76 repos

Awesome GitHub RepositoriesInfrastructure

Foundational systems and hardware-level tools required to support the development, deployment, and scaling of machine learning workflows.

Explore 76 awesome GitHub repositories matching artificial intelligence & ml · Infrastructure. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

infiniflow/ragflow
infiniflow/ragflow
73,425GitHubView on GitHub
This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin
Processes unstructured data using deep document understanding to extract structured knowledge for high-quality information retrieval.
Pythonagentagenticagentic-ai
redis/redis
redis/redis
73,096GitHubView on GitHub
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to pr
Accelerates machine learning workflows by serving pre-computed features directly from high-speed memory.
Ccachecachingdatabase
awesomedata/awesome-public-datasets
awesomedata/awesome-public-datasets
72,846GitHubView on GitHub
This project is a community-maintained, open-access directory of high-quality public datasets. It serves as a centralized reference point for researchers, developers, and data scientists to locate reliable information sources across a wide spectrum of industries and scientific fields. By providing a structured index, t
Supplies a diverse collection of labeled datasets essential for training, validating, and benchmarking predictive models.
aaron-swartzawesome-public-datasetsdatasets
twitter/the-algorithm
twitter/the-algorithm
72,764GitHubView on GitHub
The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver
Deploys predictive models to score content relevance and user engagement probabilities in real-time.
Scala
tesseract-ocr/tesseract
tesseract-ocr/tesseract
72,460GitHubView on GitHub
Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into d
Refines recognition accuracy by applying document-specific image and language models tailored to varying typefaces and vocabularies.
C++hacktoberfestlstmmachine-learning
CompVis/stable-diffusion
CompVis/stable-diffusion
72,380GitHubView on GitHub
Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables the generation of im
Coordinates model loading, hardware acceleration, and output processing to streamline production-ready inference.
Jupyter Notebook
josephmisiti/awesome-machine-learning
josephmisiti/awesome-machine-learning
71,702GitHubView on GitHub
This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify disco
Presents interactive graphical interfaces for automated model generation and exploratory statistical data mining.
Python
PaddlePaddle/PaddleOCR
PaddlePaddle/PaddleOCR
70,931GitHubView on GitHub
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen
Abstracts execution logic to allow seamless model operation across diverse CPU, GPU, and mobile hardware backends.
Pythonai4sciencechineseocrdocument-parsing
vllm-project/vllm
vllm-project/vllm
70,745GitHubView on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Compresses large neural networks to reduce memory footprint while maintaining performance on resource-constrained hardware.
Pythonamdblackwellcuda
dair-ai/Prompt-Engineering-Guide
dair-ai/Prompt-Engineering-Guide
70,526GitHubView on GitHub
This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task
Lists benchmarks and evaluation metrics for assessing how well models recall information across large context windows.
MDXagentagentsai-agents
binary-husky/gpt_academic
binary-husky/gpt_academic
70,112GitHubView on GitHub
This project provides a self-hosted, web-based interface designed to integrate large language models into academic and research workflows. It functions as a modular platform for document analysis, literature processing, and data handling, allowing users to maintain full control over their data and model connectivity th
Manage and serve large language models on local hardware through a unified, web-based interface.
Pythonacademicchatglm-6bchatgpt
OpenHands/OpenHands
OpenHands/OpenHands
67,974GitHubView on GitHub
OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system
Routes requests to different language models based on performance, cost, or capability requirements to optimize agent execution.
Pythonagentartificial-intelligencechatgpt
hiyouga/LlamaFactory
hiyouga/LlamaFactory
67,386GitHubView on GitHub
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The pro
Simplifies complex model refinement by offering a unified interface for both full-parameter and efficient training methods.
Pythonagentaideepseek
xtekky/gpt4free
xtekky/gpt4free
65,720GitHubView on GitHub
This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models through a single, co
Routes incoming requests through a unified interface that dynamically selects and cycles between multiple third-party service providers.
Pythonchatbotchatbotschatgpt
scikit-learn/scikit-learn
scikit-learn/scikit-learn
65,178GitHubView on GitHub
Scikit-learn is a machine learning library for predictive data analysis that provides a collection of algorithms for supervised and unsupervised learning. It functions as a comprehensive toolkit for data preprocessing, dimensionality reduction, and model selection, allowing users to classify data objects, predict conti
Compares algorithm configurations and tunes hyperparameters to identify the most accurate approach for specific predictive tasks.
Pythondata-analysisdata-sciencemachine-learning
FoundationAgents/MetaGPT
FoundationAgents/MetaGPT
64,304GitHubView on GitHub
MetaGPT is an agentic workflow engine and multi-agent orchestration framework designed to automate complex software engineering and data analysis tasks. It functions as an automated software factory that transforms high-level natural language requirements into functional web applications, technical documentation, and p
Generates full-stack web applications by delegating planning, design, frontend, backend, and deployment tasks to an automated team of agents.
Pythonagentgptllm
keras-team/keras
keras-team/keras
63,858GitHubView on GitHub
Keras is a high-level deep learning framework designed for constructing and training neural networks through the composition of modular, functional layers. It serves as a comprehensive modeling toolkit that provides standardized procedures for defining, evaluating, and deploying complex architectures. By utilizing a di
Coordinates automated workflows for training loops, batch processing, and validation dataset management.
Pythondata-sciencedeep-learningjax
openinterpreter/open-interpreter
openinterpreter/open-interpreter
62,257GitHubView on GitHub
Open Interpreter is an autonomous agent runtime that translates natural language instructions into executable code to interact with local software and operating systems. It functions as an orchestration framework that connects language models to a secure execution environment, enabling the development of agents capable
Standardizes requests and responses across multiple language models to facilitate consistent inference and streaming.
Pythonchatgptgpt-4interpreter
traefik/traefik
traefik/traefik
61,814GitHubView on GitHub
Traefik is a cloud-native edge router and API gateway designed to manage service communication and traffic flow across distributed infrastructure. It functions as a dynamic service proxy that automatically discovers backend services and configures routing rules in real time, eliminating the need for manual restarts or
Caches model responses based on query semantics to minimize redundant computation and lower inference latency.
Goconsuldockeretcd
pathwaycom/pathway
pathwaycom/pathway
59,684GitHubView on GitHub
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with
Connects live data streams to language models for instant, context-aware content generation and analysis.
Pythonbatch-processingdata-analyticsdata-pipelines

Explore sub-tags

76 repos

Awesome GitHub RepositoriesInfrastructure

Foundational systems and hardware-level tools required to support the development, deployment, and scaling of machine learning workflows.

Explore 76 awesome GitHub repositories matching artificial intelligence & ml · Infrastructure. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

infiniflow/ragflow
infiniflow/ragflow
73,425GitHubView on GitHub
This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin
Processes unstructured data using deep document understanding to extract structured knowledge for high-quality information retrieval.
Pythonagentagenticagentic-ai
redis/redis
redis/redis
73,096GitHubView on GitHub
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to pr
Accelerates machine learning workflows by serving pre-computed features directly from high-speed memory.
Ccachecachingdatabase
awesomedata/awesome-public-datasets
awesomedata/awesome-public-datasets
72,846GitHubView on GitHub
This project is a community-maintained, open-access directory of high-quality public datasets. It serves as a centralized reference point for researchers, developers, and data scientists to locate reliable information sources across a wide spectrum of industries and scientific fields. By providing a structured index, t
Supplies a diverse collection of labeled datasets essential for training, validating, and benchmarking predictive models.
aaron-swartzawesome-public-datasetsdatasets
twitter/the-algorithm
twitter/the-algorithm
72,764GitHubView on GitHub
The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver
Deploys predictive models to score content relevance and user engagement probabilities in real-time.
Scala
tesseract-ocr/tesseract
tesseract-ocr/tesseract
72,460GitHubView on GitHub
Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into d
Refines recognition accuracy by applying document-specific image and language models tailored to varying typefaces and vocabularies.
C++hacktoberfestlstmmachine-learning
CompVis/stable-diffusion
CompVis/stable-diffusion
72,380GitHubView on GitHub
Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables the generation of im
Coordinates model loading, hardware acceleration, and output processing to streamline production-ready inference.
Jupyter Notebook
josephmisiti/awesome-machine-learning
josephmisiti/awesome-machine-learning
71,702GitHubView on GitHub
This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify disco
Presents interactive graphical interfaces for automated model generation and exploratory statistical data mining.
Python
PaddlePaddle/PaddleOCR
PaddlePaddle/PaddleOCR
70,931GitHubView on GitHub
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen
Abstracts execution logic to allow seamless model operation across diverse CPU, GPU, and mobile hardware backends.
Pythonai4sciencechineseocrdocument-parsing
vllm-project/vllm
vllm-project/vllm
70,745GitHubView on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Compresses large neural networks to reduce memory footprint while maintaining performance on resource-constrained hardware.
Pythonamdblackwellcuda
dair-ai/Prompt-Engineering-Guide
dair-ai/Prompt-Engineering-Guide
70,526GitHubView on GitHub
This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task
Lists benchmarks and evaluation metrics for assessing how well models recall information across large context windows.
MDXagentagentsai-agents
binary-husky/gpt_academic
binary-husky/gpt_academic
70,112GitHubView on GitHub
This project provides a self-hosted, web-based interface designed to integrate large language models into academic and research workflows. It functions as a modular platform for document analysis, literature processing, and data handling, allowing users to maintain full control over their data and model connectivity th
Manage and serve large language models on local hardware through a unified, web-based interface.
Pythonacademicchatglm-6bchatgpt
OpenHands/OpenHands
OpenHands/OpenHands
67,974GitHubView on GitHub
OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system
Routes requests to different language models based on performance, cost, or capability requirements to optimize agent execution.
Pythonagentartificial-intelligencechatgpt
hiyouga/LlamaFactory
hiyouga/LlamaFactory
67,386GitHubView on GitHub
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The pro
Simplifies complex model refinement by offering a unified interface for both full-parameter and efficient training methods.
Pythonagentaideepseek
xtekky/gpt4free
xtekky/gpt4free
65,720GitHubView on GitHub
This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models through a single, co
Routes incoming requests through a unified interface that dynamically selects and cycles between multiple third-party service providers.
Pythonchatbotchatbotschatgpt
scikit-learn/scikit-learn
scikit-learn/scikit-learn
65,178GitHubView on GitHub
Scikit-learn is a machine learning library for predictive data analysis that provides a collection of algorithms for supervised and unsupervised learning. It functions as a comprehensive toolkit for data preprocessing, dimensionality reduction, and model selection, allowing users to classify data objects, predict conti
Compares algorithm configurations and tunes hyperparameters to identify the most accurate approach for specific predictive tasks.
Pythondata-analysisdata-sciencemachine-learning
FoundationAgents/MetaGPT
FoundationAgents/MetaGPT
64,304GitHubView on GitHub
MetaGPT is an agentic workflow engine and multi-agent orchestration framework designed to automate complex software engineering and data analysis tasks. It functions as an automated software factory that transforms high-level natural language requirements into functional web applications, technical documentation, and p
Generates full-stack web applications by delegating planning, design, frontend, backend, and deployment tasks to an automated team of agents.
Pythonagentgptllm
keras-team/keras
keras-team/keras
63,858GitHubView on GitHub
Keras is a high-level deep learning framework designed for constructing and training neural networks through the composition of modular, functional layers. It serves as a comprehensive modeling toolkit that provides standardized procedures for defining, evaluating, and deploying complex architectures. By utilizing a di
Coordinates automated workflows for training loops, batch processing, and validation dataset management.
Pythondata-sciencedeep-learningjax
openinterpreter/open-interpreter
openinterpreter/open-interpreter
62,257GitHubView on GitHub
Open Interpreter is an autonomous agent runtime that translates natural language instructions into executable code to interact with local software and operating systems. It functions as an orchestration framework that connects language models to a secure execution environment, enabling the development of agents capable
Standardizes requests and responses across multiple language models to facilitate consistent inference and streaming.
Pythonchatgptgpt-4interpreter
traefik/traefik
traefik/traefik
61,814GitHubView on GitHub
Traefik is a cloud-native edge router and API gateway designed to manage service communication and traffic flow across distributed infrastructure. It functions as a dynamic service proxy that automatically discovers backend services and configures routing rules in real time, eliminating the need for manual restarts or
Caches model responses based on query semantics to minimize redundant computation and lower inference latency.
Goconsuldockeretcd
pathwaycom/pathway
pathwaycom/pathway
59,684GitHubView on GitHub
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with
Connects live data streams to language models for instant, context-aware content generation and analysis.
Pythonbatch-processingdata-analyticsdata-pipelines

Awesome Infrastructure GitHub Repositories

infiniflow/ragflow

redis/redis

awesomedata/awesome-public-datasets

twitter/the-algorithm

tesseract-ocr/tesseract

CompVis/stable-diffusion

josephmisiti/awesome-machine-learning

PaddlePaddle/PaddleOCR

vllm-project/vllm

dair-ai/Prompt-Engineering-Guide

binary-husky/gpt_academic

OpenHands/OpenHands

hiyouga/LlamaFactory

xtekky/gpt4free

scikit-learn/scikit-learn

FoundationAgents/MetaGPT

keras-team/keras

openinterpreter/open-interpreter

traefik/traefik

pathwaycom/pathway

Explore sub-tags

Awesome Infrastructure GitHub Repositories

infiniflow/ragflow

redis/redis

awesomedata/awesome-public-datasets

twitter/the-algorithm

tesseract-ocr/tesseract

CompVis/stable-diffusion

josephmisiti/awesome-machine-learning

PaddlePaddle/PaddleOCR

vllm-project/vllm

dair-ai/Prompt-Engineering-Guide

binary-husky/gpt_academic

OpenHands/OpenHands

hiyouga/LlamaFactory

xtekky/gpt4free

scikit-learn/scikit-learn

FoundationAgents/MetaGPT

keras-team/keras

openinterpreter/open-interpreter

traefik/traefik

pathwaycom/pathway

Explore sub-tags