awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Infrastructure · Awesome GitHub Repositories

76 repos

Awesome GitHub RepositoriesInfrastructure

Foundational systems and hardware-level tools required to support the development, deployment, and scaling of machine learning workflows.

Explore 76 awesome GitHub repositories matching artificial intelligence & ml · Infrastructure. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Machine Learning
  4. Infrastructure

Awesome Infrastructure GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • infiniflow/ragflow

    infiniflow/ragflow

    73,425GitHubView on GitHub↗

    This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin

    Processes unstructured data using deep document understanding to extract structured knowledge for high-quality information retrieval.

    Pythonagentagenticagentic-ai
  • redis/redis

    redis/redis

    73,096GitHubView on GitHub↗

    Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to pr

    Accelerates machine learning workflows by serving pre-computed features directly from high-speed memory.

    Ccachecachingdatabase
  • awesomedata/awesome-public-datasets

    awesomedata/awesome-public-datasets

    72,846GitHubView on GitHub↗

    This project is a community-maintained, open-access directory of high-quality public datasets. It serves as a centralized reference point for researchers, developers, and data scientists to locate reliable information sources across a wide spectrum of industries and scientific fields. By providing a structured index, t

    Supplies a diverse collection of labeled datasets essential for training, validating, and benchmarking predictive models.

    aaron-swartzawesome-public-datasetsdatasets
  • twitter/the-algorithm

    twitter/the-algorithm

    72,764GitHubView on GitHub↗

    The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver

    Deploys predictive models to score content relevance and user engagement probabilities in real-time.

    Scala
  • tesseract-ocr/tesseract

    tesseract-ocr/tesseract

    72,460GitHubView on GitHub↗

    Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into d

    Refines recognition accuracy by applying document-specific image and language models tailored to varying typefaces and vocabularies.

    C++hacktoberfestlstmmachine-learning
  • CompVis/stable-diffusion

    CompVis/stable-diffusion

    72,380GitHubView on GitHub↗

    Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables the generation of im

    Coordinates model loading, hardware acceleration, and output processing to streamline production-ready inference.

    Jupyter Notebook
  • josephmisiti/awesome-machine-learning

    josephmisiti/awesome-machine-learning

    71,702GitHubView on GitHub↗

    This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify disco

    Presents interactive graphical interfaces for automated model generation and exploratory statistical data mining.

    Python
  • PaddlePaddle/PaddleOCR

    PaddlePaddle/PaddleOCR

    70,931GitHubView on GitHub↗

    PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen

    Abstracts execution logic to allow seamless model operation across diverse CPU, GPU, and mobile hardware backends.

    Pythonai4sciencechineseocrdocument-parsing
  • vllm-project/vllm

    vllm-project/vllm

    70,745GitHubView on GitHub↗

    vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen

    Compresses large neural networks to reduce memory footprint while maintaining performance on resource-constrained hardware.

    Pythonamdblackwellcuda
  • dair-ai/Prompt-Engineering-Guide

    dair-ai/Prompt-Engineering-Guide

    70,526GitHubView on GitHub↗

    This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task

    Lists benchmarks and evaluation metrics for assessing how well models recall information across large context windows.

    MDXagentagentsai-agents
  • binary-husky/gpt_academic

    binary-husky/gpt_academic

    70,112GitHubView on GitHub↗

    This project provides a self-hosted, web-based interface designed to integrate large language models into academic and research workflows. It functions as a modular platform for document analysis, literature processing, and data handling, allowing users to maintain full control over their data and model connectivity th

    Manage and serve large language models on local hardware through a unified, web-based interface.

    Pythonacademicchatglm-6bchatgpt
  • OpenHands/OpenHands

    OpenHands/OpenHands

    67,974GitHubView on GitHub↗

    OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system

    Routes requests to different language models based on performance, cost, or capability requirements to optimize agent execution.

    Pythonagentartificial-intelligencechatgpt
  • hiyouga/LlamaFactory

    hiyouga/LlamaFactory

    67,386GitHubView on GitHub↗

    LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The pro

    Simplifies complex model refinement by offering a unified interface for both full-parameter and efficient training methods.

    Pythonagentaideepseek
  • xtekky/gpt4free

    xtekky/gpt4free

    65,720GitHubView on GitHub↗

    This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models through a single, co

    Routes incoming requests through a unified interface that dynamically selects and cycles between multiple third-party service providers.

    Pythonchatbotchatbotschatgpt
  • scikit-learn/scikit-learn

    scikit-learn/scikit-learn

    65,178GitHubView on GitHub↗

    Scikit-learn is a machine learning library for predictive data analysis that provides a collection of algorithms for supervised and unsupervised learning. It functions as a comprehensive toolkit for data preprocessing, dimensionality reduction, and model selection, allowing users to classify data objects, predict conti

    Compares algorithm configurations and tunes hyperparameters to identify the most accurate approach for specific predictive tasks.

    Pythondata-analysisdata-sciencemachine-learning
  • FoundationAgents/MetaGPT

    FoundationAgents/MetaGPT

    64,304GitHubView on GitHub↗

    MetaGPT is an agentic workflow engine and multi-agent orchestration framework designed to automate complex software engineering and data analysis tasks. It functions as an automated software factory that transforms high-level natural language requirements into functional web applications, technical documentation, and p

    Generates full-stack web applications by delegating planning, design, frontend, backend, and deployment tasks to an automated team of agents.

    Pythonagentgptllm
  • keras-team/keras

    keras-team/keras

    63,858GitHubView on GitHub↗

    Keras is a high-level deep learning framework designed for constructing and training neural networks through the composition of modular, functional layers. It serves as a comprehensive modeling toolkit that provides standardized procedures for defining, evaluating, and deploying complex architectures. By utilizing a di

    Coordinates automated workflows for training loops, batch processing, and validation dataset management.

    Pythondata-sciencedeep-learningjax
  • openinterpreter/open-interpreter

    openinterpreter/open-interpreter

    62,257GitHubView on GitHub↗

    Open Interpreter is an autonomous agent runtime that translates natural language instructions into executable code to interact with local software and operating systems. It functions as an orchestration framework that connects language models to a secure execution environment, enabling the development of agents capable

    Standardizes requests and responses across multiple language models to facilitate consistent inference and streaming.

    Pythonchatgptgpt-4interpreter
  • traefik/traefik

    traefik/traefik

    61,814GitHubView on GitHub↗

    Traefik is a cloud-native edge router and API gateway designed to manage service communication and traffic flow across distributed infrastructure. It functions as a dynamic service proxy that automatically discovers backend services and configures routing rules in real time, eliminating the need for manual restarts or

    Caches model responses based on query semantics to minimize redundant computation and lower inference latency.

    Goconsuldockeretcd
  • pathwaycom/pathway

    pathwaycom/pathway

    59,684GitHubView on GitHub↗

    Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with

    Connects live data streams to language models for instant, context-aware content generation and analysis.

    Pythonbatch-processingdata-analyticsdata-pipelines
Prev1234Next

Explore sub-tags

  • Data Ingestion and Preparation4 sub-tagsTools focused on the initial stages of the pipeline, including loading, formatting, and augmenting raw data for model consumption.
  • Dataset Management2 sub-tags
  • Deployment & Serving8 sub-tags
  • Domain-Specific Processing Pipelines3 sub-tagsSpecialized pipelines tailored for specific data modalities like media synthesis or real-time streaming inference.
Evaluation & Validation14 sub-tags
  • Integrated Development Platforms1 sub-tagComprehensive environments that bundle tools for the end-to-end lifecycle, including development, management, and operational workflows.
  • Machine Learning Training8 sub-tagsFrameworks and utilities used to train, fine-tune, and align machine learning models with specific objectives.
  • Model Evaluation and Analysis7 sub-tagsTools and frameworks for measuring, benchmarking, and monitoring the performance and quality of machine learning models.
  • Model Inference and Serving7 sub-tagsPlatforms and techniques for deploying, optimizing, and serving machine learning models for production use.
  • Model Management9 sub-tagsTools and interfaces for organizing, loading, and executing machine learning models throughout their operational lifecycle.
  • Model Optimization5 sub-tagsTechniques and utilities designed to improve model performance, reduce resource consumption, and refine parameters for specific deployment environments.
  • Optimization & Inference8 sub-tags
  • Training & Tuning12 sub-tags
  • Training Monitoring & Profiling3 sub-tags