What are the best open-source GitHub repositories for a routing gateway for multiple LLM providers?

21st-dev/1code is the closest match — 1code is an AI-assisted development environment that provides a unified interface for switching between multiple AI coding agents. It toggles between a read-only analysis mode and a full execution mode, asking clarifying questions, building structured plans with previews, and requiring user approval before making code changes. The environment integrates with external services and tools through the Model Context Protocol (MCP), enabling connections to databa…

Why does 21st-dev/1code match “a routing gateway for multiple LLM providers”?

1code is an AI-assisted development environment that provides a unified interface for switching between multiple AI coding agents. It toggles between a read-only analysis mode and a full execution mode, asking clarifying questions, building structured plans with previews, and requiring user approva…

Why does xtekky/gpt4free match “a routing gateway for multiple LLM providers”?

This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models…

Why does cft0808/edict match “a routing gateway for multiple LLM providers”?

Edict is a multi-agent orchestration system and framework designed to coordinate specialized large language model agents. It functions as a workflow designer and orchestrator that decomposes complex objectives into structured plans, using directed acyclic graphs and role-based hierarchies to execut…

Why does vllm-project/vllm match “a routing gateway for multiple LLM providers”?

vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built…

Why does ai4finance-foundation/finrobot match “a routing gateway for multiple LLM providers”?

FinRobot is an AI-powered financial analysis framework that coordinates multiple specialized agents to automate equity research, financial analysis, and investment risk assessment. At its core, it functions as a multi-agent orchestration system where a director and task manager allocate financial t…

LLM Gateway and Routing Layers

Open-source infrastructure for batching, load balancing, and routing requests across multiple large language model backends.

Find the best repos with AI.We'll search the best matching repositories with AI.

21st-dev/1code
21st-dev/1code
5,549View on GitHub
1code is an AI-assisted development environment that provides a unified interface for switching between multiple AI coding agents. It toggles between a read-only analysis mode and a full execution mode, asking clarifying questions, building structured plans with previews, and requiring user approval before making code changes. The environment integrates with external services and tools through the Model Context Protocol (MCP), enabling connections to databases, project management systems, and code repositories. Agent sessions can run either locally or in persistent cloud sandboxes that stay al
TypeScriptAI Coding AssistantsAgent Switching InterfacesAI Coding Agent Platforms
View on GitHub5,549
xtekky/gpt4free
xtekky/gpt4free
66,335View on GitHub
This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models through a single, consistent programming interface. By abstracting provider-specific protocols and authentication requirements, the tool simplifies the development of applications that rely on external AI services. The platform distinguishes itself through a resilient request routing architecture d
PythonAI Request RoutersConversation ManagementFailover Strategies
View on GitHub66,335
cft0808/edict
cft0808/edict
16,123View on GitHub
Edict is a multi-agent orchestration system and framework designed to coordinate specialized large language model agents. It functions as a workflow designer and orchestrator that decomposes complex objectives into structured plans, using directed acyclic graphs and role-based hierarchies to execute sub-tasks. The system is distinguished by its event-driven architecture, utilizing a publish-subscribe event bus and transactional outbox to manage agent communications and task transitions. It features a dedicated skill management system that allows for the importation, updating, and sandboxed ex
PythonAutomation WorkflowsAgent Communication RestrictionsAgent Governance
View on GitHub16,123
vllm-project/vllm
vllm-project/vllm
83,048View on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token generation speed and memory efficiency, enabling both large-scale cloud deployments and local execution on personal hardware. The project distinguishes itself through advanced memory management and request scheduling techniques, most notably its use of non-contiguous key-value cach
PythonContinuous Batching StrategiesCustom Model Execution EnginesDistributed Model Servers
View on GitHub83,048
ai4finance-foundation/finrobot
AI4Finance-Foundation/FinRobot
6,252View on GitHub
FinRobot is an AI-powered financial analysis framework that coordinates multiple specialized agents to automate equity research, financial analysis, and investment risk assessment. At its core, it functions as a multi-agent orchestration system where a director and task manager allocate financial tasks to the most suitable large language models based on performance metrics and task requirements. The framework distinguishes itself through its ability to execute complex multi-step financial workflows by routing tasks through perception, reasoning, and action modules. It generates professional e
Jupyter NotebookAgent-Based FrameworksAgent Workflow OrchestrationsEquity Research Engines
View on GitHub6,252
huggingface/transformers
huggingface/transformers
161,630View on GitHub
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering specialized architectures for both text and vision processing. The framework includes tools for managing the entire model lifecycle, from data preprocessing and tokenization to distributed training and inference. The library features extensive support for model optimization and
PythonAPI FrameworksByte Pair EncodingsHybrid
View on GitHub161,630
huggingface/chat-ui
huggingface/chat-ui
10,766View on GitHub
This project is a web-based user interface for interacting with large language models, featuring streaming responses and persistent conversation history. It functions as an orchestration gateway that directs user prompts to specific language models and acts as a Model Context Protocol client to execute external tools and incorporate live data into conversations. The application includes a routing layer that analyzes input signals and tool requirements to dynamically direct messages to the most appropriate specialized model. It also provides customization settings for brand identity, allowing
TypeScriptWeb Chat InterfacesClient Side RenderingConversation History Management
View on GitHub10,766
hpcaitech/colossalai
hpcaitech/ColossalAI
41,395View on GitHub
ColossalAI is a distributed deep learning framework designed for training and deploying massive artificial intelligence models across clusters of hardware accelerators. It functions as a parallel computing engine that partitions model workloads and data across multiple processors to maximize memory efficiency and throughput. The platform distinguishes itself through a comprehensive suite of parallelization strategies, including multi-dimensional tensor parallelism and pipeline-based model parallelism, which segment neural network layers and stages across devices. To support large-scale genera
PythonDistributed Deep Learning FrameworksDistributed Training OrchestratorsLarge-Scale Model Training
View on GitHub41,395
affaan-m/ecc
affaan-m/ECC
221,981View on GitHub
ECC is an LLM agent orchestration framework and cross-platform AI tooling suite designed to coordinate multi-model workflows. It provides a system for managing specialized agent roles, reusable skills, and structured planning to execute complex software development tasks across different AI-powered code editors. The project distinguishes itself as a Model Context Protocol manager, providing a configuration layer to integrate external servers and audit tool execution. It further implements an agentic security sandbox that restricts sensitive file access and scans for secret leakage to secure a
JavaScriptAI Agent OrchestratorsAgent Access ControlsAgent Action Guardrails
View on GitHub221,981
berriai/litellm
BerriAI/litellm
50,579View on GitHub
LiteLLM is a unified gateway and proxy server designed to centralize access to over one hundred language model providers. It provides a standardized API interface that abstracts vendor-specific schemas, allowing developers to interact with diverse models through a single, consistent format. By acting as a central traffic management layer, it enables organizations to route, secure, and govern model interactions across multiple deployments. The platform distinguishes itself through its policy-driven architecture, which uses configuration-based routing to manage traffic distribution, load balanc
PythonModel GatewaysModel Safety FiltersRequest Routers
View on GitHub50,579
pytorch/examples
pytorch/examples
23,752View on GitHub
This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning library. It provides practical examples for building, training, and deploying deep learning models, functioning as a toolkit for developers to explore neural network architectures and training workflows. The project distinguishes itself by offering concrete demonstrations of complex machine learning operations, ranging from computer vision tasks like object detection and depth estimation to the training of large-scale transformer models. These examples illustrate how to implement
PythonMachine Learning ImplementationsPython Machine Learning LibrariesDeep Learning Frameworks
View on GitHub23,752
meta-llama/llama
meta-llama/llama
59,464View on GitHub
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on your own hardware. The system distinguishes itself through specialized memory and computation management techniques, including memory-mapped weight loading and quantization-aware inference, which allow for efficient execution on standard consumer hardware. It utilizes a stateles
PythonInference EnginesLarge Language Model RuntimesLocal Inference Engines
View on GitHub59,464
nousresearch/hermes-agent
NousResearch/hermes-agent
195,049View on GitHub
Hermes-agent is an autonomous AI agent framework and runtime designed to execute complex tasks and synthesize new skills from execution traces. It includes a provider-agnostic gateway for routing requests across multiple model backends and a serverless runtime that suspends idle agent instances and resumes them on demand across containers and virtual machines. The project provides a desktop automation toolset that controls native GUI workflows on Linux by querying accessibility APIs and injecting input events. It further distinguishes itself with the ability to generate procedural skills from
PythonAutonomous Agent FrameworksAutonomous Task ExecutionAccessibility Tree Automation
View on GitHub195,049
stanfordnlp/dspy
stanfordnlp/dspy
35,325View on GitHub
DSPy is a declarative programming framework designed for building complex language model applications. It treats model interactions as modular, composable programs, allowing developers to define task logic through typed class schemas rather than relying on manually written prompts. By organizing workflows into hierarchical, reusable Python objects, the framework enables the construction of sophisticated AI systems that manage state and execution flow independently. The framework distinguishes itself through an automated optimization engine that iteratively refines prompt instructions and few-
PythonDeclarative AI FrameworksAgentic Orchestration FrameworksAI Signature Definitions
View on GitHub35,325
zhayujie/chatgpt-on-wechat
zhayujie/chatgpt-on-wechat
45,353View on GitHub
This project is an autonomous agent framework designed to integrate large language models with popular messaging platforms. It functions as a middleware platform that enables automated, multimodal interactions by decomposing complex user goals into sequential plans, executing them through external tools, and maintaining persistent context across sessions. The framework distinguishes itself through a modular skill architecture and a hybrid memory system. Users can extend system capabilities by installing custom logic modules from community hubs or generating them through natural language. The
PythonAgent FrameworksAgent OrchestratorsAgent Memory Systems
View on GitHub45,353
jingyaogong/minimind
jingyaogong/minimind
51,834View on GitHub
This project is a comprehensive framework for the entire lifecycle of transformer-based language models, supporting everything from foundational pretraining to specialized deployment. It provides a modular toolkit for defining neural network architectures, managing data preparation pipelines, and executing training routines across various scales. The framework is designed to handle the full model development process, including supervised fine-tuning, behavioral alignment, and the integration of agentic capabilities. What distinguishes this framework is its focus on efficient training and adva
PythonModel Training ToolkitsAgentic FrameworksAgentic Training Frameworks
View on GitHub51,834
wangrongding/wechat-bot
wangrongding/wechat-bot
9,806View on GitHub
This project is a WeChat LLM bot framework and messaging gateway designed to connect WeChat accounts to language models for automated responses and group chat interactions. It functions as an orchestration layer that routes incoming messages to AI agents and returns generated responses to users. The system distinguishes itself through a provider-agnostic routing mechanism that distributes messages across various cloud-based and local language model services. It includes a command-line interface for managing login sessions, searching chat history, and sending messages, as well as a whitelist-b
JavaScriptMessaging Bot FrameworksWeChat AI AutomationAI Agent Integrations
View on GitHub9,806
mudler/localai
mudler/LocalAI
46,889View on GitHub
LocalAI is a self-hosted inference server that enables the execution of machine learning models directly on local hardware. By providing a unified interface for text, image, and audio processing, it allows users to maintain full control over data privacy and infrastructure costs while eliminating dependencies on external network services. The platform functions as an API gateway that mimics standard cloud-based artificial intelligence interfaces, allowing existing applications to integrate local models as drop-in replacements. It utilizes a container-based architecture to package runtimes and
GoInference ServersLocal Inference EnginesLocal Model Serving
View on GitHub46,889
vllm-project/semantic-router
vllm-project/semantic-router
3,205View on GitHub
GoInference GatewaysAdversarial Input DetectionAI Inference
View on GitHub3,205
zai-org/chatglm-6b
zai-org/ChatGLM-6B
41,039View on GitHub
ChatGLM-6B is a generative AI inference engine designed for local execution of transformer-based language models. It provides a comprehensive runtime environment that allows users to load and run pre-trained neural network weights directly on their own hardware, ensuring data privacy and independence from external cloud services. The project distinguishes itself through a hardware-agnostic execution backend that supports deployment across diverse environments, including standard processors, Apple Silicon, and multi-GPU configurations. It incorporates advanced optimization techniques such as w
PythonAutoregressive Inference EnginesLocal Inference EnginesModel Runtimes
View on GitHub41,039
openai/openai-agents-python
openai/openai-agents-python
27,191View on GitHub
This project is a Python framework for building autonomous, event-driven agent systems. It provides a unified runtime for orchestrating multi-agent workflows, managing persistent conversation state, and executing code within secure, isolated sandbox environments. The framework is designed to handle complex task delegation, allowing agents to invoke other agents as tools while maintaining context across multi-turn interactions. The framework distinguishes itself through its deep integration with the Model Context Protocol, enabling agents to connect to external data sources and remote services
PythonAgentic Workflow FrameworksMulti-Agent Orchestration FrameworksAgentic Workflow Orchestration
View on GitHub27,191
tensorflow/tfjs-examples
tensorflow/tfjs-examples
6,783View on GitHub
This repository provides a collection of practical demonstrations and implementation guides for machine learning tasks using TensorFlow.js. It serves as a resource for developers to explore model architectures, training workflows, and data manipulation techniques across domains such as computer vision, natural language processing, and reinforcement learning. The project covers the full lifecycle of machine learning development, including tensor-based mathematical operations, model construction via high-level layer APIs or low-level tensor logic, and model serialization for various storage med
JavaScriptManual Memory ManagementCore Model APIsModel Execution APIs
View on GitHub6,783
nanmicoder/cc-haha
NanmiCoder/cc-haha
12,675View on GitHub
cc-haha is a cross-platform desktop agent and computer use framework that enables large language models to control local operating systems through screenshots, clicks, and keystrokes. It functions as an AI coding workbench and orchestration platform, allowing for the management of multi-project workflows and the coordination of multiple agents executing complex tasks in parallel. The system includes a model backend gateway to connect various artificial intelligence providers and local models to autonomous agents. It features a centralized permission gate for authorizing sensitive commands and
TypeScriptComputer UseDesktop AI AgentsAgent Team Orchestration
View on GitHub12,675
deepspeedai/deepspeed
deepspeedai/DeepSpeed
42,528View on GitHub
DeepSpeed is a high-performance library designed to scale deep learning model training and inference across massive clusters of GPUs and compute nodes. It provides a comprehensive suite of tools for distributed training, enabling the execution of models that exceed the memory capacity of single devices through advanced parameter partitioning, pipeline-based model parallelism, and memory-efficient state offloading. The framework distinguishes itself through specialized communication-efficient optimizers and hardware-aware acceleration techniques. By utilizing gradient compression, quantization
PythonDistributed Memory OptimizersDistributed Training FrameworksDistributed Training Optimizers
View on GitHub42,528
nvidia/nemoclaw
NVIDIA/NemoClaw
21,237View on GitHub
NemoClaw is an LLM agent orchestrator and sandboxed execution environment designed to deploy and manage the lifecycles of large language model agents. It provides a secure runtime that isolates persistent agents from the underlying host system to ensure operational security. The system includes a secure LLM inference gateway that acts as a managed routing layer, securing communication between AI agents and inference engines to prevent unauthorized access. It also integrates with NVIDIA OpenShell to run specialized agents within a secure shell environment. Operational control is provided thro
TypeScriptAgent Runtime SandboxingAgent Execution EnvironmentsAgent Lifecycle Management
View on GitHub21,237
huggingface/open-r1
huggingface/open-r1
26,326View on GitHub
Open-r1 is a framework designed for the large-scale training, distillation, and optimization of language models focused on complex reasoning and programming tasks. It provides a comprehensive suite of tools for managing distributed training jobs across multi-node clusters, enabling the development of high-performance models through reinforcement learning and supervised fine-tuning. The project distinguishes itself by integrating secure, containerized code execution environments directly into the training and evaluation lifecycle. By allowing models to run and verify code snippets against test
PythonCode-Integrated Training FrameworksLarge Scale Training SuitesReasoning Model Training Suites
View on GitHub26,326
langroid/langroid
langroid/langroid
3,894View on GitHub
Langroid is a multi-agent orchestration framework and tool integration suite designed for building complex AI applications. It serves as a multi-modal integration layer that connects diverse local and remote language models with an agentic retrieval-augmented generation system. The project distinguishes itself through a collaborative message-exchange paradigm, allowing specialized agents to delegate tasks hierarchically and coordinate via structured communication. It features an advanced state management system for conversational AI, including the ability to rewind and prune conversation hist
PythonAgent OrchestrationHierarchical Task DelegationLanguage Model Integrations
View on GitHub3,894
exo-explore/exo
exo-explore/exo
45,380View on GitHub
Exo is a distributed inference engine designed to run machine learning models across local hardware. It functions as a network orchestration layer that automatically discovers available devices to form a unified computing cluster, allowing users to scale artificial intelligence workloads by distributing computational tasks across multiple machines. The platform distinguishes itself through its ability to manage the entire lifecycle of local models while providing a standardized gateway for external applications. By translating local model outputs into industry-standard formats, it enables exi
PythonDistributed AI SystemsDistributed Inference EnginesInference Engines
View on GitHub45,380
langchain-ai/langchain
langchain-ai/langchain
139,458View on GitHub
LangChain is an orchestration framework designed for building, managing, and deploying applications powered by large language models. It provides a unified integration layer that normalizes disparate model provider APIs into a consistent set of primitives, enabling developers to build complex, multi-step AI workflows that manage state, memory, and tool execution. The project distinguishes itself through a durable execution runtime that maintains persistent state across long-running processes by checkpointing progress to external storage. It models agent workflows as directed graphs, allowing
PythonAgent Orchestration FrameworksLLM Application OrchestrationLLM Integration Layers
View on GitHub139,458
mintplex-labs/anything-llm
Mintplex-Labs/anything-llm
61,663View on GitHub
This platform serves as a comprehensive environment for managing private language models, document knowledge bases, and automated agent workflows within secure local infrastructure. It functions as a document-aware workspace that enables users to ingest diverse file formats into searchable repositories, ensuring that all data processing and model inference remain within private, local environments to maintain data sovereignty. The system distinguishes itself through a modular agentic engine that allows for the definition of custom skills and external tool execution. By utilizing a multi-model
JavaScriptAgentic Workflow EnginesAI Agent OrchestratorsDocument-Aware AI Workspaces
View on GitHub61,663
stangirard/quivr
StanGirard/quivr
39,167View on GitHub
Quivr is a framework for building retrieval-augmented generation pipelines that connect large language models to custom knowledge bases. It serves as a generative AI integration layer that abstracts the process of transforming diverse document sources into searchable context for AI responses. The project orchestrates the end-to-end flow between document ingestion, vector storage management, and model provider interfaces. It features a vector-store-agnostic retrieval system and a modular API layer that allows for flexible switching between different generative model providers. The system cove
PythonRetrieval Augmented GenerationIngestion PipelinesKnowledge Base Retrieval
View on GitHub39,167
mlabonne/llm-course
mlabonne/llm-course
80,178View on GitHub
This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as well as the practical implementation of supervised instruction fine-tuning and preference-based model alignment. The repository distinguishes itself by providing a deep dive into advanced model composition and optimization techniques. It details methodologies for weight-space mode
AI Research RepositoriesAwesome ListFine-Tuning Strategies
View on GitHub80,178
chatchat-space/langchain-chatchat
chatchat-space/Langchain-Chatchat
38,211View on GitHub
Langchain-Chatchat is a system for building retrieval-augmented generation applications and autonomous AI agents. It integrates a knowledge base management system and an agent framework to enable language models to interact with private documents and execute multi-step tasks through external tools. The platform supports local deployment of language models on private infrastructure to operate without an internet connection. It includes a multimodal AI platform that combines vision models for image analysis with text-to-image generation capabilities. The system provides a web-based conversatio
PythonKnowledge Base RetrievalLocal Model DeploymentAI Agent Orchestrators
View on GitHub38,211
ggml-org/llama.cpp
ggml-org/llama.cpp
116,799View on GitHub
Llama.cpp is an inference engine designed for the local execution of text-based and multimodal language models on consumer hardware. It provides a core environment for running models that process both text and image inputs, utilizing hardware-accelerated backends to optimize performance across diverse CPU and GPU architectures. The project distinguishes itself by offering a lightweight HTTP server that adheres to standard API specifications, enabling chat completion, embeddings, and reranking services. It includes a suite of tools for model quantization and conversion, which reduces memory us
C++Hardware Abstraction LayersText-Only Inference EnginesMultimodal Inference Engines
View on GitHub116,799
hwchase17/langchainjs
hwchase17/langchainjs
17,822View on GitHub
LangChainJS is an AI agent orchestrator and application framework designed for building autonomous systems that use large language models to plan and execute tasks. It serves as an integration library that connects language models with tools, memory, and external data sources to create context-aware logic and complex workflows. The project provides a provider-agnostic interface and model provider abstraction, allowing applications to switch between different language model providers without rewriting core logic. It includes a toolkit for retrieval augmented generation, utilizing retrievers to
TypeScriptAutonomous Agent FrameworksAgentic Workflow OrchestrationAI Agent Orchestrators
View on GitHub17,822
chatgptnextweb/nextchat
ChatGPTNextWeb/NextChat
88,256View on GitHub
NextChat is a self-hosted web application that provides a unified interface for interacting with multiple large language models. It functions as a conversational platform where users can manage and switch between diverse AI providers through configurable API backends, maintaining full control over their data and infrastructure. The platform features a persistent session layer designed to handle long-running dialogues by managing message history and context. It distinguishes itself through a structured prompt engineering environment that allows for the development and application of templates
TypeScriptConversational State ManagersLanguage Model Interaction PatternsLLM Chat Interfaces
View on GitHub88,256
hkuds/autoagent
HKUDS/AutoAgent
8,583View on GitHub
AutoAgent is a multi-agent orchestrator and natural language workflow builder designed to connect multiple large language models with external API tools. It provides a framework for designing multi-step agent interactions and reasoning processes using plain text instead of manual code. The platform functions as a tool integration gateway, linking agents to third-party platforms and authenticated browser sessions. It enables the execution of complex analytical tasks and deep research by distributing work across collaborative agent frameworks and importing browser cookies to access restricted w
PythonAgentic Workflow OrchestrationNatural Language Workflow BuildersAgentic RAG Development
View on GitHub8,583
paddlepaddle/paddleocr
PaddlePaddle/PaddleOCR
82,412View on GitHub
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into independent, configurable stages. This architecture supports automated document digitization and multilingual text recognition, capable of identifying text in over one hundred languages across diverse environments ranging from scanned documents to industrial scenes. The framework disti
PythonModular Vision PipelinesMultilingual Text RecognitionDeep Learning
View on GitHub82,412
hwchase17/chat-langchain
hwchase17/chat-langchain
6,377View on GitHub
This project is a conversational assistant and retrieval-augmented generation system designed to provide technical answers from official documentation and support knowledge bases. It implements a retrieval architecture that routes queries through specialized tools and utilizes a model abstraction layer to switch between different chat and embedding providers without modifying core integration code. The system employs a graph-based state machine for durable agent execution, enabling state persistence and human-in-the-loop interactions. It features an agentic middleware framework that allows fo
TypeScriptLangChain-Based AgentsLangGraph OrchestrationsAgentic RAG Platforms
View on GitHub6,377
lightning-ai/pytorch-lightning
Lightning-AI/pytorch-lightning
31,201View on GitHub
PyTorch Lightning is a deep learning research framework that provides a structured environment for organizing machine learning code. It functions as a unified trainer orchestrator, centralizing the execution flow by managing the interaction between hardware resources, data loaders, and model components. By decoupling model architecture from training logic, the framework enables researchers to maintain clean, modular codebases that remain portable across different environments. The framework distinguishes itself through a hardware-agnostic abstraction layer that scales deep learning workloads
PythonDeep Learning FrameworksModular Training OrchestratorsTraining Orchestrators
View on GitHub31,201
flashinfer-ai/flashinfer
flashinfer-ai/flashinfer
4,996View on GitHub
FlashInfer is a library of high-performance GPU kernels purpose-built for accelerating large language model inference. It provides optimized implementations for attention operations (including flash attention, page attention, multi-head latent attention, and cascade attention) using paged key-value caches, fused kernel composition, and just-in-time compilation. The library also includes specialized kernels for mixture-of-experts layers, block-scaled low-precision quantization (FP8, FP4), and distributed collective communication. What distinguishes FlashInfer is its fused all-reduce communicat
PythonAll-Reduces with Residual, RMSNorm, and QuantizationAttention Kernel LibrariesAttention State Merging
View on GitHub4,996
microsoft/semantic-kernel
microsoft/semantic-kernel
27,262View on GitHub
Semantic Kernel is an artificial intelligence orchestration framework designed to integrate large language models with existing codebases. It functions as an agentic workflow engine, providing a standardized interface that connects generative models to traditional application logic, data sources, and external tools to automate complex, multi-step business tasks. The platform distinguishes itself through a modular plugin architecture and a planner-based reasoning engine that decomposes high-level goals into executable sequences of functions. By utilizing a connector-based abstraction layer, it
C#Agent Orchestration FrameworksAI Orchestration FrameworksModel Abstraction Layers
View on GitHub27,262
567-labs/instructor
567-labs/instructor
13,176View on GitHub
Instructor is a framework designed for structured data extraction, validation, and language model integration. It functions as a library that transforms unstructured text into validated, type-safe objects by leveraging schema definitions and model-specific tool-calling capabilities. By acting as a validation middleware, the project ensures that language model outputs strictly conform to defined data structures. The library distinguishes itself through a robust validation-based retry loop that automatically re-submits failed responses with error feedback to iteratively correct schema complianc
PythonStructured Data ExtractionStructured Output ParsersLLM Integration Frameworks
View on GitHub13,176
haotian-liu/llava
haotian-liu/LLaVA
24,465View on GitHub
LLaVA is a multimodal large language model architecture designed to process and interpret both image and text inputs to generate natural language responses. It functions as a research-oriented platform for visual instruction tuning, providing a framework to align language models with human intent through training on diverse datasets of paired images and text queries. The system distinguishes itself through a specialized vision-language training pipeline that connects visual data to language models using projection layers and instruction-based fine-tuning. It supports distributed inference by
PythonMultimodal Large Language ModelsVision-Language PipelinesVisual Instruction Tuning
View on GitHub24,465
memorilabs/memori
MemoriLabs/Memori
15,358View on GitHub
Memori is an AI agent memory middleware platform designed to provide persistent, context-aware recall for language models. It functions as a non-intrusive layer that intercepts outbound model requests to automatically capture interaction history and execution traces, ensuring that agents maintain continuity across sessions without requiring modifications to existing application logic. The platform distinguishes itself through a dual-model storage architecture that maintains information as both structured relational primitives for precise fact retrieval and rolling narrative summaries for situ
PythonAwesome ListAgent Memory PersistenceAgent Memory Stores
View on GitHub15,358
nomic-ai/gpt4all
nomic-ai/gpt4all
77,375View on GitHub
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a comprehensive ecosystem for managing the entire model lifecycle, including discovery, downloading, and configuration of local weights. What distinguishes the platform is its integrated retrieval-augmented generation engine, which allows users to index local documents into semantic vect
C++C++ Inference BackendsLanguage Model OrchestrationLocal AI Inference
View on GitHub77,375
pipecat-ai/pipecat
pipecat-ai/pipecat
12,846View on GitHub
Pipecat is a framework and software development kit for building real-time multimodal AI agents and speech-to-speech systems. It utilizes a frame-based data pipeline to route audio, video, and text through a modular sequence of processors, enabling the orchestration of low-latency conversational AI. The project is distinguished by its ability to coordinate complex multimodal services, including speech-to-text, language models, and text-to-speech, within a single pipeline. It features semantic voice activity detection for natural turn-taking, state-machine conversation flows for dialogue manag
PythonData Flow OrchestratorsMultimodal AI OrchestratorsMultimodal Service Orchestration
View on GitHub12,846
hiyouga/llamafactory
hiyouga/LlamaFactory
72,213View on GitHub
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The project distinguishes itself by offering a low-code visual dashboard that enables users to configure experiments and monitor performance metrics in real time without writing extensive custom scripts. It also features a configuration-driven orchestration system that decouples experim
PythonExperiment TrackingLanguage Model Fine-TuningLarge Language Model Fine-Tuning Frameworks
View on GitHub72,213

LLM Gateway and Routing Layers

21st-dev/1code

xtekky/gpt4free

cft0808/edict

vllm-project/vllm

AI4Finance-Foundation/FinRobot

huggingface/transformers

huggingface/chat-ui

hpcaitech/ColossalAI

affaan-m/ECC

BerriAI/litellm

pytorch/examples

meta-llama/llama

NousResearch/hermes-agent

stanfordnlp/dspy

zhayujie/chatgpt-on-wechat

jingyaogong/minimind

wangrongding/wechat-bot

mudler/LocalAI

vllm-project/semantic-router

zai-org/ChatGLM-6B

openai/openai-agents-python

tensorflow/tfjs-examples

NanmiCoder/cc-haha

deepspeedai/DeepSpeed

NVIDIA/NemoClaw

huggingface/open-r1

langroid/langroid

exo-explore/exo

langchain-ai/langchain

Mintplex-Labs/anything-llm

StanGirard/quivr

mlabonne/llm-course

chatchat-space/Langchain-Chatchat

ggml-org/llama.cpp

hwchase17/langchainjs

ChatGPTNextWeb/NextChat

HKUDS/AutoAgent

PaddlePaddle/PaddleOCR

hwchase17/chat-langchain

Lightning-AI/pytorch-lightning

flashinfer-ai/flashinfer

microsoft/semantic-kernel

567-labs/instructor

haotian-liu/LLaVA

MemoriLabs/Memori

nomic-ai/gpt4all

pipecat-ai/pipecat

hiyouga/LlamaFactory