Open-source infrastructure for batching, load balancing, and routing requests across multiple large language model backends.
1code is an AI-assisted development environment that provides a unified interface for switching between multiple AI coding agents. It toggles between a read-only analysis mode and a full execution mode, asking clarifying questions, building structured plans with previews, and requiring user approval before making code changes. The environment integrates with external services and tools through the Model Context Protocol (MCP), enabling connections to databases, project management systems, and code repositories. Agent sessions can run either locally or in persistent cloud sandboxes that stay al
This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models through a single, consistent programming interface. By abstracting provider-specific protocols and authentication requirements, the tool simplifies the development of applications that rely on external AI services. The platform distinguishes itself through a resilient request routing architecture d
Edict is a multi-agent orchestration system and framework designed to coordinate specialized large language model agents. It functions as a workflow designer and orchestrator that decomposes complex objectives into structured plans, using directed acyclic graphs and role-based hierarchies to execute sub-tasks. The system is distinguished by its event-driven architecture, utilizing a publish-subscribe event bus and transactional outbox to manage agent communications and task transitions. It features a dedicated skill management system that allows for the importation, updating, and sandboxed ex
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token generation speed and memory efficiency, enabling both large-scale cloud deployments and local execution on personal hardware. The project distinguishes itself through advanced memory management and request scheduling techniques, most notably its use of non-contiguous key-value cach
FinRobot is an AI-powered financial analysis framework that coordinates multiple specialized agents to automate equity research, financial analysis, and investment risk assessment. At its core, it functions as a multi-agent orchestration system where a director and task manager allocate financial tasks to the most suitable large language models based on performance metrics and task requirements. The framework distinguishes itself through its ability to execute complex multi-step financial workflows by routing tasks through perception, reasoning, and action modules. It generates professional e
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering specialized architectures for both text and vision processing. The framework includes tools for managing the entire model lifecycle, from data preprocessing and tokenization to distributed training and inference. The library features extensive support for model optimization and
This project is a web-based user interface for interacting with large language models, featuring streaming responses and persistent conversation history. It functions as an orchestration gateway that directs user prompts to specific language models and acts as a Model Context Protocol client to execute external tools and incorporate live data into conversations. The application includes a routing layer that analyzes input signals and tool requirements to dynamically direct messages to the most appropriate specialized model. It also provides customization settings for brand identity, allowing
ColossalAI is a distributed deep learning framework designed for training and deploying massive artificial intelligence models across clusters of hardware accelerators. It functions as a parallel computing engine that partitions model workloads and data across multiple processors to maximize memory efficiency and throughput. The platform distinguishes itself through a comprehensive suite of parallelization strategies, including multi-dimensional tensor parallelism and pipeline-based model parallelism, which segment neural network layers and stages across devices. To support large-scale genera
ECC is an LLM agent orchestration framework and cross-platform AI tooling suite designed to coordinate multi-model workflows. It provides a system for managing specialized agent roles, reusable skills, and structured planning to execute complex software development tasks across different AI-powered code editors. The project distinguishes itself as a Model Context Protocol manager, providing a configuration layer to integrate external servers and audit tool execution. It further implements an agentic security sandbox that restricts sensitive file access and scans for secret leakage to secure a
LiteLLM is a unified gateway and proxy server designed to centralize access to over one hundred language model providers. It provides a standardized API interface that abstracts vendor-specific schemas, allowing developers to interact with diverse models through a single, consistent format. By acting as a central traffic management layer, it enables organizations to route, secure, and govern model interactions across multiple deployments. The platform distinguishes itself through its policy-driven architecture, which uses configuration-based routing to manage traffic distribution, load balanc
This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning library. It provides practical examples for building, training, and deploying deep learning models, functioning as a toolkit for developers to explore neural network architectures and training workflows. The project distinguishes itself by offering concrete demonstrations of complex machine learning operations, ranging from computer vision tasks like object detection and depth estimation to the training of large-scale transformer models. These examples illustrate how to implement
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on your own hardware. The system distinguishes itself through specialized memory and computation management techniques, including memory-mapped weight loading and quantization-aware inference, which allow for efficient execution on standard consumer hardware. It utilizes a stateles
Hermes-agent is an autonomous AI agent framework and runtime designed to execute complex tasks and synthesize new skills from execution traces. It includes a provider-agnostic gateway for routing requests across multiple model backends and a serverless runtime that suspends idle agent instances and resumes them on demand across containers and virtual machines. The project provides a desktop automation toolset that controls native GUI workflows on Linux by querying accessibility APIs and injecting input events. It further distinguishes itself with the ability to generate procedural skills from
DSPy is a declarative programming framework designed for building complex language model applications. It treats model interactions as modular, composable programs, allowing developers to define task logic through typed class schemas rather than relying on manually written prompts. By organizing workflows into hierarchical, reusable Python objects, the framework enables the construction of sophisticated AI systems that manage state and execution flow independently. The framework distinguishes itself through an automated optimization engine that iteratively refines prompt instructions and few-
This project is an autonomous agent framework designed to integrate large language models with popular messaging platforms. It functions as a middleware platform that enables automated, multimodal interactions by decomposing complex user goals into sequential plans, executing them through external tools, and maintaining persistent context across sessions. The framework distinguishes itself through a modular skill architecture and a hybrid memory system. Users can extend system capabilities by installing custom logic modules from community hubs or generating them through natural language. The
This project is a comprehensive framework for the entire lifecycle of transformer-based language models, supporting everything from foundational pretraining to specialized deployment. It provides a modular toolkit for defining neural network architectures, managing data preparation pipelines, and executing training routines across various scales. The framework is designed to handle the full model development process, including supervised fine-tuning, behavioral alignment, and the integration of agentic capabilities. What distinguishes this framework is its focus on efficient training and adva
This project is a WeChat LLM bot framework and messaging gateway designed to connect WeChat accounts to language models for automated responses and group chat interactions. It functions as an orchestration layer that routes incoming messages to AI agents and returns generated responses to users. The system distinguishes itself through a provider-agnostic routing mechanism that distributes messages across various cloud-based and local language model services. It includes a command-line interface for managing login sessions, searching chat history, and sending messages, as well as a whitelist-b
LocalAI is a self-hosted inference server that enables the execution of machine learning models directly on local hardware. By providing a unified interface for text, image, and audio processing, it allows users to maintain full control over data privacy and infrastructure costs while eliminating dependencies on external network services. The platform functions as an API gateway that mimics standard cloud-based artificial intelligence interfaces, allowing existing applications to integrate local models as drop-in replacements. It utilizes a container-based architecture to package runtimes and
ChatGLM-6B is a generative AI inference engine designed for local execution of transformer-based language models. It provides a comprehensive runtime environment that allows users to load and run pre-trained neural network weights directly on their own hardware, ensuring data privacy and independence from external cloud services. The project distinguishes itself through a hardware-agnostic execution backend that supports deployment across diverse environments, including standard processors, Apple Silicon, and multi-GPU configurations. It incorporates advanced optimization techniques such as w
This project is a Python framework for building autonomous, event-driven agent systems. It provides a unified runtime for orchestrating multi-agent workflows, managing persistent conversation state, and executing code within secure, isolated sandbox environments. The framework is designed to handle complex task delegation, allowing agents to invoke other agents as tools while maintaining context across multi-turn interactions. The framework distinguishes itself through its deep integration with the Model Context Protocol, enabling agents to connect to external data sources and remote services
This repository provides a collection of practical demonstrations and implementation guides for machine learning tasks using TensorFlow.js. It serves as a resource for developers to explore model architectures, training workflows, and data manipulation techniques across domains such as computer vision, natural language processing, and reinforcement learning. The project covers the full lifecycle of machine learning development, including tensor-based mathematical operations, model construction via high-level layer APIs or low-level tensor logic, and model serialization for various storage med
cc-haha is a cross-platform desktop agent and computer use framework that enables large language models to control local operating systems through screenshots, clicks, and keystrokes. It functions as an AI coding workbench and orchestration platform, allowing for the management of multi-project workflows and the coordination of multiple agents executing complex tasks in parallel. The system includes a model backend gateway to connect various artificial intelligence providers and local models to autonomous agents. It features a centralized permission gate for authorizing sensitive commands and
DeepSpeed is a high-performance library designed to scale deep learning model training and inference across massive clusters of GPUs and compute nodes. It provides a comprehensive suite of tools for distributed training, enabling the execution of models that exceed the memory capacity of single devices through advanced parameter partitioning, pipeline-based model parallelism, and memory-efficient state offloading. The framework distinguishes itself through specialized communication-efficient optimizers and hardware-aware acceleration techniques. By utilizing gradient compression, quantization
NemoClaw is an LLM agent orchestrator and sandboxed execution environment designed to deploy and manage the lifecycles of large language model agents. It provides a secure runtime that isolates persistent agents from the underlying host system to ensure operational security. The system includes a secure LLM inference gateway that acts as a managed routing layer, securing communication between AI agents and inference engines to prevent unauthorized access. It also integrates with NVIDIA OpenShell to run specialized agents within a secure shell environment. Operational control is provided thro
Open-r1 is a framework designed for the large-scale training, distillation, and optimization of language models focused on complex reasoning and programming tasks. It provides a comprehensive suite of tools for managing distributed training jobs across multi-node clusters, enabling the development of high-performance models through reinforcement learning and supervised fine-tuning. The project distinguishes itself by integrating secure, containerized code execution environments directly into the training and evaluation lifecycle. By allowing models to run and verify code snippets against test
Langroid is a multi-agent orchestration framework and tool integration suite designed for building complex AI applications. It serves as a multi-modal integration layer that connects diverse local and remote language models with an agentic retrieval-augmented generation system. The project distinguishes itself through a collaborative message-exchange paradigm, allowing specialized agents to delegate tasks hierarchically and coordinate via structured communication. It features an advanced state management system for conversational AI, including the ability to rewind and prune conversation hist
Exo is a distributed inference engine designed to run machine learning models across local hardware. It functions as a network orchestration layer that automatically discovers available devices to form a unified computing cluster, allowing users to scale artificial intelligence workloads by distributing computational tasks across multiple machines. The platform distinguishes itself through its ability to manage the entire lifecycle of local models while providing a standardized gateway for external applications. By translating local model outputs into industry-standard formats, it enables exi
LangChain is an orchestration framework designed for building, managing, and deploying applications powered by large language models. It provides a unified integration layer that normalizes disparate model provider APIs into a consistent set of primitives, enabling developers to build complex, multi-step AI workflows that manage state, memory, and tool execution. The project distinguishes itself through a durable execution runtime that maintains persistent state across long-running processes by checkpointing progress to external storage. It models agent workflows as directed graphs, allowing
This platform serves as a comprehensive environment for managing private language models, document knowledge bases, and automated agent workflows within secure local infrastructure. It functions as a document-aware workspace that enables users to ingest diverse file formats into searchable repositories, ensuring that all data processing and model inference remain within private, local environments to maintain data sovereignty. The system distinguishes itself through a modular agentic engine that allows for the definition of custom skills and external tool execution. By utilizing a multi-model
Quivr is a framework for building retrieval-augmented generation pipelines that connect large language models to custom knowledge bases. It serves as a generative AI integration layer that abstracts the process of transforming diverse document sources into searchable context for AI responses. The project orchestrates the end-to-end flow between document ingestion, vector storage management, and model provider interfaces. It features a vector-store-agnostic retrieval system and a modular API layer that allows for flexible switching between different generative model providers. The system cove
This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as well as the practical implementation of supervised instruction fine-tuning and preference-based model alignment. The repository distinguishes itself by providing a deep dive into advanced model composition and optimization techniques. It details methodologies for weight-space mode
Langchain-Chatchat is a system for building retrieval-augmented generation applications and autonomous AI agents. It integrates a knowledge base management system and an agent framework to enable language models to interact with private documents and execute multi-step tasks through external tools. The platform supports local deployment of language models on private infrastructure to operate without an internet connection. It includes a multimodal AI platform that combines vision models for image analysis with text-to-image generation capabilities. The system provides a web-based conversatio
Llama.cpp is an inference engine designed for the local execution of text-based and multimodal language models on consumer hardware. It provides a core environment for running models that process both text and image inputs, utilizing hardware-accelerated backends to optimize performance across diverse CPU and GPU architectures. The project distinguishes itself by offering a lightweight HTTP server that adheres to standard API specifications, enabling chat completion, embeddings, and reranking services. It includes a suite of tools for model quantization and conversion, which reduces memory us
LangChainJS is an AI agent orchestrator and application framework designed for building autonomous systems that use large language models to plan and execute tasks. It serves as an integration library that connects language models with tools, memory, and external data sources to create context-aware logic and complex workflows. The project provides a provider-agnostic interface and model provider abstraction, allowing applications to switch between different language model providers without rewriting core logic. It includes a toolkit for retrieval augmented generation, utilizing retrievers to
NextChat is a self-hosted web application that provides a unified interface for interacting with multiple large language models. It functions as a conversational platform where users can manage and switch between diverse AI providers through configurable API backends, maintaining full control over their data and infrastructure. The platform features a persistent session layer designed to handle long-running dialogues by managing message history and context. It distinguishes itself through a structured prompt engineering environment that allows for the development and application of templates
AutoAgent is a multi-agent orchestrator and natural language workflow builder designed to connect multiple large language models with external API tools. It provides a framework for designing multi-step agent interactions and reasoning processes using plain text instead of manual code. The platform functions as a tool integration gateway, linking agents to third-party platforms and authenticated browser sessions. It enables the execution of complex analytical tasks and deep research by distributing work across collaborative agent frameworks and importing browser cookies to access restricted w
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into independent, configurable stages. This architecture supports automated document digitization and multilingual text recognition, capable of identifying text in over one hundred languages across diverse environments ranging from scanned documents to industrial scenes. The framework disti
This project is a conversational assistant and retrieval-augmented generation system designed to provide technical answers from official documentation and support knowledge bases. It implements a retrieval architecture that routes queries through specialized tools and utilizes a model abstraction layer to switch between different chat and embedding providers without modifying core integration code. The system employs a graph-based state machine for durable agent execution, enabling state persistence and human-in-the-loop interactions. It features an agentic middleware framework that allows fo
PyTorch Lightning is a deep learning research framework that provides a structured environment for organizing machine learning code. It functions as a unified trainer orchestrator, centralizing the execution flow by managing the interaction between hardware resources, data loaders, and model components. By decoupling model architecture from training logic, the framework enables researchers to maintain clean, modular codebases that remain portable across different environments. The framework distinguishes itself through a hardware-agnostic abstraction layer that scales deep learning workloads
FlashInfer is a library of high-performance GPU kernels purpose-built for accelerating large language model inference. It provides optimized implementations for attention operations (including flash attention, page attention, multi-head latent attention, and cascade attention) using paged key-value caches, fused kernel composition, and just-in-time compilation. The library also includes specialized kernels for mixture-of-experts layers, block-scaled low-precision quantization (FP8, FP4), and distributed collective communication. What distinguishes FlashInfer is its fused all-reduce communicat
Semantic Kernel is an artificial intelligence orchestration framework designed to integrate large language models with existing codebases. It functions as an agentic workflow engine, providing a standardized interface that connects generative models to traditional application logic, data sources, and external tools to automate complex, multi-step business tasks. The platform distinguishes itself through a modular plugin architecture and a planner-based reasoning engine that decomposes high-level goals into executable sequences of functions. By utilizing a connector-based abstraction layer, it
Instructor is a framework designed for structured data extraction, validation, and language model integration. It functions as a library that transforms unstructured text into validated, type-safe objects by leveraging schema definitions and model-specific tool-calling capabilities. By acting as a validation middleware, the project ensures that language model outputs strictly conform to defined data structures. The library distinguishes itself through a robust validation-based retry loop that automatically re-submits failed responses with error feedback to iteratively correct schema complianc
LLaVA is a multimodal large language model architecture designed to process and interpret both image and text inputs to generate natural language responses. It functions as a research-oriented platform for visual instruction tuning, providing a framework to align language models with human intent through training on diverse datasets of paired images and text queries. The system distinguishes itself through a specialized vision-language training pipeline that connects visual data to language models using projection layers and instruction-based fine-tuning. It supports distributed inference by
Memori is an AI agent memory middleware platform designed to provide persistent, context-aware recall for language models. It functions as a non-intrusive layer that intercepts outbound model requests to automatically capture interaction history and execution traces, ensuring that agents maintain continuity across sessions without requiring modifications to existing application logic. The platform distinguishes itself through a dual-model storage architecture that maintains information as both structured relational primitives for precise fact retrieval and rolling narrative summaries for situ
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a comprehensive ecosystem for managing the entire model lifecycle, including discovery, downloading, and configuration of local weights. What distinguishes the platform is its integrated retrieval-augmented generation engine, which allows users to index local documents into semantic vect
Pipecat is a framework and software development kit for building real-time multimodal AI agents and speech-to-speech systems. It utilizes a frame-based data pipeline to route audio, video, and text through a modular sequence of processors, enabling the orchestration of low-latency conversational AI. The project is distinguished by its ability to coordinate complex multimodal services, including speech-to-text, language models, and text-to-speech, within a single pipeline. It features semantic voice activity detection for natural turn-taking, state-machine conversation flows for dialogue manag
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The project distinguishes itself by offering a low-code visual dashboard that enables users to configure experiments and monitor performance metrics in real time without writing extensive custom scripts. It also features a configuration-driven orchestration system that decouples experim