The visitor is looking for an API gateway that aggregates and manages multiple LLM providers and local model endpoints under a single unified interface.

wei-shaw/claude-relay-service is the closest match — This repository is a dedicated LLM proxy gateway that provides multi-provider routing, load balancing, usage analytics, and rate limiting, making it a comprehensive solution for managing AI model endpoints.. Other strong matches: mnfst/manifest, berriai/litellm, quantumnous/new-api, alibaba/higress.

Why does wei-shaw/claude-relay-service match “a gateway that unifies LLM provider APIs”?

This repository is a dedicated LLM proxy gateway that provides multi-provider routing, load balancing, usage analytics, and rate limiting, making it a comprehensive solution for managing AI model endpoints.

Why does mnfst/manifest match “a gateway that unifies LLM provider APIs”?

Manifest is a comprehensive LLM gateway that provides unified access to multiple providers, local model integration, request routing, and robust usage analytics, making it a direct fit for your requirements.

Why does berriai/litellm match “a gateway that unifies LLM provider APIs”?

LiteLLM is a comprehensive API gateway that provides a unified interface for over one hundred LLM providers, featuring robust support for load balancing, API key management, usage analytics, and rate limiting.

Why does quantumnous/new-api match “a gateway that unifies LLM provider APIs”?

This project is a comprehensive LLM API gateway that provides a unified interface for multiple providers, featuring robust load balancing, token usage tracking, and granular access control to manage diverse AI model endpoints.

Why does alibaba/higress match “a gateway that unifies LLM provider APIs”?

Higress is a cloud-native API gateway specifically designed to aggregate and manage traffic for multiple LLM providers and AI agents, offering the requested load balancing, rate limiting, and usage optimization features in a unified interface.

Unified LLM API Gateways

Open-source proxies that aggregate and manage requests across OpenAI, Anthropic, and local language models.

Find the best repos with AI.We'll search the best matching repositories with AI.

wei-shaw/claude-relay-service
Wei-Shaw/claude-relay-service
12,114View on GitHub
This project is a secure intermediary proxy gateway for large language model APIs. It functions as a relay service that forwards requests to AI providers while managing service accounts and routing traffic. The service provides a compatibility layer that supports multiple endpoint formats, allowing different third-party AI clients to communicate with a single provider. It distinguishes itself through a service account management system that assigns individual proxy settings to multiple accounts to prevent IP bans and distributes traffic via load balancing to avoid rate limits. The system includes a rate limiter that restricts access based on token volume, concurrency, and custom identification keys. It monitors usage through a tracking system that records token consumption and request metrics per user. Reliability is maintained through a circuit-breaker mechanism that detects upstream connection failures and pauses routing to affected accounts using cooldown timers.
This repository is a dedicated LLM proxy gateway that provides multi-provider routing, load balancing, usage analytics, and rate limiting, making it a comprehensive solution for managing AI model endpoints.
JavaScriptLLM GatewaysToken Usage Analytics
View on GitHub12,114
mnfst/manifest
mnfst/manifest
7,022View on GitHub
Manifest is a language model provider unification system that standardizes access to multiple AI backends through a single interface. It functions as a centralized management layer for integrating various cloud-based and local model providers to simplify how applications request completions. The system provides intelligent model routing and high availability infrastructure by directing queries based on complexity and automatically triggering model fallbacks when a primary provider fails. It distinguishes itself through multi-tenant AI management, organizing agents into isolated groups with dedicated keys for authentication and telemetry. The project covers AI cost management and observability by tracking token usage, monitoring expenditures per request, and enforcing budget limits. These capabilities are supported by daily synchronization of model pricing from external sources and the tracking of performance metrics across agents. The system can be deployed as a containerized image using Docker to simplify self-hosted administration.
Manifest is a comprehensive LLM gateway that provides unified access to multiple providers, local model integration, request routing, and robust usage analytics, making it a direct fit for your requirements.
TypeScriptToken Usage AnalyticsAI Cost MonitoringModel Routing
View on GitHub7,022
berriai/litellm
BerriAI/litellm
50,579View on GitHub
LiteLLM is a unified gateway and proxy server designed to centralize access to over one hundred language model providers. It provides a standardized API interface that abstracts vendor-specific schemas, allowing developers to interact with diverse models through a single, consistent format. By acting as a central traffic management layer, it enables organizations to route, secure, and govern model interactions across multiple deployments. The platform distinguishes itself through its policy-driven architecture, which uses configuration-based routing to manage traffic distribution, load balancing, and automatic fallbacks without requiring code changes. It incorporates a robust security and compliance layer that enforces content moderation, secret redaction, and fine-grained access control. Additionally, it supports complex operational requirements such as semantic routing, rule-based complexity scoring, and persistent virtual key management for multi-tenant environments. Beyond core routing, the project provides comprehensive governance and observability tools to monitor usage, track spending, and log request metadata across teams. It includes an integrated software development kit for tool calling and agent orchestration, alongside support for advanced features like response caching, batch processing, and structured output configuration. The system is designed for enterprise-wide deployment, offering features for audit logging, single sign-on integration, and granular cost reporting.
LiteLLM is a comprehensive API gateway that provides a unified interface for over one hundred LLM providers, featuring robust support for load balancing, API key management, usage analytics, and rate limiting.
PythonAPI Key ManagementModel Routing ConfigurationsUsage Limiters
View on GitHub50,579
quantumnous/new-api
QuantumNous/new-api
39,722View on GitHub
This project is an AI model API gateway and proxy server designed to provide a unified interface for interacting with diverse artificial intelligence service providers. It functions as a centralized middleware platform that routes, load balances, and translates API requests across multiple models, enabling developers to access text, image, audio, and video generation capabilities through a single, standardized integration. The gateway distinguishes itself through comprehensive administrative and financial controls, including event-driven usage accounting, real-time token consumption tracking, and granular role-based access control. It supports complex traffic management by distributing requests across multiple credential pools and providers to optimize throughput and bypass rate limits. Furthermore, it integrates a robust identity federation system that supports OIDC, OAuth, and hardware-backed passkeys to secure user access and manage multi-tenant environments. Beyond core routing, the platform provides extensive tooling for service maintenance, including automated health checks, model registry synchronization, and content moderation filters. It also features a complete billing and payment infrastructure, allowing administrators to manage user credit balances, process prepaid redemptions, and monitor cost structures across different model vendors. The system is designed for flexible deployment across containerized and distributed infrastructure, with administrative interfaces for auditing usage logs, managing API channels, and configuring global system parameters.
This project is a comprehensive LLM API gateway that provides a unified interface for multiple providers, featuring robust load balancing, token usage tracking, and granular access control to manage diverse AI model endpoints.
GoAPI Key ManagementLoad Balancers
View on GitHub39,722
alibaba/higress
alibaba/higress
7,558View on GitHub
Higress is an AI API gateway and cloud-native traffic manager that functions as a Kubernetes ingress controller. It provides a centralized system for routing, securing, and optimizing traffic directed toward large language models, AI agents, and microservice architectures. The project distinguishes itself through deep AI orchestration, including the ability to host and manage Model Context Protocol servers that transform REST APIs into tools for AI agents. It features specialized AI infrastructure for model request proxying, protocol translation across multiple providers, and semantic-based caching to reduce token consumption and latency. Broad capabilities cover API lifecycle management and traffic control, including canary releases, load balancing, and rate limiting. The system includes a comprehensive security suite with WAF filtering, OIDC and OAuth2 identity integration, and automated TLS certificate management. Extensibility is provided via a WebAssembly-based plugin system that allows for hot-loading custom logic without interrupting traffic. The gateway can be deployed to Kubernetes or Docker and supports the Kubernetes Gateway API and Ingress standards.
Higress is a cloud-native API gateway specifically designed to aggregate and manage traffic for multiple LLM providers and AI agents, offering the requested load balancing, rate limiting, and usage optimization features in a unified interface.
GoLoad BalancersToken Usage AnalyticsLoad Balancing Algorithms
View on GitHub7,558
decolua/9router
decolua/9router
17,690View on GitHub
9router is an AI model gateway designed to route requests from AI coding tools to multiple model providers through a single unified API. It provides administration for self-hosted AI proxy deployments, allowing users to manage API keys and model access on local servers or edge networks. The system differentiates itself through multi-provider API normalization, which translates incompatible request and response formats to ensure compatibility across different AI models. It features AI provider failover management to automatically switch between providers or accounts when quotas are exhausted or errors occur, and implements multi-account rotation to bypass individual provider limits. The gateway covers a broad set of capabilities including token optimization via payload compression, spending analysis and quota tracking, and encrypted configuration synchronization across devices. Traffic management is handled through capability-based routing and outbound proxy support, while security is maintained via API access keys and automated token refreshment. The application supports containerized deployment and can be hosted on local machines, virtual servers, or global edge networks.
9router is a dedicated LLM gateway that provides unified API access, multi-provider failover, account rotation, and usage analytics, making it a comprehensive solution for managing diverse AI model endpoints.
JavaScriptAI GatewaysModel Proxy GatewaysAPI Access Security
View on GitHub17,690
winfunc/opcode
winfunc/opcode
22,083View on GitHub
Opcode is a desktop interface designed for managing AI-assisted software development workflows. It provides a centralized workspace to organize interactive programming sessions, configure specialized automated agents, and maintain oversight of development tasks through a visual environment. The platform distinguishes itself by integrating version control for AI conversations, allowing developers to create checkpoints and branches to navigate, compare, and revert between different interaction states. It also functions as a client for standardized context protocols, enabling the connection of external data sources to provide models with project-specific knowledge. The application includes comprehensive monitoring tools to track real-time token consumption and resource expenditure throughout the development lifecycle. By bridging command-line tools with a graphical interface and utilizing isolated execution environments for agents, it provides a structured approach to managing complex, automated coding projects.
This is a desktop-based AI development environment and agent workspace rather than a server-side API gateway designed to aggregate and proxy LLM traffic for external applications.
TypeScriptLLM Usage MetricsToken Usage AnalyticsAI Cost Monitoring
View on GitHub22,083
truefoundry/cognita
truefoundry/cognita
4,317View on GitHub
Cognita is a retrieval augmented generation orchestration framework used to build pipelines that connect document stores and language models to provide grounded answers. It functions as a document ingestion pipeline and a vector database integrator, managing the process of loading, parsing, and indexing files into a searchable knowledge base. The system includes a language model gateway proxy that provides a unified API to interact with multiple different model providers. This routing layer decouples the application from specific vendors, allowing requests to be proxied through a provider-agnostic interface. The framework covers contextual information retrieval through similarity search and reranking to generate responses with source citations. It supports incremental document indexing to process new or updated files without re-indexing entire datasets and allows for the integration of various vector store implementations.
This repository provides a unified LLM gateway proxy that supports multiple model providers, though its primary focus is on RAG orchestration and document ingestion pipelines rather than acting as a standalone API gateway for traffic management.
PythonLLM Gateways
View on GitHub4,317
fauxpilot/fauxpilot
fauxpilot/fauxpilot
14,732View on GitHub
Fauxpilot is a self-hosted AI coding assistant and local inference server. It functions as a proxy and API gateway that redirects traffic from IDE plugins to a local large language model, allowing for AI-assisted programming without external cloud dependencies. The project provides a specialized API emulation layer that mimics coding assistant protocols and a standardized OpenAI-compatible interface. This enables supported code editors to use local models for completions and suggestions by overriding default proxy URLs. The system includes capabilities for downloading and deploying local models, as well as a format-conversion pipeline to transform model files into optimized versions for specific inference engines. A model-agnostic backend allows for switching between different inference engines while maintaining the same API interfaces.
This project is a specialized inference server and proxy designed specifically for IDE coding assistant protocols rather than a general-purpose LLM API gateway for managing multiple providers and usage analytics.
PythonLocal Model Inference ServersOpenAI-Compatible
View on GitHub14,732
insforge/insforge
InsForge/InsForge
11,794View on GitHub
InsForge is a backend-as-a-service platform that provides an integrated suite of tools for managing relational databases, identity provision, object storage, and serverless compute. It functions as an open-source identity provider and a PostgreSQL database manager featuring integrated vector storage and row-level security. The platform serves as an LLM orchestration gateway, offering a unified endpoint to route requests across various AI providers through an OpenAI-compatible interface. It enables AI-driven application generation and connects AI agents to backend resources using a standardized context protocol. Broad capabilities include comprehensive OAuth and OIDC identity management, an S3-compatible object storage gateway, and a real-time pub-sub engine for database synchronization. The system also covers automated billing and subscription lifecycles with mirrored payment data, as well as serverless function runtimes triggered by HTTP requests or database events. Infrastructure is managed via a backend command-line interface and declarative configuration files.
InsForge functions as an LLM orchestration gateway that provides a unified OpenAI-compatible interface for routing requests across various AI providers, aligning with the core requirements for an LLM proxy.
TypeScriptAPI Key ManagementLLM GatewaysAI Cost Monitoring
View on GitHub11,794
kilo-org/kilocode
Kilo-Org/kilocode
15,616View on GitHub
Kilocode is an autonomous engineering platform designed to orchestrate AI agents for complex software development tasks. It functions as a comprehensive system for automating coding, testing, and repository management by integrating directly with your codebase and terminal. The platform provides a unified gateway for model orchestration, allowing for the management of agentic workflows, event-driven automation, and persistent session state across distributed development environments. The platform distinguishes itself through its federated task management and policy-based access control, which enable secure, collaborative development across independent instances. By maintaining semantic codebase indexing and a centralized model gateway, it ensures that AI agents have context-aware retrieval of project structures while managing authentication, rate limits, and automatic service failover across multiple AI providers. Beyond its core orchestration capabilities, the platform supports a wide range of functional areas including automated code review, security vulnerability triage, and multi-stage workflow planning. It provides granular control over agent permissions and tool execution, allowing teams to define custom operational modes and integrate external services through standardized protocols. The system is designed for extensibility, offering a framework to register custom tools and manage environment configurations through natural language commands. It includes robust monitoring and observability features to track agent performance, token consumption, and organizational adoption metrics.
Kilocode functions as an autonomous engineering platform that includes a centralized model gateway capable of managing multiple AI providers, authentication, rate limiting, and failover, which aligns with the core requirements for an LLM proxy.
TypeScriptLLM GatewaysToken Usage AnalyticsAPI Throttling
View on GitHub15,616
cloudflare/moltworker
cloudflare/moltworker
9,909View on GitHub
Moltworker is an AI agent sandbox and model orchestrator designed for the secure execution of untrusted code and shell commands generated by large language models. It functions as a gateway proxy that routes requests to multiple AI providers through a unified interface, integrating a container runtime backed by S3-compatible object storage to persist state across ephemeral lifecycles. The system distinguishes itself by combining an AI model orchestrator with a headless browser controller for automated web scraping and screenshot capture. It manages the full lifecycle of AI agents, including multi-channel chat integration, consolidated billing across different providers, and expenditure limits to control operational costs. The platform provides a broad suite of capabilities for ephemeral environment hosting, including isolated build pipelines and the exposure of services via preview URLs. It incorporates security and observability tools such as token-based proxy authentication, response caching, and traffic analysis to monitor token usage and request volume. The infrastructure supports real-time interaction through a browser-based terminal interface using WebSocket streaming and monitors filesystem changes for automated build processes.
Moltworker functions as an AI model orchestrator and gateway proxy that provides unified access to multiple providers, usage analytics, and rate limiting, though its primary focus is on secure agent sandboxing rather than acting as a pure-play API gateway.
TypeScriptAI Cost Monitoring
View on GitHub9,909
chopratejas/headroom
chopratejas/headroom
29,537View on GitHub
Headroom is an AI gateway proxy and token optimizer designed to reduce the cost and latency of large language model interactions. It functions as an intermediary that intercepts traffic between clients and providers to apply context compression, request routing, and format translation. The system differentiates itself through a Model Context Protocol server implementation that delivers compression and retrieval tools to compatible AI hosts. It employs a content-aware compression pipeline and tiered importance scoring to trim redundant data from logs and tool outputs while preserving essential information via a reversible local cache. The project covers a broad capability surface including synchronized agent memory systems, semantic vector storage for context management, and AST-based code indexing. It also provides observability tools for tracking token savings, simulating compression effects, and monitoring pipeline performance. The software is implemented in Python and supports standalone proxy deployment.
Headroom functions as an AI gateway proxy that intercepts and routes traffic between clients and LLM providers, providing the core infrastructure needed to manage model interactions and monitor performance.
PythonAI Cost Monitoring
View on GitHub29,537
xtekky/gpt4free
xtekky/gpt4free
66,335View on GitHub
This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models through a single, consistent programming interface. By abstracting provider-specific protocols and authentication requirements, the tool simplifies the development of applications that rely on external AI services. The platform distinguishes itself through a resilient request routing architecture designed to maintain service availability. It features an automated failover mechanism that monitors request status and dynamically switches between secondary providers when primary endpoints encounter errors or rate limits. This capability is complemented by support for both remote API interactions and local model execution, enabling users to run language models directly on their own hardware infrastructure. Beyond core connectivity, the system includes advanced tools for managing complex conversational states and real-time data retrieval. It supports sequential message history to maintain context across long sessions and integrates live web search capabilities to provide up-to-date information. The client also handles multimodal inputs, allowing for the processing of visual content and the generation of images from text descriptions through asynchronous, non-blocking communication patterns.
This project functions as an orchestration layer and proxy that standardizes access to multiple AI providers and local models, providing the unified interface and request routing capabilities required for an LLM gateway.
PythonAI Request RoutersConversation ManagementFailover Strategies
View on GitHub66,335
janhq/jan
janhq/jan
43,043View on GitHub
Jan is a desktop application that functions as a local artificial intelligence model runtime and an open-standard API server. It enables the execution of large language models directly on local hardware, ensuring that data remains private and accessible offline while providing a unified interface for managing model weights and inference runtimes. The platform distinguishes itself by offering a modular inference backend that allows users to swap execution engines based on hardware compatibility and performance needs. It acts as a cross-platform orchestrator, providing the ability to switch between local model files and remote cloud-based AI providers through a single interface. By exposing these capabilities via an open-standard server layer, the application supports the integration of local AI into external software and development tools. Beyond its core runtime capabilities, the software provides an environment for configuring agentic workflows and autonomous task automation. It includes tools for managing server behaviors, such as network access, authentication, and remote tool execution, while maintaining state persistence through a local file-based database. The application is distributed as a cross-platform container to ensure consistent access to local files and system resources across different operating systems.
Jan functions as a local model runtime and API server that provides a unified interface for both local models and remote providers, effectively serving as an API gateway for LLM orchestration.
TypeScriptLocal Model RuntimesDesktop AI RuntimesOpenAI-Compatible Servers
View on GitHub43,043
mudler/localai
mudler/LocalAI
46,889View on GitHub
LocalAI is a self-hosted inference server that enables the execution of machine learning models directly on local hardware. By providing a unified interface for text, image, and audio processing, it allows users to maintain full control over data privacy and infrastructure costs while eliminating dependencies on external network services. The platform functions as an API gateway that mimics standard cloud-based artificial intelligence interfaces, allowing existing applications to integrate local models as drop-in replacements. It utilizes a container-based architecture to package runtimes and dependencies, ensuring consistent deployment across diverse hardware configurations. To optimize system performance, the server employs an on-demand orchestration layer that dynamically loads and unloads models based on active requests, minimizing memory usage during periods of inactivity. The system supports a wide range of model architectures through a flexible backend abstraction that allows for driver switching at runtime. Users can manage their models and interact with the service through a web interface or via standard web requests, which the proxy translates into model-specific execution commands. The software is distributed as a containerized application to facilitate deployment across various server and cloud environments.
LocalAI functions as a self-hosted inference server that provides a unified API interface for local models, effectively acting as a gateway that mimics standard cloud AI providers for your applications.
GoInference ServersLocal Inference EnginesLocal Model Serving
View on GitHub46,889
hkuds/nanobot
HKUDS/nanobot
44,285View on GitHub
Nanobot is an orchestration framework designed for building, deploying, and managing autonomous AI agents. It provides a secure runtime environment that supports persistent memory, multi-step workflow management, and tool integration, allowing agents to maintain context and state across long-running tasks. The platform distinguishes itself through a unified model gateway that normalizes requests across diverse local and remote language models, alongside a multi-channel integration layer that connects agents to various messaging platforms. It enforces security through containerized sandboxing and network policies, ensuring that agent execution remains isolated and controlled. The system includes comprehensive infrastructure for monitoring agent performance through internal tracing pipelines and managing background tasks via an event-driven job queue. It also provides standardized endpoints compatible with common model request formats, enabling interoperability with external applications and development kits.
This is an agent orchestration framework that includes a built-in model gateway capable of normalizing requests across various local and remote LLM providers, serving as a unified interface for your model management needs.
PythonAgent RuntimesAI Agent Orchestration FrameworksAutonomous Agent Orchestration
View on GitHub44,285
lmstudio-ai/lms
lmstudio-ai/lms
4,214View on GitHub
This project is a headless large language model inference engine and server manager designed for local deployments. It provides a developer toolkit and API gateway that allows for the management of model lifecycles and inference tasks without a graphical user interface. The system enables the deployment of model engines across different operating systems, cloud environments, or CI pipelines. It includes a command-line interface for bootstrapping development projects and automating the orchestration of loading and unloading model binaries based on specific workflow needs. The toolset covers infrastructure monitoring through real-time state-streaming logs and application status checks. It further provides a standardized network interface to expose inference capabilities to external software development kits.
This tool functions as a headless inference engine and API gateway specifically designed for managing local model lifecycles and serving them through a standardized network interface, though it lacks the multi-provider aggregation features found in broader LLM proxy solutions.
TypeScriptHeadless ImplementationsLocal Inference EnginesCLI Model Management
View on GitHub4,214

Unified LLM API Gateways

Wei-Shaw/claude-relay-service

mnfst/manifest

BerriAI/litellm

QuantumNous/new-api

alibaba/higress

decolua/9router

winfunc/opcode

truefoundry/cognita

fauxpilot/fauxpilot

InsForge/InsForge

Kilo-Org/kilocode

cloudflare/moltworker

chopratejas/headroom

xtekky/gpt4free

janhq/jan

mudler/LocalAI

HKUDS/nanobot

lmstudio-ai/lms