17 repository-uri
Translation layers that expose local model capabilities through standardized API endpoints.
Distinguishing note: Focuses on the gateway/proxy aspect of standardizing model outputs for external consumption.
Explore 17 awesome GitHub repositories matching artificial intelligence & ml · Model API Gateways. Refine with filters or upvote what's useful.
Aider is a command-line interface tool that enables large language models to directly edit, refactor, and manage source code within a local repository. It functions as an AI-powered coding assistant that integrates into the developer workflow, allowing users to apply code changes through natural language prompts while maintaining repository context and version control. The tool distinguishes itself through a specialized diff-based patching engine that parses model-generated search-and-replace blocks to modify specific file segments without rewriting entire files. It features a provider-agnost
Aider connects to various language models through a unified API gateway by configuring environment variables and specifying model identifiers during tool startup.
Exo is a distributed inference engine designed to run machine learning models across local hardware. It functions as a network orchestration layer that automatically discovers available devices to form a unified computing cluster, allowing users to scale artificial intelligence workloads by distributing computational tasks across multiple machines. The platform distinguishes itself through its ability to manage the entire lifecycle of local models while providing a standardized gateway for external applications. By translating local model outputs into industry-standard formats, it enables exi
Converts local model outputs into common industry formats to ensure compatibility with existing AI development tools.
Nanobot is an orchestration framework designed for building, deploying, and managing autonomous AI agents. It provides a secure runtime environment that supports persistent memory, multi-step workflow management, and tool integration, allowing agents to maintain context and state across long-running tasks. The platform distinguishes itself through a unified model gateway that normalizes requests across diverse local and remote language models, alongside a multi-channel integration layer that connects agents to various messaging platforms. It enforces security through containerized sandboxing
Exposes standardized API endpoints that normalize interactions with diverse local and remote language models.
This project is an AI model API gateway and proxy server designed to provide a unified interface for interacting with diverse artificial intelligence service providers. It functions as a centralized middleware platform that routes, load balances, and translates API requests across multiple models, enabling developers to access text, image, audio, and video generation capabilities through a single, standardized integration. The gateway distinguishes itself through comprehensive administrative and financial controls, including event-driven usage accounting, real-time token consumption tracking,
Provides a unified API interface for invoking diverse text, image, audio, and video generation models.
Modular is a unified machine learning development platform designed for building, compiling, and deploying high-performance neural network models. It provides a comprehensive execution engine that supports both local and production-grade inference, enabling developers to manage the entire model lifecycle from initial architecture definition to scalable, containerized service deployment. The platform distinguishes itself through a hardware-agnostic runtime that abstracts diverse silicon architectures, allowing models to execute efficiently across varied compute environments. It includes a spec
Standardizes interactions with diverse artificial intelligence backends for text, image, and video generation.
Vercel is a cloud platform for building, deploying, and scaling web applications. It provides a unified infrastructure that automates the build process by detecting project frameworks and distributing static and dynamic content through a global content delivery network. The platform executes application logic using serverless functions that scale automatically based on real-time traffic demand. The platform distinguishes itself through a centralized AI gateway that proxies requests to multiple model providers, enabling standardized authentication, observability, and cost tracking. It supports
Provides endpoints to list available models and retrieve specific configuration details for model management.
This project is an autonomous AI agent framework and workflow orchestrator designed to automate machine learning engineering. It functions as a reasoning engine that reads research papers and writes code to train and deploy machine learning models through iterative reasoning loops and tool execution. The system distinguishes itself by integrating a GPU-accelerated sandboxed execution environment, allowing it to run and verify machine learning scripts in isolated remote containers. It utilizes a model provider integration gateway to route inference requests across various hosted or local endpo
Implements a translation layer to route requests to various hosted or local LLM endpoints via standard APIs.
KoboldCPP is a local large language model inference engine and GGUF model runner designed to execute quantized models on personal hardware. It functions as a multimodal AI server and API gateway, providing OpenAI-compatible endpoints that allow third-party clients to interact with locally hosted models. The project distinguishes itself as an AI storytelling backend, featuring dedicated tools for long-form narrative management through persistent memory, world lore tracking, and character state management. It further extends its capabilities as a multimodal server capable of processing text, im
Provides translation layers that expose local model capabilities through standardized OpenAI-compatible API endpoints.
This project is an on-device AI SDK providing a framework for running large language models, vision models, and speech models locally. It serves as an orchestration layer for local LLM execution, ensuring data privacy and offline availability by utilizing hardware acceleration on the device. The SDK is distinguished by its comprehensive voice and multimodal capabilities, including a coordinated voice pipeline for activity detection, speech-to-text, and text-to-speech synthesis. It also provides a dedicated implementation kit for local retrieval-augmented generation and tools for processing co
Defines model identifiers, download locations, and memory requirements for centralized local management.
Helicone is an AI gateway and observability platform designed to intercept, manage, and monitor interactions with large language models. By acting as a reverse-proxy, it provides a centralized layer for routing requests across multiple AI providers, allowing developers to maintain consistent application logic while gaining deep visibility into model performance, usage, and costs. The platform distinguishes itself through a robust suite of traffic management and prompt engineering tools. It enables policy-driven control, including automatic failover between providers, rate limiting, and edge-b
Routes requests through a unified gateway to generate model outputs while supporting flexible billing and authentication methods.
Llama-swap is a local inference orchestrator and API gateway for large language models. It functions as an OpenAI API proxy that manages the lifecycle of multiple local model servers, automatically starting and stopping them to swap models based on incoming request identifiers. The project distinguishes itself through dynamic model swapping and hardware optimization. It utilizes a specialized matrix-based concurrency control to define which models can run simultaneously and employs cost-based eviction to remove inactive servers from memory based on relative resource costs. The system provide
Provides a translation layer that exposes local model capabilities through standardized API endpoints.
Kokoro-FastAPI is a text-to-speech API and LLM speech synthesis server that generates spoken audio from text via a REST interface. It functions as a Kubernetes-native deployment designed for orchestrated speech synthesis. The system includes a voice blending engine that creates unique vocal profiles by mixing multiple existing voices using custom weight ratios. The service provides real-time audio streaming to reduce latency and generates word-level timestamps for speech synchronization. It manages hardware efficiency through on-demand model loading to optimize VRAM usage and includes system
Exposes the underlying synthesis model and monitoring tools through a FastAPI-based REST gateway.
This project is a headless large language model inference engine and server manager designed for local deployments. It provides a developer toolkit and API gateway that allows for the management of model lifecycles and inference tasks without a graphical user interface. The system enables the deployment of model engines across different operating systems, cloud environments, or CI pipelines. It includes a command-line interface for bootstrapping development projects and automating the orchestration of loading and unloading model binaries based on specific workflow needs. The toolset covers i
Exposes a standardized HTTP interface for local LLM inference, enabling integration with external SDKs.
This repository is a collection of node-based pipeline configurations, examples, and templates for generating AI media. It provides a workflow library and a curated gallery of blueprints designed for creating images, videos, and 3D assets using diffusion models. The project specifically offers a set of pre-configured node graphs for implementing advanced image generation and refinement techniques, with a focus on Stable Diffusion workflows. These examples demonstrate how to interconnect processing nodes to define complex generative logic without writing code. The available templates cover a
Manages third-party model access by routing requests through a unified interface with token-based billing.
Langroid is a multi-agent orchestration framework and tool integration suite designed for building complex AI applications. It serves as a multi-modal integration layer that connects diverse local and remote language models with an agentic retrieval-augmented generation system. The project distinguishes itself through a collaborative message-exchange paradigm, allowing specialized agents to delegate tasks hierarchically and coordinate via structured communication. It features an advanced state management system for conversational AI, including the ability to rewind and prune conversation hist
Provides a translation layer that standardizes API endpoints to allow seamless switching between diverse AI models.
DeepAnalyze is an autonomous data science agent and research pipeline designed to transform raw datasets into comprehensive analysis reports. It operates by generating and executing Python code to perform data preparation, modeling, and visualization. The system utilizes a secure, containerized execution environment to run generated scripts in isolation from the host system. It includes a benchmarking tool to evaluate the accuracy and performance of large language models against standardized data science tasks and a standardized API gateway for managing model completions and file uploads. Th
Implements a standardized server interface for managing chat completions and file uploads to underlying large language models.
ruby_llm is an LLM integration framework and AI agent orchestrator designed to connect applications to multiple large language model providers through a unified interface. It serves as a toolkit for building autonomous assistants with custom personas, managing structured output via JSON schemas, and implementing vector embedding engines for semantic search. The project distinguishes itself as an observability suite and multimodal toolkit. It provides specialized capabilities for tracking token usage, calculating model costs, and tracing workflows via OpenTelemetry, while supporting the proces
Implements interfaces for defining and persisting configuration metadata for external AI models.