# vllm-project/semantic-router

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/vllm-project-semantic-router).**

3,205 stars · 536 forks · Go · apache-2.0

## Links

- GitHub: https://github.com/vllm-project/semantic-router
- Homepage: https://vllm-semantic-router.com
- awesome-repositories: https://awesome-repositories.com/repository/vllm-project-semantic-router.md

## Topics

`ai-gateway` `bert-classification` `fine-tuning` `golang` `huggingface-candle` `huggingface-transformers` `kubernetes` `llm` `llmrouter` `mcp` `mixture-of-models` `pii-detection` `prompt-engineering` `prompt-guard` `rust` `semantic-router` `vllm`

## Tags

### Artificial Intelligence & ML

- [Inference Gateways](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-gateways.md) — Accepts OpenAI-style chat and responses API requests and dispatches them to the appropriate backend model.
- [Intent Classification Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/agent-architectures/user-intent-modeling/intent-classification-pipelines.md) — Reads intent, domain, and safety profile of requests using purpose-built encoders before selecting a handling model. ([source](https://vllm-semantic-router.com/))
- [Model Request Routing](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-model-clients/model-request-routing.md) — Sends each request to the model that best balances quality, cost, latency, and privacy. ([source](https://vllm-semantic-router.com/))
- [Cost-Aware Model Routers](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-model-clients/model-request-routing/cost-aware-model-routers.md) — Routes routine traffic to cheaper models and reserves expensive frontier models for requests that need them. ([source](https://vllm-semantic-router.com/))
- [AI Request Routing](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-request-routing.md) — Routes AI inference requests to the optimal model based on semantic meaning, cost, and safety signals.
- [Chat Completion Services](https://awesome-repositories.com/f/artificial-intelligence-ml/chat-completion-services.md) — Accepts OpenAI-style chat completion requests and routes them to appropriate backends. ([source](https://vllm-semantic-router.com/docs/api/router))
- [Inference Routing Protocols](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-routing-protocols.md) — Specifies a content-level classification and semantic routing framework as an IETF protocol. ([source](https://vllm-semantic-router.com/publications/))
- [Inference Cost Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-training/cost-optimization-strategies/inference-cost-optimizers.md) — Sends routine traffic to cheaper models and reserves expensive frontier reasoning only for requests that need it. ([source](https://vllm-semantic-router.com/))
- [Real-Time Safety Enforcers](https://awesome-repositories.com/f/artificial-intelligence-ml/llm-jailbreak-protections/real-time-safety-enforcers.md) — Detects and blocks jailbreak attempts, PII leaks, and hallucinations in real time during request routing.
- [Multi-Provider Abstractions](https://awesome-repositories.com/f/artificial-intelligence-ml/model-provider-integrations/multi-provider-abstractions.md) — Routes inference requests across local, private, and frontier models through a single normalizing layer.
- [GPU Fleet Orchestrators](https://awesome-repositories.com/f/artificial-intelligence-ml/model-provider-management/gpu-fleet-orchestrators.md) — Manages model selection, fleet sizing, and cost optimization across heterogeneous GPU fleets and providers.
- [Model Routing](https://awesome-repositories.com/f/artificial-intelligence-ml/model-routing.md) — Matches each query to the best model based on extracted signals for efficient mixture-of-models collaboration. ([source](https://vllm-semantic-router.com/docs/overview/goals))
- [Model Routing Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-routing-layers.md) — Routes requests across local, private, and frontier models through a single layer from edge devices to the cloud. ([source](https://vllm-semantic-router.com/))
- [Model Selection Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/model-selection-tools.md) — Routes simple tasks to smaller models and complex tasks to larger ones, minimizing token usage and expense. ([source](https://vllm-semantic-router.com/install.sh))
- [Cost-Performance Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-selection-tools/cost-performance-optimizers.md) — Routes simple tasks to cheaper models and complex tasks to larger ones to minimize cost. ([source](https://vllm-semantic-router.com/install.sh))
- [Routing Signal Extractors](https://awesome-repositories.com/f/artificial-intelligence-ml/routing-signal-extractors.md) — Extracts request, safety, follow-up, and preference signals from 16 families to inform routing decisions. ([source](https://vllm-semantic-router.com/docs/intro))
- [Semantic Routers](https://awesome-repositories.com/f/artificial-intelligence-ml/semantic-routers.md) — Routes inference requests to the optimal model based on semantic meaning, cost, latency, and safety signals.
- [Semantic Signal Fusion Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/semantic-signal-fusion-engines.md) — Combines multiple semantic signals with AND/OR logic to select the optimal model and configuration.
- [Fleet Sizing Dashboards](https://awesome-repositories.com/f/artificial-intelligence-ml/autonomous-agent-simulations/simulation-dashboards/fleet-sizing-dashboards.md) — Provides a dashboard for simulating and sizing LLM GPU fleets to meet latency targets. ([source](https://vllm-semantic-router.com/docs/fleet-sim/overview))
- [Cross-Encoder Rerankers](https://awesome-repositories.com/f/artificial-intelligence-ml/document-rerankers/cross-encoder-rerankers.md) — Applies joint cross-attention scoring to query-candidate pairs for high-precision reranking. ([source](https://vllm-semantic-router.com/))
- [Hallucination Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/hallucination-detection.md) — Analyzes token-level output in real time to flag factual inaccuracies as the model generates text. ([source](https://vllm-semantic-router.com/docs/v0.1/intro))
- [Responses API Translators](https://awesome-repositories.com/f/artificial-intelligence-ml/openai-api-clients/responses-api-translators.md) — Accepts OpenAI Responses API requests and translates them to chat completions for routing. ([source](https://vllm-semantic-router.com/docs/api/router))
- [Reasoning Need Detectors](https://awesome-repositories.com/f/artificial-intelligence-ml/reasoning-models/reasoning-need-detectors.md) — Determines whether a query requires reasoning and applies expensive reasoning models only when beneficial. ([source](https://vllm-semantic-router.com/publications/))
- [Token Budget Routers](https://awesome-repositories.com/f/artificial-intelligence-ml/reasoning-token-budgeting/token-budget-routers.md) — Estimates token budgets and dispatches requests to short or long context pools to cut fleet cost. ([source](https://vllm-semantic-router.com/publications/))
- [Context Length Routers](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-training-pipelines/rl-training-workflows/progressive-context-length-scaling/context-length-routers.md) — Routes requests based on context length to improve energy efficiency and reduce fleet cost. ([source](https://vllm-semantic-router.com/publications/))
- [Request-Response Translation Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/request-response-translation-layers.md) — Converts OpenAI-style chat requests to Anthropic's Messages API format and translates responses back. ([source](https://vllm-semantic-router.com/docs/api/router))
- [Cross-Encoder Rerankers](https://awesome-repositories.com/f/artificial-intelligence-ml/result-reranking/cross-encoder-rerankers.md) — Applies joint cross-attention scoring to query-candidate pairs for high-precision reranking. ([source](https://vllm-semantic-router.com/))
- [RAG Grounding Verifiers](https://awesome-repositories.com/f/artificial-intelligence-ml/retrieval-augmented-generation/rag-grounding-verifiers.md) — Checks long-document RAG responses for grounding against source contexts up to 32K tokens in real time. ([source](https://vllm-semantic-router.com/publications/))
- [Domain Classifiers](https://awesome-repositories.com/f/artificial-intelligence-ml/text-classifiers/text-classifier-fine-tuning/domain-classifiers.md) — Routes queries to specialized models based on academic or professional domains using a fine-tuned classifier. ([source](https://vllm-semantic-router.com/docs/training/training-overview))

### Part of an Awesome List

- [Sensitive Data Redaction](https://awesome-repositories.com/f/awesome-lists/devtools/information-extraction/sensitive-data-identification/sensitive-data-redaction.md) — Scans requests and responses for sensitive data and applies configurable policies to protect privacy. ([source](https://vllm-semantic-router.com/docs/v0.1/intro))

### Data & Databases

- [Workload-Based Model Selectors](https://awesome-repositories.com/f/data-databases/model-as-a-table-integrations/request-routing-by-model-id/workload-based-model-selectors.md) — Routes each inference request to the model best suited for its task, optimizing for latency, cost, or accuracy. ([source](https://vllm-semantic-router.com/vision-paper))
- [Semantic Caching](https://awesome-repositories.com/f/data-databases/response-caching/semantic-caching.md) — Caches responses for semantically similar queries using vector-based matching to reduce latency and cost. ([source](https://vllm-semantic-router.com/docs/intro))
- [Semantic Query Routing](https://awesome-repositories.com/f/data-databases/semantic-query-routing.md) — Routes inference requests to the optimal model based on semantic meaning of the input. ([source](https://vllm-semantic-router.com/publications/))
- [Category-Based Caches](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/caching-performance/caching-strategies/query-result-caching/category-based-caches.md) — Caches query results by category with per-category similarity thresholds and TTLs. ([source](https://vllm-semantic-router.com/publications/))
- [Memory-Augmented Model Routers](https://awesome-repositories.com/f/data-databases/indexing-and-search/recall-optimization/conversation-memory-retrieval/memory-augmented-model-routers.md) — Uses conversational memory and retrieval to let lightweight models match larger model performance on persistent queries. ([source](https://vllm-semantic-router.com/publications/))
- [Difficulty-Based Routers](https://awesome-repositories.com/f/data-databases/model-as-a-table-integrations/request-routing-by-model-id/difficulty-based-routers.md) — Estimates action difficulty for agent steps and routes to the cheapest model meeting a reliability threshold. ([source](https://vllm-semantic-router.com/publications/))
- [Semantic Search](https://awesome-repositories.com/f/data-databases/semantic-search.md) — Encodes queries and candidates into dense vectors to find semantically similar matches for caching or retrieval. ([source](https://vllm-semantic-router.com/))

### Development Tools & Productivity

- [Global Defaults](https://awesome-repositories.com/f/development-tools-productivity/global-defaults.md) — Provides typed built-in defaults for router, services, stores, and model catalog settings that can be selectively overridden. ([source](https://vllm-semantic-router.com/docs/proposals/unified-config-contract-v0-3))
- [Research Surfaces](https://awesome-repositories.com/f/development-tools-productivity/history-rewriting/metadata-rewriters/research-surfaces.md) — Serves as a research surface that lets teams add new signals, algorithms, and plugins without rewriting the serving path. ([source](https://vllm-semantic-router.com/docs/v0.3/intro))
- [Multi-Function Plugin Chains](https://awesome-repositories.com/f/development-tools-productivity/plugin-systems/custom-plugin-registrations/request-processing-plugins/multi-function-plugin-chains.md) — Extends request/response processing with plugins for caching, security, data protection, and threat detection. ([source](https://vllm-semantic-router.com/docs/intro))
- [Toggleable Plugin Chains](https://awesome-repositories.com/f/development-tools-productivity/plugin-systems/custom-plugin-registrations/request-processing-plugins/toggleable-plugin-chains.md) — Inspects and modifies requests and responses through an extensible plugin chain with per-decision toggling. ([source](https://vllm-semantic-router.com/docs/v0.3/intro))
- [Dense Vector Rankers](https://awesome-repositories.com/f/development-tools-productivity/search-ranking-algorithms/ai-based-relevance-ranking/dense-vector-rankers.md) — Encodes queries and candidates into dense vectors for similarity search and relevance scoring. ([source](https://vllm-semantic-router.com/))

### Graphics & Multimedia

- [AI Model Selection Signals](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-recording/audio-signal-mirroring/signal-routing/ai-model-selection-signals.md) — Combines multiple semantic signals with AND/OR logic to select the best model for each request. ([source](https://vllm-semantic-router.com/docs/v0.3/intro))

### Networking & Communication

- [LLM Threat Interceptors](https://awesome-repositories.com/f/networking-communication/traffic-interception/llm-threat-interceptors.md) — Intercepts and blocks jailbreak, PII, and hallucination risks before they reach a model. ([source](https://vllm-semantic-router.com/))
- [Signal-Based Routing Policies](https://awesome-repositories.com/f/networking-communication/traffic-routing-policies/signal-based-routing-policies.md) — Evaluates requests against configurable signals and projection rules to select the best model and route. ([source](https://vllm-semantic-router.com/docs/intro))
- [Signal-Based Routing Rules](https://awesome-repositories.com/f/networking-communication/traffic-routing-rules/signal-based-routing-rules.md) — Evaluates AND/OR decision rules over extracted signals and projections to select the active route and model candidates. ([source](https://vllm-semantic-router.com/docs/intro))

### Operating Systems & Systems Programming

- [Input Modality Detectors](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management-systems/multi-modal-memory-routing/input-modality-detectors.md) — Detects text, image, or audio inputs and routes them to capable models. ([source](https://vllm-semantic-router.com/))

### Programming Languages & Runtimes

- [AI Inference](https://awesome-repositories.com/f/programming-languages-runtimes/domain-specific-languages/routing-dsls/ai-inference.md) — Defines routing logic using a DSL that owns model cards, signals, projections, and decisions. ([source](https://vllm-semantic-router.com/docs/proposals/unified-config-contract-v0-3))

### Security & Cryptography

- [Adversarial Input Detection](https://awesome-repositories.com/f/security-cryptography/adversarial-robustness-testing/adversarial-input-generation/adversarial-input-detection.md) — Identifies jailbreak attempts and prompt injections in real time to prevent unauthorized model behavior. ([source](https://vllm-semantic-router.com/docs/v0.1/intro))
- [Jailbreak Detectors](https://awesome-repositories.com/f/security-cryptography/adversarial-robustness-testing/adversarial-input-generation/adversarial-input-detection/jailbreak-detectors.md) — Identifies and blocks attempts to circumvent AI safety measures using a binary classification model. ([source](https://vllm-semantic-router.com/docs/training/training-overview))
- [Prompt Injection Detectors](https://awesome-repositories.com/f/security-cryptography/adversarial-robustness-testing/adversarial-input-generation/adversarial-input-detection/prompt-injection-detectors.md) — Blocks prompt injection and jailbreak attempts by inspecting incoming requests for malicious patterns. ([source](https://vllm-semantic-router.com/docs/intro))
- [Multi-Threat Policy Enforcers](https://awesome-repositories.com/f/security-cryptography/infrastructure-policy-enforcement/security-policy-enforcers/multi-threat-policy-enforcers.md) — Detects PII, jailbreak attempts, and hallucinations while logging all security decisions for audit trails. ([source](https://vllm-semantic-router.com/install.sh))
- [LLM Safety Enforcers](https://awesome-repositories.com/f/security-cryptography/license-compliance-tools/compliance-enforcement/llm-safety-enforcers.md) — Enforces safety and compliance by blocking jailbreak, PII, and hallucination threats at the routing layer. ([source](https://vllm-semantic-router.com/docs/intro))
- [PII Detection and Screening](https://awesome-repositories.com/f/security-cryptography/pii-detection-and-screening.md) — Identifies personal data in queries to protect user privacy using a dedicated classification model. ([source](https://vllm-semantic-router.com/docs/training/training-overview))
- [LLM Safety Enforcers](https://awesome-repositories.com/f/security-cryptography/request-authorization-enforcers/llm-safety-enforcers.md) — Enforces safety by blocking jailbreak, PII, and hallucination threats at the routing decision layer. ([source](https://vllm-semantic-router.com/docs/v0.3/intro))
- [LLM Request Scanners](https://awesome-repositories.com/f/security-cryptography/safety-profile-enforcers/llm-request-scanners.md) — Scans incoming requests for jailbreak attempts, PII leaks, and hallucinations before model execution.
- [ML Policy Conflict Detectors](https://awesome-repositories.com/f/security-cryptography/authorization-policies/authorization-policy-enforcement/conflict-resolution-policies/ml-policy-conflict-detectors.md) — Identifies when probabilistic ML predicates in routing policies silently co-fire on the same query. ([source](https://vllm-semantic-router.com/publications/))
- [Security Decision Loggers](https://awesome-repositories.com/f/security-cryptography/compliance-audit-tools/authorization-audit-trails/security-decision-loggers.md) — Logs all security decisions and applies model-specific PII policies to meet regulatory requirements. ([source](https://vllm-semantic-router.com/install.sh))
- [Token-Level Sensitive Span Detectors](https://awesome-repositories.com/f/security-cryptography/pii-detection-and-screening/token-level-sensitive-span-detectors.md) — Labels individual tokens to identify PII and safety-sensitive spans requiring localized intervention. ([source](https://vllm-semantic-router.com/))

### Software Engineering & Architecture

- [YAML Configuration Files](https://awesome-repositories.com/f/software-engineering-architecture/application-lifecycle-management/configuration-management/configuration-formats-and-schemas/yaml-configuration-files.md) — Edits canonical config.yaml files to define listeners, providers, and routing rules for AI inference. ([source](https://vllm-semantic-router.com/docs/installation))
- [Policy-to-Artifact Compilers](https://awesome-repositories.com/f/software-engineering-architecture/declarative-spec-compilers/policy-to-artifact-compilers.md) — Translates YAML policy files into verified decision nodes, Kubernetes artifacts, and protocol gates.
- [Orchestration Artifact Compilers](https://awesome-repositories.com/f/software-engineering-architecture/declarative-task-definitions/policy-feature-declarations/orchestration-artifact-compilers.md) — Compiles declarative routing policies into Kubernetes artifacts and protocol-boundary gates for deployment. ([source](https://vllm-semantic-router.com/publications/))
- [Routing Plugin Systems](https://awesome-repositories.com/f/software-engineering-architecture/integration-extensibility/extensibility/plugin-architectures/developer-authoring-interfaces/custom-module-implementations/module-functionality-extenders/plugin-extenders/routing-plugin-systems.md) — Adds new processing logic and signal types to routing behavior through configuration without modifying core code. ([source](https://vllm-semantic-router.com/install.sh))
- [Request-Response Filter Chains](https://awesome-repositories.com/f/software-engineering-architecture/interceptor-sequences/request-response-filter-chains.md) — Inspects and modifies requests and responses through an extensible chain of togglable plugins.
- [Routing Logic Decouplers](https://awesome-repositories.com/f/software-engineering-architecture/prompt-and-code-decoupling/routing-logic-decouplers.md) — Moves routing logic out of application code into reusable signals, decisions, and configuration. ([source](https://vllm-semantic-router.com/docs/v0.3/intro))
- [Web-Based Configuration Dashboards](https://awesome-repositories.com/f/software-engineering-architecture/application-lifecycle-management/configuration-management/configuration-interfaces-and-editors/web-based-configuration-generators/web-based-configuration-dashboards.md) — Walks through model setup, preset selection, and config activation using a browser-based interface. ([source](https://vllm-semantic-router.com/docs/installation))
- [LLM Fleet Capacity Planners](https://awesome-repositories.com/f/software-engineering-architecture/capacity-planning/llm-fleet-capacity-planners.md) — Sizes multi-pool LLM GPU fleets against P99 time-to-first-token targets using discrete-event simulation. ([source](https://vllm-semantic-router.com/publications/))
- [Workload CDF Optimizers](https://awesome-repositories.com/f/software-engineering-architecture/capacity-planning/llm-fleet-capacity-planners/workload-cdf-optimizers.md) — Derives the minimum-cost two-pool LLM fleet directly from the workload cumulative distribution function and latency target. ([source](https://vllm-semantic-router.com/publications/))

### User Interface & Experience

- [Signal Combination Logic](https://awesome-repositories.com/f/user-interface-experience/search-filtering-logic/combinator-logic/signal-combination-logic.md) — Combines multiple semantic signals with AND/OR logic to select the optimal model and configuration. ([source](https://vllm-semantic-router.com/docs/overview/goals))

### Web Development

- [AI Provider Routing](https://awesome-repositories.com/f/web-development/api-endpoint-configurations/service-endpoint-configurations/ai-provider-routing.md) — Extends inference APIs to support multi-provider routing for agentic AI workloads. ([source](https://vllm-semantic-router.com/publications/))
- [Multimodal Response Normalizers](https://awesome-repositories.com/f/web-development/api-management-tools/api-request-handling/multimodal-request-normalizers/multimodal-response-normalizers.md) — Routes multimodal and image-generation requests to backends and normalizes responses. ([source](https://vllm-semantic-router.com/docs/api/router))
- [Provider-Agnostic LLM Routing](https://awesome-repositories.com/f/web-development/provider-agnostic-llm-routing.md) — Coordinates requests across local, private, and frontier models using a single declarative policy layer.
- [AI Model](https://awesome-repositories.com/f/web-development/routing-strategies/ai-model.md) — Adds new signals, algorithms, and plugins for AI model routing without rewriting the serving path. ([source](https://vllm-semantic-router.com/docs/intro))

### Business & Productivity Software

- [Spending Controls](https://awesome-repositories.com/f/business-productivity-software/payment-integrations/spending-controls.md) — Reserves expensive model capabilities for high-value requests and uses caching and routing to reduce waste. ([source](https://vllm-semantic-router.com/docs/intro))
- [AI Token Spend Controllers](https://awesome-repositories.com/f/business-productivity-software/payment-integrations/spending-controls/ai-token-spend-controllers.md) — Reserves premium models and long context for high-value requests using caching and context-aware routing. ([source](https://vllm-semantic-router.com/docs/intro))

### DevOps & Infrastructure

- [Fleet Sizing What-If Simulators](https://awesome-repositories.com/f/devops-infrastructure/deployment-orchestration/deployment-simulators/fleet-sizing-what-if-simulators.md) — Replays traces and tests planning assumptions through simulation to validate fleet-sizing decisions. ([source](https://vllm-semantic-router.com/docs/fleet-sim/overview))
- [GPU Fleet Capacity Simulators](https://awesome-repositories.com/f/devops-infrastructure/gpu-fleet-capacity-simulators.md) — Sizes multi-pool LLM GPU fleets against latency targets using discrete-event simulation.
- [GPU Fleet Simulators](https://awesome-repositories.com/f/devops-infrastructure/gpu-fleet-capacity-simulators/gpu-fleet-simulators.md) — Simulates homogeneous, heterogeneous, or disaggregated GPU fleets to determine the configuration that meets a given latency target. ([source](https://vllm-semantic-router.com/docs/fleet-sim/overview))
- [GPU Fleet Cost Comparators](https://awesome-repositories.com/f/devops-infrastructure/resource-cost-management/cost-estimators/gpu-fleet-cost-comparators.md) — Compares yearly cost across GPU types, routing policies, and threshold settings for fleet optimization. ([source](https://vllm-semantic-router.com/docs/fleet-sim/overview))
