# semantic-router **Repository Path**: mirrors_trending/semantic-router ## Basic Information - **Project Name**: semantic-router - **Description**: Intelligent Mixture-of-Models Router for Efficient LLM Inference - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-13 - **Last Updated**: 2025-10-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
vLLM Semantic Router [![Documentation](https://img.shields.io/badge/docs-read%20the%20docs-blue)](https://vllm-semantic-router.com) [![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Community-yellow)](https://huggingface.co/LLM-Semantic-Router) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE) [![Crates.io](https://img.shields.io/crates/v/candle-semantic-router.svg)](https://crates.io/crates/candle-semantic-router) ![Test And Build](https://github.com/vllm-project/semantic-router/workflows/Test%20And%20Build/badge.svg) **📚 [Complete Documentation](https://vllm-semantic-router.com) | 🚀 [Quick Start](https://vllm-semantic-router.com/docs/installation) | 📣 [Blog](https://vllm-semantic-router.com/blog/) | 📖 [API Reference](https://vllm-semantic-router.com/docs/api/router/)** ![code](./website/static/img/code.png)
## Innovations ✨ ![architecture](./website/static/img/architecture.png) ### Intelligent Routing 🧠 #### Auto-Reasoning and Auto-Selection of Models An **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on **Semantic Understanding** of the request's intent (Complexity, Task, Tools). This is achieved using BERT classification. Conceptually similar to Mixture-of-Experts (MoE) which lives *within* a model, this system selects the best *entire model* for the nature of the task. As such, the overall inference accuracy is improved by using a pool of models that are better suited for different types of tasks: ![Model Accuracy](./website/static/img/category_accuracies.png) The screenshot below shows the LLM Router dashboard in Grafana. ![LLM Router Dashboard](./website/static/img/grafana_screenshot.png) The router is implemented in two ways: - Golang (with Rust FFI based on the [candle](https://github.com/huggingface/candle) rust ML framework) - Python Benchmarking will be conducted to determine the best implementation. #### Auto-Selection of Tools Select the tools to use based on the prompt, avoiding the use of tools that are not relevant to the prompt so as to reduce the number of prompt tokens and improve tool selection accuracy by the LLM. #### Category-Specific System Prompts Automatically inject specialized system prompts based on query classification, ensuring optimal model behavior for different domains (math, coding, business, etc.) without manual prompt engineering. ### Enterprise Security 🔒 #### PII detection Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the privacy of the user. #### Prompt guard Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving. ### Similarity Caching ⚡️ Cache the semantic representation of the prompt so as to reduce the number of prompt tokens and improve the overall inference latency. ### Distributed Tracing 🔍 Comprehensive observability with OpenTelemetry distributed tracing provides fine-grained visibility into the request processing pipeline: - **Request Flow Tracing**: Track requests through classification, security checks, caching, and routing - **Performance Analysis**: Identify bottlenecks with detailed timing for each operation - **Security Monitoring**: Trace PII detection and jailbreak prevention operations - **Routing Decisions**: Understand why specific models were selected - **OpenTelemetry Standard**: Industry-standard tracing with support for Jaeger, Tempo, and other OTLP backends See [Distributed Tracing Guide](https://vllm-semantic-router.com/docs/tutorials/observability/distributed-tracing/) for complete setup instructions. ## Documentation 📖 For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit: **👉 [Complete Documentation at Read the Docs](https://vllm-semantic-router.com/)** The documentation includes: - **[Installation Guide](https://vllm-semantic-router.com/docs/installation/)** - Complete setup instructions - **[System Architecture](https://vllm-semantic-router.com/docs/overview/architecture/system-architecture/)** - Technical deep dive - **[Model Training](https://vllm-semantic-router.com/docs/training/training-overview/)** - How classification models work - **[API Reference](https://vllm-semantic-router.com/docs/api/router/)** - Complete API documentation - **[Distributed Tracing](https://vllm-semantic-router.com/docs/tutorials/observability/distributed-tracing/)** - Observability and debugging guide ## Community 👋 For questions, feedback, or to contribute, please join `#semantic-router` channel in vLLM Slack. ## Citation If you find Semantic Router helpful in your research or projects, please consider citing it: ``` @misc{semanticrouter2025, title={vLLM Semantic Router}, author={vLLM Semantic Router Team}, year={2025}, howpublished={\url{https://github.com/vllm-project/semantic-router}}, } ``` ## Star History 🔥 We opened the project at Aug 31, 2025. We love open source and collaboration ❤️ [![Star History Chart](https://api.star-history.com/svg?repos=vllm-project/semantic-router&type=Date)](https://www.star-history.com/#vllm-project/semantic-router&Date)