# vllm-mindspore **Repository Path**: furyliu/vllm-mindspore ## Basic Information - **Project Name**: vllm-mindspore - **Description**: MindSpore的vLLM插件,支持基于vLLM框架部署MindSpore模型的推理服务。 - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 128 - **Created**: 2025-06-12 - **Last Updated**: 2025-07-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
| About MindSpore | vLLM MindSpore SIG | Issue Feedback |
--- *Latest News* 🔥 - [2025/06] Adaptation for vLLM [v0.8.3](https://github.com/vllm-project/vllm/releases/tag/v0.8.3), support for vLLM V1 architecture and the Qwen3 large model. - [2025/04] Adaptation for vLLM [v0.7.3](https://github.com/vllm-project/vllm/releases/tag/v0.7.3), support Automatic Prefix Caching, Chunked Prefill, Multi-step Scheduling, and MTP. In collaboration with the openEuler community and Shanghai Jiao Tong University, we achieved full-stack open-source single-machine inference deployment for DeepSeek. You can read the detailed report [here](https://news.pku.edu.cn/xwzh/e13046c47d03471c8cebb950bd1f4598.htm). - [2025/03] Adaptation for vLLM [v0.6.6.post1](https://github.com/vllm-project/vllm/releases/tag/v0.6.6.post1) supporting the deployment of inference services for large models such as DeepSeek-V3/R1 and Qwen2.5 based on MindSpore using `vllm.entrypoints`. In collaboration with the openEuler community and Peking University, we released a full-stack open-source DeepSeek inference solution. You can read the detailed report [here](https://news.pku.edu.cn/xwzh/e13046c47d03471c8cebb950bd1f4598.htm). - [2025/02] The MindSpore community officially created the [mindspore/vllm-mindspore](https://gitee.com/mindspore/vllm-mindspore) repository, aiming to integrate MindSpore's large model inference capabilities into vLLM. --- # Overview vLLM MindSpore (`vllm-mindspore`) is a plugin brewed by the [MindSpore community](https://www.mindspore.cn/en), which aims to integrate MindSpore LLM inference capabilities into [vLLM](https://github.com/vllm-project/vllm). With vLLM MindSpore, technical strengths of Mindspore and vLLM will be organically combined to provide a full-stack open-source, high-performance, easy-to-use LLM inference solution. vLLM MindSpore plugin aims to integrate Mindspore large models into vLLM and to enable deploying MindSpore-based LLM inference services. It follows the following design principles: - Interface compatibility: support the native APIs and service deployment interfaces of vLLM to avoid adding new configuration files or interfaces, reducing user learning costs and ensuring ease of use. - Minimal invasive modifications: minimize invasive modifications to the vLLM code to ensure system maintainability and evolvability. - Component decoupling: minimize and standardize the coupling between MindSpore large model components and vLLM service components to facilitate the integration of various MindSpore large model suites. On the basis of the above design principles, vLLM MindSpore adopts the system architecture shown in the figure below, and implements the docking between vLLM and Mindspore in categories of components: - Service components: vLLM MindSpore maps PyTorch API calls in service components including LLMEngine and Scheduler to MindSpore capabilities, inheriting support for service functions like Continuous Batching and PagedAttention. - Model components: vLLM MindSpore registers or replaces model components including models, network layers, and custom operators, and integrates MindSpore Transformers, MindSpore One, and other MindSpore large model suites, as well as custom large models, into vLLM.