# Apt-Serve **Repository Path**: plushuang/Apt-Serve ## Basic Information - **Project Name**: Apt-Serve - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-12 - **Last Updated**: 2025-07-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Apt-Serve Code repository for the SIGMOD 25 paper: "Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving".
Apt-Serve is a serving framework prototype implemented on top of vLLM (release version: 0.5.0 post1). All the adds-on by the framework are located in the folder `additional_designs`.
Note that Apt-Serve is a research prototype, which does not support complete features of the up-to-date vLLM. We have only adopted some key parts of the codebase for faster research iterations. ## Getting Started 1. Install the backbone system (vLLM 0.5.0. post1) first. Following guidelines from https://github.com/vllm-project/vllm. 2. Insert the additional designs: ``` bash additional_designs/insert_designs.sh ``` 3. Install the customized cuda kernels to support hybrid cache: ``` python additional_designs/mixed_cache_kernels/mixed_cache_setup.py build_ext --inplace ``` With all these steps completed, the necessary implementation for the new designs has been integrated into vLLM and is ready for usage. ## Sample Serving Traces Following `readme.md` from the folder `sample_requests_from_datasets` to sample requests to create a serving trace. The sampled requests are automatically saved into `./sampled_datasets/` folder. ## Serving Simulation Use OPT-13B as an example.
Start the server side by: ``` python -m vllm.entrypoints.openai.api_server --model facebook/opt-13b --enforce-eager --disable-log-requests ``` After the server side is set up, start the client side code to simulate the request arrivals: ``` python gen_client_requests.py --model facebook/opt-13b --request-rate 3 --cv 1 --dataset sharegpt ``` ## Exemplar Serving Result Comparsion vLLM:
截屏2025-03-31 13 27 24
Apt-Serve:
截屏2025-03-31 13 27 34