# Apt-Serve
**Repository Path**: plushuang/Apt-Serve
## Basic Information
- **Project Name**: Apt-Serve
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-07-12
- **Last Updated**: 2025-07-12
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Apt-Serve
Code repository for the SIGMOD 25 paper: "Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving".
Apt-Serve is a serving framework prototype implemented on top of vLLM (release version: 0.5.0 post1). All the adds-on by the framework are located in the folder `additional_designs`.
Note that Apt-Serve is a research prototype, which does not support complete features of the up-to-date vLLM. We have only adopted some key parts of the codebase for faster research iterations.
## Getting Started
1. Install the backbone system (vLLM 0.5.0. post1) first. Following guidelines from https://github.com/vllm-project/vllm.
2. Insert the additional designs:
```
bash additional_designs/insert_designs.sh
```
3. Install the customized cuda kernels to support hybrid cache:
```
python additional_designs/mixed_cache_kernels/mixed_cache_setup.py build_ext --inplace
```
With all these steps completed, the necessary implementation for the new designs has been integrated into vLLM and is ready for usage.
## Sample Serving Traces
Following `readme.md` from the folder `sample_requests_from_datasets` to sample requests to create a serving trace.
The sampled requests are automatically saved into `./sampled_datasets/` folder.
## Serving Simulation
Use OPT-13B as an example.
Start the server side by:
```
python -m vllm.entrypoints.openai.api_server --model facebook/opt-13b --enforce-eager --disable-log-requests
```
After the server side is set up, start the client side code to simulate the request arrivals:
```
python gen_client_requests.py --model facebook/opt-13b --request-rate 3 --cv 1 --dataset sharegpt
```
## Exemplar Serving Result Comparsion
vLLM:
Apt-Serve:
