# MSD **Repository Path**: vllm/msd ## Basic Information - **Project Name**: MSD - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-31 - **Last Updated**: 2025-08-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Multimodal Speculative Decoding (MSD) 📄 [**Paper on arXiv**](https://arxiv.org/pdf/2505.14260) *Speculative Decoding Reimagined for Multimodal Large Language Models* --- ## 🧠 MSD Models You can directly use the Multimodal Speculative Decoding (MSD) models available on Hugging Face: - **MSD-LLaVA1.5-7B**: [lucylyn/MSD-LLaVA1.5-7B](https://huggingface.co/lucylyn/MSD-LLaVA1.5-7B) - **MSD-LLaVA1.5-13B**: [lucylyn/MSD-LLaVA1.5-13B](https://huggingface.co/lucylyn/MSD-LLaVA1.5-13B) - **MSD-Qwen2VL-7B-Instruct**: [lucylyn/MSD-Qwen2VL-7B-Instruct](https://huggingface.co/lucylyn/MSD-Qwen2VL-7B-Instruct) --- ## 🧱 1. Setup & Installation ```bash conda create -n msd python=3.10 -y conda activate msd # Ensure CUDA 12.1 is installed and configured cd LLaVA pip install -e . cd ../EAGLE pip install -e . cd ../lmms-eval pip install -e . ``` --- ## 📥 2. Download Datasets Download the annotations used for instruction tuning: * [`ShareGPT_V3_unfiltered_cleaned_split.json`](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/blob/main/ShareGPT_V3_unfiltered_cleaned_split.json) * [`llava_v1_5_mix665k.json`](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json) > ⚠️ Before use, process `llava_v1_5_mix665k.json` with [`EAGLE/eagle/ge_data/convert.py`](EAGLE/eagle/ge_data/convert.py) to fix formatting issues. Then download the image data from the following datasets: * **COCO**: [train2017](http://images.cocodataset.org/zips/train2017.zip) * **GQA**: [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip) * **OCR-VQA**: [Download script (Google Drive)](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing) > 💡 Make sure all OCR-VQA images are saved as `.jpg` * **TextVQA**: [train\_val\_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip) * **Visual Genome**: [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip) After downloading, organize the data under `./image_data` in the following structure: ``` ├── coco │ └── train2017 ├── gqa │ └── images ├── ocr_vqa │ └── images ├── textvqa │ └── train_images └── vg ├── VG_100K └── VG_100K_2 ``` --- ## ⚙️ 3. Data Processing Use the following script to generate training data. You can control the target model by setting the `--model_type` argument (e.g., `llava_v15_t/v` or `qwen2_vl_t/v`): ```bash cd EAGLE/eagle/ge_data CUDA_VISIBLE_DEVICES=0 python -m eagle.ge_data.allocation \ --outdir \ --model_type \ --model \ --image_data_path \ --json_data_path ``` --- ## 🏋️ 4. Train the Model Use DeepSpeed to train the speculative decoding model. Modify the following paths according to your setup: ```bash cd EAGLE/eagle/train deepspeed --master_port 29504 --include localhost:0 main_deepspeed.py \ --deepspeed_config ds_config.json \ --tmpdir_v \ --tmpdir_t \ --basepath \ --cpdir \ --config ``` **Parameters:** * ``: directory containing preprocessed visual data * ``: directory containing preprocessed text data * ``: training configuration file, e.g., `llava_v15_7B_config.json` --- ## 📊 5. Evaluate the Model Run evaluation with `lmms-eval`. The following example evaluates on the `ChartQA` task: ```bash CUDA_VISIBLE_DEVICES=0 accelerate launch --num_processes=1 --main_process_port=29506 -m lmms_eval \ --model \ --model_args pretrained="" \ --msd_model_path \ --tasks chartqa \ --batch_size 1 \ --gen_kwargs temperature=0 \ --use_msd \ ``` **Parameters:** * ``: short name identifier of your model, e.g., `llava_msd` or `qwen2_vl_msd` * ``: path to the base pretrained model * ``: path to the MSD model ---