# LightGen
**Repository Path**: shawn2020/LightGen
## Basic Information
- **Project Name**: LightGen
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-19
- **Last Updated**: 2025-12-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization
Official PyTorch Implementation
#### [HF Checkpoint 🚀](https://huggingface.co/Beckham808/LightGen) | [Technical Report 📝](https://arxiv.org/pdf/2503.08619) | [机器之心 🤩](https://mp.weixin.qq.com/s/CNZoIxRbOO37rbEgwqLNQg) | [量子位 🤩](https://mp.weixin.qq.com/s/9dyep_QSeJSZh61jCEzjkg) | [HKUST AIS 🤩](https://ais.hkust.edu.hk/whats-happening/news/research-team-led-prof-harry-yang-developed-lightgen-budget-friendly-ai-image)
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization
Xianfeng Wu1, 2#
·
Yajing Bai1, 2#
·
Haoze Zheng1, 2#
·
Harold (haodong) Chen1, 2#
·
Yexin Liu1, 2#
·
Zihao Wang1, 2
·
Xuran Ma1, 2
·
Wenjie Shu1, 2
·
Xianzu Wu1, 2
·
Harry Yang1, 2*
·
Sernam Lim2, 3*
1 HKUST AMC, 2 Everlyn AI, 3 UCF CS, #Equal contribution, * Corresponding Author
This is a PyTorch/GPU implementation of [LightGen](https://arxiv.org/abs/2503.08619), this repo wants to provide an efficient pre-training pipeline for text-to-image generation based on [Fluid](https://arxiv.org/pdf/2410.13863)/[MAR](https://github.com/LTH14/mar)
## 🦉 ToDo List
- [ ] DPO Post-proceesing Code Released
- [ ] Release Complete Checkpoint.
- [ ] Add Accelerate Module.
## Env
```bash
conda create -n everlyn_video python=3.10
conda activate everlyn_video
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
# pip install -U xformers==0.0.26 --index-url https://download.pytorch.org/whl/cu121
pip install -r requierments.txt
```
## Prepare stage
```bash
huggingface-cli download --token hf_ur_token --resume-download stabilityai/stable-diffusion-3.5-large --local-dir stable-diffusion-3.5-large # Image VAE
huggingface-cli download --resume-download google/flan-t5-xxl --local-dir google/flan-t5-xxl # Text Encoder
huggingface-cli download --repo-type dataset --resume-download jackyhate/text-to-image-2M --local-dir text-to-image-2M # Dataset
```
untar script for text-to-image2M
```bash
#!/bin/bash
# Check if the 'untar' directory exists, and create it if it does not
mkdir -p untar
# Loop through all .tar files
for tar_file in *.tar; do
# Extract the numeric part, for example 00001, 00002, ...
dir_name=$(basename "$tar_file" .tar)
# Create the corresponding directory
mkdir -p "untar/$dir_name"
# Extract the tar file to the corresponding directory
tar -xvf "$tar_file" -C "untar/$dir_name"
echo "Extraction completed: $tar_file to untar/$dir_name"
done
echo "All files have been extracted."
```
It may too large to cost much time to read this data in normal dataset, so we need to generate a json file first
to accelerate this process, modify `scripts/generate_txt.py` then run it.
```bash
python generate_json.py
```
## Training
Script for the default setting, u can modify some setting in `scripts/run.sh`:
```bash
sh run.sh
```
## Inference
Script for the default setting:
```bash
python pipeline_image.py
```
## Acknowledgements
A large portion of codes in this repo is based on [MAR](https://github.com/LTH14/mar).
## ✨ Star History
[](https://star-history.com/#XianfengWu01/LightGen&Date)
## Cite
```
@article{wu2025lightgen,
title={LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization},
author={Wu, Xianfeng and Bai, Yajing and Zheng, Haoze and Chen, Harold Haodong and Liu, Yexin and Wang, Zihao and Ma, Xuran and Shu, Wen-Jie and Wu, Xianzu and Yang, Harry and others},
journal={arXiv preprint arXiv:2503.08619},
year={2025}
}
```