# infinitystar **Repository Path**: scotth/infinitystar ## Basic Information - **Project Name**: infinitystar - **Description**: InfinityStar 是一个统一的时空自回归框架，用于高分辨率图像和动态视频合成 - **Primary Language**: Python - **License**: MIT - **Default Branch**: main - **Homepage**: https://www.oschina.net/p/infinitystar - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 2 - **Created**: 2025-11-12 - **Last Updated**: 2025-11-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

# Infinity**⭐️**: Uniﬁed **S**pace**T**ime **A**uto**R**egressive Modeling for Visual Generation

[![demo platform](https://img.shields.io/badge/Play%20with%20Infinity%21-Infinity%20demo%20platform-lightblue)](http://opensource.bytedance.com/discord/invite) [![arXiv](https://img.shields.io/badge/arXiv%20paper-2511.04675-b31b1b.svg)](https://arxiv.org/abs/2511.04675) [![huggingface weights](https://img.shields.io/badge/%F0%9F%A4%97%20Weights-FoundationVision/Infinity-yellow)](https://huggingface.co/FoundationVision/InfinityStar)

Infinity⭐️: Uniﬁed Spacetime AutoRegressive Modeling for Visual Generation

--- ## 🔥 Updates!! * Nov 7, 2025: 🔥 Paper, Training and Inference Codes && Checkpoints && Demo Website released! * Sep 18, 2025: 🎉 InfinityStar is accepted as NeurIPS 2025 Oral. ## 🕹️ Try and Play with Infinity⭐️! We provide a [demo website](http://opensource.bytedance.com/discord/invite) for you to play with InfinityStar and generate videos. Enjoy the fun of bitwise video autoregressive modeling! ## ✨ Overview We introduce InfinityStar, a unified spacetime autoregressive framework for high-resolution image and dynamic video synthesis. - 🧠 **Unified Spacetime Model**: A purely discrete, autoregressive approach that jointly captures spatial and temporal dependencies within a single, elegant architecture. - 🎬 **Versatile Generation**: This unified design naturally supports a variety of generation tasks such as **text-to-image**, **text-to-video**, **image-to-video**, and **long interactive video synthesis** via straightforward temporal autoregression. - 🏆 **Leading Performance & Speed**: Through extensive experiments, InfinityStar scores **83.74** on VBench, outperforming all autoregressive models by large margins, even surpassing diffusion competitors like HunyuanVideo, approximately **10x** faster than leading diffusion-based methods. - 📖 **Pioneering High-Resolution Autoregressive Generation**: To our knowledge, InfinityStar is the first discrete autoregressive video generator capable of producing industrial-level 720p videos, setting a new standard for quality in its class. ### 🔥 Unified modeling for image, video generation and long interactive video synthesis 📈:

## 🎬 Video Demos #### General Aesthetics

#### Anime & 3D Animation

#### Motion

#### Extended Application: Long Interactive Videos

## Benchmark ### Achieve sota performance on image generation benchmark:

### Achieve sota performance on video generation benchmark:

### Surpassing diffusion competitors like HunyuanVideo*:

## Visualization ### Text to image examples

### Image to video examples

### Video extrapolation examples

## 📑 Open-Source Plan - [x] Training Code - [x] Web Demo - [x] InfinityStar Inference Code - [x] InfinityStar Models Checkpoints - [x] InfinityStar-Interact Inference Code - [ ] InfinityStar-Interact Checkpoints ## Installation 1. We use FlexAttention to speedup training, which requires `torch>=2.5.1`. 2. Install other pip packages via `pip3 install -r requirements.txt`. ## Training Scripts We provide a comprehensive workflow for training and finetuning our model, covering data organization, feature extraction, and training scripts. For detailed instructions, please refer to `data/README.md`. ## Inference * **720p Video Generation:** Use `tools/infer_video_720p.py` to generate 5-second videos at 720p resolution. Due to the high computational cost of training, our released 720p model is trained for 5-second video generation. This script also supports image-to-video generation by specifying an image path. ```bash python3 tools/infer_video_720p.py ``` * **480p Variable-Length Video Generation:** We also provide an intermediate checkpoint for 480p resolution, capable of generating videos of 5 and 10 seconds. Since this model is not specifically optimized for Text-to-Video (T2V), we recommend using the experimental Image-to-Video (I2V) and Video-to-Video (V2V) modes for better results. To specify the video duration, you can edit the `generation_duration` variable in `tools/infer_video_480p.py` to either 5 or 10. This script also supports image-to-video and video continuation by providing a path to an image or a video. ```bash python3 tools/infer_video_480p.py ``` * **480p Long Interactive Video Generation:** Use `tools/infer_interact_480p.py` to generate a long interactive video in 480p. This script supports interactive video generation. You can provide a reference video and multiple prompts. The model will generate a video interactively with your assistance. ```bash python3 tools/infer_interact_480p.py ``` ## Citation If our work assists your research, feel free to give us a star ⭐ or cite us using: ``` @Article{VAR, title={Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction}, author={Keyu Tian and Yi Jiang and Zehuan Yuan and Bingyue Peng and Liwei Wang}, year={2024}, eprint={2404.02905}, archivePrefix={arXiv}, primaryClass={cs.CV} } ``` ``` @misc{Infinity, title={Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis}, author={Jian Han and Jinlai Liu and Yi Jiang and Bin Yan and Yuqi Zhang and Zehuan Yuan and Bingyue Peng and Xiaobing Liu}, year={2024}, eprint={2412.04431}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.04431}, } ``` ``` @misc{InfinityStar, title={InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation}, author={Jinlai Liu and Jian Han and Bin Yan and Hui Wu and Fengda Zhu and Xing Wang and Yi Jiang and Bingyue Peng and Zehuan Yuan}, year={2025}, eprint={2511.04675}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2511.04675}, } ``` ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.