# step-audio-editx
**Repository Path**: lavine/step-audio-editx
## Basic Information
- **Project Name**: step-audio-editx
- **Description**: Step-Audio-EditX 是一个功能强大的基于3B 参数LLM 的音频模型,它擅长表现力强且可迭代的音频编辑,涵盖情感、说话风格和副语言信息,并具备强大的零样本文本转语音
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: https://www.oschina.net/p/step-audio-editx
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 1
- **Created**: 2025-11-25
- **Last Updated**: 2025-11-25
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Step-Audio-EditX
| emotion |
happy |
Expressing happiness |
angry |
Expressing anger |
| sad |
Expressing sadness |
fear |
Expressing fear |
| surprised |
Expressing surprise |
confusion |
Expressing confusion |
| empathy |
Expressing empathy and understanding |
embarrass |
Expressing embarrassment |
| excited |
Expressing excitement and enthusiasm |
depressed |
Expressing a depressed or discouraged mood |
| admiration |
Expressing admiration or respect |
coldness |
Expressing coldness and indifference |
| disgusted |
Expressing disgust or aversion |
humour |
Expressing humor or playfulness |
| speaking style |
serious |
Speaking in a serious or solemn manner |
arrogant |
Speaking in an arrogant manner |
| child |
Speaking in a childlike manner |
older |
Speaking in an elderly-sounding manner |
| girl |
Speaking in a light, youthful feminine manner |
pure |
Speaking in a pure, innocent manner |
| sister |
Speaking in a mature, confident feminine manner |
sweet |
Speaking in a sweet, lovely manner |
| exaggerated |
Speaking in an exaggerated, dramatic manner |
ethereal |
Speaking in a soft, airy, dreamy manner |
| whisper |
Speaking in a whispering, very soft manner |
generous |
Speaking in a hearty, outgoing, and straight-talking manner |
| recite |
Speaking in a clear, well-paced, poetry-reading manner |
act_coy |
Speaking in a sweet, playful, and endearing manner |
| warm |
Speaking in a warm, friendly manner |
shy |
Speaking in a shy, timid manner |
| comfort |
Speaking in a comforting, reassuring manner |
authority |
Speaking in an authoritative, commanding manner |
| chat |
Speaking in a casual, conversational manner |
radio |
Speaking in a radio-broadcast manner |
| soulful |
Speaking in a heartfelt, deeply emotional manner |
gentle |
Speaking in a gentle, soft manner |
| story |
Speaking in a narrative, audiobook-style manner |
vivid |
Speaking in a lively, expressive manner |
| program |
Speaking in a show-host/presenter manner |
news |
Speaking in a news broadcasting manner |
| advertising |
Speaking in a polished, high-end commercial voiceover manner |
roar |
Speaking in a loud, deep, roaring manner |
| murmur |
Speaking in a quiet, low manner |
shout |
Speaking in a loud, sharp, shouting manner |
| deeply |
Speaking in a deep and low-pitched tone |
loudly |
Speaking in a loud and high-pitched tone |
| paralinguistic |
Breathing |
Breathing sound |
Laughter |
Laughter or laughing sound |
| Uhm |
Hesitation sound: "Uhm" |
Sigh |
Sighing sound |
| Surprise-oh |
Expressing surprise: "Oh" |
Surprise-ah |
Expressing surprise: "Ah" |
| Surprise-wa |
Expressing surprise: "Wa" |
Confirmation-en |
Confirming: "En" |
| Question-ei |
Questioning: "Ei" |
Dissatisfaction-hnn |
Dissatisfied sound: "Hnn" |
## Feature Requests & Wishlist
💡 We welcome all ideas for new features! If you'd like to see a feature added to the project, please start a discussion in our [Discussions](https://github.com/stepfun-ai/Step-Audio-EditX/discussions) section.
We'll be collecting community feedback here and will incorporate popular suggestions into our future development plans. Thank you for your contribution!
## Demos
| Task |
Text |
Source |
Edited |
| Emotion-Fear |
我总觉得,有人在跟着我,我能听到奇怪的脚步声。 |
[fear_zh_female_prompt.webm](https://github.com/user-attachments/assets/a088c059-032c-423f-81d6-3816ba347ff5)
|
[fear_zh_female_output.webm](https://github.com/user-attachments/assets/917494ac-5913-4949-8022-46cf55ca05dd)
|
| Style-Whisper |
比如在工作间隙,做一些简单的伸展运动,放松一下身体,这样,会让你更有精力。 |
[whisper_prompt.webm](https://github.com/user-attachments/assets/ed9e22f1-1bac-417b-913a-5f1db31f35c9)
|
[whisper_output.webm](https://github.com/user-attachments/assets/e0501050-40db-4d45-b380-8bcc309f0b5f)
|
| Style-Act_coy |
我今天想喝奶茶,可是不知道喝什么口味,你帮我选一下嘛,你选的都好喝~ |
[act_coy_prompt.webm](https://github.com/user-attachments/assets/74d60625-5b3c-4f45-becb-0d3fe7cc4b3f)
|
[act_coy_output.webm](https://github.com/user-attachments/assets/b2f74577-56c2-4997-afd6-6bf47d15ea51)
|
| Paralinguistics |
你这次又忘记带钥匙了 [Dissatisfaction-hnn],真是拿你没办法。 |
[paralingustic_prompt.webm](https://github.com/user-attachments/assets/21e831a3-8110-4c64-a157-60e0cf6735f0)
|
[paralingustic_output.webm](https://github.com/user-attachments/assets/a82f5a40-c6a3-409b-bbe6-271180b20d7b)
|
| Denoising |
Such legislation was clarified and extended from time to time thereafter. No, the man was not drunk, he wondered how we got tied up with this stranger. Suddenly, my reflexes had gone. It's healthier to cook without sugar. |
[denoising_prompt.webm](https://github.com/user-attachments/assets/70464bf4-ebde-44a3-b2a6-8c292333319b)
|
[denoising_output.webm](https://github.com/user-attachments/assets/7cd0ae8d-1bf0-40fc-9bcd-f419bd4b2d21)
|
| Speed-Faster |
上次你说鞋子有点磨脚,我给你买了一双软软的鞋垫。 |
[speed_faster_prompt.webm](https://github.com/user-attachments/assets/db46609e-1b98-48d8-99c8-e166cfdfc6e3)
|
[speed_faster_output.webm](https://github.com/user-attachments/assets/0fbc14ca-dd4a-4362-aadc-afe0629f4c9f)
|
For more examples, see [demo page](https://stepaudiollm.github.io/step-audio-editx/).
## Model Download
| Models | 🤗 Hugging Face | ModelScope |
|-------|-------|-------|
| Step-Audio-EditX | [stepfun-ai/Step-Audio-EditX](https://huggingface.co/stepfun-ai/Step-Audio-EditX) | [stepfun-ai/Step-Audio-EditX](https://modelscope.cn/models/stepfun-ai/Step-Audio-EditX) |
| Step-Audio-Tokenizer | [stepfun-ai/Step-Audio-Tokenizer](https://huggingface.co/stepfun-ai/Step-Audio-Tokenizer) | [stepfun-ai/Step-Audio-Tokenizer](https://modelscope.cn/models/stepfun-ai/Step-Audio-Tokenizer) |
## Model Usage
### 📜 Requirements
The following table shows the requirements for running Step-Audio-EditX model (batch size = 1):
| Model | Parameters | Setting