# Agentic-AIGC **Repository Path**: FEcandy_admin/Agentic-AIGC ## Basic Information - **Project Name**: Agentic-AIGC - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 2 - **Created**: 2025-11-22 - **Last Updated**: 2025-11-22 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

๐Ÿš€ Agentic-AIGC: Video Production with
Multi-Modal Agents

One Prompt โ†’ Video Creation: AI Unleashed

## ๐ŸŽฏ Introduction This project serves as a **Comprehensive Cookbook for Agentic-AIGC Development**, with a primary focus on video creation workflows. It guides readers through the emerging field of agent-based AI-generated content creation. Video production represents the ultimate complexity challenge in AIGC. Creating professional videos requires seamless coordination of script writing, storyboard development, visual scene generation, character animation, audio synthesis, voice acting, background music composition, scene transitions, visual effects, and final editing. Traditional AIGC approaches rely on single-model generation with limited scope and coordination capabilities. Agentic-AIGC represents the next frontier where intelligent agents orchestrate sophisticated creative ecosystems. These agents coordinate multiple AI tools seamlessly. They make nuanced creative decisions in real-time. Most importantly, they maintain narrative and visual coherence across complex, multi-stage production pipelines. --- ## โœจ What Will You Gain? ### Core Knowledge & Skills - ๐Ÿ“š **Agentic-AIGC Fundamentals** - Deep understanding of agent-based content generation concepts and architectural patterns - ๐Ÿณ **Hands-On Experience** - Working implementations of complete video production workflows from concept to final output - ๐Ÿค– **Multi-Tool Coordination** - Practical examples of orchestrating different AI models for seamless creative collaboration ### Autonomous Video Creation Agents - ๐ŸŽฌ **Intelligent Video Production** - Build agents that make independent creative decisions throughout video production pipelines. - ๐ŸŽต **Self-Directed Audio Processing** - Develop agents that autonomously handle voice synthesis, music selection, and audio-visual synchronization. - ๐Ÿ”ง **Agent Orchestration Patterns** - Master architectures where specialized agents collaborate to manage complex video workflows autonomously. ### Production-Ready Solutions - ๐ŸŒ **Cross-Modal Applications** - Work with text, audio, and visual content simultaneously while maintaining narrative coherence. - ๐Ÿ“– **Ready-to-Use Recipes** - Six comprehensive video production workflows you can immediately adapt and extend for your projects.
## ๐Ÿงพ Table of Contents - [๐ŸŽฏ Project Introduction](#-project-introduction) - [โœจ What Will You Gain?](#-what-will-you-gain) - [๐Ÿงพ Table of Contents](#-table-of-contents) - [๐Ÿณ What is Agentic-AIGC](#-what-is-agentic-aigc) - [๐Ÿงพ Prerequisites \& Setup](#-prerequisites--setup) - [Environment](#environment) - [Clone and Install](#clone-and-install) - [Download Required Models](#download-required-models) - [Configure LLM](#configure-llm) - [๐Ÿฝ Recipes: Creating Videos](#-recipes-creating-videos) - [๐ŸŽฌ Movie Edits (Rhythm-Based)](#-movie-edits-rhythm-based) - [๐Ÿ“– Novel-to-Screen Adaptation](#-novel-to-screen-adaptation) - [๐Ÿ“ฐ News Summary](#-news-summary) - [๐Ÿ˜‚ Meme Video](#-meme-video) - [๐ŸŽต Music Video (SVC)](#-music-video-svc) - [๐ŸŽญ Cross-Culture Comedy](#-cross-culture-comedy) - [๐Ÿ“‹ Configuration Details](#-configuration-details) - [Input Configuration](#input-configuration) - [Character Image for Visual Retrieval Enhancement](#character-image-for-visual-retrieval-enhancement) - [Running the Tool](#running-the-tool) - [๐ŸŽฅ Demos](#-demos) - [๐Ÿ™ Acknowledgements](#-acknowledgements) --- ## ๐Ÿณ What is Agentic-AIGC ๐Ÿš€ Recent breakthroughs in generative AI have transformed multimedia content creation across diverse domains. Powered by advanced diffusion models and Large Language Models (LLMs), AI-Generated Content (AIGC) has achieved remarkable success in ๐Ÿ–ผ๏ธ image generation, ๐ŸŽต audio creation, ๐ŸŽฎ interactive media, and ๐ŸŽญ multimodal experiences. โšก While these achievements are impressive, creating truly high-quality, sophisticated multimedia content--particularly ๐ŸŽฌ complex videos--presents challenges that extend far beyond simple generation tasks. ๐ŸŽฏ Success in this domain requires seamless i) ๐Ÿ”„ multi-modal alignment to synchronize visual, audio, and textual elements across temporal sequences; ii) ๐Ÿ“– maintaining narrative coherence and visual continuity throughout extended content; iii) ๐ŸŽจ orchestrating dynamic scene compositions with complex transitions and character interactions, and iv) โš™๏ธ coordinating sophisticated production pipelines while ensuring professional quality standards across all components. โŒ These challenges cannot be addressed by generative models alone, as they lack ๐ŸŽญ orchestration capabilities for complex multi-step creative workflows, and unable to coordinate multiple specialized tools or maintain consistency across interconnected production processes that require ๐Ÿ—บ๏ธ deliberate planning and ๐Ÿ”— cross-modal synchronization. ### ๐Ÿ”ง The Agentic-AIGC Solution ๐ŸŽฏ To directly address these fundamental challenges, **Agentic-AIGC** leverages ๐Ÿค– intelligent agent architectures that systematically solve each limitation through ๐Ÿ”„ coordinated automation. Unlike ๐Ÿ“ฑ traditional generative models, the ๐Ÿง  agentic approach tackles ๐ŸŽฌ complex video production through โšก fully-automated intelligent workflows. --- ## ๐Ÿงพ Prerequisites & Setup This section walks you through the environment setup to get Agentic-AIGC running on your device. ### Environment * **GPU Memory:** 8GB * **System:** Linux, Windows ### Clone and Install ```bash # 1. Clone the repository git clone https://github.com/HKUDS/Agentic-AIGC.git # 2. Create and activate a Conda environment conda create --name aicreator python=3.10 conda activate aicreator # 3. Install system dependencies (pynini, ffmpeg) conda install -y -c conda-forge pynini==2.1.6 ffmpeg # 4. Install Python dependencies pip install -r requirements.txt ``` ### Download Required Models Ensure `git-lfs` is installed first: [https://git-lfs.com](https://git-lfs.com) ```bash git lfs install ``` Navigate to the `tools` directory and download the necessary models. You only need to download the models relevant to the video types you want to create (see feature/model table below). ```bash # Example downloads (adjust paths and models as needed) # Download CosyVoice cd tools/CosyVoice huggingface-cli download PillowTa1k/CosyVoice --local-dir pretrained_models # Download fish-speech cd tools/fish-speech huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5 # Download seed-vc cd tools/seed-vc huggingface-cli download PillowTa1k/seed-vc --local-dir checkpoints # Download DiffSinger cd tools/DiffSinger huggingface-cli download PillowTa1k/DiffSinger --local-dir checkpoints # Download MiniCPM cd tools git lfs clone https://huggingface.co/openbmb/MiniCPM-V-2_6-int4 # Download Whisper cd tools git lfs clone https://huggingface.co/openai/whisper-large-v3-turbo # Download ImageBind cd tools mkdir .checkpoints cd .checkpoints wget https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth ``` --- **Feature & Model Requirements Table:**
Feature Agentic-AIGC Director Funclip NarratoAI NotebookLM
Beat-synced Edits โœ… โœ… โœ… โ€” โ€”
Storytelling Video โœ… โ€” โ€” โœ… โ€”
Video Overview โœ… โœ… โœ… โœ… โœ…
Meme Video โœ… โ€” โ€” โ€” โ€”
Music Remixes โœ… โ€” โ€” โ€” โ€”
Comedy Remaking โœ… โ€” โ€” โ€” โ€”
Feature Type Video Demo Required Models
Cross Talk English Stand-up Comedy to Chinese Crosstalk CosyVoice, MiniCPM, Whisper, ImageBind
Talk Show Chinese Crosstalk to English Stand-up Comedy CosyVoice, MiniCPM, Whisper, ImageBind
MAD TTS Xiao-Ming-Jian-Mo(ๅฐๆ˜Žๅ‰‘้ญ”) Meme fish-speech
MAD SVC AI Music Videos DiffSinger, seed-vc, MiniCPM, Whisper, ImageBind
Rhythm Spider-Man: Across the Spider-Verse MiniCPM, Whisper, ImageBind
Comm Novel-to-Screen Adaptation MiniCPM, Whisper, ImageBind
News Tech News: OpenAI's GPT-4o Image Generation Release MiniCPM, Whisper, ImageBind
### Configure LLM 1. **API Keys:** Edit `Agentic-AIGC/environment/config/config.yml` to add your LLM API key and base URL. 2. **Model Names:** Check and adjust model names in `environment/config/llm.py` according to your LLM provider's requirements. For single-model APIs like official GPT, use the specific model name (e.g., `gpt-4o-mini`) for all entries. --- ## ๐Ÿฝ Agentic Video Creation: Step-by-Step Recipes ๐Ÿ“‹ Each production recipe below represents a distinct video format that can be automatically generated through Agentic-AIGC's intelligent agent system. --- ### โœ‚๏ธ Video Editing **Goal:** Create a video edit synchronized with music beats or based on a user's narrative idea, selecting high-energy or relevant clips from source videos. **Key Steps:** - ๐Ÿ“ 1. **Prepare Source Material**: Place your source video clips in a directory (e.g., dataset/user_video/). - ๐ŸŽต 2. **Prepare Music (Optional for beat-sync)**: Place your background music file (e.g., .mp3) in your project. - โ–ถ๏ธ 3. **Run the Tool**: Execute python main.py. - ๐ŸŽฏ 4. **Select Type**: When prompted, input type eg. Rhythm-Based Video Editing. - ๐Ÿ’ฌ 5. **Provide Prompt**: Enter a detailed description of the editing style/feel you want (e.g., "Fast-paced action sequences with dynamic transitions..."). - โš™๏ธ 6. **(Optional) Adjust Beat Sync**: Modify parameters in music_filter.py (thresholds, masks) if needed. - ๐ŸŽฌ 7. **Processing & Output**: The system will process, analyze videos, detect beats, retrieve visually relevant clips, and generate the final edited video. --- ### ๐Ÿ“– Text-to-Video Adaptation **Goal:** Transform written text (such as novel excerpts) into cinematic video content with AI-generated commentary and visually matched scenes from your source footage. **Key Steps:** - ๐Ÿ“ 1. **Prepare Source Material**: Place your source video clips in a directory (e.g., dataset/user_video/). Add your novel .txt file to the project. - ๐ŸŽค 2. **(Optional) Prepare Voice Sample**: Place a short .wav file (e.g., ava_16k.wav) for voice cloning in dataset/video_edit/voice_data/. - โœ๏ธ 3. **(Optional) Prepare Style File**: Customize or input dataset/video_edit/writing_data/present_style.txt file describing the desired commentary tone. - โ–ถ๏ธ 4. **Run the Tool**: Execute python main.py. - ๐ŸŽฏ 5. **Select Type**: When prompted, input type eg. Novel-to-Screen Commentary. - ๐Ÿ’ฌ 6. **Provide Prompt**: Enter a prompt for the commentary script (e.g., "Generate an engaging commentary script with 1500 words."). - ๐ŸŽฌ 7. **Processing & Output**: The system will generate the script, segment the content, synthesize audio narration, match visual scenes, and produce the final adapted video. --- ### ๐Ÿ“ฐ Video Summarization **Goal:** Generate concise summary videos from lengthy source content. Supports interviews, lectures, meetings, news videos, podcasts, webinars, documentaries, and various video/audio materials. **Key Steps:** - ๐Ÿ“ 1. **Prepare Source Material**: Place your source video/audio file in a directory (e.g., dataset/user_video/). - ๐ŸŽค 2ใ€‚ **Prepare Voice Sample (Optional)**: Add a short .wav file (e.g., ava_16k.wav) for voice cloning in dataset/video_edit/voice_data/. - โœ๏ธ 3. **Prepare Style File (Optional)**: Customize dataset/video_edit/writing_data/present_style.txt to define the summary tone and style. - โ–ถ๏ธ 4. **Run the Tool**: Execute python main.py to start the process. - ๐ŸŽฏ 5. **Select Type**: When prompted, input the type (e.g., Summary of News). - ๐Ÿ’ฌ 6. **Provide Prompt**: Enter specifications for your summary (e.g., "Create a concise tech news summary with conversational tone, maximum 250 words"). - ๐ŸŽฌ 7. **Processing Complete**: The system will automatically transcribe content, generate summary, synthesize voiceover, match relevant clips, and produce the final video. --- ### ๐Ÿ˜‚ Audio Editing **Goal:** Replace existing video audio with custom scripts or narratives. Maintains precise video-audio synchronization for professional dubbing and creative content adaptation. **Key Steps:** - ๐Ÿ“ 1. **Prepare Source Video**: Place your source video file (e.g., .mp4) in a directory (e.g., dataset/meme_video/). - โš™๏ธ 2. **Configure Settings**: Edit Agentic-AIGC/environment/config/mad_tts.yml. Set video_path to your source video and adjust output_path if needed. - โ–ถ๏ธ 3. **Run the Tool**: Execute python main.py to start the audio editing process. - ๐ŸŽฏ 4. **Select Type**: When prompted, choose TTS (Text-to-Speech) for audio generation. - โœ๏ธ 5. **Provide Script**: Enter detailed instructions for the new audio content (e.g., "Create a professional narration explaining machine learning concepts with clear pronunciation and appropriate pacing"). - ๐ŸŽต 6. **Processing Complete**: The system will extract original audio, transcribe existing content, generate new audio using Fish-Speech technology, synchronize timing with video frames, and produce the final edited video. ### ๐ŸŽต AI Cover Creation **Goal:** Generate professional cover versions of songs using custom target voices. Includes precise audio-visual synchronization capabilities. **Key Steps:** - 1. ๐Ÿ“‚ **Prepare Files**: Place MIDI file, lyrics (.txt), background music (BGM), and target voice sample (.wav) in the project directory. - 2. โš™๏ธ **Configure Settings**: Edit Agentic-AIGC/environment/config/mad_svc.yml. Set paths for midi_path, lyrics_path, bgm_path, and target_voice_path. - 3. โ–ถ๏ธ **Run Tool**: Execute python main.py. - 4. ๐ŸŽฏ **Select Mode**: Choose SVC (Singing Voice Conversion) when prompted. - 5. โœ๏ธ **Provide Instructions**: Enter adaptation prompt (e.g., "Rock ballad style with emotional intensity, focusing on perseverance themes"). - 6. ๐ŸŽถ **Processing**: System processes MIDI, generates audio (DiffSinger), clones voice (Seed-VC), synchronizes timing, and integrates with video pipeline. ### ๐ŸŽญ Cross-Cultural Content Adaptation **Goal:** Adapt audio content (e.g., English talk shows) into different cultural formats (e.g., Chinese crosstalk), or vice versa. **Key Steps:** - 1. ๐Ÿ“‚ **Prepare Source Audio**: Place source audio file (.wav) in the directory (e.g., dataset/cross_talk/). - 2. ๐ŸŽค **Prepare Voice Samples**: Add target voice .wav files (e.g., Guo Degang, Fu Hang - ready-to-use samples available in repository). - 3. โš™๏ธ **Configure Settings**: Edit Agentic-AIGC/environment/config/cross_talk.yml (or talk_show.yml). Set audio_path to source audio. Configure dou_gen, peng_gen voice paths. Adjust output path. - 4. โ–ถ๏ธ **Run Tool**: Execute python main.py. - 5. ๐ŸŽฏ **Select Mode**: Choose Cross Talk or Talk Show when prompted. - 6. โœ๏ธ **Provide Instructions**: Enter content adaptation prompt (e.g., "Adapt this content into Chinese crosstalk format while maintaining the original humor style"). - 7. ๐ŸŽญ **Processing**: System adapts script content, synthesizes target voices using CosyVoice, adds audio effects, and integrates with video editing pipeline. --- ## ๐Ÿ“‹ Configuration Details ### Input Configuration Input settings for different video types are managed in YAML files located in `Agentic-AIGC/environment/config/`. Common parameters include: * `reqs`: A prompt or instruction for the specific agent. * `audio_path`: Path to the source audio file. * `video_source_dir`: Path to the directory containing source video clips. * `novel_path`: Path to the source text file (for novel adaptation). * `output`: Path for the final generated video file. * `dou_gen`, `peng_gen`, etc.: Paths to specific voice sample files for cloning. Always ensure paths in these YAML files are correct relative to your project structure. ### Running the Tool After setup and configuration: 1. Activate your Conda environment: `conda activate aicreator`. 2. Run the main script from the project root: `python main.py`. 3. Follow the on-terminal prompts to select the video type and provide any required input. ## ๐ŸŽฅ Demos
Movie Edits Meme Videos Music Videos
Verbal Comedy Arts Commentary Video Video Overview
For additional demo usage details, please refer to: ๐Ÿ‘‰ [Demos Documentation](demos_documents.md) You can find more fun videos on our Bilibili channel here: ๐Ÿ‘‰ [Bilibili Homepage](https://space.bilibili.com/3546868449544308) Feel free to check it out for more entertaining content! ๐Ÿ˜Š **Note**: All videos are used for research and demonstration purposes only. The audio and visual assets are sourced from the Internet. Please contact us if you believe any content infringes upon your intellectual property rights. --- ## ๐Ÿ™ Acknowledgements We extend our heartfelt appreciation to the countless individuals and organizations who have made Agentic-AIGC possible. This project builds upon the foundation laid by pioneering AI researchers and the vibrant open-source community worldwide. Their collective contributions, shared knowledge, and innovative breakthroughs have been instrumental in bringing this vision to life. We are deeply grateful to the open-source community and AI service providers whose innovative tools and technologies serve as the cornerstone of our work: - [CosyVoice](https://github.com/FunAudioLLM/CosyVoice ) - [Fish Speech](https://github.com/fishaudio/fish-speech ) - [Seed-VC](https://github.com/Plachtaa/seed-vc ) - [DiffSinger](https://github.com/MoonInTheRiver/DiffSinger ) - [VideoRAG](https://github.com/HKUDS/VideoRAG ) - [ImageBind](https://github.com/facebookresearch/ImageBind ) - [whisper](https://github.com/openai/whisper ) - [MiniCPM](https://github.com/OpenBMB/MiniCPM-o ) - [Librosa](https://github.com/librosa/librosa ) - [moviepy](https://github.com/Zulko/moviepy ) - [ffmpeg](https://github.com/FFmpeg/FFmpeg ) Our work has been significantly enhanced by the creative contributions of talented content creators across diverse platforms: - ๐ŸŽฌ Original video creators whose content served as valuable testing and demonstration material - ๐ŸŽญ Comedy artists whose performances inspired our cross-cultural adaptation features - ๐ŸŽฅ Filmmakers and production teams behind the movies and TV shows showcased in our demonstrations - โœ‚๏ธ Content creators who generously shared their expertise and insights on video editing techniques All content used in our demonstrations is for research purposes only. We deeply respect the intellectual property rights of all content creators and welcome any concerns or feedback regarding content usage. - Spider-Man movie editing idea reference Douyin account[@ๆˆ‘ๆ˜ฏไธๆ˜ฏzx](https://www.douyin.com/user/MS4wLjABAAAApVuuGxyM7CI4MJRHQvc6SAy0J2zrJ12eg3f5jFqCIXk?from_tab_name=main&vid=7468621366913273115)