# GaussianTalker
**Repository Path**: github-cnpro/GaussianTalker
## Basic Information
- **Project Name**: GaussianTalker
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-09-24
- **Last Updated**: 2024-09-24
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting (ACM MM 2024)
This is our official implementation of the paper
"GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting"
by [Kyusun Cho](https://github.com/kyustorm7)\*, [Joungbin Lee](https://github.com/joungbinlee)\*, [Heeji Yoon](https://github.com/yoon-heez)\*, [Yeobin Hong](https://github.com/yeobinhong), [Jaehoon Ko](https://github.com/mlnyang), Sangjun Ahn, [Seungryong Kim](https://cvlab.korea.ac.kr)†
## ⚡️News
**❗️2024.06.13:** We also generated the torso in the same space as the face using Gaussian splatting. **After cloning the torso branch**, you can train and render it in the same way to use it.
## Introduction

For more information, please check out our [Paper](https://arxiv.org/abs/2404.16012v2) and our [Project page](https://ku-cvlab.github.io/GaussianTalker/).
## Installation
We implemented & tested **GaussianTalker** with NVIDIA RTX 3090 and A6000 GPU.
Run the below codes for the environment setting. ( details are in requirements.txt )
```bash
git clone https://github.com/joungbinlee/GaussianTalker.git
cd GaussianTalker
git submodule update --init --recursive
conda create -n GaussianTalker python=3.7
conda activate GaussianTalker
pip install -r requirements.txt
pip install -e submodules/custom-bg-depth-diff-gaussian-rasterization
pip install -e submodules/simple-knn
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install tensorflow-gpu==2.8.0
pip install --upgrade "protobuf<=3.20.1"
```
## Download Dataset
We used talking portrait videos from [AD-NeRF](https://github.com/YudongGuo/AD-NeRF), [GeneFace](https://github.com/yerfor/GeneFace) and [HDTF dataset](https://github.com/MRzzm/HDTF).
These are static videos whose average length are about 3~5 minutes.
You can see an example video with the below line:
```
wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/obama/obama.mp4
```
We also used [SynObama](https://grail.cs.washington.edu/projects/AudioToObama/) for cross-driven setting inference.
## Data Preparation
- prepare face-parsing model.
```bash
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth
```
- Download 3DMM model from [Basel Face Model 2009](https://faces.dmi.unibas.ch/bfm/main.php?nav=1-1-0&id=details)
Put "01_MorphableModel.mat" to data_utils/face_tracking/3DMM/
```bash
cd data_utils/face_tracking
python convert_BFM.py
cd ../../
python data_utils/process.py ${YOUR_DATASET_DIR}/${DATASET_NAME}/${DATASET_NAME}.mp4
```
- Obtain AU45 for eyes blinking
Run `FeatureExtraction` in [OpenFace](https://github.com/TadasBaltrusaitis/OpenFace), rename and move the output CSV file to `(your dataset dir)/(dataset name)/au.csv`.
```
├── (your dataset dir)
│ | (dataset name)
│ ├── gt_imgs
│ ├── 0.jpg
│ ├── 1.jgp
│ ├── 2.jgp
│ ├── ...
│ ├── ori_imgs
│ ├── 0.jpg
│ ├── 0.lms
│ ├── 1.jgp
│ ├── 1.lms
│ ├── ...
│ ├── parsing
│ ├── 0.png
│ ├── 1.png
│ ├── 2.png
│ ├── 3.png
│ ├── ...
│ ├── torso_imgs
│ ├── 0.png
│ ├── 1.png
│ ├── 2.png
│ ├── 3.png
│ ├── ...
│ ├── au.csv
│ ├── aud_ds.npy
│ ├── aud_novel.wav
│ ├── aud_train.wav
│ ├── aud.wav
│ ├── bc.jpg
│ ├── (dataset name).mp4
│ ├── track_params.pt
│ ├── transforms_train.json
│ ├── transforms_val.json
```
## Training
```bash
python train.py -s ${YOUR_DATASET_DIR}/${DATASET_NAME} --model_path ${YOUR_MODEL_DIR} --configs arguments/64_dim_1_transformer.py
```
## Rendering
Please adjust the batch size to match your GPU settings.
```bash
python render.py -s ${YOUR_DATASET_DIR}/${DATASET_NAME} --model_path ${YOUR_MODEL_DIR} --configs arguments/64_dim_1_transformer.py --iteration 10000 --batch 128
```
## Inference with custom audio
Please locate the files .wav and .npy in the following directory path: ${YOUR_DATASET_DIR}/${DATASET_NAME}.
```bash
python render.py -s ${YOUR_DATASET_DIR}/${DATASET_NAME} --model_path ${YOUR_MODEL_DIR} --configs arguments/64_dim_1_transformer.py --iteration 10000 --batch 128 --custom_aud .npy --custom_wav .wav --skip_train --skip_test
```
## Citation
If you find our work useful in your research, please cite our work as:
```
@misc{cho2024gaussiantalker,
title={GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting},
author={Kyusun Cho and Joungbin Lee and Heeji Yoon and Yeobin Hong and Jaehoon Ko and Sangjun Ahn and Seungryong Kim},
year={2024},
eprint={2404.16012},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```