Skip to content

Offical code of paper KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation.

License

Notifications You must be signed in to change notification settings

XingruiWang/KeyVID

Repository files navigation

🎬 KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation

Project Page Badge arXiv Badge License Badge

KeyVID Teaser

Official repository for **KeyVID**, presented in **“KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation.”** This work introduces a unified diffusion framework that generates temporally coherent videos conditioned on audio, guided by adaptive keyframe localization.

📦 Release Plan

cd motion_scores/network
python main.py --mode predict
  1. Keyframe generator =======
  • Keyframe Localization — Coming soon
  • Keyframe Generation — Released
  • Interpolation Model — Released
  • Training Code — Coming soon
  • Checkpoints

⚙️ Environment Setup

We recommend using Python 3.10+ and PyTorch ≥ 2.1.

# Clone the repository
git clone https://github.com/XingruiWang/KeyVID.git
cd KeyVID

# Create environment
conda create -n keyvid python=3.10
conda activate keyvid

# Install dependencies
pip install -r requirements.txt

🚀 Inference

1️⃣ Keyframe Localization

Detect audio-synchronized keyframes:

bash scripts/run_ASVA_evaluation.sh asva_12_kf

2️⃣ Keyframe Generation

Generate keyframes aligned with localized timestamps:

bash scripts/run_ASVA_evaluation.sh asva_12_kf

Configuration example:

config="configs/inference_512_asva_12_keyframe_new_add_idx.yaml"
exp_root="${save_root}/ver_add_idx_add_fps/keyframes"
checkpoint="checkpoints/keyframe_generation/best_checkpoint.ckpt"

3️⃣ Interpolation

Generate smooth video transitions between keyframes:

bash scripts/run_ASVA_evaluation.sh asva_12_kf_interp

Configuration example:

config="configs/inference_512_asva_12_keyframe_kf_freenoise.yaml"
exp_root="${save_root}/ver_add_idx_add_fps/interpolation/"
checkpoint="checkpoints/interpolation/best_checkpoint.ckpt"

📈 Evaluation

Quantitative evaluation (e.g., FID, FVD, AlignSync, RelSync) scripts will be added soon.
You can also visualize the output videos in the outputs/ directory for qualitative comparison.


📚 Citation

If you find this project useful, please cite:

@article{wang2025keyvid,
  title={KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation},
  author={Wang, Xingrui and Liu, Jiang and Wang, Ze and Yu, Xiaodong and Wu, Jialian and Sun, Ximeng and Su, Yusheng and Yuille, Alan and Liu, Zicheng and Barsoum, Emad},
  journal={arXiv preprint arXiv:2504.09656},
  year={2025}
}

About

Offical code of paper KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published