Skip to content

kongzhecn/DAM-VSR

Repository files navigation

DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution (ACM SIGGRAPH 2025)

Zhe Kong · Le Li · Yong Zhang* · Feng Gao · Shaoshu Yang · Tao Wang · Kaihao Zhang · Zhuoliang Kang ·

Xiaoming Wei · Guanying Chen · Wenhan Luo*

*Corresponding Authors

GitHub

🏷️ Change Log

🔆 Method Overview

🔧 Dependencies and Installation

The code requires python==3.10.14, as well as pytorch==2.1.1 and torchvision==0.16.1. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended. The project has been tested on CUDA version of 12.1.

conda create -n dam-vsr python=3.10.14
conda activate dam-vsr
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
pip install xformers==0.0.23 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

⏬ Pretrained Model Preparation

1) Automatic Download

You can directly download all the required model through the following command:

python download.py

All the models will be downloaded to the checkpoints path. Alternatively, you can download each model manually.

2) Manual Download

Download the following models and put them to the checkpoints path.

1. Video Super-Resolution Models: stabilityai/stable-video-diffusion-img2vid.
5. DAM-VSR: Fucius/DAM-VSR

The checkpoints directory structure should be arranged as:

checkpoints
    ├── stable-diffusion-xl-base-1.0
    ├── sd-turbo
    ├── DAM-VSR
    │       ├── SUPIR-v0Q.ckpt
    │       ├── controlnet
    │       ├── unet
    │       ├── lora
    │       ├── autoencoder_vq_f4.pth
    │       └── resshift_realsrx4_s4_v3.pth
    ├── clip-vit-large-patch14-336
    ├── llava-v1.5-13b
    ├── CLIP-ViT-bigG-14-laion2B-39B-b160k
    ├── stable-video-diffusion-img2vid
    ├── clip-vit-large-patch14
    └── noise_predictor_sd_turbo_v5.pth

🚀 Inference

For image super-resoolution, you can choose SupIR, InvSR or ResShift.

For real-world or AIGC videos, it is recommended to utilize SupIR or InvSR for image super-resolution. Among them, SupIR can achieve the best visual effects, while InvSR can achieve the best evaluation metrics.

python infer.py \
    --validation_data_dir="example/example1.mp4" \
    --max_cfg 3.0 \
    --backwrad_scale 0.3 \
    --sr_type="supir" \ # or "invsr"
    --use_usm

For synthetic degradations, it is recommended to utilize ResShift for image super-resolution.

python infer.py \
    --validation_data_dir="example/example1.mp4" \
    --max_cfg 1.0 \
    --backwrad_scale 1.0 \
    --lora_path='checkpoints/DAM-VSR/lora/vae-decoder.safetensors' \
    --sr_type="resshift"

We also provide a lighter version that does not use bidirectional sampling for accelerated generation.

python infer_accelerated.py \
    --validation_data_dir="example/example1.mp4" \
    --sr_type="supir" \ # invsr/resshift
    --use_usm

❤️ Acknowledgments

This project is based on SupIR, InvSR, ResShift, svd-temporal-controlnet and svd_keyframe_interpolation. Thanks for their awesome works.

🎓Citations

If our project helps your research or work, please consider citing our paper:

@inproceedings{kong2025dam,
  title={DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution}, 
  author={Zhe Kong, Le Li, Yong Zhang, Feng Gao, Shaoshu Yang, Tao Wang, Kaihao Zhang, Zhuoliang Kang, Xiaoming Wei, Guanying Chen, Wenhan Luo},
  year={2025},
  booktitle={ACM SIGGRAPH 2025},
}

About

[ACM SIGGRAPH 2025] DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages