Zhe Kong · Le Li · Yong Zhang* · Feng Gao · Shaoshu Yang · Tao Wang · Kaihao Zhang · Zhuoliang Kang ·
Xiaoming Wei · Guanying Chen · Wenhan Luo*
*Corresponding Authors
- [2025/7/2] 🔥 We release the source code and technical report of DAM-VSR.
The code requires python==3.10.14, as well as pytorch==2.1.1 and torchvision==0.16.1. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended. The project has been tested on CUDA version of 12.1.
conda create -n dam-vsr python=3.10.14
conda activate dam-vsr
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
pip install xformers==0.0.23 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txtYou can directly download all the required model through the following command:
python download.py
All the models will be downloaded to the checkpoints path. Alternatively, you can download each model manually.
Download the following models and put them to the checkpoints path.
1. Video Super-Resolution Models: stabilityai/stable-video-diffusion-img2vid.
2. SupIR: stabilityai/stable-diffusion-xl-base-1.0, openai/clip-vit-large-patch14-336, liuhaotian/llava-v1.5-13b, openai/clip-vit-large-patch14, laion/CLIP-ViT-bigG-14-laion2B-39B-b160k, SUPIR-v0Q.
4. ResShift: resshift_realsrx4_s4_v3.pth, autoencoder_vq_f4.pth
5. DAM-VSR: Fucius/DAM-VSR
The checkpoints directory structure should be arranged as:
checkpoints
├── stable-diffusion-xl-base-1.0
├── sd-turbo
├── DAM-VSR
│ ├── SUPIR-v0Q.ckpt
│ ├── controlnet
│ ├── unet
│ ├── lora
│ ├── autoencoder_vq_f4.pth
│ └── resshift_realsrx4_s4_v3.pth
├── clip-vit-large-patch14-336
├── llava-v1.5-13b
├── CLIP-ViT-bigG-14-laion2B-39B-b160k
├── stable-video-diffusion-img2vid
├── clip-vit-large-patch14
└── noise_predictor_sd_turbo_v5.pth
For image super-resoolution, you can choose SupIR, InvSR or ResShift.
For real-world or AIGC videos, it is recommended to utilize SupIR or InvSR for image super-resolution. Among them, SupIR can achieve the best visual effects, while InvSR can achieve the best evaluation metrics.
python infer.py \
--validation_data_dir="example/example1.mp4" \
--max_cfg 3.0 \
--backwrad_scale 0.3 \
--sr_type="supir" \ # or "invsr"
--use_usm
For synthetic degradations, it is recommended to utilize ResShift for image super-resolution.
python infer.py \
--validation_data_dir="example/example1.mp4" \
--max_cfg 1.0 \
--backwrad_scale 1.0 \
--lora_path='checkpoints/DAM-VSR/lora/vae-decoder.safetensors' \
--sr_type="resshift"
We also provide a lighter version that does not use bidirectional sampling for accelerated generation.
python infer_accelerated.py \
--validation_data_dir="example/example1.mp4" \
--sr_type="supir" \ # invsr/resshift
--use_usm
This project is based on SupIR, InvSR, ResShift, svd-temporal-controlnet and svd_keyframe_interpolation. Thanks for their awesome works.
If our project helps your research or work, please consider citing our paper:
@inproceedings{kong2025dam,
title={DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution},
author={Zhe Kong, Le Li, Yong Zhang, Feng Gao, Shaoshu Yang, Tao Wang, Kaihao Zhang, Zhuoliang Kang, Xiaoming Wei, Guanying Chen, Wenhan Luo},
year={2025},
booktitle={ACM SIGGRAPH 2025},
}

