Sangbeom Lim1 · Seoung Wug Oh2 · Jiahui Huang2 · Heeji Yoon3
Seungryong Kim3 · Joon-Young Lee2
1Korea University 2Adobe Research 3KAIST AI
ArXiv 2026
VideoMaMa is a mask-guided video matting framework that leverages a video generative prior. By utilizing this prior, it supports stable performance across diverse video domains with fine-grained matting quality.
For more visual results, go checkout our project page
VideoMaMa is an open-source project. If you find our work helpful, please consider giving this repository a ⭐.
- 2026-01-19: Our Github Repo is opened!
- 2026-02-07 ComfyUI-VideoMaMa is now available! (Thanks to @okdalto)
Note: Training code is currently under internal review. Release coming soon.
- Release Demo & Model checkpoint. (Jan 19, 2025)
- Release ArXiv paper. (Jan 19, 2025)
- Release Training Code.
- Evaluation Code.
- Release MA-V dataset.
Please run
bash scripts/setup.shit will down load stable video diffusion weight, and setup virtual enviroment needed to run whole codes.
We use conda activate videomama.
This will download sam2 which is needed for training sam2-matte.
Please check demo readme.
VideoMaMa model checkpoint — available on the Hugging Face Hub: SammyLim/VideoMaMa.
For inferencing video use this command.
python inference_onestep_folder.py \
--base_model_path "<stabilityai/stable-video-diffusion-img2vid-xt_path>" \
--unet_checkpoint_path "<videomama_checkpoint_path>" \
--image_root_path "/assets/example/image" \
--mask_root_path "assets/example/mask" \
--output_dir "assets/example" \
[--optional_arguments]For example, If you have setup using above command, this example bash will work.
python inference_onestep_folder.py \
--base_model_path "checkpoints/stable-video-diffusion-img2vid-xt" \
--unet_checkpoint_path "checkpoints/VideoMaMa" \
--image_root_path "/assets/example/image" \
--mask_root_path "assets/example/mask" \
--output_dir "assets/example" \
--keep_aspect_ratio For more information about inference setting, please check inference readme.
Please check Data pipeline README.
Please check training README.
@article{lim2026videomama,
title={VideoMaMa: Mask-Guided Video Matting via Generative Prior},
author={Lim, Sangbeom and Oh, Seoung Wug and Huang, Jiahui and Yoon, Heeji and Kim, Seungryong and Lee, Joon-Young},
journal={arXiv preprint arXiv:2601.14255},
year={2026}
}
- SAM2: Meta AI's Segment Anything 2
- Stable Video Diffusion: Stability AI's video generation model
- Gradio: For the amazing UI framework
For questions or issues, please open an issue on our GitHub repository.
We welcome any feedback, questions, or opportunities for collaboration. If you are interested in using this model for industrial applications, or have specific questions about the architecture and training, please feel free to reach out.
The code in this repository is released under the CC BY-NC 4.0 license, unless otherwise specified.
This repository builds on implementations and ideas from the Hugging Face ecosystem and the diffusion-e2e-ft project. Many thanks to the original authors and contributors for their open-source work.
The VideoMaMa model checkpoints (specifically VideoMama/unet/* and dino_projection_mlp.pth) are subject to the Stability AI Community License.
