VideoMaMa: Mask-Guided Video Matting via Generative Prior

Sangbeom Lim¹ · Seoung Wug Oh² · Jiahui Huang² · Heeji Yoon³
Seungryong Kim³ · Joon-Young Lee²

¹Korea University ²Adobe Research ³KAIST AI

ArXiv 2026

VideoMaMa is a mask-guided video matting framework that leverages a video generative prior. By utilizing this prior, it supports stable performance across diverse video domains with fine-grained matting quality.

For more visual results, go checkout our project page

📰 News

VideoMaMa is an open-source project. If you find our work helpful, please consider giving this repository a ⭐.

2026-01-19: Our Github Repo is opened!
2026-02-07 ComfyUI-VideoMaMa is now available! (Thanks to @okdalto)

Note: Training code is currently under internal review. Release coming soon.

🔥 TODO

Release Demo & Model checkpoint. (Jan 19, 2025)
Release ArXiv paper. (Jan 19, 2025)
Release Training Code.
Evaluation Code.
Release MA-V dataset.

⚙️ Setup

Please run

bash scripts/setup.sh

it will down load stable video diffusion weight, and setup virtual enviroment needed to run whole codes.
We use conda activate videomama.

This will download sam2 which is needed for training sam2-matte.

🎮 Demo

Please check demo readme.

🎯 Inference

VideoMaMa model checkpoint — available on the Hugging Face Hub: SammyLim/VideoMaMa.

For inferencing video use this command.

python inference_onestep_folder.py \
--base_model_path "<stabilityai/stable-video-diffusion-img2vid-xt_path>" \
--unet_checkpoint_path "<videomama_checkpoint_path>" \
--image_root_path "/assets/example/image" \
--mask_root_path "assets/example/mask" \
--output_dir "assets/example" \
[--optional_arguments]

For example, If you have setup using above command, this example bash will work.

python inference_onestep_folder.py \
    --base_model_path "checkpoints/stable-video-diffusion-img2vid-xt" \
    --unet_checkpoint_path "checkpoints/VideoMaMa" \
    --image_root_path "/assets/example/image" \
    --mask_root_path "assets/example/mask" \
    --output_dir "assets/example" \
    --keep_aspect_ratio

For more information about inference setting, please check inference readme.

🚂🚃🚃🚃🚃 Training

Generating training dataset

Please check Data pipeline README.

Model Training

Please check training README.

🎓 Citation

@article{lim2026videomama,
  title={VideoMaMa: Mask-Guided Video Matting via Generative Prior},
  author={Lim, Sangbeom and Oh, Seoung Wug and Huang, Jiahui and Yoon, Heeji and Kim, Seungryong and Lee, Joon-Young},
  journal={arXiv preprint arXiv:2601.14255},
  year={2026}
}

🙏 Acknowledgments

SAM2: Meta AI's Segment Anything 2
Stable Video Diffusion: Stability AI's video generation model
Gradio: For the amazing UI framework

📧 Contact

For questions or issues, please open an issue on our GitHub repository.

We welcome any feedback, questions, or opportunities for collaboration. If you are interested in using this model for industrial applications, or have specific questions about the architecture and training, please feel free to reach out.

📄 License

The code in this repository is released under the CC BY-NC 4.0 license, unless otherwise specified.

This repository builds on implementations and ideas from the Hugging Face ecosystem and the diffusion-e2e-ft project. Many thanks to the original authors and contributors for their open-source work.

The VideoMaMa model checkpoints (specifically VideoMama/unet/* and dino_projection_mlp.pth) are subject to the Stability AI Community License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
demo		demo
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
License.md		License.md
README.md		README.md
inference.md		inference.md
inference_onestep_folder.py		inference_onestep_folder.py
pipeline_svd_mask.py		pipeline_svd_mask.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoMaMa: Mask-Guided Video Matting via Generative Prior

📰 News

🔥 TODO

⚙️ Setup

🎮 Demo

🎯 Inference

🚂🚃🚃🚃🚃 Training

Generating training dataset

Model Training

🎓 Citation

🙏 Acknowledgments

📧 Contact

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

VideoMaMa: Mask-Guided Video Matting via Generative Prior

📰 News

🔥 TODO

⚙️ Setup

🎮 Demo

🎯 Inference

🚂🚃🚃🚃🚃 Training

Generating training dataset

Model Training

🎓 Citation

🙏 Acknowledgments

📧 Contact

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages