Skip to content

Latest commit

 

History

History
40 lines (23 loc) · 2.08 KB

File metadata and controls

40 lines (23 loc) · 2.08 KB

✨ VidComposition: Can MLLMs Analyze Compositions in Compiled Videos? (CVPR 2025)

🌐 Homepage | 🔬 Paper | 👩‍💻 Code | 📊 Dataset | 📈 Evaluation | 🏆 Leaderboard

What is VidComposition?

VidComposition is a novel benchmark crafted to evaluate the fine-grained video composition understanding of Multimodal Large Language Models (MLLMs). It focuses on assessing the ability of these models to interpret and analyze complex video compositions, where visual elements interact dynamically across time and space. VidComposition aims to bridge the gap in evaluating MLLMs by providing a detailed framework for understanding video content at a cinematic level. It comprises 15 intricate video comprehension tasks across five key areas of video composition.

alt text

VidComposition enables researchers and practitioners to uncover the strengths, limitations, and potential areas for improvement in MLLMs, offering valuable insights into the challenges of understanding edited and compiled video content.

🏆 Leaderboard

alt text

Link

📉 Statistics

alt text

Link

👀 Visualization Results

alt text

✏️ Citation

@article{tang2024vidcompostion,
  title = {VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?},
  author = {Tang, Yunlong and Guo, Junjia and Hua, Hang and Liang, Susan and Feng, Mingqian and Li, Xinyang and Mao, Rui and Huang, Chao and Bi, Jing and Zhang, Zeliang and Fazli, Pooyan and Xu, Chenliang},
  journal = {arXiv preprint arXiv:2411.10979},
  year = {2024}
}