✨ VidComposition: Can MLLMs Analyze Compositions in Compiled Videos? (CVPR 2025)

What is VidComposition?

VidComposition is a novel benchmark crafted to evaluate the fine-grained video composition understanding of Multimodal Large Language Models (MLLMs). It focuses on assessing the ability of these models to interpret and analyze complex video compositions, where visual elements interact dynamically across time and space. VidComposition aims to bridge the gap in evaluating MLLMs by providing a detailed framework for understanding video content at a cinematic level. It comprises 15 intricate video comprehension tasks across five key areas of video composition.

VidComposition enables researchers and practitioners to uncover the strengths, limitations, and potential areas for improvement in MLLMs, offering valuable insights into the challenges of understanding edited and compiled video content.

🏆 Leaderboard

Link

📉 Statistics

Link

👀 Visualization Results

✏️ Citation

@article{tang2024vidcompostion,
  title = {VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?},
  author = {Tang, Yunlong and Guo, Junjia and Hua, Hang and Liang, Susan and Feng, Mingqian and Li, Xinyang and Mao, Rui and Huang, Chao and Bi, Jing and Zhang, Zeliang and Fazli, Pooyan and Xu, Chenliang},
  journal = {arXiv preprint arXiv:2411.10979},
  year = {2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ VidComposition: Can MLLMs Analyze Compositions in Compiled Videos? (CVPR 2025)

What is VidComposition?

🏆 Leaderboard

📉 Statistics

👀 Visualization Results

✏️ Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

✨ VidComposition: Can MLLMs Analyze Compositions in Compiled Videos? (CVPR 2025)

What is VidComposition?

🏆 Leaderboard

📉 Statistics

👀 Visualization Results

✏️ Citation