A framework for enhancing temporal understanding in Video-LLMs through synthetic preference data generation and training.
git clone https://github.com/yourusername/TimeWarp.git
cd TimeWarp
pip install -r requirements.txt- Download the FineVideo Dataset
- Run the preprocessing pipeline:
chmod +x scripts/*.sh ./scripts/setup_dataset.sh path/to/finevideo path/to/output 5000 ./scripts/generate_data.sh path/to/output
Refer to dpo_scripts/train_dpo.sh for DPO training configurations.
Evaluation scripts are available in the test/ directory for various benchmarks including MVBench, TempCompass, and our TimeWarp benchmarks.
TimeWarp/
├── 📂 timewarp/ # Core data generation modules
│ ├── preprocess/ # Video preprocessing
│ ├── pref_data/ # Preference data generation
│ └── benchmark/ # Benchmark creation
├── 📂 dpo_scripts/ # DPO training scripts
├── 📂 llava/ # Model implementations
├── 📂 test/ # Evaluation pipelines
├── 📂 inference/ # Inference utilities
└── 📂 scripts/ # Setup and generation scripts
If you find our work helpful, please consider citing:
@article{vani2025harnessing,
title={Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs},
author={Vani, Sameep and Jena, Shreyas and Patel, Maitreya and Baral, Chitta and Aditya, Somak and Yang, Yezhou},
journal={arXiv preprint arXiv:2510.03955},
year={2025}
}We welcome contributions! Please feel free to submit pull requests or open issues for bug reports and potential issues.
This project is licensed under the MIT License - see the LICENSE file for details.
We thank the authors of LLaVA-Hound, Video-LLaMA3, and FineVideo for their foundational work.
