Skip to content

sameepv21/TimeWarp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TimeWarp: Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs

arXiv Hugging Face License: MIT

Pipeline Overview

A framework for enhancing temporal understanding in Video-LLMs through synthetic preference data generation and training.

🚀 Installation

git clone https://github.com/yourusername/TimeWarp.git
cd TimeWarp
pip install -r requirements.txt

📦 Dataset Setup

Option 1: Download Preprocessed Data

Hugging Face Dataset

Option 2: Generate from Scratch

  1. Download the FineVideo Dataset
  2. Run the preprocessing pipeline:
    chmod +x scripts/*.sh
    ./scripts/setup_dataset.sh path/to/finevideo path/to/output 5000
    ./scripts/generate_data.sh path/to/output

🎯 Training & Evaluation

Training

Refer to dpo_scripts/train_dpo.sh for DPO training configurations.

Evaluation

Evaluation scripts are available in the test/ directory for various benchmarks including MVBench, TempCompass, and our TimeWarp benchmarks.

📁 Project Structure

TimeWarp/
├── 📂 timewarp/           # Core data generation modules
│   ├── preprocess/        # Video preprocessing
│   ├── pref_data/         # Preference data generation
│   └── benchmark/         # Benchmark creation
├── 📂 dpo_scripts/        # DPO training scripts
├── 📂 llava/              # Model implementations
├── 📂 test/               # Evaluation pipelines
├── 📂 inference/          # Inference utilities
└── 📂 scripts/            # Setup and generation scripts

📚 Citation

If you find our work helpful, please consider citing:

@article{vani2025harnessing,
  title={Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs},
  author={Vani, Sameep and Jena, Shreyas and Patel, Maitreya and Baral, Chitta and Aditya, Somak and Yang, Yezhou},
  journal={arXiv preprint arXiv:2510.03955},
  year={2025}
}

🤝 Contributing

We welcome contributions! Please feel free to submit pull requests or open issues for bug reports and potential issues.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

We thank the authors of LLaVA-Hound, Video-LLaMA3, and FineVideo for their foundational work.

About

Official pytorch implementation of "Harnessing Synthetic Preference Data for Enhancing Temporal Understanding in Video-LLMs"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages