Skip to content

This is the official repository for our paper πŸ“„ β€œIn-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding”

Notifications You must be signed in to change notification settings

davidhalladay/ChartScope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€— Hugging Face Dataset 🌐 Project Page arXiv GitHub License: MIT

πŸ“‘ Table of Contents

πŸš€ ChartScope

This is the official repository for our paper
πŸ“„ β€œIn-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding”
πŸ”— Project site

ChartScope lets you automatically generate synthetic chart data via Qwen3 and easily download the ChartDQA benchmark. Stay tuned for more updates! πŸ”₯

demo

πŸ” TL;DR

This repo offers an automated, efficient pipeline powered by a text-only LLM. With a single command, you can generate:

  • πŸ“Š Chart images
  • πŸ—„οΈ Raw JSON data
  • ❓ Question–Answer pairs
  • 🐍 Python scripts
  • πŸ“– Background stories

🚨 Important Updates

  • July 18, 2025 – Data-generation pipeline & ChartDQA benchmark are now released! πŸŽ‰

βš™οΈ Setup

πŸ–₯️ Machine Environment

  • OS: Ubuntu 24.04.2 LTS
  • CUDA: 12.6
  • GPUs: Tested on 4Γ— NVIDIA L40 or 8Γ— NVIDIA H100

🐍 Python Environment

Requires: Python β‰₯ 3.10

# Core deps
pip install openai pathlib tqdm subprocess joblib threading
pip install -U "huggingface_hub[cli]"

# PyTorch for CUDA 12.6
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 \
  --index-url https://download.pytorch.org/whl/cu126

# Flash attention
pip install flash-attn==2.7.3

# vLLM & transformers
pip install vllm==0.9.0.1 transformers==4.51.3
pip install accelerate einops

Download Models

mkdir model-weights
huggingface-cli download Qwen/Qwen3-32B --local-dir model-weights/Qwen3-32B

Generating High-Quality Chart QAs with Qwen3

Note: Our paper’s data were generated with OpenAI GPT. This pipeline uses open-source Qwen3 for public use. You can also change Qwen3 to GPT by simply specifying GPT_DEPLOY_NAME="gpt-o4-mini" in all files in scripts_api.

1. Launch vllm Server

bash launch.sh

# OR 

vllm serve \
  model-weights/Qwen3-32B/ \
  --tensor-parallel-size 4 \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice \
  --tool-call-parser hermes \
  --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' \
  --max-model-len 131072

2. Edit data/metadata.json to select which chart types to generate.

3. Generate JSON Template and README via JSON expert.

Please check the GPT version that you are going to use first in script.

python3 scripts_api/generate_json_template.py

4. Generate JSON data and QA via Data expert.

Please check the GPT version that you are going to use first in script.

python3 scripts_api/generate_json_data_and_qa.py

5. Generate Python scripts via Python expert.

Please check the GPT version that you are going to use first in script.

python3 scripts_api/generate_py_script.py

You can run this in parallel with step 4.

6. Validate QA & JSON format

python3 tools/data/check_json_qa_format.py

7. Merge all outputs

python3 tools/data/merge_folders.py

8. Produce chart images

Please adjust the number of worker accordingly in script.

python3 tools/data/generate_chart_image.py

9. Collect your data in "data/final/"

⏱️ Time Cost (Qwen3-32B on 4 L40s)

JSON data and qa generation: 2.2 min for one chart type and one pair Python script genearation: 10 min for one chart type and one script

Task Time per chart type
Template generation 4.3 min
JSON data & QA generation 2.2 min / per pair
Python script generation 10 min / per script

πŸ“¦ ChartDQA Benchmark

πŸ“₯ Data download

Hugging Face Dataset

πŸ—‚οΈ Data Structure

We provide two annotation formatsβ€”JSON and JSONLβ€”with identical QA pairs.

Use test.json for full evaluation and test_small.json for a quick run on 1,000 sampled QA pairs.

>ChartDQA
β”œβ”€β”€ data
β”‚   └── Area_Chart/
β”‚   β”‚    └── chart/
β”‚   β”‚        └── 000000_script_matplotlib_0.png
β”‚   β”‚        └── ...
β”‚   β”‚    └── csv/
β”‚   β”‚        └── 000000.csv
β”‚   β”‚        └── ...
β”‚   β”‚    └── json/
β”‚   β”‚        └── 000000.json
β”‚   β”‚        └── ...
β”‚   β”‚    └── qa/
β”‚   β”‚        └── 000000.json
β”‚   β”‚        └── ...
β”‚   └── Bar_Chart/
β”‚   └── Box_Plot/
β”‚   └── ...
└── test.json
└── test.jsonl
└── test_small.json
└── test_small.jsonl

πŸ“š Citation

If you find ChartScope useful, please cite:

@inproceedings{fan2025chartscope,
  title={On pre-training of multimodal language models customized for chart understanding},
  author={Fan, Wan-Cyuan and Chen, Yen-Chun and Liu, Mengchen and Jacobson, Alexander and Yuan, Lu and Sigal, Leonid},
  booktitle={NeurIPS Workshop on Adaptive Foundation Models},
  year={2024}
}

πŸ“œ License

This project is licensed under the MIT License.

About

This is the official repository for our paper πŸ“„ β€œIn-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding”

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published