You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
3. Submit the results to the [evaluation server](https://eval.ai/web/challenges/challenge-page/830/my-submission): `./playground/data/eval/vqav2/answers_upload`.
and put under `./playground/data/eval/gqa/data`. You may need to modify `eval.py` as [this](https://gist.github.com/haotian-liu/db6eddc2a984b4cbcc8a7f26fd523187) due to the missing assets in the
1. Download [`test.json`](https://vizwiz.cs.colorado.edu/VizWiz_final/vqa_data/Annotations.zip) and extract [`test.zip`](https://vizwiz.cs.colorado.edu/VizWiz_final/images/test.zip) to `test`. Put
3. Submit the results to the [evaluation server](https://eval.ai/web/challenges/challenge-page/1911/my-submission): `./playground/data/eval/vizwiz/answers_upload`.
75
70
76
71
### ScienceQA
77
72
78
73
1. Under `./playground/data/eval/scienceqa`, download `images`, `pid_splits.json`, `problems.json` from the `data/scienceqa` folder of the ScienceQA [repo](https://github.com/lupantech/ScienceQA).
1. Download [`TextVQA_0.5.1_val.json`](https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_val.json) and [images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip) and extract
1. Download `coco` from [POPE](https://github.com/AoiDragon/POPE/tree/e3e39262c85a6a83f26cf5094022a782cb0df58d/output/coco) and put under `./playground/data/eval/pope`.
1. Download [`mmbench_dev_20230712.tsv`](https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_dev_20230712.tsv) and put under `./playground/data/eval/mmbench`.
3. Submit the results to the [evaluation server](https://opencompass.org.cn/leaderboard-multimodal): `./playground/data/eval/mmbench/answers_upload/mmbench_dev_20230712`.
125
114
126
115
### MMBench-CN
127
116
128
117
1. Download [`mmbench_dev_cn_20231003.tsv`](https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_dev_cn_20231003.tsv) and put under `./playground/data/eval/mmbench`.
3. Submit the results to the evaluation server: `./playground/data/eval/mmbench/answers_upload/mmbench_dev_cn_20231003`.
136
123
124
+
137
125
### SEED-Bench
138
126
139
127
1. Following the official [instructions](https://github.com/AILab-CVC/SEED-Bench/blob/main/DATASET.md) to download the images and the videos. Put images
140
128
under `./playground/data/eval/seed_bench/SEED-Bench-image`.
141
129
2. Extract the video frame in the middle from the downloaded videos, and put them under `./playground/data/eval/seed_bench/SEED-Bench-video-image`. We provide our script `extract_video_frames.py`
4. Optionally, submit the results to the leaderboard: `./playground/data/eval/seed_bench/answers_upload` using the official jupyter notebook.
150
136
151
137
### LLaVA-Bench-in-the-Wild
152
138
153
139
1. Extract contents of [`llava-bench-in-the-wild`](https://huggingface.co/datasets/liuhaotian/llava-bench-in-the-wild) to `./playground/data/eval/llava-bench-in-the-wild`.
3. Evaluate the predictions in `./playground/data/eval/mmvet/results` using the official jupyter notebook.
170
153
171
154
## More Benchmarks
@@ -179,11 +162,9 @@ Below are awesome benchmarks for multimodal understanding from the research comm
179
162
2. Download and extract [images](https://huggingface.co/datasets/nanyangtu/LLVisionQA-QBench/resolve/main/images_llvisionqa.tar) and put all the images directly
180
163
under `./playground/data/eval/qbench/images_llviqionqa`.
181
164
3. Single-GPU inference (change `dev` to `test` for evaluation on test set).
182
-
183
165
```Shell
184
166
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/qbench.sh dev
185
167
```
186
-
187
168
4. Submit the results by instruction [here](https://github.com/VQAssessment/Q-Bench#option-1-submit-results): `./playground/data/eval/qbench/llvisionqa_dev_answers.jsonl`.
188
169
189
170
### Chinese-Q-Bench
@@ -194,9 +175,7 @@ CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/qbench.sh dev
194
175
2. Download and extract [images](https://huggingface.co/datasets/nanyangtu/LLVisionQA-QBench/resolve/main/images_llvisionqa.tar) and put all the images directly
195
176
under `./playground/data/eval/qbench/images_llviqionqa`.
196
177
3. Single-GPU inference (change `dev` to `test` for evaluation on test set).
197
-
198
178
```Shell
199
179
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/qbench_zh.sh dev
200
180
```
201
-
202
181
4. Submit the results by instruction [here](https://github.com/VQAssessment/Q-Bench#option-1-submit-results): `./playground/data/eval/qbench/llvisionqa_zh_dev_answers.jsonl`.
Copy file name to clipboardexpand all lines: LLaVA/docs/LLaVA_from_LLaMA2.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,7 @@ Llama 2 checkpoints, and release it to the community for the public use.
9
9
10
10
You need to apply for and download the latest Llama 2 checkpoints to start your own training (apply [here](https://ai.meta.com/resources/models-and-libraries/llama-downloads/))
:volcano: How is the new LLaVA based on Llama 2 different from Llama 1? The comparisons of the training process are described:
20
-
21
21
-**Pre-training**. The pre-trained base LLM is changed from Llama 1 to Llama 2
22
22
-**Language instruction-tuning**. The previous LLaVA model starts with Vicuna, which is instruct tuned on ShareGPT data from Llama 1; The new LLaVA model starts with Llama 2 Chat, which is an
23
23
instruct tuned checkpoint on dialogue data from Llama 2.
These are projector weights we have pretrained. You can use these projector weights for visual instruction tuning. They are just pretrained on image-text pairs and are NOT instruction-tuned, which
@@ -64,12 +66,14 @@ When using these projector weights to instruction-tune your LMM, please make sur
64
66
| Vicuna-13B-v1.3 | CLIP-L | Linear | LCS-558K | 1e |[projector](https://huggingface.co/liuhaotian/llava-pretrain-vicuna-13b-v1.3)|
65
67
| Vicuna-7B-v1.3 | CLIP-L | Linear | LCS-558K | 1e |[projector](https://huggingface.co/liuhaotian/llava-pretrain-vicuna-7b-v1.3)|
66
68
69
+
67
70
## Science QA Checkpoints
68
71
69
72
| Base LLM | Vision Encoder | Pretrain Data | Pretraining schedule | Finetuning Data | Finetuning schedule | Download |
The model weights below are *merged* weights. You do not need to apply delta. The usage of LLaVA checkpoints should comply with the base LLM's model license.
@@ -78,6 +82,7 @@ The model weights below are *merged* weights. You do not need to apply delta. Th
The model weights below are *delta* weights. The usage of LLaVA checkpoints should comply with the base LLM's model license: [LLaMA](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md).
0 commit comments