Skip to content

Commit 106e2ae

Browse files
committed
- bug fixes
- cleanup
1 parent 5fcc1bc commit 106e2ae

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1206
-1348
lines changed

.gitignore

+1-10
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,4 @@
22
/synthetic_or_generation/dis_background_removal_helpers/saved_models/isnet-general-use.pth
33
/evaluated_checkpoints.json
44
/requirements_submission.txt
5-
/EGE_README.MD
6-
/LLaVA/slurm_config_multiview.conf
7-
/LLaVA/slurm_config_multiview_aug.conf
8-
/LLaVA/slurm_config_multiview_temporal.conf
9-
/LLaVA/slurm_config_multiview_temporal_curriculum.conf
10-
/LLaVA/slurm_config_symbolic.conf
11-
/LLaVA/slurm_config_symbolic_synthetic_mv.conf
12-
/LLaVA/slurm_config_symbolic_synthetic_removal_mv.conf
13-
/slurm_config.conf
14-
/slurm_config_generate_novel_entities.conf
5+
/EGE_README.MD

LLaVA/docs/Evaluation.md

+1-22
Original file line numberDiff line numberDiff line change
@@ -43,11 +43,9 @@ scripts, and the prediction files with LLaVA v1.5. Extract to `./playground/data
4343

4444
1. Download [`test2015`](http://images.cocodataset.org/zips/test2015.zip) and put it under `./playground/data/eval/vqav2`.
4545
2. Multi-GPU inference.
46-
4746
```Shell
4847
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/eval/vqav2.sh
4948
```
50-
5149
3. Submit the results to the [evaluation server](https://eval.ai/web/challenges/challenge-page/830/my-submission): `./playground/data/eval/vqav2/answers_upload`.
5250

5351
### GQA
@@ -56,7 +54,6 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/eval/vqav2.sh
5654
and put under `./playground/data/eval/gqa/data`. You may need to modify `eval.py` as [this](https://gist.github.com/haotian-liu/db6eddc2a984b4cbcc8a7f26fd523187) due to the missing assets in the
5755
GQA v1.2 release.
5856
2. Multi-GPU inference.
59-
6057
```Shell
6158
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/eval/gqa.sh
6259
```
@@ -66,18 +63,15 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/eval/gqa.sh
6663
1. Download [`test.json`](https://vizwiz.cs.colorado.edu/VizWiz_final/vqa_data/Annotations.zip) and extract [`test.zip`](https://vizwiz.cs.colorado.edu/VizWiz_final/images/test.zip) to `test`. Put
6764
them under `./playground/data/eval/vizwiz`.
6865
2. Single-GPU inference.
69-
7066
```Shell
7167
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/vizwiz.sh
7268
```
73-
7469
3. Submit the results to the [evaluation server](https://eval.ai/web/challenges/challenge-page/1911/my-submission): `./playground/data/eval/vizwiz/answers_upload`.
7570

7671
### ScienceQA
7772

7873
1. Under `./playground/data/eval/scienceqa`, download `images`, `pid_splits.json`, `problems.json` from the `data/scienceqa` folder of the ScienceQA [repo](https://github.com/lupantech/ScienceQA).
7974
2. Single-GPU inference and evaluate.
80-
8175
```Shell
8276
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/sqa.sh
8377
```
@@ -87,7 +81,6 @@ CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/sqa.sh
8781
1. Download [`TextVQA_0.5.1_val.json`](https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_val.json) and [images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip) and extract
8882
to `./playground/data/eval/textvqa`.
8983
2. Single-GPU inference and evaluate.
90-
9184
```Shell
9285
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/textvqa.sh
9386
```
@@ -96,7 +89,6 @@ CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/textvqa.sh
9689

9790
1. Download `coco` from [POPE](https://github.com/AoiDragon/POPE/tree/e3e39262c85a6a83f26cf5094022a782cb0df58d/output/coco) and put under `./playground/data/eval/pope`.
9891
2. Single-GPU inference and evaluate.
99-
10092
```Shell
10193
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/pope.sh
10294
```
@@ -107,7 +99,6 @@ CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/pope.sh
10799
2. Downloaded images to `MME_Benchmark_release_version`.
108100
3. put the official `eval_tool` and `MME_Benchmark_release_version` under `./playground/data/eval/MME`.
109101
4. Single-GPU inference and evaluate.
110-
111102
```Shell
112103
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh
113104
```
@@ -116,43 +107,37 @@ CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh
116107

117108
1. Download [`mmbench_dev_20230712.tsv`](https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_dev_20230712.tsv) and put under `./playground/data/eval/mmbench`.
118109
2. Single-GPU inference.
119-
120110
```Shell
121111
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mmbench.sh
122112
```
123-
124113
3. Submit the results to the [evaluation server](https://opencompass.org.cn/leaderboard-multimodal): `./playground/data/eval/mmbench/answers_upload/mmbench_dev_20230712`.
125114

126115
### MMBench-CN
127116

128117
1. Download [`mmbench_dev_cn_20231003.tsv`](https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_dev_cn_20231003.tsv) and put under `./playground/data/eval/mmbench`.
129118
2. Single-GPU inference.
130-
131119
```Shell
132120
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mmbench_cn.sh
133121
```
134-
135122
3. Submit the results to the evaluation server: `./playground/data/eval/mmbench/answers_upload/mmbench_dev_cn_20231003`.
136123

124+
137125
### SEED-Bench
138126

139127
1. Following the official [instructions](https://github.com/AILab-CVC/SEED-Bench/blob/main/DATASET.md) to download the images and the videos. Put images
140128
under `./playground/data/eval/seed_bench/SEED-Bench-image`.
141129
2. Extract the video frame in the middle from the downloaded videos, and put them under `./playground/data/eval/seed_bench/SEED-Bench-video-image`. We provide our script `extract_video_frames.py`
142130
modified from the official one.
143131
3. Multiple-GPU inference and evaluate.
144-
145132
```Shell
146133
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/eval/seed.sh
147134
```
148-
149135
4. Optionally, submit the results to the leaderboard: `./playground/data/eval/seed_bench/answers_upload` using the official jupyter notebook.
150136

151137
### LLaVA-Bench-in-the-Wild
152138

153139
1. Extract contents of [`llava-bench-in-the-wild`](https://huggingface.co/datasets/liuhaotian/llava-bench-in-the-wild) to `./playground/data/eval/llava-bench-in-the-wild`.
154140
2. Single-GPU inference and evaluate.
155-
156141
```Shell
157142
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/llavabench.sh
158143
```
@@ -161,11 +146,9 @@ CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/llavabench.sh
161146

162147
1. Extract [`mm-vet.zip`](https://github.com/yuweihao/MM-Vet/releases/download/v1/mm-vet.zip) to `./playground/data/eval/mmvet`.
163148
2. Single-GPU inference.
164-
165149
```Shell
166150
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mmvet.sh
167151
```
168-
169152
3. Evaluate the predictions in `./playground/data/eval/mmvet/results` using the official jupyter notebook.
170153

171154
## More Benchmarks
@@ -179,11 +162,9 @@ Below are awesome benchmarks for multimodal understanding from the research comm
179162
2. Download and extract [images](https://huggingface.co/datasets/nanyangtu/LLVisionQA-QBench/resolve/main/images_llvisionqa.tar) and put all the images directly
180163
under `./playground/data/eval/qbench/images_llviqionqa`.
181164
3. Single-GPU inference (change `dev` to `test` for evaluation on test set).
182-
183165
```Shell
184166
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/qbench.sh dev
185167
```
186-
187168
4. Submit the results by instruction [here](https://github.com/VQAssessment/Q-Bench#option-1-submit-results): `./playground/data/eval/qbench/llvisionqa_dev_answers.jsonl`.
188169

189170
### Chinese-Q-Bench
@@ -194,9 +175,7 @@ CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/qbench.sh dev
194175
2. Download and extract [images](https://huggingface.co/datasets/nanyangtu/LLVisionQA-QBench/resolve/main/images_llvisionqa.tar) and put all the images directly
195176
under `./playground/data/eval/qbench/images_llviqionqa`.
196177
3. Single-GPU inference (change `dev` to `test` for evaluation on test set).
197-
198178
```Shell
199179
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/qbench_zh.sh dev
200180
```
201-
202181
4. Submit the results by instruction [here](https://github.com/VQAssessment/Q-Bench#option-1-submit-results): `./playground/data/eval/qbench/llvisionqa_zh_dev_answers.jsonl`.

LLaVA/docs/LLaVA_from_LLaMA2.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Llama 2 checkpoints, and release it to the community for the public use.
99

1010
You need to apply for and download the latest Llama 2 checkpoints to start your own training (apply [here](https://ai.meta.com/resources/models-and-libraries/llama-downloads/))
1111

12+
1213
## Training
1314

1415
Please
@@ -17,7 +18,6 @@ checkout [`pretrain.sh`](https://github.com/haotian-liu/LLaVA/blob/main/scripts/
1718
## LLaVA (based on Llama 2), What is different?
1819

1920
:volcano: How is the new LLaVA based on Llama 2 different from Llama 1? The comparisons of the training process are described:
20-
2121
- **Pre-training**. The pre-trained base LLM is changed from Llama 1 to Llama 2
2222
- **Language instruction-tuning**. The previous LLaVA model starts with Vicuna, which is instruct tuned on ShareGPT data from Llama 1; The new LLaVA model starts with Llama 2 Chat, which is an
2323
instruct tuned checkpoint on dialogue data from Llama 2.

LLaVA/docs/LoRA.md

-3
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,11 @@ the base model to use. Please make sure the base model corresponds to the LoRA c
1212
already, follow the instructions [here](https://github.com/lm-sys/FastChat#vicuna-weights)).
1313

1414
#### Launch a controller
15-
1615
```Shell
1716
python -m llava.serve.controller --host 0.0.0.0 --port 10000
1817
```
1918

2019
#### Launch a gradio web server.
21-
2220
```Shell
2321
python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload
2422
```
@@ -27,7 +25,6 @@ You just launched the Gradio web interface. Now, you can open the web interface
2725
not launched any model worker yet. It will be automatically updated when you launch a model worker.
2826

2927
#### Launch a model worker
30-
3128
```Shell
3229
python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path liuhaotian/llava-vicuna-7b-v1.1-lcs_558k-instruct_80k_3e-lora-preview-alpha --model-base /path/to/vicuna-v1.1
3330
```

LLaVA/docs/MODEL_ZOO.md

+5
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Base model: Vicuna v1.5. Training logs: [wandb](https://api.wandb.ai/links/lht/6
2323
LLaVA-1.5 achieves SoTA performance across 11 benchmarks.
2424
</p>
2525

26+
2627
## LLaVA-v1
2728

2829
*Note: We recommend using the most capable LLaVA-v1.5 series above for the best performance.*
@@ -33,6 +34,7 @@ Base model: Vicuna v1.5. Training logs: [wandb](https://api.wandb.ai/links/lht/6
3334
| LLaMA-2-13B-Chat | CLIP-L | LCS-558K | 1e | LLaVA-Instruct-80K | full_ft-1e | 56.7 | 58.6 | 80.0 | 67.9 | [ckpt](https://huggingface.co/liuhaotian/llava-llama-2-13b-chat-lightning-preview) |
3435
| LLaMA-2-7B-Chat | CLIP-L | LCS-558K | 1e | LLaVA-Instruct-80K | lora-1e | 51.2 | 58.9 | 71.6 | 62.8 | [LoRA](https://huggingface.co/liuhaotian/llava-llama-2-7b-chat-lightning-lora-preview) |
3536

37+
3638
## Projector weights
3739

3840
These are projector weights we have pretrained. You can use these projector weights for visual instruction tuning. They are just pretrained on image-text pairs and are NOT instruction-tuned, which
@@ -64,12 +66,14 @@ When using these projector weights to instruction-tune your LMM, please make sur
6466
| Vicuna-13B-v1.3 | CLIP-L | Linear | LCS-558K | 1e | [projector](https://huggingface.co/liuhaotian/llava-pretrain-vicuna-13b-v1.3) |
6567
| Vicuna-7B-v1.3 | CLIP-L | Linear | LCS-558K | 1e | [projector](https://huggingface.co/liuhaotian/llava-pretrain-vicuna-7b-v1.3) |
6668

69+
6770
## Science QA Checkpoints
6871

6972
| Base LLM | Vision Encoder | Pretrain Data | Pretraining schedule | Finetuning Data | Finetuning schedule | Download |
7073
|-----------------|----------------|---------------|----------------------|-----------------|---------------------|-----------------------------------------------------------------------------------|
7174
| Vicuna-13B-v1.3 | CLIP-L | LCS-558K | 1e | ScienceQA | full_ft-12e | [ckpt](https://huggingface.co/liuhaotian/llava-lcs558k-scienceqa-vicuna-13b-v1.3) |
7275

76+
7377
## Legacy Models (merged weights)
7478

7579
The model weights below are *merged* weights. You do not need to apply delta. The usage of LLaVA checkpoints should comply with the base LLM's model license.
@@ -78,6 +82,7 @@ The model weights below are *merged* weights. You do not need to apply delta. Th
7882
|-------------|----------------|---------------|----------------------|--------------------|---------------------|-----------------------------------------------------------------------------|
7983
| MPT-7B-Chat | CLIP-L | LCS-558K | 1e | LLaVA-Instruct-80K | full_ft-1e | [preview](https://huggingface.co/liuhaotian/LLaVA-Lightning-MPT-7B-preview) |
8084

85+
8186
## Legacy Models (delta weights)
8287

8388
The model weights below are *delta* weights. The usage of LLaVA checkpoints should comply with the base LLM's model license: [LLaMA](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md).

LLaVA/docs/ScienceQA.md

-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
### ScienceQA
22

33
#### Prepare Data
4-
54
1. Please see ScienceQA [repo](https://github.com/lupantech/ScienceQA) for setting up the dataset.
65
2. Generate ScienceQA dataset for LLaVA conversation-style format.
76

LLaVA/docs/Windows.md

-2
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,12 @@ now. More functionalities on Windows is to be added soon, stay tuned.*
66
## Installation
77

88
1. Clone this repository and navigate to LLaVA folder
9-
109
```bash
1110
git clone https://github.com/haotian-liu/LLaVA.git
1211
cd LLaVA
1312
```
1413

1514
2. Install Package
16-
1715
```Shell
1816
conda create -n llava python=3.10 -y
1917
conda activate llava

LLaVA/docs/macOS.md

-2
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,12 @@
55
## Installation
66

77
1. Clone this repository and navigate to LLaVA folder
8-
98
```bash
109
git clone https://github.com/haotian-liu/LLaVA.git
1110
cd LLaVA
1211
```
1312

1413
2. Install Package
15-
1614
```Shell
1715
conda create -n llava python=3.10 -y
1816
conda activate llava

0 commit comments

Comments
 (0)