Spaces:
Running
on
A10G
Running
on
A10G
### ScienceQA | |
#### Prepare Data | |
1. Please see ScienceQA [repo](https://github.com/lupantech/ScienceQA) for setting up the dataset. | |
2. Generate ScienceQA dataset for LLaVA conversation-style format. | |
```Shell | |
python scripts/convert_sqa_to_llava.py \ | |
convert_to_llava \ | |
--base-dir /path/to/ScienceQA/data/scienceqa \ | |
--prompt-format "QCM-LEA" \ | |
--split {train,val,minival,test,minitest} | |
``` | |
#### Training | |
1. Pretraining | |
You can download our pretrained projector weights from our [Model Zoo](), or train your own projector weights using [`pretrain.sh`](https://github.com/haotian-liu/LLaVA/blob/main/scripts/pretrain.sh). | |
2. Finetuning | |
See [`finetune_sqa.sh`](https://github.com/haotian-liu/LLaVA/blob/main/scripts/finetune_sqa.sh). | |
#### Evaluation | |
1. Multiple-GPU inference | |
You may evaluate this with multiple GPUs, and concatenate the generated jsonl files. Please refer to our script for [batch evaluation](https://github.com/haotian-liu/LLaVA/blob/main/scripts/sqa_eval_batch.sh) and [results gathering](https://github.com/haotian-liu/LLaVA/blob/main/scripts/sqa_eval_gather.sh). | |
2. Single-GPU inference | |
(a) Generate LLaVA responses on ScienceQA dataset | |
```Shell | |
python -m llava.eval.model_vqa_science \ | |
--model-path liuhaotian/llava-lcs558k-scienceqa-vicuna-13b-v1.3 \ | |
--question-file /path/to/ScienceQA/data/scienceqa/llava_test_QCM-LEA.json \ | |
--image-folder /path/to/ScienceQA/data/scienceqa/images/test \ | |
--answers-file vqa/results/ScienceQA/test_llava-13b.jsonl \ | |
--conv-mode llava_v1 | |
``` | |
(b) Evaluate the generated responses | |
```Shell | |
python eval_science_qa.py \ | |
--base-dir /path/to/ScienceQA/data/scienceqa \ | |
--result-file vqa/results/ScienceQA/test_llava-13b.jsonl \ | |
--output-file vqa/results/ScienceQA/test_llava-13b_output.json \ | |
--output-result vqa/results/ScienceQA/test_llava-13b_result.json \ | |
``` | |
For reference, we attach our prediction file [`test_sqa_llava_lcs_558k_sqa_12e_vicuna_v1_3_13b.json`](https://github.com/haotian-liu/LLaVA/blob/main/llava/eval/table/results/test_sqa_llava_lcs_558k_sqa_12e_vicuna_v1_3_13b.json) and [`test_sqa_llava_13b_v0.json`](https://github.com/haotian-liu/LLaVA/blob/main/llava/eval/table/results/test_sqa_llava_13b_v0.json) for comparison when reproducing our results, as well as for further analysis in detail. | |