Update README.md
Browse files
README.md
CHANGED
@@ -18,10 +18,6 @@ In the v1.5 (08/2024) release, we present a series of XGen-MM models including:
|
|
18 |
|
19 |
For more details, check out our [tech report](https://arxiv.org/pdf/2408.08872), [fine-tuning code](https://github.com/salesforce/LAVIS/tree/xgen-mm), and project page (coming soon).
|
20 |
|
21 |
-
# Data
|
22 |
-
|
23 |
-
For DPO, we employ [VLFeedback](https://github.com/vlf-silkie/VLFeedback?tab=readme-ov-file), a synthetically annotated multimodal preference dataset that uses off-the-shelf VLMs to generate responses to a diverse mix of multimodal instructions that are then scored by GPT4-V along three axes -- helpfulness, visual faithfulness, and ethics. The dataset contains 80k such instructions from which we construct preference data by marking as preferred (and dispreferred) the response with the highest (and lowest) average score across models and filtering out examples with low-scoring preferred responses. We thus generate 62.6k preference examples. For safety finetuning, use the train split of the [VLGuard](https://github.com/ys-zong/VLGuard) dataset, which contains 2k examples of unsafe images and instructions, in addition to 5k additional examples randomly samples from our instruction finetuning stage.
|
24 |
-
|
25 |
# DPO model results
|
26 |
|
27 |
| Model | VLGuard (↓)| HallusionBench (↑) | POPE (↑) | MMBench (dev) (↑) | SEED-IMG (↑) | MMStar (↑)| MME (norm) (↑)|
|
|
|
18 |
|
19 |
For more details, check out our [tech report](https://arxiv.org/pdf/2408.08872), [fine-tuning code](https://github.com/salesforce/LAVIS/tree/xgen-mm), and project page (coming soon).
|
20 |
|
|
|
|
|
|
|
|
|
21 |
# DPO model results
|
22 |
|
23 |
| Model | VLGuard (↓)| HallusionBench (↑) | POPE (↑) | MMBench (dev) (↑) | SEED-IMG (↑) | MMStar (↑)| MME (norm) (↑)|
|