File size: 4,761 Bytes
e161d72
 
cb5e6ec
 
180d020
 
dfdeb4b
c68457d
 
e161d72
 
698c207
e161d72
 
18366b1
 
 
 
e161d72
 
 
 
 
 
 
 
 
 
553a5fe
c68457d
8e4cba2
e161d72
8e4cba2
e161d72
 
 
c68457d
18366b1
e161d72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d0d077
e161d72
553a5fe
 
3856155
 
 
 
 
553a5fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e161d72
 
 
 
 
 
 
e108eb4
e161d72
 
 
 
 
 
 
 
 
 
 
e108eb4
e161d72
 
 
 
 
37da596
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
library_name: transformers
language:
- en
datasets:
- chart-misinformation-detection/bar_line_pie_4k
pipeline_tag: image-text-to-text
tags:
- chart
---

# Model Card for LlavaNext BLP4k

<!-- Provide a quick summary of what the model is/does. -->
This is a LlavaNext model finetuned on a synthetic dataset of bar, line, and pie charts. 
The goal is to detect if there is a misleading element in a chart image. 
The types of misleading elements that we propose are limited to: non-zero baseline for bar charts, 
omission of x-axis data points for line charts, and segments do not sum up to 100% in pie charts.


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** [Team Snoopy](https://huggingface.co/chart-misinformation-detection)
- **Model type:** Multimodal Image + Text
- **Finetuned from model:** [LlavaNext](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf)

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [LlavaNext blp-4k](https://huggingface.co/chart-misinformation-detection/hf-llava-next-finetune-blp4k)(adapters only)
- **Demo:** [LlavaNext BLP4k](https://huggingface.co/spaces/chart-misinformation-detection/Llava-Next-BLP4k)

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

Use the code below to get started with the model. Only works on GPU

```python
# Load model
from transformers import (
    AutoProcessor,
    LlavaNextForConditionalGeneration,
    BitsAndBytesConfig
)
from peft import PeftConfig, PeftModel
import requests
import torch

base_model = "llava-hf/llava-v1.6-mistral-7b-hf"
adapter_weights_repo = "chart-misinformation-detection/hf-llava-next-finetune-blp4k"

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16
)

processor = AutoProcessor.from_pretrained(base_model)
model = LlavaNextForConditionalGeneration.from_pretrained(
    base_model,
    torch_dtype=torch.float16,
    quantization_config=quantization_config,
)

model = PeftModel.from_pretrained(model, adapter_weights_repo)

# preprocess input
prompt="[INST] <image>Evaluate if this chart is misleading, and if so explain [/INST]"
image = Image.open(requests.get(image_url, stream=True).raw)
inputs = processor(prompt, image, return_tensors="pt")

# inference
output = model.generate(**inputs, max_new_tokens=500)
print(processor.decode(output[0], skip_special_tokens=False))
```

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[BLP4k dataset](https://huggingface.co/datasets/chart-misinformation-detection/bar_line_pie_4k)(dataset of synthetically created bar, line, and pie charts including misleading and non-misleading ones)

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->


#### Training Hyperparameters

- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->


## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

- Liu, Haotian, Li, Chunyuan, Li, Yuheng, Li, Bo, Zhang, Yuanhan, Shen, Sheng, & Lee, Yong Jae. (2024, January). **LLaVA-NeXT: Improved reasoning, OCR, and world knowledge**. Retrieved from [https://llava-vl.github.io/blog/2024-01-30-llava-next/](https://llava-vl.github.io/blog/2024-01-30-llava-next/).

- Liu, Haotian, Li, Chunyuan, Li, Yuheng, & Lee, Yong Jae. (2023). **Improved Baselines with Visual Instruction Tuning**. *arXiv:2310.03744*.

- Liu, Haotian, Li, Chunyuan, Wu, Qingyang, & Lee, Yong Jae. (2023). **Visual Instruction Tuning**. *NeurIPS*.