File size: 4,761 Bytes

---
library_name: transformers
language:
- en
datasets:
- chart-misinformation-detection/bar_line_pie_4k
pipeline_tag: image-text-to-text
tags:
- chart
---

# Model Card for LlavaNext BLP4k

<!-- Provide a quick summary of what the model is/does. -->
This is a LlavaNext model finetuned on a synthetic dataset of bar, line, and pie charts. 
The goal is to detect if there is a misleading element in a chart image. 
The types of misleading elements that we propose are limited to: non-zero baseline for bar charts, 
omission of x-axis data points for line charts, and segments do not sum up to 100% in pie charts.


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** [Team Snoopy](https://huggingface.co/chart-misinformation-detection)
- **Model type:** Multimodal Image + Text
- **Finetuned from model:** [LlavaNext](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf)

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [LlavaNext blp-4k](https://huggingface.co/chart-misinformation-detection/hf-llava-next-finetune-blp4k)(adapters only)
- **Demo:** [LlavaNext BLP4k](https://huggingface.co/spaces/chart-misinformation-detection/Llava-Next-BLP4k)

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

Use the code below to get started with the model. Only works on GPU

```python
# Load model
from transformers import (
    AutoProcessor,
    LlavaNextForConditionalGeneration,
    BitsAndBytesConfig
)
from peft import PeftConfig, PeftModel
import requests
import torch

base_model = "llava-hf/llava-v1.6-mistral-7b-hf"
adapter_weights_repo = "chart-misinformation-detection/hf-llava-next-finetune-blp4k"

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16
)

processor = AutoProcessor.from_pretrained(base_model)
model = LlavaNextForConditionalGeneration.from_pretrained(
    base_model,
    torch_dtype=torch.float16,
    quantization_config=quantization_config,
)

model = PeftModel.from_pretrained(model, adapter_weights_repo)

# preprocess input
prompt="[INST] <image>Evaluate if this chart is misleading, and if so explain [/INST]"
image = Image.open(requests.get(image_url, stream=True).raw)
inputs = processor(prompt, image, return_tensors="pt")

# inference
output = model.generate(**inputs, max_new_tokens=500)
print(processor.decode(output[0], skip_special_tokens=False))
```

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[BLP4k dataset](https://huggingface.co/datasets/chart-misinformation-detection/bar_line_pie_4k)(dataset of synthetically created bar, line, and pie charts including misleading and non-misleading ones)

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->


#### Training Hyperparameters

- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->


## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

- Liu, Haotian, Li, Chunyuan, Li, Yuheng, Li, Bo, Zhang, Yuanhan, Shen, Sheng, & Lee, Yong Jae. (2024, January). **LLaVA-NeXT: Improved reasoning, OCR, and world knowledge**. Retrieved from [https://llava-vl.github.io/blog/2024-01-30-llava-next/](https://llava-vl.github.io/blog/2024-01-30-llava-next/).

- Liu, Haotian, Li, Chunyuan, Li, Yuheng, & Lee, Yong Jae. (2023). **Improved Baselines with Visual Instruction Tuning**. *arXiv:2310.03744*.

- Liu, Haotian, Li, Chunyuan, Wu, Qingyang, & Lee, Yong Jae. (2023). **Visual Instruction Tuning**. *NeurIPS*.