File size: 4,761 Bytes
e161d72 cb5e6ec 180d020 dfdeb4b c68457d e161d72 698c207 e161d72 18366b1 e161d72 553a5fe c68457d 8e4cba2 e161d72 8e4cba2 e161d72 c68457d 18366b1 e161d72 9d0d077 e161d72 553a5fe 3856155 553a5fe e161d72 e108eb4 e161d72 e108eb4 e161d72 37da596 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
---
library_name: transformers
language:
- en
datasets:
- chart-misinformation-detection/bar_line_pie_4k
pipeline_tag: image-text-to-text
tags:
- chart
---
# Model Card for LlavaNext BLP4k
<!-- Provide a quick summary of what the model is/does. -->
This is a LlavaNext model finetuned on a synthetic dataset of bar, line, and pie charts.
The goal is to detect if there is a misleading element in a chart image.
The types of misleading elements that we propose are limited to: non-zero baseline for bar charts,
omission of x-axis data points for line charts, and segments do not sum up to 100% in pie charts.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- **Developed by:** [Team Snoopy](https://huggingface.co/chart-misinformation-detection)
- **Model type:** Multimodal Image + Text
- **Finetuned from model:** [LlavaNext](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf)
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** [LlavaNext blp-4k](https://huggingface.co/chart-misinformation-detection/hf-llava-next-finetune-blp4k)(adapters only)
- **Demo:** [LlavaNext BLP4k](https://huggingface.co/spaces/chart-misinformation-detection/Llava-Next-BLP4k)
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## How to Get Started with the Model
Use the code below to get started with the model. Only works on GPU
```python
# Load model
from transformers import (
AutoProcessor,
LlavaNextForConditionalGeneration,
BitsAndBytesConfig
)
from peft import PeftConfig, PeftModel
import requests
import torch
base_model = "llava-hf/llava-v1.6-mistral-7b-hf"
adapter_weights_repo = "chart-misinformation-detection/hf-llava-next-finetune-blp4k"
quantization_config = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16
)
processor = AutoProcessor.from_pretrained(base_model)
model = LlavaNextForConditionalGeneration.from_pretrained(
base_model,
torch_dtype=torch.float16,
quantization_config=quantization_config,
)
model = PeftModel.from_pretrained(model, adapter_weights_repo)
# preprocess input
prompt="[INST] <image>Evaluate if this chart is misleading, and if so explain [/INST]"
image = Image.open(requests.get(image_url, stream=True).raw)
inputs = processor(prompt, image, return_tensors="pt")
# inference
output = model.generate(**inputs, max_new_tokens=500)
print(processor.decode(output[0], skip_special_tokens=False))
```
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[BLP4k dataset](https://huggingface.co/datasets/chart-misinformation-detection/bar_line_pie_4k)(dataset of synthetically created bar, line, and pie charts including misleading and non-misleading ones)
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Training Hyperparameters
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
- Liu, Haotian, Li, Chunyuan, Li, Yuheng, Li, Bo, Zhang, Yuanhan, Shen, Sheng, & Lee, Yong Jae. (2024, January). **LLaVA-NeXT: Improved reasoning, OCR, and world knowledge**. Retrieved from [https://llava-vl.github.io/blog/2024-01-30-llava-next/](https://llava-vl.github.io/blog/2024-01-30-llava-next/).
- Liu, Haotian, Li, Chunyuan, Li, Yuheng, & Lee, Yong Jae. (2023). **Improved Baselines with Visual Instruction Tuning**. *arXiv:2310.03744*.
- Liu, Haotian, Li, Chunyuan, Wu, Qingyang, & Lee, Yong Jae. (2023). **Visual Instruction Tuning**. *NeurIPS*.
|