--- library_name: transformers language: - en datasets: - chart-misinformation-detection/bar_line_pie_4k pipeline_tag: image-text-to-text tags: - chart --- # Model Card for LlavaNext BLP4k This is a LlavaNext model finetuned on a synthetic dataset of bar, line, and pie charts. The goal is to detect if there is a misleading element in a chart image. The types of misleading elements that we propose are limited to: non-zero baseline for bar charts, omission of x-axis data points for line charts, and segments do not sum up to 100% in pie charts. ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [Team Snoopy](https://huggingface.co/chart-misinformation-detection) - **Model type:** Multimodal Image + Text - **Finetuned from model:** [LlavaNext](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf) ### Model Sources - **Repository:** [LlavaNext blp-4k](https://huggingface.co/chart-misinformation-detection/hf-llava-next-finetune-blp4k)(adapters only) - **Demo:** [LlavaNext BLP4k](https://huggingface.co/spaces/chart-misinformation-detection/Llava-Next-BLP4k) ## Uses ### Direct Use [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. Only works on GPU ```python # Load model from transformers import ( AutoProcessor, LlavaNextForConditionalGeneration, BitsAndBytesConfig ) from peft import PeftConfig, PeftModel import requests import torch base_model = "llava-hf/llava-v1.6-mistral-7b-hf" adapter_weights_repo = "chart-misinformation-detection/hf-llava-next-finetune-blp4k" quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16 ) processor = AutoProcessor.from_pretrained(base_model) model = LlavaNextForConditionalGeneration.from_pretrained( base_model, torch_dtype=torch.float16, quantization_config=quantization_config, ) model = PeftModel.from_pretrained(model, adapter_weights_repo) # preprocess input prompt="[INST] Evaluate if this chart is misleading, and if so explain [/INST]" image = Image.open(requests.get(image_url, stream=True).raw) inputs = processor(prompt, image, return_tensors="pt") # inference output = model.generate(**inputs, max_new_tokens=500) print(processor.decode(output[0], skip_special_tokens=False)) ``` ## Training Details ### Training Data [BLP4k dataset](https://huggingface.co/datasets/chart-misinformation-detection/bar_line_pie_4k)(dataset of synthetically created bar, line, and pie charts including misleading and non-misleading ones) ### Training Procedure #### Training Hyperparameters - **Training regime:** [More Information Needed] ## Citation **BibTeX:** - Liu, Haotian, Li, Chunyuan, Li, Yuheng, Li, Bo, Zhang, Yuanhan, Shen, Sheng, & Lee, Yong Jae. (2024, January). **LLaVA-NeXT: Improved reasoning, OCR, and world knowledge**. Retrieved from [https://llava-vl.github.io/blog/2024-01-30-llava-next/](https://llava-vl.github.io/blog/2024-01-30-llava-next/). - Liu, Haotian, Li, Chunyuan, Li, Yuheng, & Lee, Yong Jae. (2023). **Improved Baselines with Visual Instruction Tuning**. *arXiv:2310.03744*. - Liu, Haotian, Li, Chunyuan, Wu, Qingyang, & Lee, Yong Jae. (2023). **Visual Instruction Tuning**. *NeurIPS*.