|
--- |
|
language: |
|
- en |
|
pipeline_tag: image-text-to-text |
|
library_name: transformers |
|
extra_gated_fields: |
|
First name: text |
|
Last name: text |
|
Affiliation: text |
|
Job title: text |
|
Email: text |
|
Country: country |
|
I want to use this model for: |
|
type: select |
|
options: |
|
- Research |
|
- Education |
|
- label: Other |
|
value: other |
|
I agree to use this model for non-commercial use ONLY: checkbox |
|
|
|
--- |
|
|
|
|
|
# FinLLaVA Model Card |
|
|
|
![overview](images/overview.png) |
|
|
|
## Model details |
|
|
|
**Model type:** |
|
FinLLaVA is an open-source chatbot trained by fine-tuning FinLLaMA-instruct on GPT-generated multimodal instruction-following data. |
|
It is an auto-regressive language model, based on the transformer architecture. |
|
|
|
**Model date:** |
|
FinLLaVA was trained in July 2024. |
|
|
|
**Paper or resources for more information:** |
|
|
|
https://arxiv.org/abs/2408.11878 |
|
|
|
|
|
## Intended use |
|
**Primary intended uses:** |
|
The primary use of FinLLaVA is research on Financial large multimodal models and chatbots and those experts in financial domain. |
|
|
|
|
|
## Training dataset |
|
- 468K ALLaVA-4V |
|
- 79K OCR-VQA |
|
- 20K SynthTabNet |
|
- 5K UniChart |
|
- 20K ChartQA |
|
- 30K Chart2Text |
|
- 665K LLaVA-v1.5-mix665k |
|
- 143K Evol-Instruct |
|
|
|
|
|
|
|
## Evaluation dataset |
|
A collection of 4 benchmarks, including MMMU(test), MMMU-Business(test) and our own Chart&Table Bench. |
|
|
|
![outcome](images/outcome.png) |
|
|
|
# How to use |
|
|
|
## Installation |
|
|
|
1. Install Package |
|
|
|
```shell |
|
conda create -n llava python=3.10 -y |
|
conda activate llava |
|
pip install --upgrade pip # enable PEP 660 support |
|
|
|
git clone https://github.com/haotian-liu/LLaVA.git |
|
cd LLaVA |
|
pip install -e . |
|
``` |
|
|
|
2. Install additional packages for training cases |
|
|
|
```shell |
|
pip install -e ".[train]" |
|
pip install flash-attn --no-build-isolation |
|
``` |
|
|
|
## Interface |
|
|
|
```python |
|
from llava_llama3.serve.cli import chat_llava |
|
from llava_llama3.model.builder import load_pretrained_model |
|
import argparse |
|
import os |
|
import glob |
|
import pandas as pd |
|
from tqdm import tqdm |
|
import json |
|
|
|
root_path = os.path.dirname(os.path.abspath(__file__)) |
|
print(f'\033[92m{root_path}\033[0m') |
|
|
|
parser = argparse.ArgumentParser() |
|
parser.add_argument("--model-path", type=str, default="") |
|
parser.add_argument("--device", type=str, default="cuda") |
|
parser.add_argument("--conv-mode", type=str, default="llama_3") |
|
parser.add_argument("--temperature", type=float, default=0) |
|
parser.add_argument("--max-new-tokens", type=int, default=512) |
|
parser.add_argument("--load-8bit", action="store_true") |
|
parser.add_argument("--load-4bit", action="store_true") |
|
args = parser.parse_args() |
|
|
|
# load model |
|
tokenizer, llava_model, image_processor, context_len = load_pretrained_model(args.model_path, None, 'llava_llama3', args.load_8bit, args.load_4bit, device=args.device) |
|
|
|
print('\033[92mRunning chat\033[0m') |
|
output = chat_llava(args=args, |
|
image_file=root_path+'/data/llava_logo.png', |
|
text='What is this?', |
|
tokenizer=tokenizer, |
|
model=llava_model, |
|
image_processor=image_processor, # todo: input model name or path |
|
context_len=context_len) |
|
print('\033[94m', output, '\033[0m') |
|
``` |
|
|
|
If you encounter the error `No module named 'llava_llama3'`, set the `PYTHONPATH` as follows: |
|
|
|
```shell |
|
export PYTHONPATH=$PYTHONPATH:{$your_dir}/llava_llama3 |
|
``` |
|
|
|
## Disclaimer |
|
|
|
This repository and its contents are provided for academic and educational purposes only. None of the material constitutes financial, legal, or investment advice. No warranties, express or implied, are offered regarding the accuracy, completeness, or utility of the content. The authors and contributors are not responsible for any errors, omissions, or any consequences arising from the use of the information herein. Users should exercise their own judgment and consult professionals before making any financial, legal, or investment decisions. The use of the software and information contained in this repository is entirely at the user's own risk. |
|
|
|
By using or accessing the information in this repository, you agree to indemnify, defend, and hold harmless the authors, contributors, and any affiliated organizations or persons from any and all claims or damages. |
|
|
|
## Citaion |
|
|
|
```bibtex |
|
@misc{xie2024openfinllmsopenmultimodallarge, |
|
title={Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications}, |
|
author={Qianqian Xie and Dong Li and Mengxi Xiao and Zihao Jiang and Ruoyu Xiang and Xiao Zhang and Zhengyu Chen and Yueru He and Weiguang Han and Yuzhe Yang and Shunian Chen and Yifei Zhang and Lihang Shen and Daniel Kim and Zhiwei Liu and Zheheng Luo and Yangyang Yu and Yupeng Cao and Zhiyang Deng and Zhiyuan Yao and Haohang Li and Duanyu Feng and Yongfu Dai and VijayaSai Somasundaram and Peng Lu and Yilun Zhao and Yitao Long and Guojun Xiong and Kaleb Smith and Honghai Yu and Yanzhao Lai and Min Peng and Jianyun Nie and Jordan W. Suchow and Xiao-Yang Liu and Benyou Wang and Alejandro Lopez-Lira and Jimin Huang and Sophia Ananiadou}, |
|
year={2024}, |
|
eprint={2408.11878}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2408.11878}, |
|
} |
|
``` |
|
|
|
|