Edit model card

LLaVA-JP Model Card

This is a pretrained checkpoint, you can use it to instruct tune your multimodal models.

Check out the instructions here

Model details

Model type: LLaVA-JP is a vision-language model that can converse about input images.
This model is an LVLM model trained using google/siglip-so400m-patch14-384 as the image encoder and llm-jp/llm-jp-1.3b-v1.0 as the text decoder. supports the input of 768 x 768 high resolution images by scaling_on_scales method.

Training dataset

Acknowledgement

License

Apache-2.0

Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train toshi456/llava-jp-1.3b-v1.1-pretrain