--- license: llama2 language: - en - zh tags: - multimodal datasets: - liuhaotian/LLaVA-Pretrain base_model: - lmsys/vicuna-7b-v1.5 pipeline_tag: image-text-to-text library_name: transformers --- ## **Citation** If you find this model useful, please cite the following paper ``` @article{huang2024deciphering, title={Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate}, author={Huang, Qidong and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Cao, Yuhang and Wang, Jiaqi and Lin, Dahua and Zhang, Weiming and Yu, Nenghai}, journal={arXiv preprint arXiv:2410.07167}, year={2024} } ```