MBZUAI
/

LLaVA-Meta-Llama-3-8B-Instruct

Text Generation

Inference Endpoints

Model card Files Files and versions Community

LLaVA-Meta-Llama-3-8B-Instruct / README.md

mmaaz60's picture

Update README.md

7491659 verified 7 months ago

|

history blame contribute delete

1.47 kB

	---
	{}
	---

	[![CODE](https://img.shields.io/badge/GitHub-Repository-<COLOR>)](https://github.com/mbzuai-oryx/LLaVA-pp)

	# LLaMA-3-V: Extending the Visual Capabilities of LLaVA with Meta-Llama-3-8B-Instruct

	## Repository Overview

	This repository features LLaVA v1.5 trained with the Meta-Llama-3-8B-Instruct LLM. This integration aims to leverage the strengths of both models to offer advanced vision-language understanding.

	## Training Strategy
	- Pretraining: Only Vision-to-Language projector is trained. The rest of the model is frozen.
	- Fine-tuning: LLM is LoRA fine-tuned. Only the vision-backbone (CLIP) is kept frozen.
	- Note: The repository contains merged weights.

	## Key Components

	- Base Large Language Model (LLM): [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
	- Base Large Multimodal Model (LMM): [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA)

	## Training Data

	- Pretraining Dataset: [LCS-558K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)
	- Fine-tuning Dataset: [LLaVA-Instruct-665K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json)

	## Download It As

	```
	git lfs install
	git clone https://huggingface.co/MBZUAI/LLaVA-Meta-Llama-3-8B-Instruct
	```

	---


	## Contributions

	Contributions are welcome! Please 🌟 our repository [LLaVA++](https://github.com/mbzuai-oryx/LLaVA-pp) if you find this model useful.

	---