KoLLaVA : Korean Large Language and Vision Assistant (feat. LLaVA)
This model is a large multimodal model (LMM) that combines the LLM(KoVicuna) with visual encoder of CLIP(ViT-14), trained on Korean visual-instruction dataset.
Detail codes are available at KoLLaVA github repository
Training hyperparameters
- learning rate : 2e-5
- train_batch_size: 16
- distributed_type: multi-GPU (A100 80G)
- num_devices: 4
- gradient_accumulation_steps: 1
- total_train_batch_size: 64
- total_eval_batch_size: 16
- lr_scheduler_type: cosine
- num_epochs: 1
Model License: Apache License 2.0
- Downloads last month
- 172
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.