KoLLaVA : Korean Large Language and Vision Assistant (feat. LLaVA)

This model is a large multimodal model (LMM) that combines the LLM(LLaMA-2-7b-ko) with visual encoder of CLIP(ViT-14), trained on Korean visual-instruction dataset using QLoRA.

Detail codes are available at KoLLaVA github repository

  • Training hyperparameters
  • learning rate : 2e-4
  • train_batch_size: 16
  • distributed_type: multi-GPU (RTX3090 24G)
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 4
  • lr_scheduler_type: cosine
  • num_epochs: 1
  • lora_enable: True
  • bits: 4

Model License: cc-by-nc-4.0

Downloads last month
92
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.