MultiModal-Phi2 / README.md
ravi.naik
Fixed relative import issues
667ae00

A newer version of the Gradio SDK is available: 5.6.0

Upgrade
metadata
title: MultiModal Phi2
emoji: πŸš€
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.35.2
app_file: app.py
pinned: false
license: mit

Phi2 : Multimodal Finetuning

Details

  1. LLM Backbone: Phi2
  2. Vision Tower: clip-vit-large-patch14-336
  3. Audio Model: Whisper
  4. Pretraining Dataset: LAION-CC-SBU dataset with BLIP captions(200k samples)
  5. Finetuning Dataset: Instruct 150k dataset based on COCO

Design

image

Pretraining

Training Loss Curve

image

Learing Rate

image

Training Logs

image

Finetuning

Training Loss Curve

image

Learing Rate

image

Training Logs

image

Results

image