Model Card for Model ID

Model Details

Model Description

This model is DPO by argilla/dpo-mix-7k dataset on rungao2001/vicuna-7b-v1.5_deita10k_sft_full model.

  • Model type: Llama2 Decoder-Only
  • Language(s) (NLP): English
  • License: llama2
  • Finetuned from model: rungao2001/vicuna-7b-v1.5_deita10k_sft_full

Training Details

Training Data

argilla/dpo-mix-7k

Training Procedure

DPO

Notice: The chat_template was modified because the original vicuna1.1 format cannot be used in trl.DPOTrainer. The error "Conversation roles must alternate user/assistant/user/assistant/..." was removed, and the system message is output only when loop.index0 == 0 and role == 'user'.

Training Hyperparameters

  • Precision: BFloat16
  • Chat Template: Modified Vicuna 1.1
  • Global Batch Size: 128
  • Learning Rate: 1.0e-6
  • Num Epoches: 3
  • Max Prompt Length: 1800
  • Max Length: 2048
  • Training Steps 156

Evaluation

It Finally achieved loss=0.5006, and rewards/accuracies = 78.72% in the eval set of argilla/dpo-mix-7k

Testing Data, Factors & Metrics

Downloads last month
14
Safetensors
Model size
6.74B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train rungao2001/vicuna-7b-v1.5-dpo-mix-7k-full