README.md · ryota39/mluke-large-lite-reward at bd7427eca96932b957f2adc245590a4594c0ea55

metadata

license: apache-2.0
base_model: studio-ousia/mluke-large-lite
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - precision
  - recall
  - f1
model-index:
  - name: out
    results: []

Fine-tuning

this model was trained to classify whether input text comes from "chosen sentence" or "rejected sentence"
the probability (logits after passing softmax function) in last layer of this model can be used to quantify the preference from user input
fine-tuned studio-ousia/mluke-large-lite via full parameter tuning using open-preference-v0.3
trained on bf16 format

Metric

train and validation split

train loss	eval loss	accuracy	recall	precision	f1-score
0.114	0.1615	0.9399	0.9459	0.9346	0.9402

test split

accuracy	recall	precision	f1-score
0.9416	0.9319	0.9504	0.9411

confusion matrix when test split

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
0.4109	1.0	1479	0.2462	0.9003	0.8710	0.9399	0.9041
0.1579	2.0	2958	0.1573	0.9399	0.9495	0.9293	0.9393
0.114	3.0	4437	0.1615	0.9399	0.9346	0.9460	0.9403

Framework versions

Transformers 4.42.3
Pytorch 2.1.0+cu118
Datasets 2.20.0
Tokenizers 0.19.1