Jlonge4's picture
Jlonge4/outputs
7b94199 verified
|
raw
history blame
2.3 kB
metadata
base_model: microsoft/Phi-3.5-mini-instruct
library_name: peft
license: mit
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: outputs
    results: []

Visualize in Weights & Biases

outputs

This model is a fine-tuned version of microsoft/Phi-3.5-mini-instruct on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3147

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 20
  • training_steps: 100

Training results

Training Loss Epoch Step Validation Loss
2.2594 0.5263 5 2.2572
1.6785 1.0526 10 1.8170
1.6015 1.5789 15 1.4296
1.0556 2.1053 20 1.1199
0.9412 2.6316 25 1.0660
0.8872 3.1579 30 1.0523
0.9157 3.6842 35 1.0713
0.7735 4.2105 40 1.0983
0.6182 4.7368 45 1.0816
0.734 5.2632 50 1.1017
0.4736 5.7895 55 1.2109
0.3138 6.3158 60 1.2195
0.5315 6.8421 65 1.3147

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1