Edit model card

Visualize in Weights & Biases

meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D30001

This model is a fine-tuned version of unsloth/meta-llama-3.1-8b-instruct-bnb-4bit on the None dataset.

Model description

This model was trained on all Successful episodes similar to D10001 but instead of using the whole episode as input, each episode was split into conversation pieces.

e.g.

[
{
  role: 'user'
  content: '...'
},
{
  role: 'assistant'
  content: '...'
},
{
  role: 'user'
  content: '...'
},
{
  role: 'assistant'
  content: '...'
},
]
is split int:

[
{
  role: 'user'
  content: '...'
},
{
  role: 'assistant'
  content: '...'
},

and

[
{
  role: 'user'
  content: '...'
},
{
  role: 'assistant'
  content: '...'
},
{
  role: 'user'
  content: '...'
},
{
  role: 'assistant'
  content: '...'
},
]

Training and evaluation data

After splitting, the dataset contains 9000 conversation bits accross all games.

The Dataset ID is D30001

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 7331
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • lr_scheduler_warmup_steps: 5
  • num_epochs: 1

Training results

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
5
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Collections including clembench-playpen/meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D30001