sparse_llama_7b_refined_web_50p_2024-03-24
This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on the None dataset. It achieves the following results on the evaluation set:
- Loss: 2.1950
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 0
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- training_steps: 800
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.4969 | 0.01 | 25 | 2.7682 |
2.4532 | 0.02 | 50 | 2.7136 |
2.4855 | 0.02 | 75 | 2.6372 |
2.4368 | 0.03 | 100 | 2.6029 |
2.4952 | 0.04 | 125 | 2.5761 |
2.3209 | 0.05 | 150 | 2.5665 |
2.2798 | 0.06 | 175 | 2.5517 |
2.4447 | 0.06 | 200 | 2.5399 |
2.4008 | 0.07 | 225 | 2.5317 |
2.3508 | 0.08 | 250 | 2.5271 |
2.2851 | 0.09 | 275 | 2.5222 |
2.3171 | 0.1 | 300 | 2.5151 |
2.3594 | 0.1 | 325 | 2.5102 |
2.3233 | 0.11 | 350 | 2.5063 |
2.2479 | 0.12 | 375 | 2.5039 |
2.3484 | 0.13 | 400 | 2.5004 |
2.3252 | 0.14 | 425 | 2.4961 |
2.2819 | 0.14 | 450 | 2.4951 |
2.3504 | 0.15 | 475 | 2.4907 |
2.3745 | 0.16 | 500 | 2.4860 |
2.2705 | 0.17 | 525 | 2.4860 |
2.271 | 0.18 | 550 | 2.4836 |
2.3821 | 0.18 | 575 | 2.4820 |
2.2663 | 0.19 | 600 | 2.4795 |
2.2919 | 0.2 | 625 | 2.4764 |
2.3755 | 0.21 | 650 | 2.4718 |
2.2654 | 0.22 | 675 | 2.4745 |
2.2857 | 0.22 | 700 | 2.4723 |
2.3063 | 0.23 | 725 | 2.4716 |
2.2062 | 0.24 | 750 | 2.4698 |
2.2921 | 0.25 | 775 | 2.4664 |
2.3404 | 0.26 | 800 | 2.4676 |
Framework versions
- Transformers 4.36.2
- Pytorch 2.1.1+cu121
- Datasets 2.15.0
- Tokenizers 0.15.2
- Downloads last month
- 13
Inference API (serverless) does not yet support model repos that contain custom code.
Model tree for thrunlab/sparse_llama_7b_refined_web_50p_2024-03-24
Base model
meta-llama/Llama-2-7b-chat-hf