Groq_Llama-3-Tool-Use-VisitorRequest-Lora
This model is a fine-tuned version of Groq/Llama-3-Groq-8B-Tool-Use on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.6920
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.3852 | 0.0635 | 1 | 5.9455 |
5.9202 | 0.1270 | 2 | 2.0524 |
2.1422 | 0.1905 | 3 | 1.2710 |
1.32 | 0.2540 | 4 | 1.1640 |
1.1839 | 0.3175 | 5 | 0.8947 |
0.9042 | 0.3810 | 6 | 0.9210 |
0.8737 | 0.4444 | 7 | 0.8815 |
0.8936 | 0.5079 | 8 | 0.7752 |
0.7229 | 0.5714 | 9 | 0.7938 |
0.8039 | 0.6349 | 10 | 0.8019 |
0.8014 | 0.6984 | 11 | 0.7562 |
0.749 | 0.7619 | 12 | 0.7280 |
0.71 | 0.8254 | 13 | 0.7346 |
0.7461 | 0.8889 | 14 | 0.7208 |
0.6635 | 0.9524 | 15 | 0.7096 |
0.7271 | 1.0159 | 16 | 0.7038 |
0.6691 | 1.0794 | 17 | 0.7117 |
0.6672 | 1.1429 | 18 | 0.7996 |
0.7766 | 1.2063 | 19 | 0.7333 |
0.6818 | 1.2698 | 20 | 0.7651 |
0.684 | 1.3333 | 21 | 0.7110 |
0.6575 | 1.3968 | 22 | 0.7213 |
0.6146 | 1.4603 | 23 | 0.7275 |
0.6245 | 1.5238 | 24 | 0.7908 |
0.7224 | 1.5873 | 25 | 0.7301 |
0.6472 | 1.6508 | 26 | 0.7082 |
0.6066 | 1.7143 | 27 | 0.7114 |
0.6735 | 1.7778 | 28 | 0.6984 |
0.6263 | 1.8413 | 29 | 0.6899 |
0.5998 | 1.9048 | 30 | 0.6920 |
Framework versions
- PEFT 0.5.0
- Transformers 4.44.0
- Pytorch 2.1.0+cu118
- Datasets 2.16.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Model tree for mg11/Groq_Llama-3-Tool-Use-VisitorRequest-Lora
Base model
meta-llama/Meta-Llama-3-8B
Finetuned
Groq/Llama-3-Groq-8B-Tool-Use