File size: 750 Bytes
737e12d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
Llama-3 8B RLHF checkpoint trained by OpenRLHF
Using the models and datasets:
- Base model: https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-mixture
- Reward model: https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-mixture
- Prompt dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
Training Hyperparameters
```
Actor Learning Rate: 5e-7
Critic Learning Rate: 9e-6
Learning Rate Scheduler: Cosine with 0.03 Warmup
PPO epoch: 1
Training Batch Size: 128
Experience Buffer Size: 1024
Reward Normalization: True
Max Prompt Length: 2048
Max Response Length: 2048
Max Samples: 100k
```
Training logs
<img src="https://cdn-uploads.huggingface.co/production/uploads/63f6c04ac96958470d1e9043/Le91UD2mkieWjY06O815d.png" width="800px">
|