Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Llama-3 8B RLHF checkpoint trained by OpenRLHF
|
2 |
+
|
3 |
+
Using the models and datasets:
|
4 |
+
|
5 |
+
- Base model: https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-mixture
|
6 |
+
- Reward model: https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-mixture
|
7 |
+
- Prompt dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
|
8 |
+
|
9 |
+
Training Hyperparameters
|
10 |
+
|
11 |
+
```
|
12 |
+
Actor Learning Rate: 5e-7
|
13 |
+
Critic Learning Rate: 9e-6
|
14 |
+
Learning Rate Scheduler: Cosine with 0.03 Warmup
|
15 |
+
PPO epoch: 1
|
16 |
+
Training Batch Size: 128
|
17 |
+
Experience Buffer Size: 1024
|
18 |
+
Reward Normalization: True
|
19 |
+
Max Prompt Length: 2048
|
20 |
+
Max Response Length: 2048
|
21 |
+
Max Samples: 100k
|
22 |
+
```
|
23 |
+
|
24 |
+
Training logs
|
25 |
+
|
26 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/63f6c04ac96958470d1e9043/Le91UD2mkieWjY06O815d.png" width="800px">
|
27 |
+
|