chuyi777 commited on
Commit
737e12d
1 Parent(s): d921d0d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Llama-3 8B RLHF checkpoint trained by OpenRLHF
2
+
3
+ Using the models and datasets:
4
+
5
+ - Base model: https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-mixture
6
+ - Reward model: https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-mixture
7
+ - Prompt dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
8
+
9
+ Training Hyperparameters
10
+
11
+ ```
12
+ Actor Learning Rate: 5e-7
13
+ Critic Learning Rate: 9e-6
14
+ Learning Rate Scheduler: Cosine with 0.03 Warmup
15
+ PPO epoch: 1
16
+ Training Batch Size: 128
17
+ Experience Buffer Size: 1024
18
+ Reward Normalization: True
19
+ Max Prompt Length: 2048
20
+ Max Response Length: 2048
21
+ Max Samples: 100k
22
+ ```
23
+
24
+ Training logs
25
+
26
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63f6c04ac96958470d1e9043/Le91UD2mkieWjY06O815d.png" width="800px">
27
+