OpenRLHF
/

Llama-3-8b-rlhf-100k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chuyi777 commited on Jun 23

Commit

737e12d

•

1 Parent(s): d921d0d

Create README.md

Files changed (1) hide show

README.md +27 -0

README.md ADDED Viewed

	@@ -0,0 +1,27 @@

+Llama-3 8B RLHF checkpoint trained by OpenRLHF
+Using the models and datasets:
+- Base model: https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-mixture
+- Reward model: https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-mixture
+- Prompt dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
+Training Hyperparameters
+```
+Actor Learning Rate: 5e-7
+Critic Learning Rate: 9e-6
+Learning Rate Scheduler: Cosine with 0.03 Warmup
+PPO epoch: 1
+Training Batch Size: 128
+Experience Buffer Size: 1024
+Reward Normalization: True
+Max Prompt Length: 2048
+Max Response Length: 2048
+Max Samples: 100k
+```
+Training logs
+<img src="https://cdn-uploads.huggingface.co/production/uploads/63f6c04ac96958470d1e9043/Le91UD2mkieWjY06O815d.png" width="800px">