Jellywibble
commited on
Commit
•
0e95bf0
1
Parent(s):
d49a221
Update README.md
Browse files
README.md
CHANGED
@@ -68,4 +68,24 @@ The original dataset contains over 50 million rows of completions (chatbot respo
|
|
68 |
</figure>
|
69 |
|
70 |
### Training procedure
|
71 |
-
The `gpt2_base_retry_and_continue_12m_reward_model` was trained using a [gpt2](https://huggingface.co/gpt2) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 375,000 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs. For evaluation metrics used during training, please see our [Weights and Biases Log](https://wandb.ai/jellywibble/reward).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
</figure>
|
69 |
|
70 |
### Training procedure
|
71 |
+
The `gpt2_base_retry_and_continue_12m_reward_model` was trained using a [gpt2](https://huggingface.co/gpt2) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 375,000 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs. For evaluation metrics used during training, please see our [Weights and Biases Log](https://wandb.ai/jellywibble/reward).
|
72 |
+
|
73 |
+
### BibTeX entry
|
74 |
+
To cite this model:
|
75 |
+
```bibtex
|
76 |
+
@misc{
|
77 |
+
author = {Chai Research, Irvine, Boubert, Raina, Liusie, Mudupalli, Korshuk, Liu, Cremer, Assassi, C. Beauchamp, Lu, Rialan, W. Beauchamp},
|
78 |
+
title = {{Rewarding chatbots for real-world engagement with millions of users}},
|
79 |
+
howpublished = {\url{https://arxiv.org/abs/2303.06135}},
|
80 |
+
year = 2023,
|
81 |
+
month = Mar
|
82 |
+
}
|
83 |
+
```
|
84 |
+
If you use this model, we would love to hear about it! Reach out on [correspondence email](mailto:thomas@chai-research.com?subject=Chai%20Research%20Paper%20Enquiry) or Discord.
|
85 |
+
|
86 |
+
### Acknowledgements
|
87 |
+
This project would not have been possible without the support from members of [Seamless Capital](https://www.seamless-capital.com/)
|
88 |
+
|
89 |
+
We thank the following authors from the [Machine Intelligence Laboratory](https://mi.eng.cam.ac.uk/) for their collaboration:
|
90 |
+
- [Vysas Raina](https://www.linkedin.com/in/vyas-raina-71b226152/)
|
91 |
+
- [Adian Liusie](https://www.linkedin.com/in/adian-liusie-00b60511a/)
|