Update README.md
Browse files
README.md
CHANGED
@@ -155,3 +155,30 @@ output = generate(
|
|
155 |
|
156 |
print(output[0]["generated_text"])
|
157 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
155 |
|
156 |
print(output[0]["generated_text"])
|
157 |
```
|
158 |
+
|
159 |
+
## How it was trained
|
160 |
+
|
161 |
+
This model was trained with [SFT Trainer](https://huggingface.co/docs/trl/main/en/sft_trainer) and [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer), in several sessions, using the following settings:
|
162 |
+
|
163 |
+
For Supervised Fine-Tuning:
|
164 |
+
|
165 |
+
| Hyperparameter | Value |
|
166 |
+
| :-------------------------- | :-------------------------------------------- |
|
167 |
+
| learning_rate | 2e-5 |
|
168 |
+
| total_train_batch_size | 24 |
|
169 |
+
| max_seq_length | 2048 |
|
170 |
+
| weight_decay | 0 |
|
171 |
+
| warmup_ratio | 0.02 |
|
172 |
+
|
173 |
+
For Direct Preference Optimization:
|
174 |
+
|
175 |
+
| Hyperparameter | Value |
|
176 |
+
| :-------------------------- | :-------------------------------------------- |
|
177 |
+
| learning_rate | 7.5e-7 |
|
178 |
+
| total_train_batch_size | 6 |
|
179 |
+
| max_length | 2048 |
|
180 |
+
| max_prompt_length | 1536 |
|
181 |
+
| max_steps | 200 |
|
182 |
+
| weight_decay | 0 |
|
183 |
+
| warmup_ratio | 0.02 |
|
184 |
+
| beta | 0.1 |
|