Update README.md
Browse files
README.md
CHANGED
@@ -85,20 +85,20 @@ print(tokenizer.decode(sample[0]))
|
|
85 |
|
86 |
## Training details
|
87 |
|
88 |
-
The model is trained of
|
89 |
|
90 |
| Hyperparameters | Value |
|
91 |
| :----------------------------| :-----: |
|
92 |
-
| per_device_train_batch_size |
|
93 |
| gradient_accumulation_steps | 1 |
|
94 |
| epoch | 3 |
|
95 |
-
| steps |
|
96 |
| learning_rate | 2e-5 |
|
97 |
| lr schedular type | cosine |
|
98 |
| warmup ratio | 0.1 |
|
99 |
| optimizer | adamw |
|
100 |
| fp16 | True |
|
101 |
-
| GPU |
|
102 |
|
103 |
### Important Note
|
104 |
|
|
|
85 |
|
86 |
## Training details
|
87 |
|
88 |
+
The model is trained of 8 A100 80GB for approximately 50hrs.
|
89 |
|
90 |
| Hyperparameters | Value |
|
91 |
| :----------------------------| :-----: |
|
92 |
+
| per_device_train_batch_size | 8 |
|
93 |
| gradient_accumulation_steps | 1 |
|
94 |
| epoch | 3 |
|
95 |
+
| steps | 8628 |
|
96 |
| learning_rate | 2e-5 |
|
97 |
| lr schedular type | cosine |
|
98 |
| warmup ratio | 0.1 |
|
99 |
| optimizer | adamw |
|
100 |
| fp16 | True |
|
101 |
+
| GPU | 8 A100 80GB |
|
102 |
|
103 |
### Important Note
|
104 |
|