Fix spelling error
#26
by
Techpro864
- opened
README.md
CHANGED
@@ -129,7 +129,7 @@ We then apply the cross entropy loss by comparing with true pairs.
|
|
129 |
|
130 |
#### Hyper parameters
|
131 |
|
132 |
-
We trained
|
133 |
We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with
|
134 |
a 2e-5 learning rate. The full training script is accessible in this current repository: `train_script.py`.
|
135 |
|
|
|
129 |
|
130 |
#### Hyper parameters
|
131 |
|
132 |
+
We trained our model on a TPU v3-8. We train the model during 100k steps using a batch size of 1024 (128 per TPU core).
|
133 |
We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with
|
134 |
a 2e-5 learning rate. The full training script is accessible in this current repository: `train_script.py`.
|
135 |
|