Update README.md
Browse files
README.md
CHANGED
@@ -24,13 +24,16 @@ It achieves the following results on the evaluation set:
|
|
24 |
|
25 |
See [config](https://huggingface.co/pszemraj/mega-small-2048-C1024-tk_id-simplewiki-MR50/blob/main/config.json) for architecture details. While not a ready 'pretrained' model, this was trained from scratch.
|
26 |
|
|
|
|
|
27 |
## Intended uses & limitations
|
28 |
|
29 |
More information needed
|
30 |
|
31 |
## Training and evaluation data
|
32 |
|
33 |
-
|
|
|
34 |
## Training procedure
|
35 |
|
36 |
### Training hyperparameters
|
@@ -47,6 +50,12 @@ The following hyperparameters were used during training:
|
|
47 |
- lr_scheduler_warmup_ratio: 0.05
|
48 |
- num_epochs: 3.0
|
49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
### Training results
|
51 |
|
52 |
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|
|
|
24 |
|
25 |
See [config](https://huggingface.co/pszemraj/mega-small-2048-C1024-tk_id-simplewiki-MR50/blob/main/config.json) for architecture details. While not a ready 'pretrained' model, this was trained from scratch.
|
26 |
|
27 |
+
This model uses the tokenizer from `roberta-base`.
|
28 |
+
|
29 |
## Intended uses & limitations
|
30 |
|
31 |
More information needed
|
32 |
|
33 |
## Training and evaluation data
|
34 |
|
35 |
+
> **Note:** this was trained in `bf16`. the [official recommendation](https://github.com/facebookresearch/mega#tips) is fp32 - still exploring this.
|
36 |
+
|
37 |
## Training procedure
|
38 |
|
39 |
### Training hyperparameters
|
|
|
50 |
- lr_scheduler_warmup_ratio: 0.05
|
51 |
- num_epochs: 3.0
|
52 |
|
53 |
+
Additionally:
|
54 |
+
|
55 |
+
- mask rate of 50%
|
56 |
+
- whole-word masking
|
57 |
+
|
58 |
+
|
59 |
### Training results
|
60 |
|
61 |
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|