pszemraj
/

mega-small-2048-C1024-tk_id-simplewiki-MR50

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Sep 9, 2023

Commit

18b2a82

•

1 Parent(s): 64c86c0

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -24,13 +24,16 @@ It achieves the following results on the evaluation set:
 See [config](https://huggingface.co/pszemraj/mega-small-2048-C1024-tk_id-simplewiki-MR50/blob/main/config.json) for architecture details. While not a ready 'pretrained' model, this was trained from scratch.
 ## Intended uses & limitations
 More information needed
 ## Training and evaluation data
-- this was trained in `bf16`. the [official recommendation](https://github.com/facebookresearch/mega#tips) is fp32 - still exploring this.
 ## Training procedure
 ### Training hyperparameters
@@ -47,6 +50,12 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_ratio: 0.05
 - num_epochs: 3.0
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Accuracy |

 See [config](https://huggingface.co/pszemraj/mega-small-2048-C1024-tk_id-simplewiki-MR50/blob/main/config.json) for architecture details. While not a ready 'pretrained' model, this was trained from scratch.
+This model uses the tokenizer from `roberta-base`.
 ## Intended uses & limitations
 More information needed
 ## Training and evaluation data
+> **Note:** this was trained in `bf16`. the [official recommendation](https://github.com/facebookresearch/mega#tips) is fp32 - still exploring this.
 ## Training procedure
 ### Training hyperparameters
 - lr_scheduler_warmup_ratio: 0.05
 - num_epochs: 3.0
+Additionally:
+- mask rate of 50%
+- whole-word masking
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Accuracy |