pszemraj commited on
Commit
18b2a82
1 Parent(s): 64c86c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -24,13 +24,16 @@ It achieves the following results on the evaluation set:
24
 
25
  See [config](https://huggingface.co/pszemraj/mega-small-2048-C1024-tk_id-simplewiki-MR50/blob/main/config.json) for architecture details. While not a ready 'pretrained' model, this was trained from scratch.
26
 
 
 
27
  ## Intended uses & limitations
28
 
29
  More information needed
30
 
31
  ## Training and evaluation data
32
 
33
- - this was trained in `bf16`. the [official recommendation](https://github.com/facebookresearch/mega#tips) is fp32 - still exploring this.
 
34
  ## Training procedure
35
 
36
  ### Training hyperparameters
@@ -47,6 +50,12 @@ The following hyperparameters were used during training:
47
  - lr_scheduler_warmup_ratio: 0.05
48
  - num_epochs: 3.0
49
 
 
 
 
 
 
 
50
  ### Training results
51
 
52
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
 
24
 
25
  See [config](https://huggingface.co/pszemraj/mega-small-2048-C1024-tk_id-simplewiki-MR50/blob/main/config.json) for architecture details. While not a ready 'pretrained' model, this was trained from scratch.
26
 
27
+ This model uses the tokenizer from `roberta-base`.
28
+
29
  ## Intended uses & limitations
30
 
31
  More information needed
32
 
33
  ## Training and evaluation data
34
 
35
+ > **Note:** this was trained in `bf16`. the [official recommendation](https://github.com/facebookresearch/mega#tips) is fp32 - still exploring this.
36
+
37
  ## Training procedure
38
 
39
  ### Training hyperparameters
 
50
  - lr_scheduler_warmup_ratio: 0.05
51
  - num_epochs: 3.0
52
 
53
+ Additionally:
54
+
55
+ - mask rate of 50%
56
+ - whole-word masking
57
+
58
+
59
  ### Training results
60
 
61
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |