Saving weights and logs at step 280000
Browse files
README.md
CHANGED
@@ -17,22 +17,22 @@ datasets:
|
|
17 |
Dataset:
|
18 |
|
19 |
* [mC4 NL Cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned)
|
20 |
-
* dataset
|
21 |
|
22 |
Tokenizer:
|
23 |
|
24 |
-
*
|
25 |
Transformers [Flax examples](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling)
|
26 |
|
27 |
Training details:
|
28 |
|
29 |
-
* Trained for
|
30 |
* Block size: 512
|
31 |
* Optimizer: adam, lr 8e-4, beta1 0.9, beta2 0.98
|
32 |
* Warmup steps: 5000
|
33 |
* Weight decay: 0.01
|
34 |
|
35 |
-
Work in progress. Dec 2021
|
36 |
|
37 |
* Many thanks to the [Google TPU Research Cloud](https://sites.research.google/trc/about/) for providing access to a TPU cluster!
|
38 |
* Thanks to @gsarti for creating the [t5-flax-gcp
|
|
|
17 |
Dataset:
|
18 |
|
19 |
* [mC4 NL Cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned)
|
20 |
+
* dataset config: full (33B tokens)
|
21 |
|
22 |
Tokenizer:
|
23 |
|
24 |
+
* Tokenizer trained on mC4 with scripts from the Huggingface
|
25 |
Transformers [Flax examples](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling)
|
26 |
|
27 |
Training details:
|
28 |
|
29 |
+
* Trained for 280k steps (30 dec 2021)
|
30 |
* Block size: 512
|
31 |
* Optimizer: adam, lr 8e-4, beta1 0.9, beta2 0.98
|
32 |
* Warmup steps: 5000
|
33 |
* Weight decay: 0.01
|
34 |
|
35 |
+
Work in progress. Dec 2021-Jan2022
|
36 |
|
37 |
* Many thanks to the [Google TPU Research Cloud](https://sites.research.google/trc/about/) for providing access to a TPU cluster!
|
38 |
* Thanks to @gsarti for creating the [t5-flax-gcp
|
flax_model.msgpack
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1419302302
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d2bc942466bedf81fea88c9bbeaaafa7dfb2fec485a78c89c52705b841a2bf0a
|
3 |
size 1419302302
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1444576537
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f6576b3366a236813f2b96767c8eb783a6d52ed8aa71222557e56edeae404cf0
|
3 |
size 1444576537
|
runs/events.out.tfevents.1640332964.t1v-n-f9cfcc28-w-0.384322.0.v2
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c60bb6a82a55ceae1859f8fc81e83b0c19ee72e64de5ecdc95e012746328f4c6
|
3 |
+
size 43681985
|