omarmomen
/

structroberta_sx2_final

Model card Files Files and versions Community

omarmomen commited on Mar 26

Commit

0a98f57

•

1 Parent(s): 4f9be56

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -17,4 +17,6 @@ The paper titled "Increasing The Performance of Cognitively Inspired Data-Effici
 This model variant places the parser network after 4 attention blocks and increases the number of convolution layers in the parser network from 4 to 6.
-The model is pretrained on the BabyLM 10M dataset using a custom pretrained RobertaTokenizer (https://huggingface.co/omarmomen/babylm_tokenizer_32k).

 This model variant places the parser network after 4 attention blocks and increases the number of convolution layers in the parser network from 4 to 6.
+The model is pretrained on the BabyLM 10M dataset using a custom pretrained RobertaTokenizer (https://huggingface.co/omarmomen/babylm_tokenizer_32k).
+https://arxiv.org/abs/2310.20589