Link to code repository
#3
by
ewre324
- opened
Hello, I was wondering if the authors would be open sourcing the code for training from scratch and the training dataset?
Hi! Unfortunately we don’t have a repo for this but the pretraining code was quite literally a slightly modified version of https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/README.md
As for datasets, it’s the ones listed + others on our page for the ones I’m able to share at the moment