Link to code repository

#3
by ewre324 - opened

Hello, I was wondering if the authors would be open sourcing the code for training from scratch and the training dataset?

BEEspoke Data org

Hi! Unfortunately we don’t have a repo for this but the pretraining code was quite literally a slightly modified version of https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/README.md

As for datasets, it’s the ones listed + others on our page for the ones I’m able to share at the moment

Sign up or log in to comment