This model uses sci-bert for initial embedding and is trained using masked language modeling (MLM). The corpus is roughly 100,000 earth science based publications.

Stay tuned for further downstream task tests and updates to the model.