# Logion: Machine Learning for Greek Philology ## (for the most recent model, see: https://huggingface.co/cabrooks/LOGION-50k_wordpiece) Read the paper on [arxiv](https://arxiv.org/abs/2305.01099) by Charlie Cowen-Breen, Creston Brooks, Johannes Haubold, and Barbara Graziosi. Originally based on the pre-trained weights and tokenizer made available by Pranaydeep Singh's [Ancient Greek BERT](https://huggingface.co/pranaydeeps/Ancient-Greek-BERT), we train on a corpus of over 70 million words of premodern Greek. Further information on this project and code for beam-searching over multiple masked tokens can be found on [GitHub](https://github.com/charliecb/Logion). We're adding more models trained with cleaner data and different tokenizations - keep an eye out! ## How to use Requirements: ```python pip install transformers ``` Load the model and tokenizer directly from the HuggingFace Model Hub: ```python from transformers import BertTokenizer, BertForMaskedLM tokenizer = BertTokenizer.from_pretrained("cabrooks/LOGION-base") model = BertForMaskedLM.from_pretrained("cabrooks/LOGION-base") ``` ## Model pre-training and tokenizer The model was initialized from Pranaydeep Singh's [Ancient Greek BERT](https://huggingface.co/pranaydeeps/Ancient-Greek-BERT), which itself used a [Modern Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1) as pre-training. Singh's Ancient Greek BERT was trained on data pulled from First1KGreek Project, Perseus Digital Library, PROIEL Treebank, and Gorman's Treebank. We train futher on over 70 million words of premodern Greek, which we are happy to make available upon request. For more information, please see footnote 2 on the [arxiv paper](https://arxiv.org/abs/2305.01099). Please also refer to this paper for details on training and evaluation. ## Cite If you use this model in your research, please cite the paper: ``` @misc{logion-base, title={Logion: Machine Learning for Greek Philology}, author={Cowen-Breen, C. and Brooks, C. and Haubold, J. and Graziosi, B.}, year={2023}, eprint={2305.01099}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```