--- license: cc-by-4.0 --- # Automatic Translation Alignment of Ancient Greek Texts GRC-ALIGNMENT model is an XLM-RoBERTa-based model, fine-tuned for automatic multilingual text alignment at the word level. The model is trained on 12 million monolingual ancient Greek tokens with Masked Language Model (MLM) training objective. Further, the model is fine-tuned on 45k parallel sentences, mainly in ancient Greek-English, Greek-Latin, and Greek-Georgian. ### Multilingual Training Dataset | Languages |Sentences | Source | |:---------------------------------------|:-----------:|:--------------------------------------------------------------------------------| | GRC-ENG | 32.500 | Perseus Digital Library (Iliad, Odyssey, Xenophon, New Testament) | | GRC-LAT | 8.200 | [Digital Fragmenta Historicorum Graecorum project](https://www.dfhg-project.org/) | | GRC-KAT
GRC-ENG
GRC-LAT
GRC-ITA
GRC-POR | 4.000 | [UGARIT Translation Alignment Editor](https://ugarit.ialigner.com/ ) | ### Model Performance | Languages | Alignment Error Rate | |:---------:|:--------------------:| | GRC-ENG | 19.73% (IterMax) | | GRC-LAT | 23.91% (IterMax) | | GRC-POR | 10.60% (ArgMax) | The gold standard datasets are available on [Github](https://github.com/UgaritAlignment/Alignment-Gold-Standards). If you use this model, please cite our paper:
@misc{yousef_palladino_wright_berti_2022,
 title={Automatic Translation Alignment for Ancient Greek and Latin},
 url={osf.io/8epsy},
 DOI={10.31219/osf.io/8epsy},
 publisher={OSF Preprints},
 author={Yousef, Tariq and Palladino, Chiara and Wright, David J and Berti, Monica},
 year={2022},
 month={Apr}
}