--- license: cc-by-4.0 language: - nso - en pipeline_tag: text2text-generation tags: - m2m100 - translation - africanlp - african - sepedi - northern-sotho --- # [nso-en] Northen Sotho [Sepedi] to English Translation Model based on M2M100 and The South African Gov-ZA multilingual corpus Model created from Northen Sotho [Sepedi] to English aligned sentences from [The South African Gov-ZA multilingual corpus](https://github.com/dsfsi/gov-za-multilingual) The data set contains cabinet statements from the South African government, maintained by the Government Communication and Information System (GCIS). Data was scraped from the governments website: https://www.gov.za/cabinet-statements ## Authors - Vukosi Marivate - [@vukosi](https://twitter.com/vukosi) - Matimba Shingange - Richard Lastrucci - Isheanesu Joseph Dzingirai - Jenalea Rajab ## BibTeX entry and citation info ``` @article{lastrucci2023preparing, title = {Preparing the Vuk'uzenzele and ZA-gov-multilingual South African multilingual corpora}, author = {Richard Lastrucci and Isheanesu Dzingirai and Jenalea Rajab and Andani Madodonga and Matimba Shingange and Daniel Njini and Vukosi Marivate}, year = {2023}, journal = {arXiv preprint arXiv: Arxiv-2303.03750} } ``` [Paper - Preparing the Vuk'uzenzele and ZA-gov-multilingual South African multilingual corpora](https://arxiv.org/abs/2303.03750)