Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak
Paper
•
2409.04269
•
Published
•
9
Dilmash: Karakalpak Machine Translation
Note Describes the development and evaluation of Dilmash models for Karakalpak machine translation, including datasets and methodologies used.
Note Fine-tuned the original nllb-200-600M model on Dilmash parallel corpus
Note Fine-tuned the original nllb-200-600M model on Dilmash parallel corpus with additional tokens from a larger Karakalpak monocorpus
Note Fine-tuned the original nllb-200-600M model on Dilmash parallel corpus and additional multilingual data from the TIL corpus with additional tokens from a larger Karakalpak monocorpus
Note Dilmash parallel corpus that is used to fine-tune dilmash models.