jhu-clsp
/

kreyol-mt-scratch

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

n8rob commited on May 31

Commit

44212ca

•

1 Parent(s): d031269

Update README.md

Files changed (1) hide show

README.md +77 -5

README.md CHANGED Viewed

@@ -1,21 +1,93 @@
 ---
 license: mit
 ---
-This is a many-to-many model for Creole-English, English-Creole and Creole-Creole MT, trained from scratch on all data.
 ```
 from transformers import MBartForConditionalGeneration, AutoModelForSeq2SeqLM
 from transformers import AlbertTokenizer, AutoTokenizer
-tokenizer = AutoTokenizer.from_pretrained("n8rob/kreyol-mt-scratch", do_lower_case=False, use_fast=False, keep_accents=True)
 # The tokenizer we use is based on the AlbertTokenizer class which is essentially sentencepiece. We train this sentencepeice model from scratch.
-# Or use tokenizer = AlbertTokenizer.from_pretrained("n8rob/kreyol-mt-scratch", do_lower_case=False, use_fast=False, keep_accents=True)
-model = AutoModelForSeq2SeqLM.from_pretrained("n8rob/kreyol-mt-scratch")
-# Or use model = MBartForConditionalGeneration.from_pretrained("n8rob/kreyol-mt-scratch")
 # Some initial mapping
 bos_id = tokenizer._convert_token_to_id_with_added_voc("<s>")

 ---
 license: mit
+language:
+- acf
+- aoa
+- bah
+- bzj
+- bzk
+- cri
+- crs
+- dcr
+- djk
+- fab
+- fng
+- fpe
+- gcf
+- gcr
+- gpe
+- gul
+- gyn
+- hat
+- icr
+- jam
+- kea
+- kri
+- ktu
+- lou
+- mfe
+- mue
+- pap
+- pcm
+- pov
+- pre
+- rcf
+- sag
+- srm
+- srn
+- svc
+- tpi
+- trf
+- wes
+- ara
+- aze
+- ceb
+- deu
+- eng
+- fra
+- nep
+- por
+- spa
+- zho
+task_categories:
+- translation
 ---
+# Kreyòl-MT
+Welcome to the repository for our **from-scratch** **all-data** model.
+Please see our paper: 📄 ["Kreyòl-MT: Building Machine Translation for Latin American, Caribbean, and Colonial African Creole Languages"](https://arxiv.org/abs/2405.05376)
+And our GitHub repository: 💻 [Kreyòl-MT](https://github.com/JHU-CLSP/Kreyol-MT/tree/main)
+And cite our work:
+```
+@article{robinson2024krey,
+  title={Krey$\backslash$ol-MT: Building MT for Latin American, Caribbean and Colonial African Creole Languages},
+  author={Robinson, Nathaniel R and Dabre, Raj and Shurtz, Ammon and Dent, Rasul and Onesi, Onenamiyi and Monroc, Claire Bizon and Grobol, Lo{\"\i}c and Muhammad, Hasan and Garg, Ashi and Etori, Naome A and others},
+  journal={arXiv preprint arXiv:2405.05376},
+  year={2024}
+}
+```
+## Model hosted here
+This is a many-to-many model for translation into and out of Creole languages, trained from scratch on all data.
 ```
 from transformers import MBartForConditionalGeneration, AutoModelForSeq2SeqLM
 from transformers import AlbertTokenizer, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("jhu-clsp/kreyol-mt-scratch", do_lower_case=False, use_fast=False, keep_accents=True)
 # The tokenizer we use is based on the AlbertTokenizer class which is essentially sentencepiece. We train this sentencepeice model from scratch.
+# Or use tokenizer = AlbertTokenizer.from_pretrained("jhu-clsp/kreyol-mt-scratch", do_lower_case=False, use_fast=False, keep_accents=True)
+model = AutoModelForSeq2SeqLM.from_pretrained("jhu-clsp/kreyol-mt-scratch")
+# Or use model = MBartForConditionalGeneration.from_pretrained("jhu-clsp/kreyol-mt-scratch")
 # Some initial mapping
 bos_id = tokenizer._convert_token_to_id_with_added_voc("<s>")