guillermoruiz
/

bilma_MX

Inference Endpoints

Model card Files Files and versions Community

guillermoruiz commited on Apr 1

Commit

81dd73b

•

1 Parent(s): ac6585c

Update README.md

Files changed (1) hide show

README.md +20 -3

README.md CHANGED Viewed

@@ -18,7 +18,11 @@ tokenizer:
 ---
 # BILMA (Bert In Latin aMericA)
-Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the https://sadit.github.io/regional-spanish-models-talk-2022/ datasets.
 The accuracy of the models trained on the MLM task for different regions are:
@@ -40,9 +44,10 @@ Install the following version for the transformers library
 Instanciate the tokenizer and the trained model
 ```
 from transformers import AutoTokenizer
-tok = AutoTokenizer.from_pretrained("guillermoruiz/bilma_mx")
 from transformers import TFAutoModel
-model = TFAutoModel.from_pretrained("guillermoruiz/bilma_mx", trust_remote_code=True, include_top=False)
 ```
 Now,we will need some text and then pass it through the tokenizer:
@@ -67,3 +72,15 @@ which produces the output:
 ```
 ['vamos a comer tacos.', 'hace mucho que no voy al gym.']
 ```

 ---
 # BILMA (Bert In Latin aMericA)
+Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the
+https://sadit.github.io/regional-spanish-models-talk-2022/ datasets. It is a model trained on regionalized
+Spanish short texts from the Twitter (now X) platform.
+We have pretrained models for the countries of Argentina, Chile, Colombia, Spain, Mexico, United States, Uruguay, and Venezuela.
 The accuracy of the models trained on the MLM task for different regions are:
 Instanciate the tokenizer and the trained model
 ```
 from transformers import AutoTokenizer
 from transformers import TFAutoModel
+tok = AutoTokenizer.from_pretrained("guillermoruiz/bilma_mx")
+model = TFAutoModel.from_pretrained("guillermoruiz/bilma_mx", trust_remote_code=True)
 ```
 Now,we will need some text and then pass it through the tokenizer:
 ```
 ['vamos a comer tacos.', 'hace mucho que no voy al gym.']
 ```
+If you find this model useful for your research, please cite the following paper:
+```
+@misc{tellez2022regionalized,
+      title={Regionalized models for Spanish language variations based on Twitter},
+      author={Eric S. Tellez and Daniela Moctezuma and Sabino Miranda and Mario Graff and Guillermo Ruiz},
+      year={2022},
+      eprint={2110.06128},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```