guillermoruiz commited on
Commit
81dd73b
1 Parent(s): ac6585c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -3
README.md CHANGED
@@ -18,7 +18,11 @@ tokenizer:
18
  ---
19
  # BILMA (Bert In Latin aMericA)
20
 
21
- Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the https://sadit.github.io/regional-spanish-models-talk-2022/ datasets.
 
 
 
 
22
 
23
  The accuracy of the models trained on the MLM task for different regions are:
24
 
@@ -40,9 +44,10 @@ Install the following version for the transformers library
40
  Instanciate the tokenizer and the trained model
41
  ```
42
  from transformers import AutoTokenizer
43
- tok = AutoTokenizer.from_pretrained("guillermoruiz/bilma_mx")
44
  from transformers import TFAutoModel
45
- model = TFAutoModel.from_pretrained("guillermoruiz/bilma_mx", trust_remote_code=True, include_top=False)
 
 
46
  ```
47
 
48
  Now,we will need some text and then pass it through the tokenizer:
@@ -67,3 +72,15 @@ which produces the output:
67
  ```
68
  ['vamos a comer tacos.', 'hace mucho que no voy al gym.']
69
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
19
  # BILMA (Bert In Latin aMericA)
20
 
21
+ Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the
22
+ https://sadit.github.io/regional-spanish-models-talk-2022/ datasets. It is a model trained on regionalized
23
+ Spanish short texts from the Twitter (now X) platform.
24
+
25
+ We have pretrained models for the countries of Argentina, Chile, Colombia, Spain, Mexico, United States, Uruguay, and Venezuela.
26
 
27
  The accuracy of the models trained on the MLM task for different regions are:
28
 
 
44
  Instanciate the tokenizer and the trained model
45
  ```
46
  from transformers import AutoTokenizer
 
47
  from transformers import TFAutoModel
48
+
49
+ tok = AutoTokenizer.from_pretrained("guillermoruiz/bilma_mx")
50
+ model = TFAutoModel.from_pretrained("guillermoruiz/bilma_mx", trust_remote_code=True)
51
  ```
52
 
53
  Now,we will need some text and then pass it through the tokenizer:
 
72
  ```
73
  ['vamos a comer tacos.', 'hace mucho que no voy al gym.']
74
  ```
75
+
76
+ If you find this model useful for your research, please cite the following paper:
77
+ ```
78
+ @misc{tellez2022regionalized,
79
+ title={Regionalized models for Spanish language variations based on Twitter},
80
+ author={Eric S. Tellez and Daniela Moctezuma and Sabino Miranda and Mario Graff and Guillermo Ruiz},
81
+ year={2022},
82
+ eprint={2110.06128},
83
+ archivePrefix={arXiv},
84
+ primaryClass={cs.CL}
85
+ }
86
+ ```