guillermoruiz
commited on
Commit
•
81dd73b
1
Parent(s):
ac6585c
Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,11 @@ tokenizer:
|
|
18 |
---
|
19 |
# BILMA (Bert In Latin aMericA)
|
20 |
|
21 |
-
Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the
|
|
|
|
|
|
|
|
|
22 |
|
23 |
The accuracy of the models trained on the MLM task for different regions are:
|
24 |
|
@@ -40,9 +44,10 @@ Install the following version for the transformers library
|
|
40 |
Instanciate the tokenizer and the trained model
|
41 |
```
|
42 |
from transformers import AutoTokenizer
|
43 |
-
tok = AutoTokenizer.from_pretrained("guillermoruiz/bilma_mx")
|
44 |
from transformers import TFAutoModel
|
45 |
-
|
|
|
|
|
46 |
```
|
47 |
|
48 |
Now,we will need some text and then pass it through the tokenizer:
|
@@ -67,3 +72,15 @@ which produces the output:
|
|
67 |
```
|
68 |
['vamos a comer tacos.', 'hace mucho que no voy al gym.']
|
69 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
---
|
19 |
# BILMA (Bert In Latin aMericA)
|
20 |
|
21 |
+
Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the
|
22 |
+
https://sadit.github.io/regional-spanish-models-talk-2022/ datasets. It is a model trained on regionalized
|
23 |
+
Spanish short texts from the Twitter (now X) platform.
|
24 |
+
|
25 |
+
We have pretrained models for the countries of Argentina, Chile, Colombia, Spain, Mexico, United States, Uruguay, and Venezuela.
|
26 |
|
27 |
The accuracy of the models trained on the MLM task for different regions are:
|
28 |
|
|
|
44 |
Instanciate the tokenizer and the trained model
|
45 |
```
|
46 |
from transformers import AutoTokenizer
|
|
|
47 |
from transformers import TFAutoModel
|
48 |
+
|
49 |
+
tok = AutoTokenizer.from_pretrained("guillermoruiz/bilma_mx")
|
50 |
+
model = TFAutoModel.from_pretrained("guillermoruiz/bilma_mx", trust_remote_code=True)
|
51 |
```
|
52 |
|
53 |
Now,we will need some text and then pass it through the tokenizer:
|
|
|
72 |
```
|
73 |
['vamos a comer tacos.', 'hace mucho que no voy al gym.']
|
74 |
```
|
75 |
+
|
76 |
+
If you find this model useful for your research, please cite the following paper:
|
77 |
+
```
|
78 |
+
@misc{tellez2022regionalized,
|
79 |
+
title={Regionalized models for Spanish language variations based on Twitter},
|
80 |
+
author={Eric S. Tellez and Daniela Moctezuma and Sabino Miranda and Mario Graff and Guillermo Ruiz},
|
81 |
+
year={2022},
|
82 |
+
eprint={2110.06128},
|
83 |
+
archivePrefix={arXiv},
|
84 |
+
primaryClass={cs.CL}
|
85 |
+
}
|
86 |
+
```
|