--- license: mit language: - es metrics: - accuracy pipeline_tag: fill-mask widget: - text: Vamos a comer unos [MASK] example_title: "Vamos a comer unos tacos" tags: - code - nlp - custom - bilma tokenizer: - yes --- # BILMA (Bert In Latin aMericA) Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the https://sadit.github.io/regional-spanish-models-talk-2022/ datasets. The accuracy of the models trained on the MLM task for different regions are: ![bilma-mlm-comp](https://user-images.githubusercontent.com/392873/163045798-89bd45c5-b654-4f16-b3e2-5cf404e12ddd.png) # Pre-requisites You will need TensorFlow 2.4 or newer. # Quick guide You can see the demo notebooks for a quick guide on how to use the models. Clone this repository and then run ``` bash download-emoji15-bilma.sh ``` to download the MX model. Then to load the model you can use the code: ``` from bilma import bilma_model vocab_file = "vocab_file_All.txt" model_file = "bilma_small_MX_epoch-1_classification_epochs-13.h5" model = bilma_model.load(model_file) tokenizer = bilma_model.tokenizer(vocab_file=vocab_file, max_length=280) ``` Now you will need some text: ``` texts = ["Tenemos tres dias sin internet ni senal de celular en el pueblo.", "Incomunicados en el siglo XXI tampoco hay servicio de telefonia fija", "Vamos a comer unos tacos", "Los del banco no dejan de llamarme"] toks = tokenizer.tokenize(texts) ``` With this, you are ready to use the model ``` p = model.predict(toks) tokenizer.decode_emo(p[1]) ``` which produces the output: ![emoji-output](https://user-images.githubusercontent.com/392873/165176270-77dd32ca-377e-4d29-ab4a-bc5f75913241.jpg) each emoji correspond to each entry in `texts`.