Edit model card

Spanish to Quechua translator

This model is a finetuned version of the t5-small.

Model description

t5-small-finetuned-spanish-to-quechua has trained for 46 epochs with 102 747 sentences, the validation was performed with 12 844 sentences and 12 843 sentences were used for the test.

Intended uses & limitations

A large part of the dataset has been extracted from biblical texts, which makes the model perform better with certain types of sentences.

How to use

You can import this model as follows:

>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
>>> model_name = 'hackathon-pln-es/t5-small-finetuned-spanish-to-quechua'
>>> model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)

To translate you can do:

>>> sentence = "Entonces dijo"
>>> input = tokenizer(sentence, return_tensors="pt")
>>> output = model.generate(input["input_ids"], max_length=40, num_beams=4, early_stopping=True)
>>> print('Original Sentence: {} \nTranslated sentence: {}'.format(sentence, tokenizer.decode(output[0])))

Limitations and bias

Actually this model only can translate to Quechua of Ayacucho.

Training data

For train this model we use Spanish to Quechua dataset

Evaluation results

We obtained the following metrics during the training process:

  • eval_bleu = 2.9691
  • eval_loss = 1.2064628601074219

Team members

Downloads last month
35
Safetensors
Model size
60.5M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using somosnlp-hackathon-2022/t5-small-finetuned-spanish-to-quechua 3