|
--- |
|
model-index: |
|
- name: medieval-it5-base |
|
results: [] |
|
language: |
|
- it |
|
--- |
|
|
|
# medieval-it5-base |
|
|
|
This model is a version of [gsarti/it5-base](https://huggingface.co/gsarti/it5-base) fine-tuned on a dataset called [ita2medieval](https://huggingface.co/datasets/leobertolazzi/ita2medieval). The Dataset contains sentences from medieval italian along with paraphrases in contemporary italian (approximately 6.5k pairs in total). |
|
|
|
The fine-tuning task is text-style-tansfer from contemporary to medieval italian. |
|
|
|
|
|
## Using the model |
|
|
|
``` |
|
from transformers import AutoTokenzier, AutoModelForSeq2SeqLM |
|
tokenizer = AutoTokenizer.from_pretrained("leobertolazzi/medieval-it5-base") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("leobertolazzi/medieval-it5-base") |
|
``` |
|
|
|
Flax and Tensorflow versions of the model are also available: |
|
``` |
|
from transformers import FlaxT5ForConditionalGeneration, TFT5ForConditionalGeneration |
|
model_flax = FlaxT5ForConditionalGeneration.from_pretrained("leobertolazzi/medieval-it5-base") |
|
model_tf = TFT5ForConditionalGeneration.from_pretrained("leobertolazzi/medieval-it5-base") |
|
``` |
|
|
|
## Training procedure |
|
|
|
The code used for the fine-tuning is available in this [repo](https://github.com/leobertolazzi/medievalIT5) |
|
|
|
## Intended uses & limitations |
|
|
|
The biggest limitation for this project is the size of the ita2medieval dataset. In fact, it consists only of 6.5K sentence pairs whereas [gsarti/it5-base](https://huggingface.co/gsarti/it5-base) has 220M parameters. |
|
|
|
For this reason the results can be far from perfect, but some nice style translations can also be obtained. |
|
|
|
It would be nice to expand ita2medieval with text and paraphrases from more medieval italian authors! |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.26.0 |
|
- Tokenizers 0.13.2 |
|
|