Tucano is a series of decoder-transformers based on the Llama 2 architecture, natively pre-trained in Portuguese.
Tucano
university
AI & ML interests
Advancing Neural Text Generation for Portuguese
Organization Card
Tucano: Advancing Neural Text Generation for Portuguese
To stimulate the future of open development of neural text generation in Portuguese, we present both GigaVerbo, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens, and Tucano, a series of decoder-transformers natively pre-trained in Portuguese. All byproducts of our study, including the source code used for training and evaluation, are openly released on GitHub and Hugging Face.
Read our preprint in arXiv.
Collections
1
models
10
TucanoBR/Tucano-160m
Text Generation
•
Updated
•
37
TucanoBR/Tucano-630m
Text Generation
•
Updated
•
23
TucanoBR/Tucano-1b1
Text Generation
•
Updated
•
402
TucanoBR/Tucano-2b4
Text Generation
•
Updated
•
72
TucanoBR/Tucano-1b1-Instruct
Text Generation
•
Updated
•
398
•
1
TucanoBR/Tucano-2b4-Instruct
Text Generation
•
Updated
•
281
•
1
TucanoBR/XGBRegressor-text-filter
Updated
TucanoBR/BERTimbau-large-text-filter
Text Classification
•
Updated
•
4
TucanoBR/XGBClassifier-text-filter
Updated
TucanoBR/BERTimbau-base-text-filter
Text Classification
•
Updated
•
9
datasets
6
TucanoBR/GigaVerbo
Viewer
•
Updated
•
145M
•
551
•
2
TucanoBR/GigaVerbo-Text-Filter
Viewer
•
Updated
•
110k
•
54
TucanoBR/Tucano-SFT
Viewer
•
Updated
•
680k
•
75
TucanoBR/alpaca-eval-pt
Viewer
•
Updated
•
805
•
48
TucanoBR/lambada-pt
Viewer
•
Updated
•
5.15k
•
23
•
2
TucanoBR/wikipedia-PT
Viewer
•
Updated
•
1.1M
•
23