Tucano - a TucanoBR Collection

TucanoBR 's Collections

Tucano

updated 15 days ago

Tucano is a series of decoder-transformers based on the Llama 2 architecture, natively pre-trained in Portuguese.

Upvote

Tucano: Advancing Neural Text Generation for Portuguese

Paper • 2411.07854 • Published 16 days ago
TucanoBR/Tucano-2b4

Text Generation • Updated 10 days ago • 72

Note 2.4 billion-parameter version of the Tucano series.
TucanoBR/Tucano-2b4-Instruct

Text Generation • Updated 10 days ago • 281 • 1

Note 2.4 billion-parameter version of the Tucano fine-tuned on the TucanoBR/Tucano-SFT dataset.
TucanoBR/Tucano-1b1

Text Generation • Updated 10 days ago • 402

Note 1.1 billion-parameter version of the Tucano series.
TucanoBR/Tucano-1b1-Instruct

Text Generation • Updated 10 days ago • 398 • 1

Note 1.1 billion-parameter version of the Tucano fine-tuned on the TucanoBR/Tucano-SFT dataset.
TucanoBR/Tucano-630m

Text Generation • Updated 10 days ago • 23

Note 630 million-parameter version of the Tucano series.
TucanoBR/Tucano-160m

Text Generation • Updated 10 days ago • 37

Note 160 million-parameter version of the Tucan series.
TucanoBR/BERTimbau-large-text-filter

Text Classification • Updated 15 days ago • 4

Note BERTimbau-large fine-tuned on the TucanoBR/GigaVerbo-Text-Filter dataset.
TucanoBR/BERTimbau-base-text-filter

Text Classification • Updated 15 days ago • 9

Note BERTimbau-base fine-tuned on the TucanoBR/GigaVerbo-Text-Filter dataset.
TucanoBR/XGBClassifier-text-filter

Updated 15 days ago

Note XGBClassifier trained on the TucanoBR/GigaVerbo-Text-Filter dataset (requires the embeddings generated by sentence-transformers/LaBSE).
TucanoBR/XGBRegressor-text-filter

Updated 15 days ago

Note XGBRegressor trained on the TucanoBR/GigaVerbo-Text-Filter dataset (requires the embeddings generated by sentence-transformers/LaBSE).
TucanoBR/GigaVerbo

Viewer • Updated 15 days ago • 145M • 551 • 2

Note GigaVerbo is an extensive dataset comprising 780 GB of Portuguese text, being a concatenated version of several datasets available in Hugging Face, containing over 200 billion tokens.
TucanoBR/GigaVerbo-Text-Filter

Viewer • Updated 15 days ago • 110k • 54

Note GigaVerbo Text-Filter is a dataset with 110,000 randomly selected samples from 9 subsets of GigaVerbo, all scored by GPT-4o.
TucanoBR/Tucano-SFT

Viewer • Updated 15 days ago • 680k • 75

Note This is the dataset used to train the "Instruct" versions of the Tucano series.
TucanoBR/lambada-pt

Viewer • Updated 21 days ago • 5.15k • 23 • 2

Note This dataset is a translated version (Portuguese) of the LAMBADA test split as pre-processed by OpenAI.
TucanoBR/alpaca-eval-pt

Viewer • Updated 17 days ago • 805 • 48

Note This dataset contains 805 translated samples (Portuguese) from the Alpaca dataset.
nicholasKluge/reward-aira-dataset

Viewer • Updated Jun 18 • 70k • 107 • 3

Note This dataset contains pairs of completions to prompts. Used for DPO fine-tuning.

Upvote