--- title: README emoji: 🦜 colorFrom: gray colorTo: yellow sdk: static pinned: true license: apache-2.0 short_description: Description of the Mula project. thumbnail: >- https://cdn-uploads.huggingface.co/production/uploads/62e1cc43926f4892a4ca2ff9/_1WxGqMpLN0RuX02Dq9Df.png ---
# Tucano: Advancing Neural Text Generation for Portuguese

An illustration of a Tucano bird showing vibrant colors like yellow, orange, blue, green, and black.

To stimulate the future of open development of neural text generation in Portuguese, we present both **[GigaVerbo](https://huggingface.co/datasets/TucanoBR/GigaVerbo)**, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens, and **[Tucano](https://huggingface.co/TucanoBR/Tucano-2b4)**, a series of decoder-transformers natively pre-trained in Portuguese. All byproducts of our study, including the source code used for training and evaluation, are openly released on [GitHub](https://github.com/Nkluge-correa/Tucano) and Hugging Face. Read our preprint in [arXiv](https://arxiv.org/abs/2411.07854).