Post
2108
Today, April 26, is the Day of the Tatar Language! 🌟
To celebrate, we release our new language model, Tweety Tatar 🐣
https://huggingface.co/Tweeties/tweety-tatar-base-7b-2024-v1
The model was converted from Mistral Instruct v0.2 using a novel technique called trans-tokenization. As a result, the model uses a brand-new tokenizer, fully tailored for the Tatar language.
We also release a model which can be finetuned for translation of English or Russian into Tatar, and achieves a performance similar to commercial offerings:
https://huggingface.co/Tweeties/tweety-tatar-hydra-base-7b-2024-v1
More details in our upcoming paper 👀
François REMY, Pieter Delobelle, Alfiya Khabibullina
Татар теле көне белән!
To celebrate, we release our new language model, Tweety Tatar 🐣
https://huggingface.co/Tweeties/tweety-tatar-base-7b-2024-v1
The model was converted from Mistral Instruct v0.2 using a novel technique called trans-tokenization. As a result, the model uses a brand-new tokenizer, fully tailored for the Tatar language.
We also release a model which can be finetuned for translation of English or Russian into Tatar, and achieves a performance similar to commercial offerings:
https://huggingface.co/Tweeties/tweety-tatar-hydra-base-7b-2024-v1
More details in our upcoming paper 👀
François REMY, Pieter Delobelle, Alfiya Khabibullina
Татар теле көне белән!