BiniyamAjaw
/

amharic_tokenizer

Model card Files Files and versions Community

Amharic Tokenizer

Model Details

Vocabulary Size: 100,000
Tokenizer Type: Byte-Pair Encoder

Model Description

Developed by: Biniyam Ajaw
Language(s) (NLP): Amharic and Amharic-Driven Languages
License: MIT

Model Sources [optional]

Repository: https://github.com/biniyam69/Amharic-LLM-Finetuning/

Uses

Model can be called by the autotokenizer module from the transformers package and can be used to tokenize any amharic text perfectly

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference API

Unable to determine this model's library. Check the docs .

Dataset used to train BiniyamAjaw/amharic_tokenizer