Amharic Tokenizer

Model Details

  • Vocabulary Size: 100,000
  • Tokenizer Type: Byte-Pair Encoder

Model Description

  • Developed by: Biniyam Ajaw
  • Language(s) (NLP): Amharic and Amharic-Driven Languages
  • License: MIT

Model Sources [optional]

Uses

Model can be called by the autotokenizer module from the transformers package and can be used to tokenize any amharic text perfectly

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train BiniyamAjaw/amharic_tokenizer