Model Card for Model ID
Tokenizer for LLMs with added Hindi Vocabulary. Trained from the Tiny Llama Tokenizer. Updated vocab size: 35K
Model Details
Model Description
- Developed by: Atharva Nighot, Shreyas Joshi, Mahek Bagde, and team
- Model type: Tokenizer
- Language(s) (NLP): Primarily Hindi and English
- Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Training Data
Trained on harshitkaran/Hindi