Question about tokenizer.model file
#2
by
TungLam
- opened
Thank you so much for releasing the wonderful model.
I have questions about tokenizer.model. I see the file size of tokenizer.model file is larger than the one of Mixtral-8x7B-v0.1. Did you custom the tokenizer?
Btw, is it possible to read what is exactly in the tokenizer.model file. I want to read its content (code or something else).
Thank you so much
Yes, we have added CJK tokens into the original file.
You may want to examine it with sentencepiece
library.
Thank you so much for your answer.
TungLam
changed discussion status to
closed