Tokenizer?
#20
by
LoadingALIAS
- opened
I'm wondering what tokenizer was used for StarCoder-2 15b? I know v1 used the GPT2 (50k) tokenizer. I'm hoping this uses the GPT3-4 (100k) tokenizer. Can anyone answer this for me?
Thank you!
@LoadingALIAS You can find the tokenizer at https://huggingface.co/bigcode/starcoder2-15b/blob/main/tokenizer_config.json. It does not use the GPT4 tokenizer.