Token Configuration Correction

#2
Cognitive Computations org
edited Jun 17

Good afternoon!

When deploying the this model locally, I discovered what I believe to be a bug in the configuration of tokenizer_config.json and config.json.

tokenizer_config.json: The "bos_token" is set to null when I believe it should be set to "<|im_start|>".

config.json: The "bos_token_id" is not set, but I believe it should be set to 151643 for the token "<|im_start|>".

I've included the proposed changes in this issue. This issue also affects cognitivecomputations/dolphin-2.9.2-qwen2-7b. I'll submit another PR there if we are in agreement here.

Thank you for you contributions; I love these finetunes.

Warm regards,
Ben

bigstorm changed pull request status to open
Cognitive Computations org

cognitive contributions. Thanks for the PR - I added special tokens when training the model when I didn't need to. Old habits die hard.

Crystalcareai changed pull request status to merged

Sign up or log in to comment