Tokenizer Chat Template

#103
by Sm1Ling - opened

Why does the model has default huggingface chat template and not llama3 special template?

In configs it is given
{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}

Instead of

{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}

Also I'm not shure about <|eot_id|>. Seems everything got mixed up
I don't really know wether this issue was fixed. (Still do not have access to original repo)

Correct. We don't know whether this issue is fixed yet or not. We need communication from meta

Use chat_template instead of default_chat_template

It seems like it is bc the Llama 3 tokenizer_config.json that they have distributed is configured with "tokenizer_class": "PreTrainedTokenizerFast", which only uses the default_chat_template.

Sign up or log in to comment