ISTA-DASLab/Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16 · Repetitive generation without additional EOS token

Jun 24

Hi! The generation_config supplied will generate indefinitely in a chat setting and repeat itself, because '<|end_of_text|>' is rarely generated. It should work better if <|eot_id|> is added, which is generated at the end of every chat response.

Here's the config that meta supplies to cover both cases:

{
  "bos_token_id": 128000,
  "eos_token_id": [128001, 128009],
  "do_sample": true,
  "temperature": 0.6,
  "max_length": 4096,
  "top_p": 0.9,
  "transformers_version": "4.40.0.dev0"
}

https://huggingface.co/ISTA-DASLab/Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16/blob/main/generation_config.json

BlackSamorez

IST Austria Distributed Algorithms and Systems Lab org Jun 24

Isn't <|eot_id|> set as eos_token already? Look here.

amrothemich

Jun 25

My understanding (guess work, haven't looked at the code/documentation) is that the generation config separately specifies the eos token so it knows when to stop generation. And in the generation_config for this model, it's specified as 128001, which is never really generated. Tokenizer has the real EOS token so it knows what to append to a tokenized sequence, but generation needs to have the more "intermediate" stop token to indicate the end of a particular response (but not necessarily the end of the whole conversation).