Endless generation
First thank you for creating this lorablated.
I'm using TabbyAPI with a 6bit and 8bit head weight on this 70B. Using either the tokenizer_config.json chat_template or a custom generated one, I'm seeing endless generation with assistant
periodically being generated between what looks like complete LLM responses.
Adding the stop word assistant
without any spaces seems to be stopping the endless generation. Is this the expected fix or do you have any thoughts on how or why this may be happening?
I've added a generation_config.json
, that might fix your problem if TabbyAPI relies on it. If you still see this issue, could you try the non-lorablated L3.1 70B and tell me if it works for you?
The generation_config.json
seems to have fixed the issue. Thank you for the assistance.
The original L3.1 70B works without issue.
Excellent, thanks for your feedback!