The model produces nonsense

#4
by Pkoosha - opened

I have the model on 4 A100-80GB. upon asking simple question or giving it a sentence to complete it generates non sense and in langchain it indicates that the context window is 4096. I have tried queries with with less and more than the 4096 context window but still get nonsense.

Example output:
"" " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "" " " " " " " " " " " " " " " " " " "" " " " " " " "" " " " " " " " " "" " " " " "" " " "" " " "" " " " "" " " " "" " " "" " " "" " " " "" "" "" " " "" " "" "" " "" " "" "" " "" "" "" " "" "" "" " "" "" "" " "" " "" "" "" "" "" " "" "" "" " " " "" "" "" " "" " "" " " "" " " " " " "" "" " " "" " " "" "" " "" "" "" " "" "" " "" " "" " "" " " "" "" "" " " " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" """ "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" """ "" "" "" "" "" """ "" """ "" "" "" "" """" "" """" "" """""" """"""" """" """""" """" """"" """"" """""" """"" """" """" """"""" """"""" """"" """"""" """"" """"" "" """" """" """"" """""" "" """"" """"""" """" """"""" """"""" """""""" """""""" """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"""a"a"""a"a""""a""""""a"""""a""a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a"a""a"a"a"a"a""a"a"a""a"a"a"a"a"a""a"a""""a""""""""""a"a"""""a"a""""a""""a"a"""""""""""""""""a"a""a"a""a"a""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""a"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""a""""""""""""""""""""""""""

Hi,

I think the reason is that this mode has been supervised fine-tuned in our prompt format.
It would be better if you ask question following the prompt format during the supervised fine-tuning, like
"Below is a material. Memorize the material and answer my question after the material. \n {material} \n Now the material ends. {question}"
https://github.com/dvlab-research/LongLoRA/blob/5056749a37833c1303129ddff3fde6ee26dfe86f/demo.py#L161

Regards,
Yukang Chen

Thanks @Yukang .

LongLoRA appears to be a great piece of work. My suspicion, however, is that this prompt format will result in low usage because it is uncommon.

I would recommend using either:
A. The Meta Llama prompt format, with [INST] AND [/INST]
B. Guanaco style (i.e. ### Assistant, ### Human)
C. chatml format (I find this difficult to use)

Separately, just a nit pick, but it may be important. Better English for the prompt would be:

"Below you will find material to study. Please read and memorize this material, as you will be asked a question concerning it afterward.
{material}
The material concludes here. {question}"

In English, it's unusual to say "a material".

I've just tested out this model and am not getting correct output.

Reproduction:

prompt = "Below is a material. Memorize the material and answer my question after the material.\n The wheels on the bus are orange. \n Now the material ends. What colour are the wheels on the bus?"

tokens = tokenizer(
    prompt,
    return_tensors='pt'
).input_ids.cuda()

# Generate output
generation_output = model.generate(
    tokens,
    do_sample=False,
    max_new_tokens=512
)

print(tokenizer.decode(generation_output[0], skip_special_tokens=True).strip())

The output is just blank space (removed by the .strip() function):

Below is a material. Memorize the material and answer my question after the material.
 The wheels on the bus are orange. 
 Now the material ends. What colour are the wheels on the bus?

Hi,

Thanks for your suggestions on prompt format. We are trying to improve the set model these days.

I think the reason is that the material is too short. The model has been tuned to fit long inputs. Please refer to this.

https://huggingface.co/Yukang/Llama-2-13b-chat-longlora-32k-sft/discussions/3#6519f285b239dc7a340b17c3

Regards,
Yukang Chen

Thanks @Yukang . I should have mentioned that I tested with 5,000 token context. How much context do you think is needed to work? In what range will this model work?

Hi,

Many thanks for this discussion. We are preparing to release more stronger models and the dataset next week. There will be more data, question types and more general format compared with the original mentioned one. The model would be better than "Llama-2-13b-chat-longlora-32k-sft". Thanks for your patience.

I will refer to the new model here next week. We can discuss then.

Regards,
Yukang Chen

Hi,

We have release our data for long instruction following, LongAlpaca-12k, and the update models, LongAlpaca-7B/13B/70B. They are available in the following links. These models should be much better than the original SFT models.

https://huggingface.co/datasets/Yukang/LongAlpaca-12k
https://huggingface.co/Yukang/LongAlpaca-7B
https://huggingface.co/Yukang/LongAlpaca-13B
https://huggingface.co/Yukang/LongAlpaca-70B-lora

Regards,
Yukang Chen

Great, could you clarify in the docs and repos what the context length is for these models. I see the alpaca set is 12k but does it also have your longer data.

Also worth highlighting that these will be licensed for non-commercial use due to Alpaca.

Last thought, if you fine tune any further models, worth considering the llama 2 prompt ([INST] etc.) as having compatibility there makes your models more plug and play for new users.

Thanks for your remind.

12k means there are 12k QA pairs, not the length. We will clarify this point.

For Licenese, we follow Alpaca for the non-commercial license and state these in the last section in the README.md. We will highlight this.

Thanks for your suggestion on the prompt. We are trying this.

Sign up or log in to comment