The model does not follow the instruction.

#140
by mehdiorouji - opened

I am trying to fine-tune the model using instruction from my own data with LoRA. Here is the formate of my input that I fine tune the model on. In the instruction I am asking the model to answer the question using only one word 'Yes' or 'No'.
Instruction: ....
context: ....
question: ....
answer: Yes.

However the model does not follow the format or instruction. The answer it generates is none sense most of the time and when it is sensical it contains all sort of random characters with 'yes' or 'no' answer (for example it outputs: 'Answer:nbsp;Yes.'). Things i have done so far to fix the issue:

  1. trying different prompts
  2. applying LoRA to different layers (right now it is applied to the key, value, and query layers).
  3. reduce the temperature to avoid hallucination.
  4. limit the number of new token generated to small numbers.

Any advice on how to solve the issue is much appreciated.

Thank you so much!

Try using the regular model instead of the "instruct" version. The instruct version is tuned to answer prompts like a Q&A bot. For something that can auto-complete like your example use meta-llama/Llama-3.1-8B.

I think it's not the problem whether to use llama3 base model or instruct model. I recommend you to fine-tune the instruct version to be fitted to the original instruction tuning prompt format. No matter how you fine-tune the base model with any prompt format. However, it's quite important to fine-tune the instruct version with the prefixed prompt format.

Thank you so much for your insights! I agree that maintaining consistency between the format of the training prompts and test prompts is important. For the test prompts, I am using the same structure, with the only difference being that the answer field is left blank for the model to generate its prediction. I also realized, ending the instruction, context etc with a full stop helps! model has become a lot better to follow the format, but still gives some nonsensical answer sometimes.

Sign up or log in to comment