Gradient is nan when Finetuning Pytorch Model
I encountered a problem while fine-tuning the Longformer Large PyTorch model, as I received a NaN gradient error during the training process. However, when I tried to perform the same fine-tuning process using the Longformer Base model, everything worked fine without any issues.
I am unsure what could be causing this problem with the Longformer Large model specifically. I have made sure that my data is free of NaN values, and I have checked that I am using the correct version of the Longformer tokenizer.
If anyone has any suggestions or has encountered a similar issue while fine-tuning the Longformer Large model, please let me know. I would be grateful for any assistance or insights.
Thank you in advance for your help.
It can mean your learning rate is too large for this specific model. Try dropping it by a factor of 10 or more.
@willieseun did you resolve the problem?
Nope
@willieseun Thanks for the response! For me, I kept training for longer (even without reducing lr) and the gradients eventually became smaller because sometimes they were non-NAN. Eventually, the problem resolved itself with time
Alright. Thanks for the update
I had a similar issue and bf16 helped as opposed to fp16.