Disabled autocast

#109

by miguelcarv - opened Feb 13

Feb 13

In line 306 and 307 of modeling_phi.py autocast is disabled due to overflow issues when using fp16. Bfloat16 does not have these issues so it should not be disabled in this case, right?

Also, wasn't Phi-2 trained in mixed precision fp16? Why wasn't this an issue when training but it seems to be in inference?

xueyanz

Microsoft org Feb 25

Same question, can we simply enable autocast?

miguelcarv

Feb 25

@xueyanz actually I do think that is needed, even if using bfloat16. For some reason autocasting the forward of the attention module leads to instability issues in training. I will keep disabling autocast

xueyanz

Microsoft org Feb 25

Thanks so much for your prompt reply, I am trying to train phi2 in a VLM model using auto-cast. To disable autocast, do you manually transfer to fp16?

miguelcarv

Feb 25

I load it in fp32 and use torch.amp with bfloat16. The last version of modeling_phi.py already disables autocast by itself in the forward method of the attention module. I'm actually also building a VLM using phi-2, would you care to explain what you are doing at a high level? Also, MoE-LLaVa states that there are training instabilities when using phi-2, maybe because they used a past version of the modeling_phi.py file.

dbands

Feb 25

I am using the inference api on huggingface. I attempted to load and access the model using the huggingface production end points. I gave up after several attempts. Any special settings or configs I need to be aware of to enable on a private huggingface inference api? Would help heaps.

xueyanz

Microsoft org Feb 25

I load it in fp32 and use torch.amp with bfloat16. The last version of modeling_phi.py already disables autocast by itself in the forward method of the attention module. I'm actually also building a VLM using phi-2, would you care to explain what you are doing at a high level? Also, MoE-LLaVa states that there are training instabilities when using phi-2, maybe because they used a past version of the modeling_phi.py file.

I will not train the language model, so I simply enable Autocast to see the performance, and the inference result seems reasonable. I am building vlm along the lines of my work in the past.

xueyanz

Microsoft org Feb 25

ok, it seems that auto-casting would make the outputs NaN even during evaluation.

kerrmetric

Feb 27

•

edited Feb 27

Edited - nevermind - I figured out how to use float16 & bfloat16 without needing to autocast. Thank you.

ajmoreno

Feb 28

Edited - nevermind - I figured out how to use float16 & bfloat16 without needing to autocast. Thank you.

Care to share what you did? Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment