Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame
182 Bytes
If your model doesn't work well with mixed precision, for example if it wasn't pretrained in mixed precision, you may encounter overflow or underflow issues which can cause NaN loss.