|
To avoid overflows under |
|
fp16 the activations must remain way below 1e4, because 1e4 * 1e4 = 1e8 so any matrix multiplication with |
|
large activations is going to lead to a numerical overflow condition. |
|
At the very start of the trace you can discover at which batch number the problem occurred (here Detected inf/nan during batch_number=0 means the problem occurred on the first batch). |
|
Each reported frame starts by declaring the fully qualified entry for the corresponding module this frame is reporting |
|
for. |