You can see that it was called from an attribute dropout inside DenseReluDense class. We can see | |
that it happened during the first layer, of the 2nd block, during the very first batch. Finally, the absolute largest | |
input elements was 6.27e+04 and same for the output was inf. | |
You can see here, that T5DenseGatedGeluDense.forward resulted in output activations, whose absolute max value was | |
around 62.7K, which is very close to fp16's top limit of 64K. |