|
Or if you're not sure how to interpret the output you can share the log file in an Issue. |
|
Underflow and Overflow Detection |
|
|
|
This feature is currently available for PyTorch-only. |
|
|
|
For multi-GPU training it requires DDP (torch.distributed.launch). |
|
|
|
This feature can be used with any nn.Module-based model. |
|
|
|
If you start getting loss=NaN or the model inhibits some other abnormal behavior due to inf or nan in |
|
activations or weights one needs to discover where the first underflow or overflow happens and what led to it. |