Or if you're not sure how to interpret the output you can share the log file in an Issue. Underflow and Overflow Detection This feature is currently available for PyTorch-only. For multi-GPU training it requires DDP (torch.distributed.launch). This feature can be used with any nn.Module-based model. If you start getting loss=NaN or the model inhibits some other abnormal behavior due to inf or nan in activations or weights one needs to discover where the first underflow or overflow happens and what led to it.