If you start getting loss=NaN or the model inhibits some other abnormal behavior due to inf or nan in activations or weights one needs to discover where the first underflow or overflow happens and what led to it.