Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame
200 Bytes
To avoid overflows under
fp16 the activations must remain way below 1e4, because 1e4 * 1e4 = 1e8 so any matrix multiplication with
large activations is going to lead to a numerical overflow condition.