Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame
120 Bytes
For example, if you're training with bf16, the data is also gathered in bf16 because gathering is a non-lossy operation.