Question about Gradient calculation

#30
by Infinity4B - opened

Hi, i am wondering why Gradient calculation in float16/bfloat16 costs more vram than float32? I searched on the Internet, but i cant find a explanation about this.

Sign up or log in to comment