GPU and memory requirements

#89
by Sengil - opened

Can someone with knowledge about the GPU and memory requirements necessary to run this model inform me?

@Sengil There are a few ways you can run it.

Bf16/fp16 is normal precision which this model is. The least required is 24gb vram gpu but you want more to run it with decent speed.

Q8/fp8/int8 requires 16gb vram gpu, Q8 and torchao int8 are the best as they are incredibly close to bf16/fp16 quality and faster.

Nf4v2/Q4/Quanto Q4 requires just 8gb vram gpu at the lowest, and are slightly slower then the above formats but require much vram. They will also have slightly less detail. Nf4v2 is the fastest here.

IMO, go for the nf4v2, bf16, or int8/q8. Depends on how much vram you have.

@YaTharThShaRma999 Thank you for your detailed and clear information. I use this model in my project and I aim to run my project on an AWS server. In a way, I was asking about AWS server requirements. What do you think about this issue, and if you have any detailed information you would like to mention, I would like to hear from you. Thank you very much again for your time.

Sign up or log in to comment