OOMS on 8 GB GPU, is it normal?
#2
by
tanimazsin130
- opened
It gives OOM error even though use_fp16=True is set. is this normal? I am running on 8 gb rtx 3070 graphics card.
same happens to me :/
I use the corpus of BeIR/nq to generate sentences
. Here is my test results (use_fp16=True
, Linux, A800 GPU):
model.encode(sentences, batch_size=128, max_length=512)
: 5.9GB / GPUmodel.encode(sentences, batch_size=200, max_length=512)
: 7.6GB / GPUmodel.encode(sentences, batch_size=256, max_length=512)
: 9.0GB / GPUmodel.encode(sentences, batch_size=256, max_length=256)
: 5.7GB / GPU
The default parameters are batch_size=256, max_length=512
, so it's normal if you run the examples directly. To solve the problem, you have two choices:
- set shorter
max_length
, if thesentences
consists mostly of short sequences - set smaller
batch_size
in my case for model.encode(sentences, batch_size=1, max_length=5000): 10.5GB VRAM
i'm testing the model for multilang retrieve and re-rank and works pretty good, but demands a lot of VRAM, i dont know if quants are possible with this model's arch, but loading in 8 bits would be a FTW