OOMS on 8 GB GPU, is it normal?

by tanimazsin130 - opened

It gives OOM error even though use_fp16=True is set. is this normal? I am running on 8 gb rtx 3070 graphics card.

same happens to me :/

I use the corpus of BeIR/nq to generate sentences. Here is my test results (use_fp16=True, Linux, A800 GPU):

  • model.encode(sentences, batch_size=128, max_length=512): 5.9GB / GPU
  • model.encode(sentences, batch_size=200, max_length=512): 7.6GB / GPU
  • model.encode(sentences, batch_size=256, max_length=512): 9.0GB / GPU
  • model.encode(sentences, batch_size=256, max_length=256): 5.7GB / GPU

The default parameters are batch_size=256, max_length=512, so it's normal if you run the examples directly. To solve the problem, you have two choices:

  • set shorter max_length, if the sentences consists mostly of short sequences
  • set smaller batch_size

in my case for model.encode(sentences, batch_size=1, max_length=5000): 10.5GB VRAM
i'm testing the model for multilang retrieve and re-rank and works pretty good, but demands a lot of VRAM, i dont know if quants are possible with this model's arch, but loading in 8 bits would be a FTW

Sign up or log in to comment