Inference speed extremely slow

#2
by Borko24 - opened

It seems that when I load 'gptq-4bit-32g-actorder_True' the inference speed is very slow. For reference I am using A10 GPU with 24GB VRAM. The context is that I am conducting experiments with it. Is it because I have to install autogptq from source. I have found an open issue from a year ago about being slow when using the pre-built wheels. It should be faster than what is currently. It takes more than an hour to complete the tasks from the HumanEval dataset.

Sign up or log in to comment