Inference speed is slow

#11
by kiran2405 - opened

I am trying to load this model for inference on a databricks notebook using the code provided in model card. But I am having a very low inference speed. It takes around 40-50 seconds to provide the answer even for simple prompts. How can I speedup this inference time?

My databricks cuda version is 11.4 .

Installing directly from github instead of pip install auto-gptq solved the problem with inference speed.

kiran2405 changed discussion status to closed

Sign up or log in to comment