Quantized version taking too long with CPU's

#80
by SukanyaM - opened

Hi Team,

While using Quantized version on a GCP instance with cpu it is taking ~10 min for each Question which is taking only few seconds with API. Can someone please suggest if we have an alternative here either to use GPU or use a Full version instead. Some articles or references are appreciated.

Thanks

Sign up or log in to comment