Quantized version taking too long with CPU's
#80
by
SukanyaM
- opened
Hi Team,
While using Quantized version on a GCP instance with cpu it is taking ~10 min for each Question which is taking only few seconds with API. Can someone please suggest if we have an alternative here either to use GPU or use a Full version instead. Some articles or references are appreciated.
Thanks