Support for quantized cache

#5
by dragstoll - opened

Hi
Is it possible to use quantized cache with this model?
It tried to use it with KV Cache Quantization:
cache_implementation="quantized",
cache_config={"nbits": 4, "backend": "quanto"},

But getting an error: This model does not support the quantized cache. If you want your model to support quantized cache, please open an issue.

Sign up or log in to comment