Binary performance?
#7
by
williambarberjr
- opened
Curious if you've considered benchmarking GritLM-7B against the approach taken here: https://huggingface.co/blog/embedding-quantization
It's essentially the only way I could practically use 4096 dimensions in production for retrieval over a large corpus. The model size itself would still be an issue when embedding queries in production so pruning and/or quantization of the model itself may also still be needed. But not all models play nicely with this quantize and re-rank with int8 quantized vectors from disk approach. If you all were already planning to benchmark this I'd be very interested in the results.
It'd be interesting to know but I'm not planning to investigate it at the moment; cc @tomaarsen