Will this method of quantization appear in Ollama?

#1
by Regrin - opened

Hello!
I would really like to use models quantized to this state on my (very weak) computer.
Can you tell me if this method will be available in Ollama?

VPTQ-community org

Hi Regin,
Thanks for your advice!
I am very interested in adapting this method for use with Ollama.
When using VPTQ or similar methods, what are your most important requirements?
What device are you using? This information would help us pinpoint the motivation for integrating it into Ollama.

VPTQ-community org

The backend of Ollama is llama.cpp, and we could support VPTQ in llama.cpp as the first step.

See, I'm using a relatively powerful laptop. That said, I only have 8GB of memory. I might build myself a server for LLM, but that's a question for tomorrow.
I need a very small model, at high quality.
I guess my priorities are RAG and programming.
In addition, I would like to train micro-models for my tasks. Is there any possibility to pre-train your quantized models? Something like QLoRA

VPTQ-community org

See, I'm using a relatively powerful laptop. That said, I only have 8GB of memory. I might build myself a server for LLM, but that's a question for tomorrow.
I need a very small model, at high quality.
I guess my priorities are RAG and programming.
In addition, I would like to train micro-models for my tasks. Is there any possibility to pre-train your quantized models? Something like QLoRA

As the VPTQ maintainer mentioned, they will release quantization codes https://github.com/microsoft/VPTQ/issues/29, and I guess you can quantify your pre-trained model.
Or Integrate with QLoRA after quantization.

Thanks!

So it will be possible to run and pre-train quantized models?

VPTQ-community org

So it will be possible to run and pre-train quantized models?

Yes, please wait a few weeks.

This comment has been hidden

Sign up or log in to comment