Quantizing the model - on our own
#3
by
christianweyer
- opened
Thanks for this great model!
As llama.cpp does not support the CogVLMForCausalLM architecture - how could we quantize the model on our own?
It is not fit for llama.cpp(gguf) formate, we will provide int4 hf model
Great!
How can we execute/run the model e.g. on a M3 Mac then?
6bit or 8bit? as 4bit quite dum dum
4bit will provide. not test in mac because using trition