Quantizing the model - on our own

by christianweyer - opened May 21

May 21

Thanks for this great model!

As llama.cpp does not support the CogVLMForCausalLM architecture - how could we quantize the model on our own?

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org May 21

It is not fit for llama.cpp(gguf) formate, we will provide int4 hf model

May 22

Great!
How can we execute/run the model e.g. on a M3 Mac then?

May 22

•

6bit or 8bit? as 4bit quite dum dum

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org May 23

4bit will provide. not test in mac because using trition

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment