internlm2-limarp-chat-20b.Q4_K_S_imx.gguf vs internlm2-limarp-chat-20b.Q4_K_S.gguf
Hello!
Really like this model!
Can you please explain the difference between IMX vs regular GGUFs?
I've googled it but nothing useful.
Thank you in advance!
Hello! I'm glad you like my model.
IMX here refers to quantizations done using the recent imatrix feature from llama.cpp. They should perform slightly better, while being the same size.
You can read more about this feature in these pull requests:
https://github.com/ggerganov/llama.cpp/pull/4861
https://github.com/ggerganov/llama.cpp/pull/4930
Hello! I'm glad you like my model.
IMX here refers to quantizations done using the recent imatrix feature from llama.cpp. They should perform slightly better, while being the same size.
You can read more about this feature in these pull requests:
https://github.com/ggerganov/llama.cpp/pull/4861
https://github.com/ggerganov/llama.cpp/pull/4930
Thank you for answer! Got it! Sounds like a nice improvement over an old GGUF!