K quants should not contain IQ4_NL types inside
That breaks support for backends that don't support I-Quants
Oh I don't know why I didn't get a notification from your GitHub ping...
I think it has something to do with the shape of the tensor and not being divisible by 256
Same thing on other people's quants too:
Here's a comment from Slaren explaining when a similar thing happened with Qwen2 and my p100:
https://github.com/ggerganov/llama.cpp/issues/7805#issuecomment-2166507695
Hmm okay.
I wonder if it would be prudent to label such quants as non-K quants.
Have an I-Quanted tensor means it breaks all other backends that don't support it, while people assume it's a regular k quant and wonder why Q3_K_M works but not Q3_K_S
Never mind, ggerganov has provided an upcoming fix https://github.com/ggerganov/llama.cpp/pull/8489
A simple re-quant after this would solve the issue.
yeah it seems an odd choice to fallback to something that can be unsupported lol
glad a fix is coming..