bartowski
/

DeepSeek-Coder-V2-Lite-Instruct-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

K quants should not contain IQ4_NL types inside

#4

by concedo - opened Jul 13

concedo

Jul 13

That breaks support for backends that don't support I-Quants

https://github.com/LostRuins/koboldcpp/discussions/976

Owner Jul 13

•

Oh I don't know why I didn't get a notification from your GitHub ping...

I think it has something to do with the shape of the tensor and not being divisible by 256

Same thing on other people's quants too:

https://huggingface.co/mradermacher/DeepSeek-Coder-V2-Lite-Instruct-GGUF/tree/main?show_file_info=DeepSeek-Coder-V2-Lite-Instruct.Q2_K.gguf

Here's a comment from Slaren explaining when a similar thing happened with Qwen2 and my p100:

https://github.com/ggerganov/llama.cpp/issues/7805#issuecomment-2166507695

concedo

Jul 15

Hmm okay.

I wonder if it would be prudent to label such quants as non-K quants.

Have an I-Quanted tensor means it breaks all other backends that don't support it, while people assume it's a regular k quant and wonder why Q3_K_M works but not Q3_K_S

concedo

Jul 15

•

Never mind, ggerganov has provided an upcoming fix https://github.com/ggerganov/llama.cpp/pull/8489

A simple re-quant after this would solve the issue.

Owner Jul 15

yeah it seems an odd choice to fallback to something that can be unsupported lol

glad a fix is coming..

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment