anthracite-org/magnum-v2-123b-gguf · Context Shift Problem

Sep 10

You have turned out a wonderful model and I thoroughly enjoy working with it. But there is one very serious problem: context shift does not work with this model in llamacpp. Very often there is a complete recalculation of the contents of the context window, which for such a large model and large context is very long, especially on weak GPUs. It seems that llamacpp cannot correctly compare current and past prompts. This does not happen with a pure Mistral Large 123B. Please try to get this problem sorted out - it's making it very difficult to work properly with your model.

lucyknada

Anthracite org Sep 10

thanks! though models can't affect that, it must be somewhere in your inferencing frontend or lcpp/kcpp that is causing that re-processing of tokens.

lucyknada changed discussion status to closed Sep 10

Vlad100

Sep 11

thanks! though models can't affect that, it must be somewhere in your inferencing frontend or lcpp/kcpp that is causing that re-processing of tokens.

Large chat with contextual window in 16k. Same settings in the Silly Tavern. Magnum-v2-123b-q4_k - often full context recalculation (not every replica, but often). Same model from Bartowsky and mradermacher - same problems. Bartowsky/Mistral-Large-Instruct-2407-Q2_K (pure) - no problems, full recalculation of context was not once in two hours of chat...

lucyknada

Anthracite org Sep 11

we have someone looking into the config.json that might be different; we'll post if we find anything.

lucyknada changed discussion status to open Sep 11

nisten

Sep 11

Hmmm, I can push out a quant with the og models config,

might be able to also fix it via editing the .ggufs metadata