Context Shift Problem
You have turned out a wonderful model and I thoroughly enjoy working with it. But there is one very serious problem: context shift does not work with this model in llamacpp. Very often there is a complete recalculation of the contents of the context window, which for such a large model and large context is very long, especially on weak GPUs. It seems that llamacpp cannot correctly compare current and past prompts. This does not happen with a pure Mistral Large 123B. Please try to get this problem sorted out - it's making it very difficult to work properly with your model.
thanks! though models can't affect that, it must be somewhere in your inferencing frontend or lcpp/kcpp that is causing that re-processing of tokens.
thanks! though models can't affect that, it must be somewhere in your inferencing frontend or lcpp/kcpp that is causing that re-processing of tokens.
Large chat with contextual window in 16k. Same settings in the Silly Tavern. Magnum-v2-123b-q4_k - often full context recalculation (not every replica, but often). Same model from Bartowsky and mradermacher - same problems. Bartowsky/Mistral-Large-Instruct-2407-Q2_K (pure) - no problems, full recalculation of context was not once in two hours of chat...
we have someone looking into the config.json that might be different; we'll post if we find anything.
Hmmm, I can push out a quant with the og models config,
might be able to also fix it via editing the .ggufs metadata