bartowski/Qwen2.5-32B-Instruct-GGUF · this model in Ollama

I used Ollama's new integration to run this model directly (like, without writing a model file, hurrah!) -- the Q4_K_M variant, specifically. I use it mostly in openwebui, in case that is important.

I notice that the time before first token grows (seemingly exponentially) with the size of the context, in a way that is not at all comparable to the official variants of the same quantization from Qwen.

Is anyone else noticing time before first token growing into the minutes with a fully loaded context (two 72kb README.md files, marked to fully load -- but the context window set to 32k)?