https://huggingface.co/NeverSleep/Lumimaid-v0.2-70B
please do this one
Hi, I am currently delaying larger llama-3.1 models because llama.cpp has no good support for it yet, and I'd like to avoid redoing it. This model (and many others) are already eagerly waiting :) I hope the rope fixes will land any day now, in which case this will be one of the first models to be quanted.
I'd like to second this request, mainly after a quant for 24GB cards, like an i1-IQ2_XS, or i1-IQ2_XSS.
Thirded. In the meantime I followed the advice for the L3.1 8B models to manually set RoPE Base to 8M, so 70M for the 70B. Very brief testing on an existing chat but it didn't seem like anything was wrong and it was to some degree pulling more details from the context, but I was testing with only 24k.
imatrix quants are currently generating, but depending on some very chaotic (at the moment) scheduling it might or might not get interrupted by something else. Funnily enough, it's in front of the llama-3.1 70b instruct model itself.
Already pulled the one I needed, Q4_K_S, best option for 24k context on a pair of P40s. Amazing people's "priorities". :D
Q4_K_S is always a good choice :)