Promising looking results on 24GB VRAM folks!
Good threat with some MMLU-Pro benchmarks over on r/LocalLLaMA
: https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lnxcfzg/?context=3
I still want to find the "knee point" between the 72B and 32B quants...
A summary of Qwen2.5 Models and Parameters performance on MMLU-Pro Computer Science benchmark as submitted by redditors over on u/AaronFeng47 great recent post.
Model Parameters | Quant | File Size (GB) | MMLU-Pro Computer Science | Source |
---|---|---|---|---|
14B | ??? |
??? | 60.49 | Additional_test_758 |
32B | 4bit AWQ |
19.33 | 75.12 | russianguy |
32B | Q4_K_L-iMatrix |
20.43 | 72.93 | AaronFeng47 |
32B | Q4_K_M |
18.50 | 71.46 | AaronFeng47 |
32B | Q3_K_M |
14.80 | 72.93 | AaronFeng47 |
32B | Q3_K_M |
14.80 | 73.41 | VoidAlchemy |
32B | IQ4_XS |
17.70 | 73.17 | soulhacker |
72B | IQ3_XXS |
31.85 | 77.07 | VoidAlchemy |
Gemma2-27B-it | Q8_0 |
29.00 | 58.05 | AaronFeng47 |
References
https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lnxcfzg/
https://www.reddit.com/r/LocalLLaMA/comments/1flfh0p/comment/lo7nppj/
Man if we could find a way to distribute MMLU pro testing.. that would be so cool haha.
A summary of Qwen2.5 Models and Parameters performance on MMLU-Pro Computer Science benchmark as submitted by redditors over on u/AaronFeng47 great recent post.
Model Parameters Quant File Size (GB) MMLU-Pro Computer Science Source 14B ???
??? 60.49 Additional_test_758 32B 4bit AWQ
19.33 75.12 russianguy 32B Q4_K_L-iMatrix
20.43 72.93 AaronFeng47 32B Q4_K_M
18.50 71.46 AaronFeng47 32B Q3_K_M
14.80 72.93 AaronFeng47 32B Q3_K_M
14.80 73.41 VoidAlchemy 32B IQ4_XS
17.70 73.17 soulhacker 72B IQ3_XXS
31.85 77.07 VoidAlchemy Gemma2-27B-it Q8_0
29.00 58.05 AaronFeng47 References
https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lnxcfzg/
https://www.reddit.com/r/LocalLLaMA/comments/1flfh0p/comment/lo7nppj/
Q3 better than Q4 ??? Craziness
Q3 better than Q4 ??? Craziness
Yeah, some interesting results for sure. Though, I personally take all these benchmarks with a big grain of salt then throw some salt over my shoulder for good measure too... lol...
It is just a single benchmark with 410 questions that allows random guesses so hard to extrapolate if Q3 would actually be better on your exact data set than Q4 etc.
Fun stuff though, and Ollama-MMLU-Pro makes it easier than ever to try it yourself!
Man if we could find a way to distribute MMLU pro testing.. that would be so cool haha.
Huh, seems like the MMLU-Pro Leaderboard does have a Submit here
button for Self-reported
results?! Kind of cool, curious to see if this gets picked up and if it is automated or what...
Q3 better than Q4 ??? Craziness
bartowski's own notes on the main page here seems to have different recommendations. this has an imartix, whereas the OP in that reddit post used Ollama models which are directly from the qwen original source, so its possible that things change with that difference (but I bet this imatrix is not multilingual)
@ubergarm yeah that's cool but it would be even cooler if there was a way to just add your GPU to a pool, similar to distributed training or crypto, and then have official MMLU-pro run against the pool and compile the results so everyone is just running a few iterations each
@robbiemu that's true, but also Q3 can out perform Q4 in benchmarks without being overall better, just by chance or by fortunate pruning
@robbiemu that's true, but also Q3 can out perform Q4 in benchmarks without being overall better, just by chance or by fortunate pruning
hi @bartowski , thanks -- that's cool, so its just the one "comp sci" measure in mmli-pro, but across the whole bench it probably doesn't remain so high.
If I can ask, where do your recommendations in the model page come from?
Purely from a bits per weight and general feeling that has been established over time, generally speaking it's been accepted to aim for above 4 bits per weight (Q4_K_M+) but models tend to stay surprisingly coherent all the way down the Q2