README.md · mradermacher/Qwen2-57B-A14B-Instruct-GGUF at main

metadata

base_model: Qwen/Qwen2-57B-A14B-Instruct
language:
  - en
library_name: transformers
license: apache-2.0
quantized_by: mradermacher
tags:
  - chat

About

The Qwen2-57B models seem to be broken. I have tried my best, but they likely need to be fixed upstream first. You have been warned.

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Link	Type	Size/GB	Notes
GGUF	Q2_K	21.2
GGUF	IQ3_XS	23.7
GGUF	Q3_K_S	25.0
GGUF	IQ3_S	25.0	beats Q3_K*
GGUF	IQ3_M	25.3
GGUF	Q3_K_M	27.6	lower quality
GGUF	Q3_K_L	29.9
GGUF	IQ4_XS	31.1
GGUF	Q4_K_S	32.8	fast, recommended
GGUF	Q4_K_M	35.0	fast, recommended
GGUF	Q5_K_S	39.7
GGUF	Q5_K_M	40.9
GGUF	Q6_K	47.2	very good quality
PART 1 PART 2	Q8_0	61.1	fast, best quality

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

See https://huggingface.co/mradermacher/model_requests for some answers to questions you might have and/or if you want some other model quantized.

I thank my company, nethype GmbH, for letting me use its servers and providing upgrades to my workstation to enable this work in my free time.