Llama 3.1 405B Quants and llama.cpp versions that is used for quantization
- IQ1_S: 86.8 GB - b3459
- IQ1_M: 95.1 GB - b3459
- IQ2_XXS: 109.0 GB - b3459
- IQ3_XXS: 157.7 GB - b3484
Quantization from BF16 here: https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/
which is converted from Llama 3.1 405B: https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct
imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat
Lmk if you need bigger quants.
- Downloads last month
- 249
Model tree for etemiz/Llama-3.1-405B-Inst-GGUF
Base model
meta-llama/Llama-3.1-405B
Finetuned
meta-llama/Llama-3.1-405B-Instruct