etemiz
/

Llama-3.1-405B-Inst-GGUF

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Llama 3.1 405B Quants and llama.cpp versions that is used for quantization

IQ1_S: 86.8 GB - b3459
IQ1_M: 95.1 GB - b3459
IQ2_XXS: 109.0 GB - b3459
IQ3_XXS: 157.7 GB - b3484

Quantization from BF16 here: https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/

which is converted from Llama 3.1 405B: https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct

imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat

Lmk if you need bigger quants.

Downloads last month: 249

GGUF

Model size

410B params

Architecture

llama

1-bit

2-bit

3-bit

Inference API

Unable to determine this model's library. Check the docs .

Model tree for etemiz/Llama-3.1-405B-Inst-GGUF

Base model

meta-llama/Llama-3.1-405B

Finetuned

meta-llama/Llama-3.1-405B-Instruct

Quantized

(26)

this model