mradermacher
/

Meta-Llama-3.1-405B-Instruct-i1-GGUF

Inference Endpoints

Model card Files Files and versions Community

mradermacher commited on Sep 29, 2024

Commit

d4bc519

·

verified ·

1 Parent(s): 3e23675

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -192,6 +192,10 @@ tags:
 <!-- ### tags: nicoboss -->
 weighted/imatrix quants of https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct
 <!-- provided-files -->
 static quants are available at https://huggingface.co/mradermacher/Meta-Llama-3.1-405B-Instruct-GGUF
 ## Usage
@@ -211,7 +215,7 @@ more details, including on how to concatenate multi-part files.
 Here is a handy graph by ikawrakow comparing some lower-quality quant
 types (lower is better):
-![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)
 And here are Artefact2's thoughts on the matter:
 https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9

 <!-- ### tags: nicoboss -->
 weighted/imatrix quants of https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct
+These imatrix quants have recently been requantized from a higher quality imatrix calculated
+from the source model instead of the Q8_0, in what was probably the largest distributed imatrix
+computation to date (and also one of the first).
 <!-- provided-files -->
 static quants are available at https://huggingface.co/mradermacher/Meta-Llama-3.1-405B-Instruct-GGUF
 ## Usage
 Here is a handy graph by ikawrakow comparing some lower-quality quant
 types (lower is better):
+![image.png]()
 And here are Artefact2's thoughts on the matter:
 https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9