InferenceIllusionist
commited on
Commit
•
3d5fe52
1
Parent(s):
8016a2c
Update README.md
Browse files
README.md
CHANGED
@@ -32,6 +32,7 @@ PROUDLY PRESENTS
|
|
32 |
Quantized from fp16.
|
33 |
* Weighted quantizations were creating using fp16 GGUF and [groups_merged-enhancedV2-TurboMini.txt](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-9432658) in 234 chunks and n_ctx=512
|
34 |
* This method of calculating the importance matrix showed improvements in some areas for Mistral 7b and Llama3 8b models, see above post for details
|
|
|
35 |
|
36 |
For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
|
37 |
|
|
|
32 |
Quantized from fp16.
|
33 |
* Weighted quantizations were creating using fp16 GGUF and [groups_merged-enhancedV2-TurboMini.txt](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-9432658) in 234 chunks and n_ctx=512
|
34 |
* This method of calculating the importance matrix showed improvements in some areas for Mistral 7b and Llama3 8b models, see above post for details
|
35 |
+
* The enhancedv2-turbomini file appends snippets from turboderp's calibration data to the standard groups_merged.txt file
|
36 |
|
37 |
For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
|
38 |
|