---
license: llama3
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
tags:
- llama
base_model: mattshumer/ref_70_e3
pipeline_tag: text-generation
library_name: ggml
datasets:
- froggeric/imatrix
metrics:
- perplexity
---
# Reflection-Llama-3.1-70B-GGUF
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/lQJH2XICEKaACm9lfH7ZM.webp)
GGUF quantized models of [mattshumer/ref_70_e3](https://huggingface.co/mattshumer/ref_70_e3)
> This is the new, working version of the Reflection Llama 3.1 70B model.
**Reflection Llama-3.1 70B is (purportedly) the world's top open-source LLM, trained with a new technique called Reflection-Tuning that teaches a LLM to detect mistakes in its reasoning and correct course.**
| Quantization | Size | Split | iMatrix |
| ------------ | ------ | ----- | ------- |
| FP16 | 141GB | true | false |
| Q8_0_L | ??.?GB | true | false |
| Q8_0 | ??.?GB | true | false |
| Q6_K_L | ??.?GB | true | false |
| Q6_K | 57.9GB | true | false |
| Q5_K_L | 52.6GB | true | false |
| Q5_K_M | ??.?GB | true | false |
| Q5_K_S | 48.7GB | false | false |
| Q4_K_L | 45.3GB | false | false |
| Q4_K_M | ??.?GB | false | false |
| Q4_K_S | 40.3GB | false | false |
| IQ4_NL | 38.2GB | false | true |
| IQ4_XS | ??.?GB | false | true |
| Q3_K_XL | 37.2GB | false | false |
| Q3_K_L | 37.1GB | false | false |
| Q3_K_M | 34.3GB | false | false |
| IQ3_M | ??.?GB | false | true |
| Q3_K_S | ??.?GB | false | false |
| IQ3_S | ??.?GB | false | true |
| Q2_K_L | 29.4GB | false | false |
| IQ3_XS | ??.?GB | false | true |
| IQ3_XXS | ??.?GB | false | true |
| Q2_K | ??.?GB | false | true |
| Q2_K_S | ??.?GB | false | true |
| IQ2_M | 23.0GB | false | true |
| IQ2_S | 21.2GB | false | true |
| IQ2_XS | 20.2GB | false | true |
| IQ2_XXS | 18.2GB | false | true |
| IQ1_M | 16.0GB | false | true |
| IQ1_S | 14.6GB | false | true |
The `_L` or `_XL` suffix means that the token embeddings and output weight are at fp16 precision.
The iMatrix dataset is bartowski's, which you can find here: [calibration_datav3.txt](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8)
Computation is done on static Q6_K for 125 chunks.
## Model Info
The model was not trained on 3 epoches, because it's identical to the 2nd epoch run [mattshumer/Reflection-Llama-3.1-70B-ep2-working](https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B-ep2-working) (it's possible this is also fake).
The fine-tuning was done using LoRA with rank 256 on the Llama-3.1-70B-Instruct model.
## Benchmarks
**Warning: These are likely false scores and cannot be replicated with this model.**
All benchmarks tested have been checked for contamination by running [LMSys's LLM Decontaminator](https://github.com/lm-sys/llm-decontaminator). When benchmarking, we isolate the `