Update README.md
Browse files
README.md
CHANGED
@@ -8,11 +8,11 @@ datasets:
|
|
8 |
|
9 |
## Overview
|
10 |
|
11 |
-
This is a finetune of [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co/NousResearch/Yarn-Llama-2-13b-64k)
|
12 |
|
13 |
**This is a (merged) QLoRA fine-tune (rank 64)**.
|
14 |
|
15 |
-
The finetune was performed with 1x RTX 6000 Ada (~
|
16 |
|
17 |
|
18 |
## How to Use
|
@@ -23,7 +23,7 @@ The PNTK method employed in my other model [bhenrym14/airophin-13b-pntk-16k-fp16
|
|
23 |
|
24 |
Please comment with any questions and feedback on how this model performs, especially at long context lengths!
|
25 |
|
26 |
-
Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to
|
27 |
|
28 |
**There may be issues on Windows systems loading this model due to the decimal in "2.1" found in the model name. Try simply changing the model directory name to omit this decimal if you have issues loading the model.**
|
29 |
|
@@ -50,7 +50,9 @@ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parame
|
|
50 |
### Benchmarks
|
51 |
|
52 |
ARC (25 shot): 60.32
|
|
|
53 |
Hellaswag (10 shot): 83.90
|
|
|
54 |
MMLU (5 shot): 54.39
|
55 |
|
56 |
## Prompting:
|
|
|
8 |
|
9 |
## Overview
|
10 |
|
11 |
+
This is a finetune of [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co/NousResearch/Yarn-Llama-2-13b-64k), which is base Llama-2-13b with additional pretraining done with YaRN scaling applied to RoPE to extend the useful context length to 64k tokens. Starting with this model, I performed instruction tuning with [Jon Durbin's Airoboros 2.1 dataset](https://huggingface.co/datasets/jondurbin/airoboros-2.1), with the same scaling approach applied.
|
12 |
|
13 |
**This is a (merged) QLoRA fine-tune (rank 64)**.
|
14 |
|
15 |
+
The finetune was performed with 1x RTX 6000 Ada (~16 hours).
|
16 |
|
17 |
|
18 |
## How to Use
|
|
|
23 |
|
24 |
Please comment with any questions and feedback on how this model performs, especially at long context lengths!
|
25 |
|
26 |
+
Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to 65586 to utilize the full context capabilities. Again `trust_remote_code=True` is imperative. Obviously, using full context requires A LOT of VRAM.
|
27 |
|
28 |
**There may be issues on Windows systems loading this model due to the decimal in "2.1" found in the model name. Try simply changing the model directory name to omit this decimal if you have issues loading the model.**
|
29 |
|
|
|
50 |
### Benchmarks
|
51 |
|
52 |
ARC (25 shot): 60.32
|
53 |
+
|
54 |
Hellaswag (10 shot): 83.90
|
55 |
+
|
56 |
MMLU (5 shot): 54.39
|
57 |
|
58 |
## Prompting:
|