bhenrym14 commited on
Commit
92f3162
1 Parent(s): 5ece77f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -8,11 +8,11 @@ datasets:
8
 
9
  ## Overview
10
 
11
- This is a finetune of [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co/NousResearch/Yarn-Llama-2-13b-64k). This starting point is Llama-2-13b with additional pretraining done with YaRN scaling applied to RoPE to extend the useful context length to 64k tokens. Starting with this model, I performed instruction tuning with [Jon Durbin's Airoboros 2.1 dataset](https://huggingface.co/datasets/jondurbin/airoboros-2.1), with same scaling approach applied.
12
 
13
  **This is a (merged) QLoRA fine-tune (rank 64)**.
14
 
15
- The finetune was performed with 1x RTX 6000 Ada (~18 hours).
16
 
17
 
18
  ## How to Use
@@ -23,7 +23,7 @@ The PNTK method employed in my other model [bhenrym14/airophin-13b-pntk-16k-fp16
23
 
24
  Please comment with any questions and feedback on how this model performs, especially at long context lengths!
25
 
26
- Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to 16384 to utilize the full context capabilities. Again `trust_remote_code=True` is imperative
27
 
28
  **There may be issues on Windows systems loading this model due to the decimal in "2.1" found in the model name. Try simply changing the model directory name to omit this decimal if you have issues loading the model.**
29
 
@@ -50,7 +50,9 @@ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parame
50
  ### Benchmarks
51
 
52
  ARC (25 shot): 60.32
 
53
  Hellaswag (10 shot): 83.90
 
54
  MMLU (5 shot): 54.39
55
 
56
  ## Prompting:
 
8
 
9
  ## Overview
10
 
11
+ This is a finetune of [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co/NousResearch/Yarn-Llama-2-13b-64k), which is base Llama-2-13b with additional pretraining done with YaRN scaling applied to RoPE to extend the useful context length to 64k tokens. Starting with this model, I performed instruction tuning with [Jon Durbin's Airoboros 2.1 dataset](https://huggingface.co/datasets/jondurbin/airoboros-2.1), with the same scaling approach applied.
12
 
13
  **This is a (merged) QLoRA fine-tune (rank 64)**.
14
 
15
+ The finetune was performed with 1x RTX 6000 Ada (~16 hours).
16
 
17
 
18
  ## How to Use
 
23
 
24
  Please comment with any questions and feedback on how this model performs, especially at long context lengths!
25
 
26
+ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to 65586 to utilize the full context capabilities. Again `trust_remote_code=True` is imperative. Obviously, using full context requires A LOT of VRAM.
27
 
28
  **There may be issues on Windows systems loading this model due to the decimal in "2.1" found in the model name. Try simply changing the model directory name to omit this decimal if you have issues loading the model.**
29
 
 
50
  ### Benchmarks
51
 
52
  ARC (25 shot): 60.32
53
+
54
  Hellaswag (10 shot): 83.90
55
+
56
  MMLU (5 shot): 54.39
57
 
58
  ## Prompting: