bhenrym14
/

airoboros-l2-13b-2.1-YaRN-64k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bhenrym14 commited on Sep 4, 2023

Commit

92f3162

•

1 Parent(s): 5ece77f

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -8,11 +8,11 @@ datasets:
 ## Overview
-This is a finetune of [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co/NousResearch/Yarn-Llama-2-13b-64k). This starting point is Llama-2-13b with additional pretraining done with YaRN scaling applied to RoPE to extend the useful context length to 64k tokens. Starting with this model, I performed instruction tuning with  [Jon Durbin's Airoboros 2.1 dataset](https://huggingface.co/datasets/jondurbin/airoboros-2.1), with same scaling approach applied.
 **This is a (merged) QLoRA fine-tune (rank 64)**.
-The finetune was performed with 1x RTX 6000 Ada (~18 hours).
 ## How to Use
@@ -23,7 +23,7 @@ The PNTK method employed in my other model [bhenrym14/airophin-13b-pntk-16k-fp16
 Please comment with any questions and feedback on how this model performs, especially at long context lengths!
-Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to 16384 to utilize the full context capabilities. Again `trust_remote_code=True` is imperative
 **There may be issues on Windows systems loading this model due to the decimal in "2.1" found in the model name. Try simply changing the model directory name to omit this decimal if you have issues loading the model.**
@@ -50,7 +50,9 @@ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parame
 ### Benchmarks
 ARC (25 shot): 60.32
 Hellaswag (10 shot): 83.90
 MMLU (5 shot): 54.39
 ## Prompting:

 ## Overview
+This is a finetune of [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co/NousResearch/Yarn-Llama-2-13b-64k), which is base Llama-2-13b with additional pretraining done with YaRN scaling applied to RoPE to extend the useful context length to 64k tokens. Starting with this model, I performed instruction tuning with  [Jon Durbin's Airoboros 2.1 dataset](https://huggingface.co/datasets/jondurbin/airoboros-2.1), with the same scaling approach applied.
 **This is a (merged) QLoRA fine-tune (rank 64)**.
+The finetune was performed with 1x RTX 6000 Ada (~16 hours).
 ## How to Use
 Please comment with any questions and feedback on how this model performs, especially at long context lengths!
+Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to 65586 to utilize the full context capabilities. Again `trust_remote_code=True` is imperative. Obviously, using full context requires A LOT of VRAM.
 **There may be issues on Windows systems loading this model due to the decimal in "2.1" found in the model name. Try simply changing the model directory name to omit this decimal if you have issues loading the model.**
 ### Benchmarks
 ARC (25 shot): 60.32
 Hellaswag (10 shot): 83.90
 MMLU (5 shot): 54.39
 ## Prompting: