brucethemoose
/

CapyTessBorosYi-34B-200K-DARE-Ties-exl2-4bpw-fiction

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

brucethemoose commited on Dec 6, 2023

Commit

88bcc86

•

1 Parent(s): b1d3c91

Command typo

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -47,7 +47,7 @@ python convert.py --in_dir //home/alpha/FastModels/CapyTessBorosYi-34B-200K-DARE
 Second exllama quantization pass:
 ```
-python convert.py --in_dir /home/alpha/FastModels/CapyTessBorosYi-34B-200K-DARE-Ties -o /home/alpha/FastModels/scratch -m /home/alpha/FastModels/capytessmes.json --cal_dataset /home/alpha/Documents/medium.parquet -l 2048 -r 200 -ml 2048 -mr 40 -gr 200 -ss 4096 -b 3.1 -hb 6 -cf /home/alpha/FastModels/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-4bpw-fiction -nr
 ```
 dare_ties is testing with better perplexity than a regular ties merge with the same merge configuration. Model weights that add up to one also seem optimal from testing. And results at long context seem... better than the previous dare merge with Tess 1.2?

 Second exllama quantization pass:
 ```
+python convert.py --in_dir /home/alpha/FastModels/CapyTessBorosYi-34B-200K-DARE-Ties -o /home/alpha/FastModels/scratch -m /home/alpha/FastModels/capytessmes.json --cal_dataset /home/alpha/Documents/medium.parquet -l 2048 -r 200 -ml 2048 -mr 40 -gr 200 -ss 4096 -b 4.0 -hb 6 -cf /home/alpha/FastModels/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-4bpw-fiction -nr
 ```
 dare_ties is testing with better perplexity than a regular ties merge with the same merge configuration. Model weights that add up to one also seem optimal from testing. And results at long context seem... better than the previous dare merge with Tess 1.2?