Quant-Cartel
/

L3.1-70B-sunfall-v0.6.1-exl2-longcal

Transformers

Not-For-All-Audiences

Inference Endpoints

Model card Files Files and versions Community

rAIfle commited on Aug 10

Commit

7072d4a

•

1 Parent(s): 65c1699

Create README.md

Browse files

Files changed (1) hide show

README.md +119 -0

README.md ADDED Viewed

	@@ -0,0 +1,119 @@

+---
+license: llama3
+license_name: llama3
+license_link: LICENSE
+library_name: transformers
+tags:
+- not-for-all-audiences
+datasets:
+- crestf411/LimaRP-DS
+- AI-MO/NuminaMath-CoT
+---
+```
+  e88 88e                               d8
+ d888 888b  8888 8888  ,"Y88b 888 8e   d88
+C8888 8888D 8888 8888 "8" 888 888 88b d88888
+ Y888 888P  Y888 888P ,ee 888 888 888  888
+  "88 88"    "88 88"  "88 888 888 888  888
+      b
+      8b,
+  e88'Y88                  d8           888
+ d888  'Y  ,"Y88b 888,8,  d88    ,e e,  888
+C8888     "8" 888 888 "  d88888 d88 88b 888
+ Y888  ,d ,ee 888 888     888   888   , 888
+  "88,d88 "88 888 888     888    "YeeP" 888
+PROUDLY PRESENTS
+```
+# L3.1-70B-sunfall-v0.6.1-exl2-longcal
+Quantized using 115 rows of 8192 tokens from the default ExLlamav2-calibration dataset.
+Branches:
+- `main` -- `measurement.json`
+- `6b8h` -- 6bpw, 8bit lm_head
+- `4.65b6h` -- 4.65bpw, 6bit lm_head
+- `4.5b6h` -- 4.5bpw, 6bit lm_head
+- `2.25b6h` -- 2.25bpw, 6bit lm_head
+Original model link: [crestf411/L3.1-70B-sunfall-v0.6.1](https://huggingface.co/crestf411/L3.1-70B-sunfall-v0.6.1)
+Original model README below.
+-----
+Sunfall (2024-07-31) v0.6.1 on top of Meta's Llama-3 70B Instruct.
+**NOTE: This model requires a slightly lower temperature than usual. Recommended starting point in Silly Tavern are:**
+* Temperature: **1.2**
+* MinP: **0.06**
+* Optional DRY: **0.8 1.75 2 0**
+General heuristic:
+* Lots of slop: temperature is too low. Raise it.
+* Model is making mistakes about subtle or obvious details in the scene: temperature is too high. Lower it.
+*Mergers/fine-tuners: [there is a LoRA of this model](https://huggingface.co/crestf411/sunfall-peft/tree/main/l3.1-70b). Consider merging that instead of merging this model.*
+To use lore book tags ([example](https://files.catbox.moe/w5otyq.json)), make sure you use **Status: Blue (constant)** and write e.g.
+```
+Follow the Diamond Law at all costs.
+Tags: humor, dark, complex storytelling, intricate characters, immersive.
+```
+![sunfall-standard-sfw.png](https://huggingface.co/crestf411/L3-8B-sunfall-v0.4-stheno-v3.2/resolve/main/sunfall-standard-sfw.png?)
+This model has been trained on context that mimics that of Silly Tavern's Llama3-instruct preset, with the following settings:
+**System Prompt:**
+```
+You are an expert actor that can fully immerse yourself into any role given. You do not break character for any reason. Currently your role is {{char}}, which is described in detail below. As {{char}}, continue the exchange with {{user}}.
+```
+The card has also been trained on content which includes a narrator card, which was used when the content did not mainly revolve around two characters. Future versions will expand on this idea, so forgive the vagueness at this time.
+(The Diamond Law is this, although new rules were added: https://files.catbox.moe/d15m3g.txt -- So far results are unclear, but the training was done with this phrase included, and the training data adheres to the law.)
+The model has also been trained to do storywriting. The system message ends up looking something like this:
+```
+You are an expert storyteller, who can roleplay or write compelling stories. Follow the Diamond Law at all costs. Below is a scenario with character descriptions and content tags. Write a story based on this scenario.
+Scenario: The story is about James, blabla.
+James is an overweight 63 year old blabla.
+Lucy: James's 62 year old wife.
+Tags: tag1, tag2, tag3, ...
+```
+MMLU-Pro Benchmark: model overall is higher than the instruct base, but it loses in specific categories.
+```
+Llama3.1 70B Instruct base:
+| overall | biology | business | chemistry | computer science | economics | engineering | health | history |  law  | math  | philosophy | physics | psychology | other |
+| ------- | ------- | -------- | --------- | ---------------- | --------- | ----------- | ------ | ------- | ----- | ----- | ---------- | ------- | ---------- | ----- |
+|   58.64 |   73.91 |    60.00 |     61.11 |            69.23 |     70.37 |       51.61 |  57.69 |   66.67 | 51.43 | 55.81 |      68.75 |   51.22 |      48.00 | 58.62 |
+|     224 |      17 |       15 |        22 |                9 |        19 |          16 |     15 |       8 |    18 |    24 |         11 |      21 |         12 |    17 |
+|     382 |      23 |       25 |        36 |               13 |        27 |          31 |     26 |      12 |    35 |    43 |         16 |      41 |         25 |    29 |
+Sunfall v0.6.1:
+| overall | biology | business | chemistry | computer science | economics | engineering | health | history |  law  | math  | philosophy | physics | psychology | other |
+| ------- | ------- | -------- | --------- | ---------------- | --------- | ----------- | ------ | ------- | ----- | ----- | ---------- | ------- | ---------- | ----- |
+|   60.73 |   78.26 |    60.00 |     55.56 |            69.23 |     70.37 |       64.52 |  65.38 |   75.00 | 42.86 | 62.79 |      68.75 |   56.10 |      56.00 | 51.72 |
+|     232 |      18 |       15 |        20 |                9 |        19 |          20 |     17 |       9 |    15 |    27 |         11 |      23 |         14 |    15 |
+|     382 |      23 |       25 |        36 |               13 |        27 |          31 |     26 |      12 |    35 |    43 |         16 |      41 |         25 |    29 |
+```
+The above benchmark output is with temp 0 and no other helping samplers. The model on its own is strong, but it gets more easily confused than the base instruct model.
+Probably because I traumatized it with my vile dataset. Who knows.