Transformers
Not-For-All-Audiences
Inference Endpoints
rAIfle's picture
Create README.md
7072d4a verified
|
raw
history blame
5.74 kB
metadata
license: llama3
license_name: llama3
license_link: LICENSE
library_name: transformers
tags:
  - not-for-all-audiences
datasets:
  - crestf411/LimaRP-DS
  - AI-MO/NuminaMath-CoT
  e88 88e                               d8     
 d888 888b  8888 8888  ,"Y88b 888 8e   d88     
C8888 8888D 8888 8888 "8" 888 888 88b d88888   
 Y888 888P  Y888 888P ,ee 888 888 888  888     
  "88 88"    "88 88"  "88 888 888 888  888     
      b                                        
      8b,                                      
 
  e88'Y88                  d8           888    
 d888  'Y  ,"Y88b 888,8,  d88    ,e e,  888    
C8888     "8" 888 888 "  d88888 d88 88b 888    
 Y888  ,d ,ee 888 888     888   888   , 888    
  "88,d88 "88 888 888     888    "YeeP" 888    
                                               
PROUDLY PRESENTS         

L3.1-70B-sunfall-v0.6.1-exl2-longcal

Quantized using 115 rows of 8192 tokens from the default ExLlamav2-calibration dataset.

Branches:

  • main -- measurement.json
  • 6b8h -- 6bpw, 8bit lm_head
  • 4.65b6h -- 4.65bpw, 6bit lm_head
  • 4.5b6h -- 4.5bpw, 6bit lm_head
  • 2.25b6h -- 2.25bpw, 6bit lm_head

Original model link: crestf411/L3.1-70B-sunfall-v0.6.1

Original model README below.


Sunfall (2024-07-31) v0.6.1 on top of Meta's Llama-3 70B Instruct.

NOTE: This model requires a slightly lower temperature than usual. Recommended starting point in Silly Tavern are:

  • Temperature: 1.2
  • MinP: 0.06
  • Optional DRY: 0.8 1.75 2 0

General heuristic:

  • Lots of slop: temperature is too low. Raise it.
  • Model is making mistakes about subtle or obvious details in the scene: temperature is too high. Lower it.

Mergers/fine-tuners: there is a LoRA of this model. Consider merging that instead of merging this model.

To use lore book tags (example), make sure you use Status: Blue (constant) and write e.g.

Follow the Diamond Law at all costs.

Tags: humor, dark, complex storytelling, intricate characters, immersive.

sunfall-standard-sfw.png

This model has been trained on context that mimics that of Silly Tavern's Llama3-instruct preset, with the following settings:

System Prompt:

You are an expert actor that can fully immerse yourself into any role given. You do not break character for any reason. Currently your role is {{char}}, which is described in detail below. As {{char}}, continue the exchange with {{user}}.

The card has also been trained on content which includes a narrator card, which was used when the content did not mainly revolve around two characters. Future versions will expand on this idea, so forgive the vagueness at this time.

(The Diamond Law is this, although new rules were added: https://files.catbox.moe/d15m3g.txt -- So far results are unclear, but the training was done with this phrase included, and the training data adheres to the law.)

The model has also been trained to do storywriting. The system message ends up looking something like this:

You are an expert storyteller, who can roleplay or write compelling stories. Follow the Diamond Law at all costs. Below is a scenario with character descriptions and content tags. Write a story based on this scenario.

Scenario: The story is about James, blabla.

James is an overweight 63 year old blabla.

Lucy: James's 62 year old wife.

Tags: tag1, tag2, tag3, ...

MMLU-Pro Benchmark: model overall is higher than the instruct base, but it loses in specific categories.

Llama3.1 70B Instruct base:

| overall | biology | business | chemistry | computer science | economics | engineering | health | history |  law  | math  | philosophy | physics | psychology | other |
| ------- | ------- | -------- | --------- | ---------------- | --------- | ----------- | ------ | ------- | ----- | ----- | ---------- | ------- | ---------- | ----- |
|   58.64 |   73.91 |    60.00 |     61.11 |            69.23 |     70.37 |       51.61 |  57.69 |   66.67 | 51.43 | 55.81 |      68.75 |   51.22 |      48.00 | 58.62 |
|     224 |      17 |       15 |        22 |                9 |        19 |          16 |     15 |       8 |    18 |    24 |         11 |      21 |         12 |    17 |
|     382 |      23 |       25 |        36 |               13 |        27 |          31 |     26 |      12 |    35 |    43 |         16 |      41 |         25 |    29 |

Sunfall v0.6.1:

| overall | biology | business | chemistry | computer science | economics | engineering | health | history |  law  | math  | philosophy | physics | psychology | other |
| ------- | ------- | -------- | --------- | ---------------- | --------- | ----------- | ------ | ------- | ----- | ----- | ---------- | ------- | ---------- | ----- |
|   60.73 |   78.26 |    60.00 |     55.56 |            69.23 |     70.37 |       64.52 |  65.38 |   75.00 | 42.86 | 62.79 |      68.75 |   56.10 |      56.00 | 51.72 |
|     232 |      18 |       15 |        20 |                9 |        19 |          20 |     17 |       9 |    15 |    27 |         11 |      23 |         14 |    15 |
|     382 |      23 |       25 |        36 |               13 |        27 |          31 |     26 |      12 |    35 |    43 |         16 |      41 |         25 |    29 |

The above benchmark output is with temp 0 and no other helping samplers. The model on its own is strong, but it gets more easily confused than the base instruct model.

Probably because I traumatized it with my vile dataset. Who knows.