license: other
tags:
- storywriting
- text adventure
- not-for-all-audiences
license_name: microsoft-research-license
model-index:
- name: psyonic-cetacean-20B
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 25.44
name: strict accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=jebcarter/psyonic-cetacean-20B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 27.84
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=jebcarter/psyonic-cetacean-20B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 0.98
name: exact match
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=jebcarter/psyonic-cetacean-20B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 3.13
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=jebcarter/psyonic-cetacean-20B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 16.9
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=jebcarter/psyonic-cetacean-20B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 20.95
name: accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=jebcarter/psyonic-cetacean-20B
name: Open LLM Leaderboard
Presenting the FP16 files for Psyonic-Cetacean-20B! This is an experimental Llama2-based stack merge based on the models and recipe below:
slices:
- sources:
- model: Orca2flat
layer_range: [0, 16]
- sources:
- model: LLaMA2-13B-Psyfighter2 (FP16 not yet available)
layer_range: [8, 24]
- sources:
- model: Orca2flat
layer_range: [17, 32]
- sources:
- model: LLaMA2-13B-Psyfighter2 (FP16 not yet available)
layer_range: [25, 40]
merge_method: passthrough
dtype: float16
Note: while we did run an inverted merge the output was not satisfactory and will not be released.
We first flatted the additional ChatML vocabulary tokens out of Orca-2-13B, then performed a stack merge with Psyfighter-2-13B. The results surprised us with their vividness, freshness of prose, obedience to instruction prompting, and formatting cohesion.
This model is focused on storywriting and text adventure, with a side order of Assistant and Chat functionality. Like its ancestor Psyfighter-2 this model will function better if you let it improvise and riff on your concepts rather than feeding it an excess of detail. Additionally, either the removal of the ChatML vocab or the stack merging process itself has resulted in not only an uncensored model but an actively anti-censored model, so please be aware that this model can and will kill you during adventures or output NSFW material if prompted accordingly.
During testing, the model exhibited an especially strong affinity for science fiction and space opera writing, while handling fantasy elements quite well and horror elements slightly less so. Refer to the Psyfighter-2 model card for best prompting practices.
Despite that, we have tested the model out to 16000 context via Rope scaling and the model does not drive towards NSFW on its own. It will follow your tone and style very well.
Please enjoy, and if you encounter anything exciting or weird, please reach out to me at [jebcarter@pm.me].
Special thanks as always to the KoboldAI crew who provided the mergebox, testing, and feedback on this model, and to gelukuMLG for the model mascot!
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 15.87 |
IFEval (0-Shot) | 25.44 |
BBH (3-Shot) | 27.84 |
MATH Lvl 5 (4-Shot) | 0.98 |
GPQA (0-shot) | 3.13 |
MuSR (0-shot) | 16.90 |
MMLU-PRO (5-shot) | 20.95 |