NarrativeNexus_7B / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
c4b4496 verified
|
raw
history blame
5.21 kB
metadata
license: other
library_name: transformers
tags:
  - mergekit
  - merge
base_model:
  - jeiku/Cookie_7B
  - jeiku/SpaghettiOs_7B
  - jeiku/Rainbow_69_7B
  - jeiku/Paranoid_Android_7B
model-index:
  - name: NarrativeNexus_7B
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 66.13
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jeiku/NarrativeNexus_7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 85.74
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jeiku/NarrativeNexus_7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 63.17
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jeiku/NarrativeNexus_7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 63.95
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jeiku/NarrativeNexus_7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 79.01
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jeiku/NarrativeNexus_7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 51.78
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jeiku/NarrativeNexus_7B
          name: Open LLM Leaderboard

Nexus

This is my new favorite 7B, made from a merge of tunes and merges that I've tossed together over the last week or so. This model seems to be greater than the sum of its parts, and is performing well in riddle testing and markdown role playing. I have also been using this model to generate 1000 token narratives that I am using to improve custom story datasets for use with future models. It is highly descriptive and readily fills a futanari character. You can likely utilize it for female or male characters as well. Enjoy!

GGUF here: https://huggingface.co/jeiku/NarrativeNexus_7B_GGUF

image/jpeg

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the DARE TIES merge method using jeiku/Cookie_7B as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

merge_method: dare_ties
base_model: jeiku/Cookie_7B
parameters:
  normalize: true
models:
  - model: jeiku/SpaghettiOs_7B
    parameters:
      weight: 1
  - model: jeiku/Rainbow_69_7B
    parameters:
      weight: 1
  - model: jeiku/Paranoid_Android_7B
    parameters:
      weight: 0.75            
dtype: float16

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 68.30
AI2 Reasoning Challenge (25-Shot) 66.13
HellaSwag (10-Shot) 85.74
MMLU (5-Shot) 63.17
TruthfulQA (0-shot) 63.95
Winogrande (5-shot) 79.01
GSM8k (5-shot) 51.78