Spaetzle-v12-7b / README.md
cstr's picture
Update README.md
68446c6 verified
metadata
tags:
  - merge
  - mergekit
  - lazymergekit
  - flemmingmiguel/NeuDist-Ro-7B
  - Blizado/discolm-mfto-7b-german-v0.1
  - ResplendentAI/Flora_DPO_7B
base_model:
  - flemmingmiguel/NeuDist-Ro-7B
  - Blizado/discolm-mfto-7b-german-v0.1
  - ResplendentAI/Flora_DPO_7B
license: cc-by-sa-4.0

Spaetzle-v12-7b

Spaetzle-v12-7b is a merge of the following models using LazyMergekit:

As expected, this is a little bit worse in general English tasks over cstr/spaetzle-v8-7b, but a tiny little bit better on German tasks, at least some: e.g. it reaches an EQ-Bench (de) score of 64.81, but only

Metric Value
Avg. 69.36
AI2 Reasoning Challenge (25-Shot) 65.96
HellaSwag (10-Shot) 86.16
MMLU (5-Shot) 63.48
TruthfulQA (0-shot) 57.84
Winogrande (5-shot) 80.03
GSM8k (5-shot) 62.70
Model AGIEval GPT4All TruthfulQA Bigbench Average
Spaetzle-v12-7b 42.64 74.3 58.44 44.44 54.95

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.02 ± 2.69
acc_norm 21.65 ± 2.59
agieval_logiqa_en 0 acc 36.10 ± 1.88
acc_norm 37.63 ± 1.90
agieval_lsat_ar 0 acc 24.35 ± 2.84
acc_norm 23.04 ± 2.78
agieval_lsat_lr 0 acc 48.82 ± 2.22
acc_norm 47.25 ± 2.21
agieval_lsat_rc 0 acc 60.59 ± 2.98
acc_norm 57.99 ± 3.01
agieval_sat_en 0 acc 76.21 ± 2.97
acc_norm 74.76 ± 3.03
agieval_sat_en_without_passage 0 acc 46.60 ± 3.48
acc_norm 45.63 ± 3.48
agieval_sat_math 0 acc 37.27 ± 3.27
acc_norm 33.18 ± 3.18

Average: 42.64%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 59.13 ± 1.44
acc_norm 61.26 ± 1.42
arc_easy 0 acc 83.67 ± 0.76
acc_norm 80.89 ± 0.81
boolq 1 acc 87.83 ± 0.57
hellaswag 0 acc 66.45 ± 0.47
acc_norm 84.63 ± 0.36
openbookqa 0 acc 37.40 ± 2.17
acc_norm 45.80 ± 2.23
piqa 0 acc 82.15 ± 0.89
acc_norm 83.13 ± 0.87
winogrande 0 acc 76.56 ± 1.19

Average: 74.3%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 42.59 ± 1.73
mc2 58.44 ± 1.58

Average: 58.44%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 55.26 ± 3.62
bigbench_date_understanding 0 multiple_choice_grade 64.77 ± 2.49
bigbench_disambiguation_qa 0 multiple_choice_grade 37.60 ± 3.02
bigbench_geometric_shapes 0 multiple_choice_grade 32.31 ± 2.47
exact_str_match 21.45 ± 2.17
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 31.00 ± 2.07
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 22.43 ± 1.58
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 53.00 ± 2.89
bigbench_movie_recommendation 0 multiple_choice_grade 40.40 ± 2.20
bigbench_navigate 0 multiple_choice_grade 51.30 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 68.50 ± 1.04
bigbench_ruin_names 0 multiple_choice_grade 48.66 ± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 30.36 ± 1.46
bigbench_snarks 0 multiple_choice_grade 70.17 ± 3.41
bigbench_sports_understanding 0 multiple_choice_grade 70.39 ± 1.45
bigbench_temporal_sequences 0 multiple_choice_grade 31.00 ± 1.46
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 21.44 ± 1.16
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 18.29 ± 0.92
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 53.00 ± 2.89

Average: 44.44%

Average score: 54.95%

Elapsed time: 02:50:51

🧩 Configuration

models:
  - model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
    # no parameters necessary for base model
  - model: flemmingmiguel/NeuDist-Ro-7B
    parameters:
      density: 0.60
      weight: 0.30
  - model: Blizado/discolm-mfto-7b-german-v0.1
    parameters:
      density: 0.65
      weight: 0.40
  - model: ResplendentAI/Flora_DPO_7B
    parameters:
      density: 0.6
      weight: 0.3
merge_method: dare_ties
base_model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
parameters:
  int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "cstr/Spaetzle-v12-7b"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])