metadata

language:
  - en
license: apache-2.0
tags:
  - axolotl
  - generated_from_trainer
base_model: pszemraj/Mistral-7B-v0.3-prune6
datasets:
  - BEE-spoke-data/knowledge-inoc-concat-v1
model-index:
  - name: Mistral-v0.3-6B
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 45.14
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 71.65
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 51.83
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 45.64
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 72.77
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 8.34
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 24.54
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 13.52
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 0.83
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 2.01
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 6.61
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 12.7
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
          name: Open LLM Leaderboard

Mistral-v0.3-6B

Brief continued pretraining @ ctx 4096 to 'heal' the layer-pruning.

Model description

This model is a fine-tuned version of pszemraj/Mistral-7B-v0.3-prune6 on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.2860

See axolotl config

axolotl version: 0.4.0

base_model: pszemraj/Mistral-7B-v0.3-prune6
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer

strict: false
seed: 80085
max_steps: 2000
# dataset
datasets:
    - path: BEE-spoke-data/knowledge-inoc-concat-v1
      name: smorgasbord-tb-quality
      type: completion 
      field: text 
val_set_size: 0.01

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: false
train_on_inputs: false
group_by_length: false

# WANDB
wandb_project: llama3-pruning
wandb_entity: pszemraj
wandb_watch: gradients
wandb_name: Mistral-6B-v0.3-v0.1-ii
hub_model_id: pszemraj/Mistral-v0.3-6B-ii
hub_strategy: every_save

gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_32bit
weight_decay: 0.1
lr_scheduler: cosine
learning_rate: 2e-5
warmup_ratio: 0.1

load_in_8bit: false
load_in_4bit: false
bfloat16: true
tf32: true

flash_attention: true
torch_compile: true 
torch_compile_backend: inductor 
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

# hyperparams for freq of evals, saving, etc
evals_per_epoch: 5
saves_per_epoch: 5
save_safetensors: true
save_total_limit: 1
output_dir: /workspace/output-axolotl/output-model-6b
logging_steps: 6

deepspeed:

special_tokens:

Quick eval

Quick eval for: pszemraj/Mistral-v0.3-6B-ii

bootstrapping for stddev: perplexity hf (pretrained=pszemraj/Mistral-v0.3-6B-ii,trust_remote_code=True,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 2

Tasks	Version	Filter	Metric	Value		Stderr
arc_easy	1	none	acc	0.7109	±	0.0093
		none	acc_norm	0.6654	±	0.0097
boolq	2	none	acc	0.7930	±	0.0071
lambada_openai	1	none	perplexity	4.9892	±	0.1269
		none	acc	0.6746	±	0.0065
openbookqa	1	none	acc	0.2460	±	0.0193
		none	acc_norm	0.3700	±	0.0216
piqa	1	none	acc	0.7350	±	0.0103
		none	acc_norm	0.7350	±	0.0103
winogrande	1	none	acc	0.6930	±	0.0130

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 1
eval_batch_size: 1
seed: 80085
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 200
training_steps: 2000

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.0002	1	1.5980
1.578	0.0955	400	1.4028
1.5828	0.1911	800	1.3809
1.4355	0.2866	1200	1.3152
1.4618	0.3822	1600	1.2877
1.4551	0.4777	2000	1.2860

Framework versions

Transformers 4.40.2
Pytorch 2.3.0+cu118
Datasets 2.19.1
Tokenizers 0.19.1

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	49.23
AI2 Reasoning Challenge (25-Shot)	45.14
HellaSwag (10-Shot)	71.65
MMLU (5-Shot)	51.83
TruthfulQA (0-shot)	45.64
Winogrande (5-shot)	72.77
GSM8k (5-shot)	8.34

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	10.03
IFEval (0-Shot)	24.54
BBH (3-Shot)	13.52
MATH Lvl 5 (4-Shot)	0.83
GPQA (0-shot)	2.01
MuSR (0-shot)	6.61
MMLU-PRO (5-shot)	12.70