File size: 10,382 Bytes

---
language:
- en
license: apache-2.0
tags:
- axolotl
- generated_from_trainer
base_model: pszemraj/Mistral-7B-v0.3-prune6
datasets:
- BEE-spoke-data/knowledge-inoc-concat-v1
model-index:
- name: Mistral-v0.3-6B
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 45.14
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 71.65
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 51.83
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 45.64
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 72.77
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 8.34
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 24.54
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 13.52
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 0.83
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 2.01
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 6.61
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 12.7
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Mistral-v0.3-6B
      name: Open LLM Leaderboard
---


# Mistral-v0.3-6B

Brief continued pretraining @ ctx 4096 to 'heal' the layer-pruning.

## Model description

This model is a fine-tuned version of [pszemraj/Mistral-7B-v0.3-prune6](https://huggingface.co/pszemraj/Mistral-7B-v0.3-prune6) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 1.2860

[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.4.0`
```yaml
base_model: pszemraj/Mistral-7B-v0.3-prune6
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer

strict: false
seed: 80085
max_steps: 2000
# dataset
datasets:
    - path: BEE-spoke-data/knowledge-inoc-concat-v1
      name: smorgasbord-tb-quality
      type: completion 
      field: text 
val_set_size: 0.01

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: false
train_on_inputs: false
group_by_length: false

# WANDB
wandb_project: llama3-pruning
wandb_entity: pszemraj
wandb_watch: gradients
wandb_name: Mistral-6B-v0.3-v0.1-ii
hub_model_id: pszemraj/Mistral-v0.3-6B-ii
hub_strategy: every_save

gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_32bit
weight_decay: 0.1
lr_scheduler: cosine
learning_rate: 2e-5
warmup_ratio: 0.1

load_in_8bit: false
load_in_4bit: false
bfloat16: true
tf32: true

flash_attention: true
torch_compile: true 
torch_compile_backend: inductor 
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

# hyperparams for freq of evals, saving, etc
evals_per_epoch: 5
saves_per_epoch: 5
save_safetensors: true
save_total_limit: 1
output_dir: /workspace/output-axolotl/output-model-6b
logging_steps: 6

deepspeed:

special_tokens:

```

</details><br>

## Quick eval

Quick eval for:	pszemraj/Mistral-v0.3-6B-ii


bootstrapping for stddev: perplexity
hf (pretrained=pszemraj/Mistral-v0.3-6B-ii,trust_remote_code=True,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 2
|    Tasks     |Version|Filter|n-shot|  Metric  |Value |   |Stderr|
|--------------|------:|------|-----:|----------|-----:|---|-----:|
|arc_easy      |      1|none  |     0|acc       |0.7109|±  |0.0093|
|              |       |none  |     0|acc_norm  |0.6654|±  |0.0097|
|boolq         |      2|none  |     0|acc       |0.7930|±  |0.0071|
|lambada_openai|      1|none  |     0|perplexity|4.9892|±  |0.1269|
|              |       |none  |     0|acc       |0.6746|±  |0.0065|
|openbookqa    |      1|none  |     0|acc       |0.2460|±  |0.0193|
|              |       |none  |     0|acc_norm  |0.3700|±  |0.0216|
|piqa          |      1|none  |     0|acc       |0.7350|±  |0.0103|
|              |       |none  |     0|acc_norm  |0.7350|±  |0.0103|
|winogrande    |      1|none  |     0|acc       |0.6930|±  |0.0130|



## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 80085
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 200
- training_steps: 2000

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| No log        | 0.0002 | 1    | 1.5980          |
| 1.578         | 0.0955 | 400  | 1.4028          |
| 1.5828        | 0.1911 | 800  | 1.3809          |
| 1.4355        | 0.2866 | 1200 | 1.3152          |
| 1.4618        | 0.3822 | 1600 | 1.2877          |
| 1.4551        | 0.4777 | 2000 | 1.2860          |


### Framework versions

- Transformers 4.40.2
- Pytorch 2.3.0+cu118
- Datasets 2.19.1
- Tokenizers 0.19.1
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_pszemraj__Mistral-v0.3-6B)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |49.23|
|AI2 Reasoning Challenge (25-Shot)|45.14|
|HellaSwag (10-Shot)              |71.65|
|MMLU (5-Shot)                    |51.83|
|TruthfulQA (0-shot)              |45.64|
|Winogrande (5-shot)              |72.77|
|GSM8k (5-shot)                   | 8.34|


# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_pszemraj__Mistral-v0.3-6B)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |10.03|
|IFEval (0-Shot)    |24.54|
|BBH (3-Shot)       |13.52|
|MATH Lvl 5 (4-Shot)| 0.83|
|GPQA (0-shot)      | 2.01|
|MuSR (0-shot)      | 6.61|
|MMLU-PRO (5-shot)  |12.70|