---
license: apache-2.0
tags:
- axolotl
- dpo
- trl
- generated_from_trainer
base_model: mistralai/Mistral-Nemo-Instruct-2407
model-index:
- name: Humanish-Mistral-Nemo-Instruct-2407
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 54.51
name: strict accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 32.71
name: normalized accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 7.63
name: exact match
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 5.03
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 9.4
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 28.01
name: accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
name: Open LLM Leaderboard
---
[](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config
axolotl version: `0.4.1`
```yaml
base_model: mistralai/Mistral-Nemo-Instruct-2407
model_type: MistralForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: true
load_in_4bit: false
strict: false
chat_template: inst
rl: dpo
datasets:
- path: HumanLLMs/humanish-dpo-project
type: chatml.prompt_pairs
conversation: mistral
dataset_prepared_path: last_run_prepared
val_set_size: 0.05
output_dir: ./humanish-mistral-nemo-instruct-2407
sequence_len: 8192
sample_packing: false
pad_to_sequence_len: true
adapter: lora
lora_model_dir:
lora_r: 8
lora_alpha: 4
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project: Humanish-DPO
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
hub_model_id: HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:
warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token:
save_safetensors: true
```
# Humanish-Mistral-Nemo-Instruct-2407
This model is a fine-tuned version of [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) on an unknown dataset.
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 2
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- training_steps: 341
### Training results
### Framework versions
- PEFT 0.13.0
- Transformers 4.45.1
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.20.0
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HumanLLMs__Humanish-Mistral-Nemo-Instruct-2407)
| Metric |Value|
|-------------------|----:|
|Avg. |22.88|
|IFEval (0-Shot) |54.51|
|BBH (3-Shot) |32.71|
|MATH Lvl 5 (4-Shot)| 7.63|
|GPQA (0-shot) | 5.03|
|MuSR (0-shot) | 9.40|
|MMLU-PRO (5-shot) |28.01|