Mistral-7B-OpenOrca-lora-merged

This is a test.

This is a regenerated model that combines the base model Mistral-7B-v0.1 with the LoRA model Mistral-7B-OpenOrca-lora.

This LoRA model is extracted from the efficient parameter fine-tuned model (Mistral-7B-OpenOra), and now it needs to be verified whether this LoRA model can achieve comparable performance with the original model.

The final goal is to create a toolkit that can simultaneously load multiple LoRA modules, and automatically switch to the appropriate combination of LoRA modules based on user queries to generate the best answer.

The source code is here

Mistral-7B-OpenOrca

Local Test

ARC_acc_norm (25-shot) HellaSwag_acc_norm (10-shot) MMLU_acc (5-shot) TruthfulQA_mc2 (0-shot) GSM8K_acc (8-shot) Open LLM Score
Mistral-7B-OpenOrca 71 83 61.42 45 40 65.11
r=256 68 84 64.28 46.953 41 65.81
r=64 67 84 64.26 47.32 41 65.65
r=16 65 83 62.84 46.95 38 64.45

Open LLM Leaderboard

ARC_acc_norm (25-shot) HellaSwag_acc_norm (10-shot) MMLU_acc (5-shot) TruthfulQA_mc2 (0-shot) Open LLM Score
Mistral-7B-SlimOrca 62.54 83.86 62.77 54.23 65.85
Mistral-7B-OpenOrca 64.08 83.99 62.24 53.05 65.84

lm-evaluation-harness

Open LLM Leaderboard

Metric Mistral-7B-OpenOrca Mistral-7B-OpenOrca-lora Mistral-7B-OpenOrca-lora-merged
ARC 64.08
HellaSwag 83.99
MMLU 62.24
TruthfulQA 53.05
Average 65.84

HumanEval

Metric Mistral-7B-OpenOrca Mistral-7B-OpenOrca-lora Mistral-7B-OpenOrca-lora-merged
humaneval-python 35.976

Training procedure

The following bitsandbytes quantization config was used during training:

  • quant_method: bitsandbytes
  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: bfloat16

Framework versions

  • PEFT 0.5.0
Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results