merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the della_linear merge method using CultriX/Qwen2.5-14B-Wernickev3 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

merge_method: della_linear
base_model: CultriX/Qwen2.5-14B-Wernickev3
dtype: bfloat16
parameters:
  epsilon: 0.01  # Reduced from 0.012 for even finer parameter scaling, enhancing precision in blending.
  lambda: 1.5    # Increased from 1.4 to further emphasize significant model contributions, particularly from specialized models.
  normalize: true # Maintains balanced parameter integration, crucial for stability across diverse benchmarks.

adaptive_merge_parameters:
  task_weights:
    tinyArc: 1.65         # Increased from 1.6 to further boost logical reasoning, leveraging Qwen2.5-14B-Broca's strength.
    tinyHellaswag: 1.55   # Slightly increased from 1.5 to enhance contextual understanding, supported by SeQwence-14Bv1.
    tinyMMLU: 1.7         # Increased from 1.65 for improved domain knowledge, utilizing Qwenfinity-2.5-14B's broad capabilities.
    tinyTruthfulQA: 1.95  # Slightly increased from 1.9 to maximize accurate factual reasoning, with Qwenfinity-2.5-14B's contribution.
    tinyTruthfulQA_mc1: 1.75 # Increased from 1.7 for enhanced multiple-choice reasoning, supported by Qwen2.5-14B-Emergedv3.
    tinyWinogrande: 1.8   # Increased from 1.75 for advanced reasoning and contextual prediction, leveraging Qwen2.5-14B-Broca.
    IFEval: 2.0            # Increased from 1.9 to prioritize instruction-following, with Q2.5-Veltha-14B-0.5's strong performance.
    BBH: 1.75              # Slightly increased from 1.7 for complex reasoning, supported by SeQwence-14B-EvolMerge's strength.
    MATH: 2.2              # Increased from 2.1 to maximize mathematical reasoning, with Qwen2.5-Math-14B-Instruct's specialization.
    GPQA: 1.85             # Increased from 1.8 for enhanced graduate-level QA, leveraging Qwen2.5-14B-Wernicke's capabilities.
    MUSR: 1.95             # Increased from 1.9 for strengthened multi-step reasoning, with Qwen2.5-14B-Vimarckoso's expertise.
    MMLU-PRO: 1.85         # Increased from 1.8 to further boost domain multitask performance, utilizing QwenSlerp6-14B.
  smoothing_factor: 0.08 # Reduced from 0.1 for more precise blending, allowing distinct model strengths to be preserved.

gradient_clipping:
  CultriX/Qwen2.5-14B-Wernickev3: 0.85  # Slightly reduced from 0.86 to allow a bit more contribution from the base model.
  CultriX/Qwenfinity-2.5-14B: 0.82      # Reduced from 0.83 to balance its broad multitask contribution.
  djuna/Q2.5-Veltha-14B-0.5: 0.92       # Slightly increased from 0.91 to allow more contribution in advanced reasoning.
  CultriX/Qwen2.5-14B-Broca: 0.86       # Slightly increased from 0.85 to leverage its logical reasoning strengths.
  qingy2019/Qwen2.5-Math-14B-Instruct: 0.94 # Increased from 0.93 to maximize its mathematical reasoning contribution.
  CultriX/SeQwence-14Bv1: 0.87          # Slightly reduced from 0.88 to balance its generalist multitask support.
  sometimesanotion/Qwen2.5-14B-Vimarckoso: 0.90 # Increased from 0.89 for enhanced multi-step reasoning.
  allknowingroger/QwenSlerp6-14B: 0.86  # Slightly reduced from 0.87 to refine its contextual reasoning integration.

models:
  - model: CultriX/Qwen2.5-14B-Wernickev3
    parameters:
      weight: 0.25       # Slightly reduced from 0.26 to balance with other models while maintaining a strong foundation.
      density: 0.72      # Increased from 0.7 to preserve more of its critical reasoning parameters.
  - model: CultriX/Qwenfinity-2.5-14B
    parameters:
      weight: 0.22       # Slightly reduced from 0.23 for a more balanced contribution across its broad capabilities.
      density: 0.68      # Increased from 0.65 to retain more of its multitask performance.
  - model: djuna/Q2.5-Veltha-14B-0.5
    parameters:
      weight: 0.20       # Reduced from 0.22 to balance its specialized contributions with the overall blend.
      density: 0.75      # Increased from 0.72 to further leverage its strengths in IFEval and advanced reasoning.
  - model: CultriX/Qwen2.5-14B-Broca
    parameters:
      weight: 0.16       # Slightly increased from 0.15 to enhance its logical reasoning and factual QA contributions.
      density: 0.68      # Increased from 0.65 to preserve more of its capabilities in the tiny benchmarks.
  - model: qingy2019/Qwen2.5-Math-14B-Instruct
    parameters:
      weight: 0.19       # Slightly increased from 0.18 to further emphasize mathematical reasoning.
      density: 0.75      # Increased from 0.73 to retain more of its specialized mathematical parameters.
  - model: CultriX/SeQwence-14Bv1
    parameters:
      weight: 0.13       # Slightly reduced from 0.14 to fine-tune its generalist multitask support.
      density: 0.65      # Increased from 0.63 to preserve more of its diverse capabilities.
  - model: sometimesanotion/Qwen2.5-14B-Vimarckoso
    parameters:
      weight: 0.11       # Slightly reduced from 0.12 to balance its multi-step reasoning contributions.
      density: 0.62      # Increased from 0.6 to retain more of its specialized reasoning strengths.
  - model: allknowingroger/QwenSlerp6-14B
    parameters:
      weight: 0.09       # Slightly reduced from 0.1 to refine its contextual reasoning contributions.
      density: 0.65      # Increased from 0.62 to preserve more of its capabilities in MMLU-PRO and contextual tasks.
Downloads last month
20
Safetensors
Model size
14.8B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CultriX/Qwen2.5-14B-Brocav6

Space using CultriX/Qwen2.5-14B-Brocav6 1