BigWeave Viable
Collection
All viable models from the BigWeave series, excluding quantized versions
•
7 items
•
Updated
The BigWeave models aim to experimentally identify merge settings for increasing model performance. The version number merely tracks various attempts and is not a quality indicator. Only results demonstrating good performance are retained and shared.
Mistral, Vicuna and Alpaca.
This is a merge of 152334H/miqu-1-70b-sf and lizpreciatior/lzlv_70b_fp16_hf. By conducting exl2 measurements, we identify the least important layers of lzlv. These least important layers are extended with layers in-between to create longer series of consecutive layers. These slices are then inserted into miqu.
Merge configuration:
slices:
- sources:
- model: 152334H/miqu-1-70b-sf
layer_range: [0, 1]
- model: lizpreciatior/lzlv_70b_fp16_hf
layer_range: [0, 1]
parameters:
weight: 0
- sources:
- model: 152334H/miqu-1-70b-sf
layer_range: [1,26]
- sources:
- model: lizpreciatior/lzlv_70b_fp16_hf
layer_range: [9,44]
- sources:
- model: 152334H/miqu-1-70b-sf
layer_range: [27,52]
- sources:
- model: lizpreciatior/lzlv_70b_fp16_hf
layer_range: [45,60]
- sources:
- model: 152334H/miqu-1-70b-sf
layer_range: [53,79]
- sources:
- model: 152334H/miqu-1-70b-sf
layer_range: [79, 80]
- model: lizpreciatior/lzlv_70b_fp16_hf
layer_range: [79, 80]
parameters:
weight: 0
merge_method: linear
parameters:
weight: 1.0
dtype: float16
tokenizer_source: 152334H/miqu-1-70b-sf
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 68.03 |
AI2 Reasoning Challenge (25-Shot) | 68.17 |
HellaSwag (10-Shot) | 88.54 |
MMLU (5-Shot) | 70.51 |
TruthfulQA (0-shot) | 62.47 |
Winogrande (5-shot) | 82.08 |
GSM8k (5-shot) | 36.39 |