|
--- |
|
license: cc-by-nc-4.0 |
|
datasets: |
|
- pankajmathur/orca_mini_v1_dataset |
|
- openai/summarize_from_feedback |
|
- PygmalionAI/PIPPA |
|
- chargoddard/rpguild |
|
- lemonilia/LimaRP |
|
- PKU-Alignment/PKU-SafeRLHF |
|
- Intel/orca_dpo_pairs |
|
- allenai/ultrafeedback_binarized_cleaned |
|
tags: |
|
- merge |
|
- mergekit |
|
--- |
|
|
|
Another experiment in the line of [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7). |
|
|
|
Steps taken to produce this model: |
|
|
|
* Train loyal-piano-m7 |
|
* cDPO with HuggingFaceH4/ultrafeedback_binarized to produce loyal-piano-m7-cdpo |
|
* Train another model with different sampling of the same source datasets as loyal-piano, let's call it servile-harpsichord |
|
* cDPO servile-harpsichord with allenai/ultrafeedback_binarized_cleaned, Intel/orca_dpo_pairs, and a helpfulness-only version of PKU-Alignment/PKU-SafeRLHF |
|
* TIES merge several checkpoints of servile-harpsichord-cdpo with loyal-piano-m7-cdpo |
|
|
|
Local benchmarks show the result to be better than any of the individual components. Let's see if that holds up! |
|
|
|
Trained using the Alpaca prompt format. |
|
|
|
|
|
Configuration for final merge: |
|
```yml |
|
models: |
|
- model: chargoddard/loyal-piano-m7-cdpo |
|
parameters: |
|
density: 1.0 |
|
weight: 1.0 |
|
- model: /home/ubuntu/servile-harpsichord-cdpo/checkpoint-4186 |
|
parameters: |
|
weight: 0.1 |
|
- model: /home/ubuntu/servile-harpsichord-cdpo/checkpoint-5796 |
|
parameters: |
|
weight: 0.2 |
|
- model: /home/ubuntu/servile-harpsichord-cdpo/checkpoint-6118 |
|
parameters: |
|
weight: 0.3 |
|
- model: /home/ubuntu/servile-harpsichord-cdpo/final |
|
parameters: |
|
weight: 0.4 |
|
merge_method: ties |
|
base_model: mistralai/Mistral-7B-v0.1 |
|
dtype: bfloat16 |
|
parameters: |
|
density: 0.4 |
|
normalize: true |
|
int8_mask: true |
|
``` |