Technoculture
/

MedMerge-6-7b-alpha-dpo

4-bit precision

Model card Files Files and versions Community

Edit model card

Technoculture/MedMerge-6-7b-alpha-dpo

Open LLM Leaderboard

Model Name	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K
Orca-2-7b	78.4	76.1	53.7	52.4	74.2	47.2
LLAMA-2-7b	43.2	77.1	44.4	38.7	69.5	16
MT7Bi-sft	54.1	75.11	-	43.08	72.14	15.54
MedMerge-6-7b	29.52	41.04	-	37.53	59.35	0.91
MedMerge-6-7b-alpha-dpo	54.27	75.6	52.65	43.94	71.03	26.16

Training Details

GPU: Nvidia A100 Tensor Core GPU
Total Batches: 4266
Epochs: 3
Duration: 3 hours, 57 minutes, and 00 seconds

DPO Training Dataset Mixture

Dataset Name	Original Size(Rows)	Ratio	Size After Ratio(Rows)
argilla/distilabel-math-preference-dpo	2.4k	1.0	2.4k
argilla/distilabel-intel-orca-dpo-pairs	12.9k	0.5	6.45k
jondurbin/truthy-dpo-v0.1	1.04k	1.0	1.04k
argilla/distilabel-capybara-dpo-7k-binarized	7.5k	0.2	1.5k
Total Size: 11.38k

Training Loss Plot

Training Loss Smoothed Plot

For full details of this dpo-training please read our notebook.

Downloads last month: 10

Inference API

Unable to determine this model’s pipeline type. Check the docs .

Model tree for Technoculture/MedMerge-6-7b-alpha-dpo

Base model

Technoculture/MT7Bi-sft

Adapter

(2)

this model

Datasets used to train Technoculture/MedMerge-6-7b-alpha-dpo

Collection including Technoculture/MedMerge-6-7b-alpha-dpo

Medical Merges

Playful merges that try to improve small medical LMs by merging them with models with higher reasoning capabilities. • 35 items • Updated Mar 5 • 3