Verah/mistral-japanese-stabalelm-merge

This is a linear model merge of:

60% https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

40% https://huggingface.co/stabilityai/japanese-stablelm-instruct-gamma-7b

I recommend following the Mistral chat template and prompting in English.

Evaluation

Tested on correct en-jp translation identification on the first 10k rows of https://huggingface.co/datasets/Verah/tatoeba_dedupe_en-jp_2024-March-01

Desired behaviour is to not accept any translation when we deliberaly test incorrect pairings from the dataset, and to not reject any translation when shown only correctly paired examples.

Model	False Admissions	False Rejections
Mistral Instruct	41	600
(This Model)	13	1839
JP Stable LM Gamma	9679	138
Hermes2DPO	20	598

I made the test harder by concatenating 3 paired sentences together, in the false admissions case 1 out of those 3 was incorrectly paired.

Model	False Admissions	False* Rejections
(This Model)	89	5508
Hermes2DPO	537	1458

This model also wanted to reject many "correct" translations, however 3 unrelated sentences back to back isn't a very correct thing to be doing, either.