Adding Evaluation Results

1f79a96 verified 2 months ago

4.46 kB

	---
	license: apache-2.0
	library_name: transformers
	base_model:
	- flammenai/Flammades-Mistral-Nemo-12B
	datasets:
	- flammenai/MahouMix-v1
	model-index:
	- name: Mahou-1.5-mistral-nemo-12B
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 67.51
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=flammenai/Mahou-1.5-mistral-nemo-12B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 36.26
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=flammenai/Mahou-1.5-mistral-nemo-12B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 5.06
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=flammenai/Mahou-1.5-mistral-nemo-12B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 3.47
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=flammenai/Mahou-1.5-mistral-nemo-12B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 16.47
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=flammenai/Mahou-1.5-mistral-nemo-12B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 28.91
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=flammenai/Mahou-1.5-mistral-nemo-12B
	name: Open LLM Leaderboard
	---
	![image/png](https://huggingface.co/flammenai/Mahou-1.0-mistral-7B/resolve/main/mahou1.png)

	# Mahou-1.5-mistral-nemo-12B

	Mahou is designed to provide short messages in a conversational context. It is capable of casual conversation and character roleplay.

	### Chat Format

	This model has been trained to use ChatML format.

	```
	<\|im_start\|>system
	{{system}}<\|im_end\|>
	<\|im_start\|>{{char}}
	{{message}}<\|im_end\|>
	<\|im_start\|>{{user}}
	{{message}}<\|im_end\|>
	```

	### Roleplay Format

	- Speech without quotes.
	- Actions in `asterisks`

	```
	leans against wall cooly so like, i just casted a super strong spell at magician academy today, not gonna lie, felt badass.
	```

	### SillyTavern Settings

	1. Use ChatML for the Context Template.
	2. Enable Instruct Mode.
	3. Use the [Mahou ChatML Instruct preset](https://huggingface.co/datasets/flammenai/Mahou-ST-ChatML-Instruct/raw/main/Mahou.json).
	4. Use the [Mahou Sampler preset](https://huggingface.co/datasets/flammenai/Mahou-ST-Sampler-Preset/raw/main/Mahou.json).

	### Method

	[ORPO finetuned](https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html) with 4x H100 for 3 epochs.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_flammenai__Mahou-1.5-mistral-nemo-12B)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|26.28\|
	\|IFEval (0-Shot) \|67.51\|
	\|BBH (3-Shot) \|36.26\|
	\|MATH Lvl 5 (4-Shot)\| 5.06\|
	\|GPQA (0-shot) \| 3.47\|
	\|MuSR (0-shot) \|16.47\|
	\|MMLU-PRO (5-shot) \|28.91\|