Adding Evaluation Results

3fe59c9 verified 4 months ago

4.42 kB

	---
	language:
	- en
	license: cc-by-nc-4.0
	model-index:
	- name: L3-70B-Euryale-v2.1
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 73.84
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/L3-70B-Euryale-v2.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 48.7
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/L3-70B-Euryale-v2.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 20.85
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/L3-70B-Euryale-v2.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 10.85
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/L3-70B-Euryale-v2.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 12.25
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/L3-70B-Euryale-v2.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 45.6
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/L3-70B-Euryale-v2.1
	name: Open LLM Leaderboard
	---

	![Euryale](https://images7.alphacoders.com/921/921311.jpg)

	She's back!

	Stheno's Sister Model, designed to impress.

	```
	- Same Dataset used as Stheno v3.2 -> See notes there.
	- LoRA Fine-Tune -> FFT is simply too expensive.
	- Trained over 8x H100 SXMs and then some more afterwards.
	```

	Testing Notes
	```
	- Better prompt adherence.
	- Better anatomy / spatial awareness.
	- Adapts much better to unique and custom formatting / reply formats.
	- Very creative, lots of unique swipes.
	- Is not restrictive during roleplays.
	- Feels like a big brained version of Stheno.
	```

	Likely due to it being a 70B model instead of 8B. Similar vibes comparing back in llama 2, where 70B models were simply much more 'aware' in the subtler areas and contexts a smaller model like a 7B or 13B simply were not able to handle.

	---

	Recommended Sampler Settings:
	```
	Temperature - 1.17
	min_p - 0.075
	Repetition Penalty - 1.10
	```

	SillyTavern Instruct Settings:
	<br>Context Template: Llama-3-Instruct-Names
	<br>Instruct Presets: [Euryale-v2.1-Llama-3-Instruct](https://huggingface.co/Sao10K/L3-70B-Euryale-v2.1/blob/main/Euryale-v2.1-Llama-3-Instruct.json)

	---

	As per usual, support me here:

	Ko-fi: https://ko-fi.com/sao10k

	```
	Art by wada_kazu / わだかず (pixiv page private?)
	```
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Sao10K__L3-70B-Euryale-v2.1)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|35.35\|
	\|IFEval (0-Shot) \|73.84\|
	\|BBH (3-Shot) \|48.70\|
	\|MATH Lvl 5 (4-Shot)\|20.85\|
	\|GPQA (0-shot) \|10.85\|
	\|MuSR (0-shot) \|12.25\|
	\|MMLU-PRO (5-shot) \|45.60\|