spow12 leaderboard-pr-bot commited on
Commit
5e2a89a
1 Parent(s): 2e7f1bf

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (8de051f9b8ee6bcadce8fb3f4d9ae4c7d90f02a4)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +117 -9
README.md CHANGED
@@ -1,10 +1,4 @@
1
  ---
2
- base_model:
3
- - mistralai/Mistral-Nemo-Instruct-2407
4
- - NeverSleep/Lumimaid-v0.2-12B
5
- - Epiculous/Violet_Twilight-v0.1
6
- - Sao10K/MN-12B-Lyra-v4
7
- - anthracite-org/magnum-v2-12b
8
  language:
9
  - en
10
  - fr
@@ -16,14 +10,115 @@ language:
16
  - zh
17
  - ja
18
  license: cc-by-nc-4.0
19
- pipeline_tag: text-generation
20
  tags:
21
  - nsfw
22
  - Visual novel
23
  - roleplay
24
  - mergekit
25
  - merge
26
- library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ---
28
 
29
  # Model Card for Model ID
@@ -158,4 +253,17 @@ This repository can use Visual novel-based RAG, but i will not distribute it yet
158
  url = { https://huggingface.co/spow12/ChatWaifu_v1.4 },
159
  publisher = { Hugging Face }
160
  }
161
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
 
2
  language:
3
  - en
4
  - fr
 
10
  - zh
11
  - ja
12
  license: cc-by-nc-4.0
13
+ library_name: transformers
14
  tags:
15
  - nsfw
16
  - Visual novel
17
  - roleplay
18
  - mergekit
19
  - merge
20
+ base_model:
21
+ - mistralai/Mistral-Nemo-Instruct-2407
22
+ - NeverSleep/Lumimaid-v0.2-12B
23
+ - Epiculous/Violet_Twilight-v0.1
24
+ - Sao10K/MN-12B-Lyra-v4
25
+ - anthracite-org/magnum-v2-12b
26
+ pipeline_tag: text-generation
27
+ model-index:
28
+ - name: ChatWaifu_v1.4
29
+ results:
30
+ - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ name: IFEval (0-Shot)
35
+ type: HuggingFaceH4/ifeval
36
+ args:
37
+ num_few_shot: 0
38
+ metrics:
39
+ - type: inst_level_strict_acc and prompt_level_strict_acc
40
+ value: 56.91
41
+ name: strict accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=spow12/ChatWaifu_v1.4
44
+ name: Open LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: BBH (3-Shot)
50
+ type: BBH
51
+ args:
52
+ num_few_shot: 3
53
+ metrics:
54
+ - type: acc_norm
55
+ value: 31.63
56
+ name: normalized accuracy
57
+ source:
58
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=spow12/ChatWaifu_v1.4
59
+ name: Open LLM Leaderboard
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: MATH Lvl 5 (4-Shot)
65
+ type: hendrycks/competition_math
66
+ args:
67
+ num_few_shot: 4
68
+ metrics:
69
+ - type: exact_match
70
+ value: 7.85
71
+ name: exact match
72
+ source:
73
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=spow12/ChatWaifu_v1.4
74
+ name: Open LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: GPQA (0-shot)
80
+ type: Idavidrein/gpqa
81
+ args:
82
+ num_few_shot: 0
83
+ metrics:
84
+ - type: acc_norm
85
+ value: 7.61
86
+ name: acc_norm
87
+ source:
88
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=spow12/ChatWaifu_v1.4
89
+ name: Open LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: MuSR (0-shot)
95
+ type: TAUR-Lab/MuSR
96
+ args:
97
+ num_few_shot: 0
98
+ metrics:
99
+ - type: acc_norm
100
+ value: 20.03
101
+ name: acc_norm
102
+ source:
103
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=spow12/ChatWaifu_v1.4
104
+ name: Open LLM Leaderboard
105
+ - task:
106
+ type: text-generation
107
+ name: Text Generation
108
+ dataset:
109
+ name: MMLU-PRO (5-shot)
110
+ type: TIGER-Lab/MMLU-Pro
111
+ config: main
112
+ split: test
113
+ args:
114
+ num_few_shot: 5
115
+ metrics:
116
+ - type: acc
117
+ value: 27.5
118
+ name: accuracy
119
+ source:
120
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=spow12/ChatWaifu_v1.4
121
+ name: Open LLM Leaderboard
122
  ---
123
 
124
  # Model Card for Model ID
 
253
  url = { https://huggingface.co/spow12/ChatWaifu_v1.4 },
254
  publisher = { Hugging Face }
255
  }
256
+ ```
257
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
258
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_spow12__ChatWaifu_v1.4)
259
+
260
+ | Metric |Value|
261
+ |-------------------|----:|
262
+ |Avg. |25.25|
263
+ |IFEval (0-Shot) |56.91|
264
+ |BBH (3-Shot) |31.63|
265
+ |MATH Lvl 5 (4-Shot)| 7.85|
266
+ |GPQA (0-shot) | 7.61|
267
+ |MuSR (0-shot) |20.03|
268
+ |MMLU-PRO (5-shot) |27.50|
269
+