Adding Evaluation Results

#3
by T145 - opened
Files changed (1) hide show
  1. README.md +114 -0
README.md CHANGED
@@ -18,6 +18,105 @@ language:
18
  - en
19
  base_model:
20
  - meta-llama/Llama-3.1-8B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ---
22
 
23
  # Dolphin 3.0 Llama 3.1 8B 🐬
@@ -124,3 +223,18 @@ Special thanks to
124
  - Deepseek, for the ridiculously fast Deepseek-V3 that we used to augment the data.
125
 
126
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  - en
19
  base_model:
20
  - meta-llama/Llama-3.1-8B
21
+ model-index:
22
+ - name: Dolphin3.0-Llama3.1-8B
23
+ results:
24
+ - task:
25
+ type: text-generation
26
+ name: Text Generation
27
+ dataset:
28
+ name: IFEval (0-Shot)
29
+ type: wis-k/instruction-following-eval
30
+ split: train
31
+ args:
32
+ num_few_shot: 0
33
+ metrics:
34
+ - type: inst_level_strict_acc and prompt_level_strict_acc
35
+ value: 76.21
36
+ name: averaged accuracy
37
+ source:
38
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=cognitivecomputations%2FDolphin3.0-Llama3.1-8B
39
+ name: Open LLM Leaderboard
40
+ - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ name: BBH (3-Shot)
45
+ type: SaylorTwift/bbh
46
+ split: test
47
+ args:
48
+ num_few_shot: 3
49
+ metrics:
50
+ - type: acc_norm
51
+ value: 27.63
52
+ name: normalized accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=cognitivecomputations%2FDolphin3.0-Llama3.1-8B
55
+ name: Open LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: MATH Lvl 5 (4-Shot)
61
+ type: lighteval/MATH-Hard
62
+ split: test
63
+ args:
64
+ num_few_shot: 4
65
+ metrics:
66
+ - type: exact_match
67
+ value: 10.5
68
+ name: exact match
69
+ source:
70
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=cognitivecomputations%2FDolphin3.0-Llama3.1-8B
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: GPQA (0-shot)
77
+ type: Idavidrein/gpqa
78
+ split: train
79
+ args:
80
+ num_few_shot: 0
81
+ metrics:
82
+ - type: acc_norm
83
+ value: 4.36
84
+ name: acc_norm
85
+ source:
86
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=cognitivecomputations%2FDolphin3.0-Llama3.1-8B
87
+ name: Open LLM Leaderboard
88
+ - task:
89
+ type: text-generation
90
+ name: Text Generation
91
+ dataset:
92
+ name: MuSR (0-shot)
93
+ type: TAUR-Lab/MuSR
94
+ args:
95
+ num_few_shot: 0
96
+ metrics:
97
+ - type: acc_norm
98
+ value: 8.97
99
+ name: acc_norm
100
+ source:
101
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=cognitivecomputations%2FDolphin3.0-Llama3.1-8B
102
+ name: Open LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: MMLU-PRO (5-shot)
108
+ type: TIGER-Lab/MMLU-Pro
109
+ config: main
110
+ split: test
111
+ args:
112
+ num_few_shot: 5
113
+ metrics:
114
+ - type: acc
115
+ value: 22.13
116
+ name: accuracy
117
+ source:
118
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=cognitivecomputations%2FDolphin3.0-Llama3.1-8B
119
+ name: Open LLM Leaderboard
120
  ---
121
 
122
  # Dolphin 3.0 Llama 3.1 8B 🐬
 
223
  - Deepseek, for the ridiculously fast Deepseek-V3 that we used to augment the data.
224
 
225
 
226
+
227
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
228
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/cognitivecomputations__Dolphin3.0-Llama3.1-8B-details)!
229
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=cognitivecomputations%2FDolphin3.0-Llama3.1-8B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
230
+
231
+ | Metric |Value (%)|
232
+ |-------------------|--------:|
233
+ |**Average** | 24.97|
234
+ |IFEval (0-Shot) | 76.21|
235
+ |BBH (3-Shot) | 27.63|
236
+ |MATH Lvl 5 (4-Shot)| 10.50|
237
+ |GPQA (0-shot) | 4.36|
238
+ |MuSR (0-shot) | 8.97|
239
+ |MMLU-PRO (5-shot) | 22.13|
240
+