Hastagaras leaderboard-pr-bot commited on
Commit
d4b2a26
1 Parent(s): 1e57db1

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (5dd6b237a9117806adf34a96388afc4c62c52d6c)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +121 -5
README.md CHANGED
@@ -1,13 +1,116 @@
1
  ---
2
- base_model:
3
- - Hastagaras/anjrit
4
- - Hastagaras/anying
5
  library_name: transformers
6
  tags:
7
  - mergekit
8
  - merge
9
  - not-for-all-audiences
10
- license: llama3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
  # ANJIRRR
13
 
@@ -57,4 +160,17 @@ parameters:
57
 
58
  ```
59
 
60
- **WARNING:** This model has not been extensively tested or evaluated, and its performance characteristics are currently unknown. It may generate harmful, biased, or inappropriate content. Please exercise caution and use it at your own risk and discretion.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: llama3
 
 
3
  library_name: transformers
4
  tags:
5
  - mergekit
6
  - merge
7
  - not-for-all-audiences
8
+ base_model:
9
+ - Hastagaras/anjrit
10
+ - Hastagaras/anying
11
+ model-index:
12
+ - name: Anjir-8B-L3
13
+ results:
14
+ - task:
15
+ type: text-generation
16
+ name: Text Generation
17
+ dataset:
18
+ name: AI2 Reasoning Challenge (25-Shot)
19
+ type: ai2_arc
20
+ config: ARC-Challenge
21
+ split: test
22
+ args:
23
+ num_few_shot: 25
24
+ metrics:
25
+ - type: acc_norm
26
+ value: 63.57
27
+ name: normalized accuracy
28
+ source:
29
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Hastagaras/Anjir-8B-L3
30
+ name: Open LLM Leaderboard
31
+ - task:
32
+ type: text-generation
33
+ name: Text Generation
34
+ dataset:
35
+ name: HellaSwag (10-Shot)
36
+ type: hellaswag
37
+ split: validation
38
+ args:
39
+ num_few_shot: 10
40
+ metrics:
41
+ - type: acc_norm
42
+ value: 84.15
43
+ name: normalized accuracy
44
+ source:
45
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Hastagaras/Anjir-8B-L3
46
+ name: Open LLM Leaderboard
47
+ - task:
48
+ type: text-generation
49
+ name: Text Generation
50
+ dataset:
51
+ name: MMLU (5-Shot)
52
+ type: cais/mmlu
53
+ config: all
54
+ split: test
55
+ args:
56
+ num_few_shot: 5
57
+ metrics:
58
+ - type: acc
59
+ value: 67.67
60
+ name: accuracy
61
+ source:
62
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Hastagaras/Anjir-8B-L3
63
+ name: Open LLM Leaderboard
64
+ - task:
65
+ type: text-generation
66
+ name: Text Generation
67
+ dataset:
68
+ name: TruthfulQA (0-shot)
69
+ type: truthful_qa
70
+ config: multiple_choice
71
+ split: validation
72
+ args:
73
+ num_few_shot: 0
74
+ metrics:
75
+ - type: mc2
76
+ value: 52.67
77
+ source:
78
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Hastagaras/Anjir-8B-L3
79
+ name: Open LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: Winogrande (5-shot)
85
+ type: winogrande
86
+ config: winogrande_xl
87
+ split: validation
88
+ args:
89
+ num_few_shot: 5
90
+ metrics:
91
+ - type: acc
92
+ value: 78.61
93
+ name: accuracy
94
+ source:
95
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Hastagaras/Anjir-8B-L3
96
+ name: Open LLM Leaderboard
97
+ - task:
98
+ type: text-generation
99
+ name: Text Generation
100
+ dataset:
101
+ name: GSM8k (5-shot)
102
+ type: gsm8k
103
+ config: main
104
+ split: test
105
+ args:
106
+ num_few_shot: 5
107
+ metrics:
108
+ - type: acc
109
+ value: 67.78
110
+ name: accuracy
111
+ source:
112
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Hastagaras/Anjir-8B-L3
113
+ name: Open LLM Leaderboard
114
  ---
115
  # ANJIRRR
116
 
 
160
 
161
  ```
162
 
163
+ **WARNING:** This model has not been extensively tested or evaluated, and its performance characteristics are currently unknown. It may generate harmful, biased, or inappropriate content. Please exercise caution and use it at your own risk and discretion.
164
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
165
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Hastagaras__Anjir-8B-L3)
166
+
167
+ | Metric |Value|
168
+ |---------------------------------|----:|
169
+ |Avg. |69.07|
170
+ |AI2 Reasoning Challenge (25-Shot)|63.57|
171
+ |HellaSwag (10-Shot) |84.15|
172
+ |MMLU (5-Shot) |67.67|
173
+ |TruthfulQA (0-shot) |52.67|
174
+ |Winogrande (5-shot) |78.61|
175
+ |GSM8k (5-shot) |67.78|
176
+