Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +118 -2

README.md CHANGED Viewed

@@ -1,10 +1,113 @@
 ---
 license: other
-license_name: microsoft-research-license
 tags:
 - storywriting
 - text adventure
 - not-for-all-audiences
 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6459a451abdbb77c4c6d8258/uNoKlBulkRF3mCoMgetGs.png)
@@ -46,4 +149,17 @@ Despite that, we have tested the model out to 16000 context via Rope scaling and
 Please enjoy, and if you encounter anything exciting or weird, please reach out to me at [jebcarter@pm.me].
-Special thanks as always to the KoboldAI crew who provided the mergebox, testing, and feedback on this model, and to gelukuMLG for the model mascot!

 ---
 license: other
 tags:
 - storywriting
 - text adventure
 - not-for-all-audiences
+license_name: microsoft-research-license
+model-index:
+- name: psyonic-cetacean-20B
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: acc_norm
+      value: 63.57
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jebcarter/psyonic-cetacean-20B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HellaSwag (10-Shot)
+      type: hellaswag
+      split: validation
+      args:
+        num_few_shot: 10
+    metrics:
+    - type: acc_norm
+      value: 86.2
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jebcarter/psyonic-cetacean-20B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 59.66
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jebcarter/psyonic-cetacean-20B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TruthfulQA (0-shot)
+      type: truthful_qa
+      config: multiple_choice
+      split: validation
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: mc2
+      value: 57.55
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jebcarter/psyonic-cetacean-20B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Winogrande (5-shot)
+      type: winogrande
+      config: winogrande_xl
+      split: validation
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 78.14
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jebcarter/psyonic-cetacean-20B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 14.71
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jebcarter/psyonic-cetacean-20B
+      name: Open LLM Leaderboard
 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6459a451abdbb77c4c6d8258/uNoKlBulkRF3mCoMgetGs.png)
 Please enjoy, and if you encounter anything exciting or weird, please reach out to me at [jebcarter@pm.me].
+Special thanks as always to the KoboldAI crew who provided the mergebox, testing, and feedback on this model, and to gelukuMLG for the model mascot!
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jebcarter__psyonic-cetacean-20B)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |59.97|
+|AI2 Reasoning Challenge (25-Shot)|63.57|
+|HellaSwag (10-Shot)              |86.20|
+|MMLU (5-Shot)                    |59.66|
+|TruthfulQA (0-shot)              |57.55|
+|Winogrande (5-shot)              |78.14|
+|GSM8k (5-shot)                   |14.71|