KlaudiaTH commited on
Commit
aa451eb
β€’
1 Parent(s): 362f22e

Fixed README

Browse files
Files changed (1) hide show
  1. README.md +0 -32
README.md CHANGED
@@ -1,13 +1,3 @@
1
- # New data model
2
-
3
- The new model is constructed by taking individual json files in data/new_eval, combining them together into
4
- a simple format, and from the combined df, we create individual files for each models.
5
-
6
- For the new eval runs which has to be appended, we first analyze the model associated with the json file
7
- produced from eval harness, select the corresponding model file to append, find the unique rows (unique configuration
8
- of model name, language, task group and few shot) in the json file, append if unique rows are not 0.
9
-
10
-
11
  ---
12
  title: Leaderboard
13
  emoji: πŸ‘
@@ -20,30 +10,8 @@ pinned: false
20
  license: unknown
21
  ---
22
 
23
- # Introduction
24
 
25
  This is the OpenGPT-X mutlilingual leaderboard source code repository.
26
  The leaderboard aims to provied an overview of LLM performance over various languages.
27
  The basic task set consists of MMLU, ARC, HellaSwag, GSM8k, TruthfulQA and belebele.
28
  To make the results comparable to the Open LLM leaderboard (https://huggingface.co/open-llm-leaderboard) we selected the former five tasks based on our internal machine translations of the English base tasks, in addition to the high-quality multilingual benchmark belebele by Meta.
29
-
30
- # Usage
31
-
32
- The actually hosted leaderboard can be found under https://huggingface.co/spaces/openGPT-X/leaderboard.
33
- In order to extend its functionality please create a PR.
34
-
35
- # Adding new tasks
36
-
37
- In order to add new evaluation tasks proceed as follows:
38
-
39
- 1. Add task information to `TASK_INFO` in `src/data.py`. It should be a dict mapping the task display name to the metric to be shown, as well as a dict containing mappings from two-letter language codes to the corresponding lm-eval-harness task selection string. See existing task information for reference.
40
- 2. Add evaluation results as detailed below.
41
-
42
- # Adding new models
43
-
44
- It is possible to change the display name of a particular model.
45
- Simply add an entry to `_MODEL_NAMES` in `src/data.py`.
46
-
47
- # Adding evaluation results
48
-
49
- Copy the `.json`-output generated by the lm-eval-harness into `data`.
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Leaderboard
3
  emoji: πŸ‘
 
10
  license: unknown
11
  ---
12
 
 
13
 
14
  This is the OpenGPT-X mutlilingual leaderboard source code repository.
15
  The leaderboard aims to provied an overview of LLM performance over various languages.
16
  The basic task set consists of MMLU, ARC, HellaSwag, GSM8k, TruthfulQA and belebele.
17
  To make the results comparable to the Open LLM leaderboard (https://huggingface.co/open-llm-leaderboard) we selected the former five tasks based on our internal machine translations of the English base tasks, in addition to the high-quality multilingual benchmark belebele by Meta.