Spaces:
Running
Running
Zhiyu Wu
commited on
Commit
•
862fdcc
1
Parent(s):
d846882
Add Pegasus scripts for running NLP evaluation (#9)
Browse files- pegasus/README.md +26 -0
- pegasus/nlp-eval.yaml +68 -0
pegasus/README.md
CHANGED
@@ -58,3 +58,29 @@ $ pegasus q
|
|
58 |
```
|
59 |
|
60 |
`q` stands for queue. Each command is run once on the next available (`hostname`, `gpu`) combination.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
```
|
59 |
|
60 |
`q` stands for queue. Each command is run once on the next available (`hostname`, `gpu`) combination.
|
61 |
+
|
62 |
+
## NLP-eval
|
63 |
+
|
64 |
+
Now use Pegasus to run benchmarks for all the models across all nodes.
|
65 |
+
|
66 |
+
```console
|
67 |
+
$ cd pegasus
|
68 |
+
$ cp nlp-eval.yaml queue.yaml
|
69 |
+
$ pegasus q
|
70 |
+
```
|
71 |
+
|
72 |
+
for some tasks, if the cuda memory of a single gpu is not enough, you can use more GPUs like follows —
|
73 |
+
|
74 |
+
1. create a larger docker with more gpus, e.g. 2 gpus:
|
75 |
+
|
76 |
+
```console
|
77 |
+
$ docker run -dit --name leaderboard_nlp_tasks --gpus '"device=0,1"' -v /data/leaderboard:/data/leaderboard -v $HOME/workspace/leaderboard:/workspace/leaderboard ml-energy:latest bash
|
78 |
+
```
|
79 |
+
|
80 |
+
2. then run the specific task with Pegasus or directly run with
|
81 |
+
|
82 |
+
```console
|
83 |
+
$ docker exec leaderboard_nlp_tasks python lm-evaluation-harness/main.py --device cuda --no_cache --model hf-causal-experimental --model_args pretrained={{model}},trust_remote_code=True,use_accelerate=True --tasks {{task}} --num_fewshot {{shot}}
|
84 |
+
```
|
85 |
+
|
86 |
+
change 'model', `task` and `shot` to specific tasks
|
pegasus/nlp-eval.yaml
ADDED
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
- command:
|
2 |
+
- docker exec leaderboard{{ gpu }} python lm-evaluation-harness/main.py --device cuda --no_cache --model hf-causal-experimental --model_args pretrained={{model}},trust_remote_code=True,use_accelerate=True --tasks arc_challenge --num_fewshot 25
|
3 |
+
model:
|
4 |
+
- /data/leaderboard/weights/metaai/llama-7B
|
5 |
+
- /data/leaderboard/weights/metaai/llama-13B
|
6 |
+
- /data/leaderboard/weights/lmsys/vicuna-7B
|
7 |
+
- /data/leaderboard/weights/lmsys/vicuna-13B
|
8 |
+
- /data/leaderboard/weights/tatsu-lab/alpaca-7B
|
9 |
+
- /data/leaderboard/weights/BAIR/koala-7b
|
10 |
+
- /data/leaderboard/weights/BAIR/koala-13b
|
11 |
+
- camel-ai/CAMEL-13B-Combined-Data
|
12 |
+
- databricks/dolly-v2-12b
|
13 |
+
- FreedomIntelligence/phoenix-inst-chat-7b
|
14 |
+
- h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
|
15 |
+
- lmsys/fastchat-t5-3b-v1.0
|
16 |
+
- Neutralzz/BiLLa-7B-SFT
|
17 |
+
- nomic-ai/gpt4all-13b-snoozy
|
18 |
+
- openaccess-ai-collective/manticore-13b-chat-pyg
|
19 |
+
- OpenAssistant/oasst-sft-1-pythia-12b
|
20 |
+
- project-baize/baize-v2-7B
|
21 |
+
- StabilityAI/stablelm-tuned-alpha-7b
|
22 |
+
- togethercomputer/RedPajama-INCITE-7B-Chat
|
23 |
+
|
24 |
+
- command:
|
25 |
+
- docker exec leaderboard{{ gpu }} python lm-evaluation-harness/main.py --device cuda --no_cache --model hf-causal-experimental --model_args pretrained={{model}},trust_remote_code=True,use_accelerate=True --tasks hellaswag --num_fewshot 10
|
26 |
+
model:
|
27 |
+
- /data/leaderboard/weights/metaai/llama-7B
|
28 |
+
- /data/leaderboard/weights/metaai/llama-13B
|
29 |
+
- /data/leaderboard/weights/lmsys/vicuna-7B
|
30 |
+
- /data/leaderboard/weights/lmsys/vicuna-13B
|
31 |
+
- /data/leaderboard/weights/tatsu-lab/alpaca-7B
|
32 |
+
- /data/leaderboard/weights/BAIR/koala-7b
|
33 |
+
- /data/leaderboard/weights/BAIR/koala-13b
|
34 |
+
- camel-ai/CAMEL-13B-Combined-Data
|
35 |
+
- databricks/dolly-v2-12b
|
36 |
+
- FreedomIntelligence/phoenix-inst-chat-7b
|
37 |
+
- h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
|
38 |
+
- lmsys/fastchat-t5-3b-v1.0
|
39 |
+
- Neutralzz/BiLLa-7B-SFT
|
40 |
+
- nomic-ai/gpt4all-13b-snoozy
|
41 |
+
- openaccess-ai-collective/manticore-13b-chat-pyg
|
42 |
+
- OpenAssistant/oasst-sft-1-pythia-12b
|
43 |
+
- project-baize/baize-v2-7B
|
44 |
+
- StabilityAI/stablelm-tuned-alpha-7b
|
45 |
+
- togethercomputer/RedPajama-INCITE-7B-Chat
|
46 |
+
|
47 |
+
- command:
|
48 |
+
- docker exec leaderboard{{ gpu }} python lm-evaluation-harness/main.py --device cuda --no_cache --model hf-causal-experimental --model_args pretrained={{model}},trust_remote_code=True,use_accelerate=True --tasks truthfulqa_mc --num_fewshot 0
|
49 |
+
model:
|
50 |
+
- /data/leaderboard/weights/metaai/llama-7B
|
51 |
+
- /data/leaderboard/weights/metaai/llama-13B
|
52 |
+
- /data/leaderboard/weights/lmsys/vicuna-7B
|
53 |
+
- /data/leaderboard/weights/lmsys/vicuna-13B
|
54 |
+
- /data/leaderboard/weights/tatsu-lab/alpaca-7B
|
55 |
+
- /data/leaderboard/weights/BAIR/koala-7b
|
56 |
+
- /data/leaderboard/weights/BAIR/koala-13b
|
57 |
+
- camel-ai/CAMEL-13B-Combined-Data
|
58 |
+
- databricks/dolly-v2-12b
|
59 |
+
- FreedomIntelligence/phoenix-inst-chat-7b
|
60 |
+
- h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
|
61 |
+
- lmsys/fastchat-t5-3b-v1.0
|
62 |
+
- Neutralzz/BiLLa-7B-SFT
|
63 |
+
- nomic-ai/gpt4all-13b-snoozy
|
64 |
+
- openaccess-ai-collective/manticore-13b-chat-pyg
|
65 |
+
- OpenAssistant/oasst-sft-1-pythia-12b
|
66 |
+
- project-baize/baize-v2-7B
|
67 |
+
- StabilityAI/stablelm-tuned-alpha-7b
|
68 |
+
- togethercomputer/RedPajama-INCITE-7B-Chat
|