Spaces:

ml-energy
/

leaderboard

Running

App Files Files Community

Zhiyu Wu commited on Jul 5, 2023

Commit

862fdcc

•

1 Parent(s): d846882

Add Pegasus scripts for running NLP evaluation (#9)

Browse files

Files changed (2) hide show

pegasus/README.md +26 -0
pegasus/nlp-eval.yaml +68 -0

pegasus/README.md CHANGED Viewed

@@ -58,3 +58,29 @@ $ pegasus q
 ```
 `q` stands for queue. Each command is run once on the next available (`hostname`, `gpu`) combination.

 ```
 `q` stands for queue. Each command is run once on the next available (`hostname`, `gpu`) combination.
+## NLP-eval
+Now use Pegasus to run benchmarks for all the models across all nodes.
+```console
+$ cd pegasus
+$ cp nlp-eval.yaml queue.yaml
+$ pegasus q
+```
+for some tasks, if the cuda memory of a single gpu is not enough, you can use more GPUs like follows —
+1. create a larger docker with more gpus, e.g. 2 gpus:
+```console
+$ docker run -dit --name leaderboard_nlp_tasks --gpus '"device=0,1"' -v /data/leaderboard:/data/leaderboard -v $HOME/workspace/leaderboard:/workspace/leaderboard ml-energy:latest bash
+```
+2. then run the specific task with Pegasus or directly run with
+```console
+$ docker exec leaderboard_nlp_tasks python lm-evaluation-harness/main.py --device cuda --no_cache --model hf-causal-experimental --model_args pretrained={{model}},trust_remote_code=True,use_accelerate=True --tasks {{task}} --num_fewshot {{shot}}
+```
+change 'model', `task` and `shot` to specific tasks

pegasus/nlp-eval.yaml ADDED Viewed

	@@ -0,0 +1,68 @@

+- command:
+    - docker exec leaderboard{{ gpu }} python lm-evaluation-harness/main.py --device cuda --no_cache --model hf-causal-experimental --model_args pretrained={{model}},trust_remote_code=True,use_accelerate=True --tasks arc_challenge --num_fewshot 25
+  model:
+    - /data/leaderboard/weights/metaai/llama-7B
+    - /data/leaderboard/weights/metaai/llama-13B
+    - /data/leaderboard/weights/lmsys/vicuna-7B
+    - /data/leaderboard/weights/lmsys/vicuna-13B
+    - /data/leaderboard/weights/tatsu-lab/alpaca-7B
+    - /data/leaderboard/weights/BAIR/koala-7b
+    - /data/leaderboard/weights/BAIR/koala-13b
+    - camel-ai/CAMEL-13B-Combined-Data
+    - databricks/dolly-v2-12b
+    - FreedomIntelligence/phoenix-inst-chat-7b
+    - h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
+    - lmsys/fastchat-t5-3b-v1.0
+    - Neutralzz/BiLLa-7B-SFT
+    - nomic-ai/gpt4all-13b-snoozy
+    - openaccess-ai-collective/manticore-13b-chat-pyg
+    - OpenAssistant/oasst-sft-1-pythia-12b
+    - project-baize/baize-v2-7B
+    - StabilityAI/stablelm-tuned-alpha-7b
+    - togethercomputer/RedPajama-INCITE-7B-Chat
+- command:
+    - docker exec leaderboard{{ gpu }} python lm-evaluation-harness/main.py --device cuda --no_cache --model hf-causal-experimental --model_args pretrained={{model}},trust_remote_code=True,use_accelerate=True --tasks hellaswag --num_fewshot 10
+  model:
+    - /data/leaderboard/weights/metaai/llama-7B
+    - /data/leaderboard/weights/metaai/llama-13B
+    - /data/leaderboard/weights/lmsys/vicuna-7B
+    - /data/leaderboard/weights/lmsys/vicuna-13B
+    - /data/leaderboard/weights/tatsu-lab/alpaca-7B
+    - /data/leaderboard/weights/BAIR/koala-7b
+    - /data/leaderboard/weights/BAIR/koala-13b
+    - camel-ai/CAMEL-13B-Combined-Data
+    - databricks/dolly-v2-12b
+    - FreedomIntelligence/phoenix-inst-chat-7b
+    - h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
+    - lmsys/fastchat-t5-3b-v1.0
+    - Neutralzz/BiLLa-7B-SFT
+    - nomic-ai/gpt4all-13b-snoozy
+    - openaccess-ai-collective/manticore-13b-chat-pyg
+    - OpenAssistant/oasst-sft-1-pythia-12b
+    - project-baize/baize-v2-7B
+    - StabilityAI/stablelm-tuned-alpha-7b
+    - togethercomputer/RedPajama-INCITE-7B-Chat
+- command:
+    - docker exec leaderboard{{ gpu }} python lm-evaluation-harness/main.py --device cuda --no_cache --model hf-causal-experimental --model_args pretrained={{model}},trust_remote_code=True,use_accelerate=True --tasks truthfulqa_mc --num_fewshot 0
+  model:
+    - /data/leaderboard/weights/metaai/llama-7B
+    - /data/leaderboard/weights/metaai/llama-13B
+    - /data/leaderboard/weights/lmsys/vicuna-7B
+    - /data/leaderboard/weights/lmsys/vicuna-13B
+    - /data/leaderboard/weights/tatsu-lab/alpaca-7B
+    - /data/leaderboard/weights/BAIR/koala-7b
+    - /data/leaderboard/weights/BAIR/koala-13b
+    - camel-ai/CAMEL-13B-Combined-Data
+    - databricks/dolly-v2-12b
+    - FreedomIntelligence/phoenix-inst-chat-7b
+    - h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
+    - lmsys/fastchat-t5-3b-v1.0
+    - Neutralzz/BiLLa-7B-SFT
+    - nomic-ai/gpt4all-13b-snoozy
+    - openaccess-ai-collective/manticore-13b-chat-pyg
+    - OpenAssistant/oasst-sft-1-pythia-12b
+    - project-baize/baize-v2-7B
+    - StabilityAI/stablelm-tuned-alpha-7b
+    - togethercomputer/RedPajama-INCITE-7B-Chat