Spaces:
Running
Running
Jae-Won Chung
commited on
Commit
•
0787166
1
Parent(s):
069d87a
Add a section in Limitations
Browse files- LEADERBOARD.md +5 -0
LEADERBOARD.md
CHANGED
@@ -61,6 +61,11 @@ See [here](https://github.com/ml-energy/leaderboard/tree/master/sharegpt) for mo
|
|
61 |
- `hellaswag`: [HellaSwag dataset](https://allenai.org/data/hellaswag), measuring grounded commonsense, 10 shot
|
62 |
- `truthfulqa`: [TruthfulQA dataset](https://arxiv.org/abs/2109.07958), measuring truthfulness against questions that elicit common falsehoods, 0 shot
|
63 |
|
|
|
|
|
|
|
|
|
|
|
64 |
## Upcoming
|
65 |
|
66 |
- Within the Summer, we'll add an LLM Arena for energy consumption!
|
|
|
61 |
- `hellaswag`: [HellaSwag dataset](https://allenai.org/data/hellaswag), measuring grounded commonsense, 10 shot
|
62 |
- `truthfulqa`: [TruthfulQA dataset](https://arxiv.org/abs/2109.07958), measuring truthfulness against questions that elicit common falsehoods, 0 shot
|
63 |
|
64 |
+
## Limitations
|
65 |
+
|
66 |
+
Currently, inference is run with basically bare PyTorch with batch size 1, which is unrealistic assuming a production serving scenario.
|
67 |
+
Hence, absolute latency, throughput, and energy numbers should not be used to estimate figures in real production settings, while relative comparison makes some sense.
|
68 |
+
|
69 |
## Upcoming
|
70 |
|
71 |
- Within the Summer, we'll add an LLM Arena for energy consumption!
|