Spaces:
Running
Running
add blog link
Browse files
ZeroEval-main/result_dirs/zebra-grid.summary.json
CHANGED
@@ -372,16 +372,5 @@
|
|
372 |
"Hard Puzzle Acc": "0.00",
|
373 |
"Total Puzzles": 1000,
|
374 |
"Reason Lens": "1592.60"
|
375 |
-
},
|
376 |
-
{
|
377 |
-
"Model": "gemma-2-27b-it@vllm",
|
378 |
-
"Mode": "greedy",
|
379 |
-
"Puzzle Acc": "0.47",
|
380 |
-
"Cell Acc": "0.31",
|
381 |
-
"No answer": "96.23",
|
382 |
-
"Easy Puzzle Acc": "2.08",
|
383 |
-
"Hard Puzzle Acc": "0.00",
|
384 |
-
"Total Puzzles": 212,
|
385 |
-
"Reason Lens": "1280.62"
|
386 |
}
|
387 |
]
|
|
|
372 |
"Hard Puzzle Acc": "0.00",
|
373 |
"Total Puzzles": 1000,
|
374 |
"Reason Lens": "1592.60"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
375 |
}
|
376 |
]
|
_header.md
CHANGED
@@ -2,5 +2,5 @@
|
|
2 |
|
3 |
# π¦ ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models
|
4 |
<!-- [π FnF Paper](https://arxiv.org/abs/2305.18654) | -->
|
5 |
-
[π° Blog]() [π» GitHub](https://github.com/yuchenlin/ZeroEval) | [π€ HuggingFace](https://huggingface.co/collections/allenai/zebra-logic-bench-6697137cbaad0b91e635e7b0) | [π¦ X](https://twitter.com/billyuchenlin/) | [π¬ Discussion](https://huggingface.co/spaces/allenai/ZebraLogicBench-Leaderboard/discussions) | Updated: **{LAST_UPDATED}**
|
6 |
|
|
|
2 |
|
3 |
# π¦ ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models
|
4 |
<!-- [π FnF Paper](https://arxiv.org/abs/2305.18654) | -->
|
5 |
+
[π° Blog](https://huggingface.co/blog/yuchenlin/zebra-logic) [π» GitHub](https://github.com/yuchenlin/ZeroEval) | [π€ HuggingFace](https://huggingface.co/collections/allenai/zebra-logic-bench-6697137cbaad0b91e635e7b0) | [π¦ X](https://twitter.com/billyuchenlin/) | [π¬ Discussion](https://huggingface.co/spaces/allenai/ZebraLogicBench-Leaderboard/discussions) | Updated: **{LAST_UPDATED}**
|
6 |
|