Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
pminervini
commited on
Merge branch 'main' of https://huggingface.co/spaces/pminervini/hallucinations-leaderboard into main
Browse files- src/display/about.py +4 -0
src/display/about.py
CHANGED
@@ -5,6 +5,10 @@ TITLE = """<h1 align="center" id="space-title">Hallucinations Leaderboard</h1>""
|
|
5 |
INTRODUCTION_TEXT = """
|
6 |
π The Hallucinations Leaderboard aims to track, rank and evaluate hallucinations in LLMs.
|
7 |
|
|
|
|
|
|
|
|
|
8 |
Submit a model for automated evaluation on the [Edinburgh International Data Facility](https://www.epcc.ed.ac.uk/hpc-services/edinburgh-international-data-facility) (EIDF) GPU cluster on the "Submit" page.
|
9 |
The backend of the Hallucinations leaderboard is based on the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) --- more details in the "About" page.
|
10 |
Metrics and datasets used by the Hallucinations Leaderboard were identified while writing our [awesome-hallucinations-detection](https://github.com/EdinburghNLP/awesome-hallucination-detection) page (you are encouraged to contribute to this list via pull requests).
|
|
|
5 |
INTRODUCTION_TEXT = """
|
6 |
π The Hallucinations Leaderboard aims to track, rank and evaluate hallucinations in LLMs.
|
7 |
|
8 |
+
It evaluates the propensity for hallucination in Large Language Models (LLMs) across a diverse array of tasks, including Closed-book Open-domain QA, Summarization, Reading Comprehension, Instruction Following, Fact-Checking, Hallucination Detection, and Self-Consistency. The evaluation encompasses a wide range of datasets such as NQ Open, TriviaQA, TruthfulQA, XSum, CNN/DM, RACE, SQuADv2, MemoTrap, IFEval, FEVER, FaithDial, True-False, HaluEval, and SelfCheckGPT, offering a comprehensive assessment of each model's performance in generating accurate and contextually relevant content.
|
9 |
+
|
10 |
+
A more detailed explanation of the definition of hallucination and the leaderboard's motivation, tasks and dataset can be found on the "About" page and [The Hallucinations Leaderboard blog post](https://huggingface.co/blog/leaderboards-on-the-hub-hallucinations).
|
11 |
+
|
12 |
Submit a model for automated evaluation on the [Edinburgh International Data Facility](https://www.epcc.ed.ac.uk/hpc-services/edinburgh-international-data-facility) (EIDF) GPU cluster on the "Submit" page.
|
13 |
The backend of the Hallucinations leaderboard is based on the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) --- more details in the "About" page.
|
14 |
Metrics and datasets used by the Hallucinations Leaderboard were identified while writing our [awesome-hallucinations-detection](https://github.com/EdinburghNLP/awesome-hallucination-detection) page (you are encouraged to contribute to this list via pull requests).
|