persian_llm_leaderboard

Running

App Files Files Community

Behnamm commited on Sep 3

Commit

ae06c37

•

1 Parent(s): ba78a38

Update src/about.py

Browse files

Files changed (1) hide show

src/about.py +2 -2

src/about.py CHANGED Viewed

@@ -37,7 +37,7 @@ TITLE = f"""
 INTRODUCTION_TEXT = """
 Persian LLM Leaderboard is designed to be a challenging benchmark and provide a reliable evaluation of LLMs in Persian Language.
-Note: This is a demo version of the leaderboard. Two new benchmarks are introduced: *PeKA* and *PersBETS*, challenging the native knowledge of the models along with
 linguistic skills and their level of bias, ethics, and trustworthiness. **These datasets are not yet public, but they will be uploaded onto huggingface along with a detailed paper
 explaining the data and performance of relevent models.**
@@ -59,7 +59,7 @@ This benchmark can also be used by multilingual researchers to measure how well
 We use our own framework to evaluate the models on the following benchmarks (TO BE RELEASED SOON).
 ### Tasks
 - PeKA: Persian Knowledge Assesment (0-shot) - a set of multiple-choice questions that tests the level of native knowledge in Persian language in more 15 domains and categories: From art to history and geography, cinema, tv, sports, law and medicine, and much more.
-- PersBETS: Persian Bias Ethics Toxicity and Skills (0-shot) - a test of model's capability in linguistic skills such as Grammar and Praphrasing, and also questions examining the bias, ethics, and toxicity of the model.
 - <a href="https://arxiv.org/abs/2404.06644" target="_blank">  Khayyam Challenge (Persian MMLU) </a>  (0-shot) - comprising 20,805 four-choice questions (of which we use 20,776, removing questions that are longer than 200 words) sourced from 38 diverse tasks extracted from Persian examinations, spanning a wide spectrum of subjects, complexities, and ages
 - <a href="https://arxiv.org/abs/2012.06154" target="_blank">  ParsiNLU MCQA </a> (0-shot) - a series of multiple-choice questions in domains of *literature*, *math & logic*, and *common knowledge*.
 - <a href="https://arxiv.org/abs/2012.06154" target="_blank">  ParsiNLU NLI </a> (max[0,3,5,10]-shot) - a 3-way classification to determine whether a hypothesis sentence entails, contradicts, or is neutral with respect to a given premise sentence.

 INTRODUCTION_TEXT = """
 Persian LLM Leaderboard is designed to be a challenging benchmark and provide a reliable evaluation of LLMs in Persian Language.
+Note: This is a demo version of the leaderboard. Two new benchmarks are introduced: *PeKA* and *PK-BETS*, challenging the native knowledge of the models along with
 linguistic skills and their level of bias, ethics, and trustworthiness. **These datasets are not yet public, but they will be uploaded onto huggingface along with a detailed paper
 explaining the data and performance of relevent models.**
 We use our own framework to evaluate the models on the following benchmarks (TO BE RELEASED SOON).
 ### Tasks
 - PeKA: Persian Knowledge Assesment (0-shot) - a set of multiple-choice questions that tests the level of native knowledge in Persian language in more 15 domains and categories: From art to history and geography, cinema, tv, sports, law and medicine, and much more.
+- PK-BETS: Persian Bias Ethics Toxicity and Skills (0-shot) - a test of model's knowledge in Persian and its capability in linguistic skills such as Grammar and Praphrasing, and also questions examining the bias, ethics, and toxicity of the model.
 - <a href="https://arxiv.org/abs/2404.06644" target="_blank">  Khayyam Challenge (Persian MMLU) </a>  (0-shot) - comprising 20,805 four-choice questions (of which we use 20,776, removing questions that are longer than 200 words) sourced from 38 diverse tasks extracted from Persian examinations, spanning a wide spectrum of subjects, complexities, and ages
 - <a href="https://arxiv.org/abs/2012.06154" target="_blank">  ParsiNLU MCQA </a> (0-shot) - a series of multiple-choice questions in domains of *literature*, *math & logic*, and *common knowledge*.
 - <a href="https://arxiv.org/abs/2012.06154" target="_blank">  ParsiNLU NLI </a> (max[0,3,5,10]-shot) - a 3-way classification to determine whether a hypothesis sentence entails, contradicts, or is neutral with respect to a given premise sentence.