open_dutch_llm_leaderboard

Running

Bram Vanroy

add data collection script

828458d 12 months ago

2.09 kB

	TITLE = '<h1 align="center" id="space-title">Open Multilingual LLM Evaluation Leaderboard (Dutch only)</h1>'

	INTRO_TEXT = f"""
	## About

	This is a fork of the [Open Multilingual LLM Evaluation Leaderboard](https://huggingface.co/spaces/uonlp/open_multilingual_llm_leaderboard), but restricted to only Dutch models and augmented with additional model results.
	We test the models on the following benchmarks for the Dutch version only!!, which have been translated into Dutch automatically by the original authors of the Open Multilingual LLM Evaluation Leaderboard with `gpt-35-turbo`.

	- <a href="https://arxiv.org/abs/1803.05457" target="_blank"> AI2 Reasoning Challenge </a> (25-shot)
	- <a href="https://arxiv.org/abs/1905.07830" target="_blank"> HellaSwag </a> (10-shot)
	- <a href="https://arxiv.org/abs/2009.03300" target="_blank"> MMLU </a> (5-shot)
	- <a href="https://arxiv.org/abs/2109.07958" target="_blank"> TruthfulQA </a> (0-shot)

	I do not maintain those datasets, I only run benchmarks and add the results to this space. For questions regarding the test sets or running them yourself, see [the original Github repository](https://github.com/laiviet/lm-evaluation-harness).

	All models are benchmarked in 8-bit precision.
	"""

	CREDIT = f"""
	## Credit

	This leaderboard has borrowed heavily from the following sources:

	- Datasets (AI2_ARC, HellaSwag, MMLU, TruthfulQA)
	- Evaluation code (EleutherAI's lm_evaluation_harness repo)
	- Leaderboard code (Huggingface4's open_llm_leaderboard repo)
	- The multilingual version of the leaderboard (uonlp's open_multilingual_llm_leaderboard repo)

	"""


	CITATION = f"""
	## Citation


	If you use or cite the Dutch benchmark results or this specific leaderboard page, please cite the following paper:

	TDB


	If you use the multilingual benchmarks, please cite the following paper:

	```bibtex
	@misc{{lai2023openllmbenchmark,
	author = {{Viet Lai and Nghia Trung Ngo and Amir Pouran Ben Veyseh and Franck Dernoncourt and Thien Huu Nguyen}},
	title={{Open Multilingual LLM Evaluation Leaderboard}},
	year={{2023}}
	}}
	```
	"""