Text Generation
Transformers
Safetensors
llama
text-generation-inference
Inference Endpoints
mfromm commited on
Commit
69937a1
1 Parent(s): 567e8b1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -9
README.md CHANGED
@@ -217,15 +217,22 @@ More information regarding the pre-training are available in our model preprint
217
 
218
  <!-- This section describes the evaluation protocols and provides the results. -->
219
 
220
- More information regarding our translated benchmarks are available in our preprint ["Towards Multilingual LLM Evaluation for European Languages"](https://arxiv.org/abs/2410.08928).
221
-
222
- ### Testing Data, Factors & Metrics
223
-
224
- #### Testing Data
225
-
226
- <!-- This should link to a Dataset Card if possible. -->
227
-
228
- Teuken-7B-instruct-commercial was evaluated in 21 languages on ARC, GSM8K, HellaSwag, TruthfulQA, Translation and MMLU. Results can be seen in the [European LLM Leaderboard](https://huggingface.co/spaces/openGPT-X/european-llm-leaderboard).
 
 
 
 
 
 
 
229
 
230
  ## Technical Specifications
231
 
 
217
 
218
  <!-- This section describes the evaluation protocols and provides the results. -->
219
 
220
+ Results on multilingual benchmarks for 21 European languages with instruction-tuned models
221
+ | Model | Avg. | EU21-ARC | EU21-HeSw | EU21-TQA | EU21-MMLU |
222
+ |--------------------------------|--------|----------|-----------|----------|-----------|
223
+ | Meta-Llama-3.1-8B-Instruct | **.563** | .563 | .579 | .532 | **.576** |
224
+ | Mistral-7B-Instruct-v0.3 | .527 | .530 | .538 | **.548** | .491 |
225
+ | Salamandra-7B-Instruct | .543 | **.595** | **.637** | .482 | .459 |
226
+ | Aya-23-8B | .485 | .475 | .535 | .476 | .455 |
227
+ | Occiglot-7B-eu5-Instruct | .475 | .484 | .519 | .471 | .428 |
228
+ | Pharia-1-LLM-7B-C-A | .417 | .396 | .438 | .469 | .366 |
229
+ | Bloomz-7B1 | .358 | .316 | .354 | .461 | .302 |
230
+ | **Teuken-7B-instruct-commercial-v0.4** | .53x | .57x | .62x | .47x | .42x |
231
+
232
+ More information regarding the quality of our translated benchmarks are available in our Evaluation preprint ["Towards Multilingual LLM Evaluation for European Languages"](https://arxiv.org/abs/2410.08928).
233
+ More evaluation results regarding Teuken-7B-instruct-research-v0.4 are available in our model preprint ["Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs"](https://arxiv.org/abs/2410.03730).
234
+
235
+ The model was evaluated in 21 languages on ARC, GSM8K, HellaSwag, TruthfulQA, Translation and MMLU. Results can also be seen in the [European LLM Leaderboard](https://huggingface.co/spaces/openGPT-X/european-llm-leaderboard).
236
 
237
  ## Technical Specifications
238