openGPT-X
/

Teuken-7B-instruct-commercial-v0.4

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mfromm commited on 5 days ago

Commit

69937a1

•

1 Parent(s): 567e8b1

Update README.md

Files changed (1) hide show

README.md +16 -9

README.md CHANGED Viewed

@@ -217,15 +217,22 @@ More information regarding the pre-training are available in our model preprint
 <!-- This section describes the evaluation protocols and provides the results. -->
-More information regarding our translated benchmarks are available in our preprint ["Towards Multilingual LLM Evaluation for European Languages"](https://arxiv.org/abs/2410.08928).
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-Teuken-7B-instruct-commercial was evaluated in 21 languages on ARC, GSM8K, HellaSwag, TruthfulQA, Translation and MMLU. Results can be seen in the [European LLM Leaderboard](https://huggingface.co/spaces/openGPT-X/european-llm-leaderboard).
 ## Technical Specifications

 <!-- This section describes the evaluation protocols and provides the results. -->
+Results on multilingual benchmarks for 21 European languages with instruction-tuned models
+| Model                          | Avg.   | EU21-ARC | EU21-HeSw | EU21-TQA | EU21-MMLU |
+|--------------------------------|--------|----------|-----------|----------|-----------|
+| Meta-Llama-3.1-8B-Instruct     | **.563** | .563   | .579      | .532     | **.576**  |
+| Mistral-7B-Instruct-v0.3       | .527   | .530     | .538      | **.548** | .491   |
+| Salamandra-7B-Instruct         | .543   | **.595** | **.637**  | .482     | .459      |
+| Aya-23-8B                      | .485   | .475     | .535      | .476     | .455      |
+| Occiglot-7B-eu5-Instruct       | .475   | .484     | .519      | .471     | .428      |
+| Pharia-1-LLM-7B-C-A            | .417   | .396     | .438      | .469     | .366      |
+| Bloomz-7B1                     | .358   | .316     | .354      | .461     | .302      |
+| **Teuken-7B-instruct-commercial-v0.4**            | .53x   | .57x     | .62x      | .47x     | .42x      |
+More information regarding the quality of our translated benchmarks are available in our Evaluation preprint ["Towards Multilingual LLM Evaluation for European Languages"](https://arxiv.org/abs/2410.08928).
+More evaluation results regarding Teuken-7B-instruct-research-v0.4 are available in our model preprint  ["Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs"](https://arxiv.org/abs/2410.03730).
+The model was evaluated in 21 languages on ARC, GSM8K, HellaSwag, TruthfulQA, Translation and MMLU. Results can also be seen in the [European LLM Leaderboard](https://huggingface.co/spaces/openGPT-X/european-llm-leaderboard).
 ## Technical Specifications