leafspark
/

Mistral-Large-218B-Instruct

+---
+license: other
+license_name: mrl
+license_link: https://mistral.ai/licenses/MRL-0.1.md
+language:
+  - en
+  - fr
+  - de
+  - es
+  - it
+  - pt
+  - zh
+  - ja
+  - ru
+  - ko
+---
+# Mistral-Large-218B-Instruct
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/P-BGJ5Ba2d1NkpdGXNThe.png)
+Mistral-Large-218B-Instruct is an advanced dense Large Language Model (LLM) with 218 billion parameters, featuring state-of-the-art reasoning, knowledge, and coding capabilities.
+Self-merged from the original Mistral Large 2, see mergekit config below.
+## Key features
+- Massive scale: With 218 billion parameters, this model pushes the boundaries of language model capabilities.
+- Multi-lingual by design: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
+- Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, JavaScript, and Bash, as well as more specific languages like Swift and Fortran.
+- Agentic-centric: Best-in-class agentic capabilities with native function calling and JSON outputting.
+- Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities.
+- Mistral Research License: Allows usage and modification for research and non-commercial purposes.
+- Large Context: Features a large 128k context window for handling extensive input.
+## Metrics
+Note: The following metrics are based on the original model and may differ for this 218B parameter version. Updated benchmarks will be provided when available.
+**Base Pretrained Benchmarks**
+| Benchmark | Score |
+| --- | --- |
+| MMLU | 84.0% |
+**Base Pretrained Multilingual Benchmarks (MMLU)**
+| Benchmark | Score |
+| --- | --- |
+| French | 82.8% |
+| German | 81.6% |
+| Spanish | 82.7% |
+| Italian | 82.7% |
+| Dutch | 80.7% |
+| Portuguese | 81.6% |
+| Russian | 79.0% |
+| Korean | 60.1% |
+| Japanese | 78.8% |
+| Chinese | 74.8% |
+**Instruction Benchmarks**
+| Benchmark | Score |
+| --- | --- |
+| MT Bench | 8.63 |
+| Wild Bench | 56.3 |
+| Arena Hard| 73.2 |
+**Code & Reasoning Benchmarks**
+| Benchmark | Score |
+| --- | --- |
+| Human Eval | 92% |
+| Human Eval Plus| 87% |
+| MBPP Base| 80% |
+| MBPP Plus| 69% |
+**Math Benchmarks**
+| Benchmark | Score |
+| --- | --- |
+| GSM8K | 93% |
+| Math Instruct (0-shot, no CoT) | 70% |
+| Math Instruct (0-shot, CoT)| 71.5% |
+## Usage
+This model can be used with standard LLM frameworks and libraries. Specific usage instructions will be provided upon release.
+## Hardware Requirements
+Given the size of this model (218B parameters), it requires substantial computational resources for inference:
+- Recommended: 8xH100 (640GB)
+- Alternatively: Distributed inference setup across multiple machines.
+## Limitations
+- This model does not have built-in moderation mechanisms. Users should implement appropriate safeguards for deployment in production environments.
+- Due to its size, inference may be computationally expensive and require significant hardware resources.
+- As with all large language models, it may exhibit biases present in its training data.
+- The model's outputs should be critically evaluated, especially for sensitive applications.
+## Notes
+This was just a fun testing model, merged with the `merge.py` script in the base of the repo. Find GGUFs at [leafspark/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/leafspark/Mistral-Large-218B-Instruct-GGUF/)
+Compatible `mergekit` config:
+```yaml
+slices:
+- sources:
+  - layer_range: [0, 20]
+    model: mistralai/Mistral-Large-Instruct-2407
+- sources:
+  - layer_range: [10, 30]
+    model: mistralai/Mistral-Large-Instruct-2407
+- sources:
+  - layer_range: [20, 40]
+    model: mistralai/Mistral-Large-Instruct-2407
+- sources:
+  - layer_range: [30, 50]
+    model: mistralai/Mistral-Large-Instruct-2407
+- sources:
+  - layer_range: [40, 60]
+    model: mistralai/Mistral-Large-Instruct-2407
+- sources:
+  - layer_range: [50, 70]
+    model: mistralai/Mistral-Large-Instruct-2407
+- sources:
+  - layer_range: [60, 80]
+    model: mistralai/Mistral-Large-Instruct-2407
+- sources:
+  - layer_range: [70, 87]
+    model: mistralai/Mistral-Large-Instruct-2407
+merge_method: passthrough
+dtype: bfloat16
+```