File size: 4,233 Bytes
b2aed30 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
license: other
license_name: mrl
license_link: https://mistral.ai/licenses/MRL-0.1.md
language:
- en
- fr
- de
- es
- it
- pt
- zh
- ja
- ru
- ko
---
# Mistral-Large-218B-Instruct
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/P-BGJ5Ba2d1NkpdGXNThe.png)
Mistral-Large-218B-Instruct is an advanced dense Large Language Model (LLM) with 218 billion parameters, featuring state-of-the-art reasoning, knowledge, and coding capabilities.
Self-merged from the original Mistral Large 2, see mergekit config below.
## Key features
- Massive scale: With 218 billion parameters, this model pushes the boundaries of language model capabilities.
- Multi-lingual by design: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
- Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, JavaScript, and Bash, as well as more specific languages like Swift and Fortran.
- Agentic-centric: Best-in-class agentic capabilities with native function calling and JSON outputting.
- Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities.
- Mistral Research License: Allows usage and modification for research and non-commercial purposes.
- Large Context: Features a large 128k context window for handling extensive input.
## Metrics
Note: The following metrics are based on the original model and may differ for this 218B parameter version. Updated benchmarks will be provided when available.
**Base Pretrained Benchmarks**
| Benchmark | Score |
| --- | --- |
| MMLU | 84.0% |
**Base Pretrained Multilingual Benchmarks (MMLU)**
| Benchmark | Score |
| --- | --- |
| French | 82.8% |
| German | 81.6% |
| Spanish | 82.7% |
| Italian | 82.7% |
| Dutch | 80.7% |
| Portuguese | 81.6% |
| Russian | 79.0% |
| Korean | 60.1% |
| Japanese | 78.8% |
| Chinese | 74.8% |
**Instruction Benchmarks**
| Benchmark | Score |
| --- | --- |
| MT Bench | 8.63 |
| Wild Bench | 56.3 |
| Arena Hard| 73.2 |
**Code & Reasoning Benchmarks**
| Benchmark | Score |
| --- | --- |
| Human Eval | 92% |
| Human Eval Plus| 87% |
| MBPP Base| 80% |
| MBPP Plus| 69% |
**Math Benchmarks**
| Benchmark | Score |
| --- | --- |
| GSM8K | 93% |
| Math Instruct (0-shot, no CoT) | 70% |
| Math Instruct (0-shot, CoT)| 71.5% |
## Usage
This model can be used with standard LLM frameworks and libraries. Specific usage instructions will be provided upon release.
## Hardware Requirements
Given the size of this model (218B parameters), it requires substantial computational resources for inference:
- Recommended: 8xH100 (640GB)
- Alternatively: Distributed inference setup across multiple machines.
## Limitations
- This model does not have built-in moderation mechanisms. Users should implement appropriate safeguards for deployment in production environments.
- Due to its size, inference may be computationally expensive and require significant hardware resources.
- As with all large language models, it may exhibit biases present in its training data.
- The model's outputs should be critically evaluated, especially for sensitive applications.
## Notes
This was just a fun testing model, merged with the `merge.py` script in the base of the repo. Find GGUFs at [leafspark/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/leafspark/Mistral-Large-218B-Instruct-GGUF/)
Compatible `mergekit` config:
```yaml
slices:
- sources:
- layer_range: [0, 20]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [10, 30]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [20, 40]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [30, 50]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [40, 60]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [50, 70]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [60, 80]
model: mistralai/Mistral-Large-Instruct-2407
- sources:
- layer_range: [70, 87]
model: mistralai/Mistral-Large-Instruct-2407
merge_method: passthrough
dtype: bfloat16
``` |