File size: 4,233 Bytes

b2aed30

---
license: other
license_name: mrl
license_link: https://mistral.ai/licenses/MRL-0.1.md
language:
  - en
  - fr
  - de
  - es
  - it
  - pt
  - zh
  - ja
  - ru 
  - ko
---

# Mistral-Large-218B-Instruct

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/P-BGJ5Ba2d1NkpdGXNThe.png)

Mistral-Large-218B-Instruct is an advanced dense Large Language Model (LLM) with 218 billion parameters, featuring state-of-the-art reasoning, knowledge, and coding capabilities.

Self-merged from the original Mistral Large 2, see mergekit config below.

## Key features
- Massive scale: With 218 billion parameters, this model pushes the boundaries of language model capabilities.
- Multi-lingual by design: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
- Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, JavaScript, and Bash, as well as more specific languages like Swift and Fortran.
- Agentic-centric: Best-in-class agentic capabilities with native function calling and JSON outputting.
- Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities.
- Mistral Research License: Allows usage and modification for research and non-commercial purposes.
- Large Context: Features a large 128k context window for handling extensive input.

## Metrics

Note: The following metrics are based on the original model and may differ for this 218B parameter version. Updated benchmarks will be provided when available.

**Base Pretrained Benchmarks**

| Benchmark | Score |
| --- | --- |
| MMLU | 84.0% |

**Base Pretrained Multilingual Benchmarks (MMLU)**
| Benchmark | Score |
| --- | --- |
| French | 82.8% |
| German | 81.6% |
| Spanish | 82.7% |
| Italian | 82.7% |
| Dutch | 80.7% |
| Portuguese | 81.6% |
| Russian | 79.0% |
| Korean | 60.1% |
| Japanese | 78.8% |
| Chinese | 74.8% |

**Instruction Benchmarks**

| Benchmark | Score |
| --- | --- |
| MT Bench | 8.63 |
| Wild Bench | 56.3 |
| Arena Hard| 73.2 |

**Code & Reasoning Benchmarks**
| Benchmark | Score |
| --- | --- |
| Human Eval | 92% |
| Human Eval Plus| 87% |
| MBPP Base| 80% |
| MBPP Plus| 69% |

**Math Benchmarks**

| Benchmark | Score |
| --- | --- |
| GSM8K | 93% |
| Math Instruct (0-shot, no CoT) | 70% |
| Math Instruct (0-shot, CoT)| 71.5% |

## Usage

This model can be used with standard LLM frameworks and libraries. Specific usage instructions will be provided upon release.

## Hardware Requirements

Given the size of this model (218B parameters), it requires substantial computational resources for inference:
- Recommended: 8xH100 (640GB)
- Alternatively: Distributed inference setup across multiple machines.

## Limitations

- This model does not have built-in moderation mechanisms. Users should implement appropriate safeguards for deployment in production environments.
- Due to its size, inference may be computationally expensive and require significant hardware resources.
- As with all large language models, it may exhibit biases present in its training data.
- The model's outputs should be critically evaluated, especially for sensitive applications.

## Notes

This was just a fun testing model, merged with the `merge.py` script in the base of the repo. Find GGUFs at [leafspark/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/leafspark/Mistral-Large-218B-Instruct-GGUF/)

Compatible `mergekit` config:
```yaml
slices:
- sources:
  - layer_range: [0, 20]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [10, 30]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [20, 40]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [30, 50]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [40, 60]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [50, 70]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [60, 80]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [70, 87]
    model: mistralai/Mistral-Large-Instruct-2407
merge_method: passthrough
dtype: bfloat16
```