leafspark's picture
readme: add model card
b2aed30 verified
|
raw
history blame
4.23 kB
metadata
license: other
license_name: mrl
license_link: https://mistral.ai/licenses/MRL-0.1.md
language:
  - en
  - fr
  - de
  - es
  - it
  - pt
  - zh
  - ja
  - ru
  - ko

Mistral-Large-218B-Instruct

image/png

Mistral-Large-218B-Instruct is an advanced dense Large Language Model (LLM) with 218 billion parameters, featuring state-of-the-art reasoning, knowledge, and coding capabilities.

Self-merged from the original Mistral Large 2, see mergekit config below.

Key features

  • Massive scale: With 218 billion parameters, this model pushes the boundaries of language model capabilities.
  • Multi-lingual by design: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
  • Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, JavaScript, and Bash, as well as more specific languages like Swift and Fortran.
  • Agentic-centric: Best-in-class agentic capabilities with native function calling and JSON outputting.
  • Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities.
  • Mistral Research License: Allows usage and modification for research and non-commercial purposes.
  • Large Context: Features a large 128k context window for handling extensive input.

Metrics

Note: The following metrics are based on the original model and may differ for this 218B parameter version. Updated benchmarks will be provided when available.

Base Pretrained Benchmarks

Benchmark Score
MMLU 84.0%

Base Pretrained Multilingual Benchmarks (MMLU)

Benchmark Score
French 82.8%
German 81.6%
Spanish 82.7%
Italian 82.7%
Dutch 80.7%
Portuguese 81.6%
Russian 79.0%
Korean 60.1%
Japanese 78.8%
Chinese 74.8%

Instruction Benchmarks

Benchmark Score
MT Bench 8.63
Wild Bench 56.3
Arena Hard 73.2

Code & Reasoning Benchmarks

Benchmark Score
Human Eval 92%
Human Eval Plus 87%
MBPP Base 80%
MBPP Plus 69%

Math Benchmarks

Benchmark Score
GSM8K 93%
Math Instruct (0-shot, no CoT) 70%
Math Instruct (0-shot, CoT) 71.5%

Usage

This model can be used with standard LLM frameworks and libraries. Specific usage instructions will be provided upon release.

Hardware Requirements

Given the size of this model (218B parameters), it requires substantial computational resources for inference:

  • Recommended: 8xH100 (640GB)
  • Alternatively: Distributed inference setup across multiple machines.

Limitations

  • This model does not have built-in moderation mechanisms. Users should implement appropriate safeguards for deployment in production environments.
  • Due to its size, inference may be computationally expensive and require significant hardware resources.
  • As with all large language models, it may exhibit biases present in its training data.
  • The model's outputs should be critically evaluated, especially for sensitive applications.

Notes

This was just a fun testing model, merged with the merge.py script in the base of the repo. Find GGUFs at leafspark/Mistral-Large-218B-Instruct-GGUF

Compatible mergekit config:

slices:
- sources:
  - layer_range: [0, 20]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [10, 30]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [20, 40]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [30, 50]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [40, 60]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [50, 70]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [60, 80]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [70, 87]
    model: mistralai/Mistral-Large-Instruct-2407
merge_method: passthrough
dtype: bfloat16