leafspark commited on
Commit
b2aed30
1 Parent(s): 70952b9

readme: add model card

Browse files
Files changed (1) hide show
  1. README.md +133 -5
README.md CHANGED
@@ -1,5 +1,133 @@
1
- ---
2
- license: other
3
- license_name: mrl
4
- license_link: https://mistral.ai/licenses/MRL-0.1.md
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: mrl
4
+ license_link: https://mistral.ai/licenses/MRL-0.1.md
5
+ language:
6
+ - en
7
+ - fr
8
+ - de
9
+ - es
10
+ - it
11
+ - pt
12
+ - zh
13
+ - ja
14
+ - ru
15
+ - ko
16
+ ---
17
+
18
+ # Mistral-Large-218B-Instruct
19
+
20
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/P-BGJ5Ba2d1NkpdGXNThe.png)
21
+
22
+ Mistral-Large-218B-Instruct is an advanced dense Large Language Model (LLM) with 218 billion parameters, featuring state-of-the-art reasoning, knowledge, and coding capabilities.
23
+
24
+ Self-merged from the original Mistral Large 2, see mergekit config below.
25
+
26
+ ## Key features
27
+ - Massive scale: With 218 billion parameters, this model pushes the boundaries of language model capabilities.
28
+ - Multi-lingual by design: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
29
+ - Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, JavaScript, and Bash, as well as more specific languages like Swift and Fortran.
30
+ - Agentic-centric: Best-in-class agentic capabilities with native function calling and JSON outputting.
31
+ - Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities.
32
+ - Mistral Research License: Allows usage and modification for research and non-commercial purposes.
33
+ - Large Context: Features a large 128k context window for handling extensive input.
34
+
35
+ ## Metrics
36
+
37
+ Note: The following metrics are based on the original model and may differ for this 218B parameter version. Updated benchmarks will be provided when available.
38
+
39
+ **Base Pretrained Benchmarks**
40
+
41
+ | Benchmark | Score |
42
+ | --- | --- |
43
+ | MMLU | 84.0% |
44
+
45
+ **Base Pretrained Multilingual Benchmarks (MMLU)**
46
+ | Benchmark | Score |
47
+ | --- | --- |
48
+ | French | 82.8% |
49
+ | German | 81.6% |
50
+ | Spanish | 82.7% |
51
+ | Italian | 82.7% |
52
+ | Dutch | 80.7% |
53
+ | Portuguese | 81.6% |
54
+ | Russian | 79.0% |
55
+ | Korean | 60.1% |
56
+ | Japanese | 78.8% |
57
+ | Chinese | 74.8% |
58
+
59
+ **Instruction Benchmarks**
60
+
61
+ | Benchmark | Score |
62
+ | --- | --- |
63
+ | MT Bench | 8.63 |
64
+ | Wild Bench | 56.3 |
65
+ | Arena Hard| 73.2 |
66
+
67
+ **Code & Reasoning Benchmarks**
68
+ | Benchmark | Score |
69
+ | --- | --- |
70
+ | Human Eval | 92% |
71
+ | Human Eval Plus| 87% |
72
+ | MBPP Base| 80% |
73
+ | MBPP Plus| 69% |
74
+
75
+ **Math Benchmarks**
76
+
77
+ | Benchmark | Score |
78
+ | --- | --- |
79
+ | GSM8K | 93% |
80
+ | Math Instruct (0-shot, no CoT) | 70% |
81
+ | Math Instruct (0-shot, CoT)| 71.5% |
82
+
83
+ ## Usage
84
+
85
+ This model can be used with standard LLM frameworks and libraries. Specific usage instructions will be provided upon release.
86
+
87
+ ## Hardware Requirements
88
+
89
+ Given the size of this model (218B parameters), it requires substantial computational resources for inference:
90
+ - Recommended: 8xH100 (640GB)
91
+ - Alternatively: Distributed inference setup across multiple machines.
92
+
93
+ ## Limitations
94
+
95
+ - This model does not have built-in moderation mechanisms. Users should implement appropriate safeguards for deployment in production environments.
96
+ - Due to its size, inference may be computationally expensive and require significant hardware resources.
97
+ - As with all large language models, it may exhibit biases present in its training data.
98
+ - The model's outputs should be critically evaluated, especially for sensitive applications.
99
+
100
+ ## Notes
101
+
102
+ This was just a fun testing model, merged with the `merge.py` script in the base of the repo. Find GGUFs at [leafspark/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/leafspark/Mistral-Large-218B-Instruct-GGUF/)
103
+
104
+ Compatible `mergekit` config:
105
+ ```yaml
106
+ slices:
107
+ - sources:
108
+ - layer_range: [0, 20]
109
+ model: mistralai/Mistral-Large-Instruct-2407
110
+ - sources:
111
+ - layer_range: [10, 30]
112
+ model: mistralai/Mistral-Large-Instruct-2407
113
+ - sources:
114
+ - layer_range: [20, 40]
115
+ model: mistralai/Mistral-Large-Instruct-2407
116
+ - sources:
117
+ - layer_range: [30, 50]
118
+ model: mistralai/Mistral-Large-Instruct-2407
119
+ - sources:
120
+ - layer_range: [40, 60]
121
+ model: mistralai/Mistral-Large-Instruct-2407
122
+ - sources:
123
+ - layer_range: [50, 70]
124
+ model: mistralai/Mistral-Large-Instruct-2407
125
+ - sources:
126
+ - layer_range: [60, 80]
127
+ model: mistralai/Mistral-Large-Instruct-2407
128
+ - sources:
129
+ - layer_range: [70, 87]
130
+ model: mistralai/Mistral-Large-Instruct-2407
131
+ merge_method: passthrough
132
+ dtype: bfloat16
133
+ ```