metadata

library_name: transformers

MiniMax-Text-01

1. Introduction

MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax-Text-01 also demonstrates the performance of a top-tier model.

2. Model Architecture

The architecture of MiniMax-Text-01 is briefly described as follows:

Total Parameters: 456B
Activated Parameters per Token: 45.9B
Number Layers: 80
Hybrid Attention: a softmax attention is positioned after every 7 lightning attention.
- Number of attention heads: 64
- Attention head dimension: 128
Mixture of Experts:
- Number of experts: 32
- Expert hidden dimension: 9216
- Top-2 routing strategy
Positional Encoding: Rotary Position Embedding (RoPE) applied to half of the attention head dimension with a base frequency of 10,000,000
Hidden Size: 6144
Vocab Size: 200,064

3. Evaluation

Core Academic Benchmarks

Tasks	GPT-4o (11-20)	Claude-3.5-Sonnet (10-22)	Gemini-1.5-Pro (002)	Gemini-2.0-Flash (exp)	Qwen2.5-72B-Inst.	DeepSeek-V3	Llama-3.1-405B-Inst.	MiniMax-Text-01
General
MMLU^*	85.7	88.3	86.8	86.5	86.1	88.5	88.6	88.5
MMLU-Pro^*	74.4	78.0	75.8	76.4	71.1	75.9	73.3	75.7
SimpleQA	39.0	28.1	23.4	26.6	10.3	24.9	23.2	23.7
C-SimpleQA	64.6	56.8	59.4	63.3	52.2	64.8	54.7	67.4
IFEval (avg)	84.1	90.1	89.4	88.4	87.2	87.3	86.4	89.1
Arena-Hard	92.4	87.6	85.3	72.7	81.2	91.4	63.5	89.1
Reasoning
GPQA^* (diamond)	46.0	65.0	59.1	62.1	49.0	59.1	50.7	54.4
DROP^* (F1)	89.2	88.8	89.2	89.3	85.0	91.0	92.5	87.8
Mathematics
GSM8k^*	95.6	96.9	95.2	95.4	95.8	96.7	96.7	94.8
MATH^*	76.6	74.1	84.6	83.9	81.8	84.6	73.8	77.4
Coding
MBPP +	76.2	75.1	75.4	75.9	77.0	78.8	73.0	71.7
HumanEval	90.2	93.7	86.6	89.6	86.6	92.1	89.0	86.9

^* Evaluated following a 0-shot CoT setting.

Long Benchmarks

4M Needle In A Haystack Test

Ruler

Model	4k	8k	16k	32k	64k	128k	256k	512k	1M
GPT-4o (11-20)	0.970	0.921	0.890	0.888	0.884	-	-	-	-
Claude-3.5-Sonnet (10-22)	0.965	0.960	0.957	0.950	0.952	0.938	-	-	-
Gemini-1.5-Pro (002)	0.962	0.960	0.960	0.958	0.938	0.917	0.916	0.861	0.850
Gemini-2.0-Flash (exp)	0.960	0.960	0.951	0.957	0.937	0.860	0.797	0.709	-
MiniMax-Text-01	0.963	0.961	0.953	0.954	0.943	0.947	0.945	0.928	0.910

LongBench v2

Model	overall	easy	hard	short	medium	long
Human	53.7	100.0	25.1	47.2	59.1	53.7
w/ CoT
GPT-4o (11-20)	51.4	54.2	49.7	59.6	48.6	43.5
Claude-3.5-Sonnet (10-22)	46.7	55.2	41.5	53.9	41.9	44.4
Deepseek-V3	-	-	-	-	-	-
Qwen2.5-72B-Inst.	43.5	47.9	40.8	48.9	40.9	39.8
MiniMax-Text-01	56.5	66.1	50.5	61.7	56.7	47.2
w/o CoT
GPT-4o (11-20)	50.1	57.4	45.6	53.3	52.4	40.2
Claude-3.5-Sonnet (10-22)	41.0	46.9	37.3	46.1	38.6	37.0
Deepseek-V3	48.7	-	-	-	-	-
Qwen2.5-72B-Inst.	42.1	42.7	41.8	45.6	38.1	44.4
MiniMax-Text-01	52.9	60.9	47.9	58.9	52.6	43.5

MTOB

Context Type	no context	half book	full book	Δ half book	Δ full book
eng → kalam (ChrF)
GPT-4o (11-20)	9.90	54.30	-	44.40	-
Claude-3.5-Sonnet (10-22)	20.22	53.62	55.65	33.39	35.42
Gemini-1.5-Pro (002)	16.79	53.68	57.90	36.89	41.11
Gemini-2.0-Flash (exp)	12.20	49.50	53.30	37.30	41.10
Qwen-Long	16.55	48.48	45.94	31.92	29.39
MiniMax-Text-01	6.0	51.74	51.60	45.7	45.6
kalam → eng (BLEURT)
GPT-4o (11-20)	33.20	58.30	-	25.10	-
Claude-3.5-Sonnet (10-22)	31.42	59.70	62.30	28.28	30.88
Gemini-1.5-Pro (002)	32.02	61.52	63.09	29.50	31.07
Gemini-2.0-Flash (exp)	33.80	57.50	57.00	23.70	23.20
Qwen-Long	30.13	53.14	32.15	23.01	2.02
MiniMax-Text-01	33.65	57.10	58.00	23.45	24.35

4. Quickstart

Here we provide a simple example of loading the tokenizer and model to generate content.

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig

# load hf config
hf_config = AutoConfig.from_pretrained("MiniMaxAI/MiniMax-Text-01", trust_remote_code=True)

# quantization config, int8 is recommended
quantization_config =  QuantoConfig(
            weights="int8",
            modules_to_not_convert=[
                "lm_head",
                "embed_tokens",
            ] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
            + [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
        )

# set device map
device_map = {
    'model.embed_tokens': 'cuda:0',
    'model.norm': f'cuda:{world_size - 1}',
    'lm_head': f'cuda:{world_size - 1}'
}
# assume 8 GPUs
world_size = 8
layers_per_device = hf_config.num_hidden_layers // world_size
for i in range(world_size):
    for j in range(layers_per_device):
        device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}'

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-Text-01")
prompt = "Hello!"
messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
    {"role": "user", "content": [{"type": "text", "text": prompt}]},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
# tokenize and move to device
model_inputs = tokenizer(text, return_tensors="pt").to("cuda")

# load bfloat16 model, move to device, and apply quantization
quantized_model = AutoModelForCausalLM.from_pretrained(
    "MiniMaxAI/MiniMax-Text-01",
    torch_dtype="bfloat16",
    device_map=device_map,
    quantization_config=quantization_config,
    trust_remote_code=True,
    offload_buffers=True,
)

# generate response
generation_config = GenerationConfig(
    max_new_tokens=20,
    eos_token_id=200020,
    use_cache=True,
)
generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config)
print(f"generated_ids: {generated_ids}")
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

5. Citation

@misc{minimax2025minimax01scalingfoundationmodels,
      title={MiniMax-01: Scaling Foundation Models with Lightning Attention}, 
      author={MiniMax and Aonian Li and Bangwei Gong and Bo Yang and Boji Shan and Chang Liu and Cheng Zhu and Chunhao Zhang and Congchao Guo and Da Chen and Dong Li and Enwei Jiao and Gengxin Li and Guojun Zhang and Haohai Sun and Houze Dong and Jiadai Zhu and Jiaqi Zhuang and Jiayuan Song and Jin Zhu and Jingtao Han and Jingyang Li and Junbin Xie and Junhao Xu and Junjie Yan and Kaishun Zhang and Kecheng Xiao and Kexi Kang and Le Han and Leyang Wang and Lianfei Yu and Liheng Feng and Lin Zheng and Linbo Chai and Long Xing and Meizhi Ju and Mingyuan Chi and Mozhi Zhang and Peikai Huang and Pengcheng Niu and Pengfei Li and Pengyu Zhao and Qi Yang and Qidi Xu and Qiexiang Wang and Qin Wang and Qiuhui Li and Ruitao Leng and Shengmin Shi and Shuqi Yu and Sichen Li and Songquan Zhu and Tao Huang and Tianrun Liang and Weigao Sun and Weixuan Sun and Weiyu Cheng and Wenkai Li and Xiangjun Song and Xiao Su and Xiaodong Han and Xinjie Zhang and Xinzhu Hou and Xu Min and Xun Zou and Xuyang Shen and Yan Gong and Yingjie Zhu and Yipeng Zhou and Yiran Zhong and Yongyi Hu and Yuanxiang Fan and Yue Yu and Yufeng Yang and Yuhao Li and Yunan Huang and Yunji Li and Yunpeng Huang and Yunzhi Xu and Yuxin Mao and Zehan Li and Zekang Li and Zewei Tao and Zewen Ying and Zhaoyang Cong and Zhen Qin and Zhenhua Fan and Zhihang Yu and Zhuo Jiang and Zijia Wu},
      year={2025},
      eprint={2501.08313},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.08313}, 
}

6. Chatbot & API

For general use and evaluation, we provide a Chatbot with online search capabilities and the online API for developers.