MiniMax-AI commited on
Commit
e1640d7
·
1 Parent(s): 45fa731

Initial Commit

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +1 -0
  2. LICENSE +42 -0
  3. README.md +205 -3
  4. config.json +126 -0
  5. configuration_minimax_text_01.py +152 -0
  6. figures/MiniMaxLogo.png +0 -0
  7. figures/TextBench.png +0 -0
  8. figures/VisionBench.png +0 -0
  9. figures/hailuo.svg +1 -0
  10. figures/image.jpg +0 -0
  11. figures/minimax.svg +1 -0
  12. figures/niah.png +3 -0
  13. main.py +100 -0
  14. merges.txt +0 -0
  15. model-00000-of-00413.safetensors +3 -0
  16. model-00001-of-00413.safetensors +3 -0
  17. model-00002-of-00413.safetensors +3 -0
  18. model-00003-of-00413.safetensors +3 -0
  19. model-00004-of-00413.safetensors +3 -0
  20. model-00005-of-00413.safetensors +3 -0
  21. model-00006-of-00413.safetensors +3 -0
  22. model-00007-of-00413.safetensors +3 -0
  23. model-00008-of-00413.safetensors +3 -0
  24. model-00009-of-00413.safetensors +3 -0
  25. model-00010-of-00413.safetensors +3 -0
  26. model-00011-of-00413.safetensors +3 -0
  27. model-00012-of-00413.safetensors +3 -0
  28. model-00013-of-00413.safetensors +3 -0
  29. model-00014-of-00413.safetensors +3 -0
  30. model-00015-of-00413.safetensors +3 -0
  31. model-00016-of-00413.safetensors +3 -0
  32. model-00017-of-00413.safetensors +3 -0
  33. model-00018-of-00413.safetensors +3 -0
  34. model-00019-of-00413.safetensors +3 -0
  35. model-00020-of-00413.safetensors +3 -0
  36. model-00021-of-00413.safetensors +3 -0
  37. model-00022-of-00413.safetensors +3 -0
  38. model-00023-of-00413.safetensors +3 -0
  39. model-00024-of-00413.safetensors +3 -0
  40. model-00025-of-00413.safetensors +3 -0
  41. model-00026-of-00413.safetensors +3 -0
  42. model-00027-of-00413.safetensors +3 -0
  43. model-00028-of-00413.safetensors +3 -0
  44. model-00029-of-00413.safetensors +3 -0
  45. model-00030-of-00413.safetensors +3 -0
  46. model-00031-of-00413.safetensors +3 -0
  47. model-00032-of-00413.safetensors +3 -0
  48. model-00033-of-00413.safetensors +3 -0
  49. model-00034-of-00413.safetensors +3 -0
  50. model-00035-of-00413.safetensors +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ figures/niah.png filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ MINIMAX MODEL LICENSE AGREEMENT
3
+
4
+ 1. Definitions
5
+ "Agreement" means the terms and conditions for use, reproduction, distribution and modification of the MiniMax Model Materials set forth herein.
6
+ "License" or "you" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
7
+ "MiniMax Model" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by MiniMax at https://huggingface.co/MiniMaxAI/MiniMaxText01, https://huggingface.co/MiniMaxAI/MiniMaxVL01, https://github.com/MiniMax-AI/MiniMax01. In this agreement, MiniMax Model including MiniMaxText01 and MiniMaxVL01.
8
+ "MiniMax Model Materials" means, collectively, MiniMax’s proprietary MiniMax Model and Documentation (and any portion thereof) made available under this Agreement.
9
+ "MiniMax" or "we" means MiniMax AI.
10
+
11
+ 2. License Rights and Redistribution
12
+ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under MiniMax’s intellectual property or other rights owned by MiniMax embodied in the MiniMax Model Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the MiniMax Model Materials.
13
+ b. Redistribution and Use.
14
+ i. If you distribute or make available the MiniMax Model Materials (or any derivative works thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide a copy of this Agreement with any such MiniMax Model Materials; and (B) prominently display “Built with MiniMax AI” on a related website, user interface, blogpost, about page, or product documentation. If you use the MiniMax Model Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “MiniMax” at the beginning of any such AI model name.
15
+ ii. You must retain in all copies of the MiniMax Model Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “MiniMax AI model is licensed under the MiniMax License, Copyright © MiniMax. All Rights Reserved.”
16
+ iii. Your use of the MiniMax Model Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Prohibited Uses Policy for the MiniMax Model Materials, which is hereby incorporated by reference into this Agreement.
17
+ iv. You will not use the MiniMax Model Materials or any output or results of the MiniMax Model Materials to improve any other large language model.
18
+
19
+ 3. Additional Commercial Terms. If, on the MiniMax Model Materials release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 100 million monthly active users in the preceding calendar month, you must request a license from MiniMax, which MiniMax may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until MiniMax otherwise expressly grants you such rights.
20
+
21
+ 4. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE MINIMAX MODEL MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND MINIMAX DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE MINIMAX MODEL MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE MINIMAX MODEL MATERIALS AND ANY OUTPUT AND RESULTS.
22
+
23
+ 5. Limitation of Liability. IN NO EVENT WILL MINIMAX OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF MINIMAX OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
24
+
25
+ 6. Intellectual Property.
26
+ a. No trademark licenses are granted under this Agreement, and in connection with the MiniMax Model Materials, neither MiniMax nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the MiniMax Materials or as set forth in this Section 6(a). MiniMax hereby grants you a license to use "MiniMaxText01" or "MiniMaxVL01" (the "Mark") solely as required to comply with the last sentence of Section 2.b.i. All goodwill arising out of your use of the Mark will inure to the benefit of MiniMax.
27
+ b. Subject to MiniMax’s ownership of MiniMax Model Materials and derivatives made by or for MiniMax, with respect to any derivative works and modifications of the MiniMax Model Materials that are made by you, as between you and MiniMax, you are and will be the owner of such derivative works and modifications.
28
+ c. If you institute litigation or other proceedings against MiniMax or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the MiniMax Model Materials or outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless MiniMax from and against any claim by any third party arising out of or related to your use or distribution of the MiniMax Model Materials.
29
+ 7. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the MiniMax Model Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. MiniMax may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the MiniMax Model Materials. Sections 2, 3 and 6 shall survive the termination of this Agreement.
30
+
31
+ 8. Governing Law and Jurisdiction. This agreement will be governed and construed under the laws of Singapore without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this agreement. Any dispute arising out of or in connection with this Agreement, including any question regarding its existence, validity or termination, shall be referred to and finally resolved by arbitration administered by the Singapore International Arbitration Centre (“SIAC”) in accordance with the Arbitration Rules of the Singapore International Arbitration Centre (“SIAC Rules”) for the time being in force, which rules are deemed to be incorporated by reference in this clause.
32
+
33
+ You agree you will not use, or allow others to use,MiniMaxText01 or MiniMaxVL01 to:
34
+ 1. Violate any applicable federal, state, local, or international law or regulation, or infringe upon the lawful rights or interests of any third party.
35
+ 2. Assist with, engage in or in any way associate with any military purpose.
36
+ 3. Exploit, harm, or attempt to exploit or harm minors in any way.
37
+ 4. Generate or disseminate false or misleading information with the intent to harm others.
38
+ 5. Generate or disseminate content prohibited by applicable laws or regulations.
39
+ 6. Generate or disseminate personally identifiable information without proper authorization or for unreasonable or unlawful purposes.
40
+ 7. Defame, disparage, harass, or cause harm to any individual or entity.
41
+ 8. Carry out fully automated decision-making that adversely affects an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation.
42
+ 9. Promote discrimination, hate speech, or harmful behavior towards individuals or groups based on race or ethnic origin, religion, disability, age, nationality and national origin, veteran status, sexual orientation, gender or gender identity, caste, immigration status, or any other legally protected characteristics or categories.
README.md CHANGED
@@ -1,3 +1,205 @@
1
- ---
2
- license: unknown
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <img src="figures/MiniMaxLogo.png" width="60%" alt="MiniMax-Text-01" />
3
+ </div>
4
+ <hr>
5
+
6
+ <div align="center" style="line-height: 1;">
7
+ <a href="https://www.minimaxi.com/en" target="_blank" style="margin: 2px;">
8
+ <img alt="Homepage" src="https://img.shields.io/badge/_Homepage-MiniMax-FF4040?style=flat-square&labelColor=2C3E50&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgNDkwLjE2IDQxMS43Ij48ZGVmcz48c3R5bGU+LmNscy0xe2ZpbGw6I2ZmZjt9PC9zdHlsZT48L2RlZnM+PHBhdGggY2xhc3M9ImNscy0xIiBkPSJNMjMzLjQ1LDQwLjgxYTE3LjU1LDE3LjU1LDAsMSwwLTM1LjEsMFYzMzEuNTZhNDAuODIsNDAuODIsMCwwLDEtODEuNjMsMFYxNDVhMTcuNTUsMTcuNTUsMCwxLDAtMzUuMDksMHY3OS4wNmE0MC44Miw0MC44MiwwLDAsMS04MS42MywwVjE5NS40MmExMS42MywxMS42MywwLDAsMSwyMy4yNiwwdjI4LjY2YTE3LjU1LDE3LjU1LDAsMCwwLDM1LjEsMFYxNDVBNDAuODIsNDAuODIsMCwwLDEsMTQwLDE0NVYzMzEuNTZhMTcuNTUsMTcuNTUsMCwwLDAsMzUuMSwwVjIxNy41aDBWNDAuODFhNDAuODEsNDAuODEsMCwxLDEsODEuNjIsMFYyODEuNTZhMTEuNjMsMTEuNjMsMCwxLDEtMjMuMjYsMFptMjE1LjksNjMuNEE0MC44Niw0MC44NiwwLDAsMCw0MDguNTMsMTQ1VjMwMC44NWExNy41NSwxNy41NSwwLDAsMS0zNS4wOSwwdi0yNjBhNDAuODIsNDAuODIsMCwwLDAtODEuNjMsMFYzNzAuODlhMTcuNTUsMTcuNTUsMCwwLDEtMzUuMSwwVjMzMGExMS42MywxMS42MywwLDEsMC0yMy4yNiwwdjQwLjg2YTQwLjgxLDQwLjgxLDAsMCwwLDgxLjYyLDBWNDAuODFhMTcuNTUsMTcuNTUsMCwwLDEsMzUuMSwwdjI2MGE0MC44Miw0MC44MiwwLDAsMCw4MS42MywwVjE0NWExNy41NSwxNy41NSwwLDEsMSwzNS4xLDBWMjgxLjU2YTExLjYzLDExLjYzLDAsMCwwLDIzLjI2LDBWMTQ1QTQwLjg1LDQwLjg1LDAsMCwwLDQ0OS4zNSwxMDQuMjFaIi8+PC9zdmc+&logoWidth=20" style="display: inline-block; vertical-align: middle;"/>
9
+ </a>
10
+ <a href="https://huggingface.co/MiniMaxAI" target="_blank" style="margin: 2px;">
11
+ <img alt="Hugging Face" src="https://img.shields.io/badge/🤗_Hugging_Face-MinMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
12
+ </a>
13
+ </div>
14
+ <div align="center" style="line-height: 1;">
15
+ <a href="https://www.hailuo.ai/" target="_blank" style="margin: 2px;">
16
+ <img alt="Chat" src="https://img.shields.io/badge/Chat-_Hailuo AI-FF4040?style=flat-square&labelColor=2C3E50&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgMzc1LjE0IDM3NS4xNCI+PGRlZnM+PHN0eWxlPi5jbHMtMXtmaWxsOnVybCgjdW5uYW1lZC1ncmFkaWVudCk7fTwvc3R5bGU+PGxpbmVhckdyYWRpZW50IGlkPSJ1bm5hbWVkLWdyYWRpZW50IiB4MT0iOC40MiIgeTE9IjEzLjgxIiB4Mj0iNDI5LjY1IiB5Mj0iNDIyLjM3IiBncmFkaWVudFVuaXRzPSJ1c2VyU3BhY2VPblVzZSI+PHN0b3Agb2Zmc2V0PSIwLjA5IiBzdG9wLWNvbG9yPSIjZmZhYjBjIi8+PHN0b3Agb2Zmc2V0PSIwLjMxIiBzdG9wLWNvbG9yPSIjZmY1NTM4Ii8+PHN0b3Agb2Zmc2V0PSIwLjQ2IiBzdG9wLWNvbG9yPSIjZTk0MDVkIi8+PHN0b3Agb2Zmc2V0PSIwLjc1IiBzdG9wLWNvbG9yPSIjZDI2NmRhIi8+PHN0b3Agb2Zmc2V0PSIwLjg5IiBzdG9wLWNvbG9yPSIjZDU4NGVmIi8+PC9saW5lYXJHcmFkaWVudD48L2RlZnM+PHBhdGggY2xhc3M9ImNscy0xIiBkPSJNMzc1LjE0LDE4Ny41N0MzNzUuMTQsODQsMjkwLjc0LS4yNiwxODcuMDksMCw4NC4yNi4yNi4yNiw4NC4yNSwwLDE4Ny4wOWMtLjI2LDEwMy42NSw4NCwxODgsMTg3LjU3LDE4OEgzMTAuODJBNjQuMjEsNjQuMjEsMCwwLDAsMzc1LDMxMC45M1YxOTMuODJoMEMzNzUuMDksMTkxLjc5LDM3NS4xNCwxODkuNjcsMzc1LjE0LDE4Ny41N1ptLTI4NCwxMDQuMTdjLTI5Ljg2LTI1LjQ5LTQ4LjI2LTY2LjI3LTQ3LjQtMTA3Ljg1cS4wOS00LjM4LjQ2LTguNzNWMTc1YzQuMzItNDkuNiwzNi4zNy05NS44OCw4MS4yOS0xMTcuMzZTMjI2LjUyLDQwLjIxLDI2Ny44NSw2OHM2Ni4zMiw3OC4yMSw2My40LDEyNy45MmExNzgsMTc4LDAsMCwxLTUuMTQsMzIuMjVjLTEsNC4yLTIuMyw4LjU3LTUuMjgsMTEuNzJzLTguMiw0LjYtMTEuNzMsMi4wOWMtMy4zNy0yLjQxLTMuODctNy4xMi00LjE2LTExLjI1LTIuMzMtMzMuMzctMTEuMjQtNjcuNzYtMzMuNzktOTIuNDdhMTAzLjY3LDEwMy42NywwLDAsMC02Ni4zOC0zMi44NEExMDcuMTksMTA3LjE5LDAsMCwwLDEzMy4yMiwxMjVDMTE2LDEzNy4yNywxMDIuNTUsMTU0Ljg4LDk2LDE3NXMtNS44Niw0Mi42MSwyLjcxLDYxLjkzYTgxLjg5LDgxLjg5LDAsMCwwLDI5LjcxLDM1YzIyLjk0LDE1LjA2LDU0LjMxLDE3LjIsNzguMTQsMy42czM4LjA3LTQzLjEsMzItNjkuODZTMjA1LjQsMTU4LDE3OC4xMSwxNjAuODRjLTQuMTYuNDMtMTAuMTMsMC0xMC4yOC00LjIxLS4xMi0zLjI0LDMuNzctNC45NCw3LTUuNTIsMjcuNjgtNSw1Ny4zNCw5LjA5LDcyLjUzLDMyLjc3czE2LDU1LjQxLDMuNTYsODAuNjYtMzcsNDMuNjktNjQuMzYsNTAuMzVDMTQ5LjY4LDMyMy44NywxMTYuMzEsMzEzLjI1LDkxLjExLDI5MS43NFoiLz48L3N2Zz4=&logoWidth=16" style="display: inline-block; vertical-align: middle;"/>
17
+ </a>
18
+ <a href="https://intl.minimaxi.com" style="margin: 2px;">
19
+ <img alt="API" src="https://img.shields.io/badge/⚡_API-Platform-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
20
+ </a>
21
+ </div>
22
+ <div align="center" style="line-height: 1;">
23
+ <a href="https://github.com/MiniMax-AI/MiniMax-01/blob/main/LICENSE" style="margin: 2px;">
24
+ <img alt="License" src="https://img.shields.io/badge/📜_License-Model_Agreement-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
25
+ </a>
26
+ </div>
27
+
28
+
29
+ # MiniMax-Text-01
30
+
31
+ ## 1. Introduction
32
+
33
+ MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax-Text-01 also demonstrates the performance of a top-tier model.
34
+
35
+ <p align="center">
36
+ <img width="100%" src="figures/TextBench.png">
37
+ </p>
38
+
39
+ ## 2. Model Architecture
40
+
41
+ The architecture of MiniMax-Text-01 is briefly described as follows:
42
+ - Total Parameters: 456B
43
+ - Activated Parameters per Token: 45.9B
44
+ - Number Layers: 80
45
+ - Hybrid Attention: a softmax attention is positioned after every 7 lightning attention.
46
+ - Number of attention heads: 64
47
+ - Attention head dimension: 128
48
+ - Mixture of Experts:
49
+ - Number of experts: 32
50
+ - Expert hidden dimension: 9216
51
+ - Top-2 routing strategy
52
+ - Positional Encoding: Rotary Position Embedding (RoPE) applied to half of the attention head dimension with a base frequency of 10,000,000
53
+ - Hidden Size: 6144
54
+ - Vocab Size: 200,064
55
+
56
+ ## 3. Evaluation
57
+
58
+ ### Core Academic Benchmarks
59
+
60
+ | **Tasks** | **GPT-4o (11-20)** | **Claude-3.5-Sonnet (10-22)** | **Gemini-1.5-Pro (002)** | **Gemini-2.0-Flash (exp)** | **Qwen2.5-72B-Inst.** | **DeepSeek-V3** | **Llama-3.1-405B-Inst.** | **MiniMax-Text-01** |
61
+ |-------------------------------|--------------------|-------------------------------|--------------------------|----------------------------|-----------------------|-----------------|--------------------------|---------------------|
62
+ | **General** | | | | | | | | |
63
+ | MMLU<sup>*</sup> | 85.7 | 88.3 | 86.8 | 86.5 | 86.1 | 88.5 | **88.6** | 88.5 |
64
+ | MMLU-Pro<sup>*</sup> | 74.4 | **78.0** | 75.8 | 76.4 | 71.1 | 75.9 | 73.3 | 75.7 |
65
+ | SimpleQA | **39.0** | 28.1 | 23.4 | 26.6 | 10.3 | 24.9 | 23.2 | 23.7 |
66
+ | C-SimpleQA | 64.6 | 56.8 | 59.4 | 63.3 | 52.2 | 64.8 | 54.7 | **67.4** |
67
+ | IFEval _(avg)_ | 84.1 | **90.1** | 89.4 | 88.4 | 87.2 | 87.3 | 86.4 | 89.1 |
68
+ | Arena-Hard | **92.4** | 87.6 | 85.3 | 72.7 | 81.2 | 91.4 | 63.5 | 89.1 |
69
+ | **Reasoning** | | | | | | | | |
70
+ | GPQA<sup>*</sup> _(diamond)_ | 46.0 | **65.0** | 59.1 | 62.1 | 49.0 | 59.1 | 50.7 | 54.4 |
71
+ | DROP<sup>*</sup> _(F1)_ | 89.2 | 88.8 | 89.2 | 89.3 | 85.0 | 91.0 | **92.5** | 87.8 |
72
+ | **Mathematics** | | | | | | | | |
73
+ | GSM8k<sup>*</sup> | 95.6 | **96.9** | 95.2 | 95.4 | 95.8 | 96.7 | 96.7 | 94.8 |
74
+ | MATH<sup>*</sup> | 76.6 | 74.1 | **84.6** | 83.9 | 81.8 | **84.6** | 73.8 | 77.4 |
75
+ | **Coding** | | | | | | | | |
76
+ | MBPP + | 76.2 | 75.1 | 75.4 | 75.9 | 77.0 | **78.8** | 73.0 | 71.7 |
77
+ | HumanEval | 90.2 | **93.7** | 86.6 | 89.6 | 86.6 | 92.1 | 89.0 | 86.9 |
78
+
79
+ <sup>*</sup> Evaluated following a _0-shot CoT_ setting.
80
+
81
+ ### Long Benchmarks
82
+ #### 4M Needle In A Haystack Test
83
+ <p align="center">
84
+ <img width="90%" src="figures/niah.png">
85
+ </p>
86
+
87
+ #### Ruler
88
+ | Model | 4k | 8k | 16k | 32k | 64k | 128k | 256k | 512k | 1M |
89
+ |-------|----|----|-----|-----|-----|------|------|------|----|
90
+ | **GPT-4o (11-20)** | **0.970** | 0.921 | 0.890 | 0.888 | 0.884 | - | - | - | - |
91
+ | **Claude-3.5-Sonnet (10-22)** | 0.965 | 0.960 | 0.957 | 0.950 | **0.952** | 0.938 | - | - | - |
92
+ | **Gemini-1.5-Pro (002)** | 0.962 | 0.960 | **0.960** | **0.958** | 0.938 | 0.917 | 0.916 | 0.861 | 0.850 |
93
+ | **Gemini-2.0-Flash (exp)** | 0.960 | 0.960 | 0.951 | 0.957 | 0.937 | 0.860 | 0.797 | 0.709 | - |
94
+ | **MiniMax-Text-01** | 0.963 | **0.961** | 0.953 | 0.954 | 0.943 | **0.947** | **0.945** | **0.928** | **0.910** |
95
+
96
+ #### LongBench v2
97
+ | **Model** | **overall** | **easy** | **hard** | **short** | **medium** | **long** |
98
+ |----------------------------|-------------|----------|----------|------------|------------|----------|
99
+ | Human | 53.7 | 100.0 | 25.1 | 47.2 | 59.1 | 53.7 |
100
+ | **w/ CoT** | | | | | | |
101
+ | GPT-4o (11-20) | 51.4 | 54.2 | 49.7 | 59.6 | 48.6 | 43.5 |
102
+ | Claude-3.5-Sonnet (10-22) | 46.7 | 55.2 | 41.5 | 53.9 | 41.9 | 44.4 |
103
+ | Deepseek-V3 | - | - | - | - | - | - |
104
+ | Qwen2.5-72B-Inst. | 43.5 | 47.9 | 40.8 | 48.9 | 40.9 | 39.8 |
105
+ | **MiniMax-Text-01** | **56.5** | **66.1** | **50.5** | **61.7** | **56.7** | **47.2** |
106
+ | **w/o CoT** | | | | | | |
107
+ | GPT-4o (11-20) | 50.1 | 57.4 | 45.6 | 53.3 | 52.4 | 40.2 |
108
+ | Claude-3.5-Sonnet (10-22) | 41.0 | 46.9 | 37.3 | 46.1 | 38.6 | 37.0 |
109
+ | Deepseek-V3 | 48.7 | - | - | - | - | - |
110
+ | Qwen2.5-72B-Inst. | 42.1 | 42.7 | 41.8 | 45.6 | 38.1 | **44.4** |
111
+ | **MiniMax-Text-01** | **52.9** | **60.9** | **47.9** | **58.9** | **52.6** | 43.5 |
112
+
113
+ #### MTOB
114
+ | **Context Type** | **no context** | **half book** | **full book** | **Δ half book** | **Δ full book** |
115
+ |------------------|----------------|---------------|---------------|------------------|-----------------|
116
+ | **eng → kalam (ChrF)** | | | | | |
117
+ | GPT-4o (11-20) | 9.90 | **54.30** | - | 44.40 | - |
118
+ | Claude-3.5-Sonnet (10-22) | 20.22 | 53.62 | 55.65 | 33.39 | 35.42 |
119
+ | Gemini-1.5-Pro (002) | 16.79 | 53.68 | **57.90** | 36.89 | 41.11 |
120
+ | Gemini-2.0-Flash (exp) | 12.20 | 49.50 | 53.30 | 37.30 | 41.10 |
121
+ | Qwen-Long | 16.55 | 48.48 | 45.94 | 31.92 | 29.39 |
122
+ | **MiniMax-Text-01** | 6.0 | 51.74 | 51.60 | **45.7** | **45.6** |
123
+ | **kalam → eng (BLEURT)** | | | | | |
124
+ | GPT-4o (11-20) | 33.20 | 58.30 | - | 25.10 | - |
125
+ | Claude-3.5-Sonnet (10-22) | 31.42 | 59.70 | 62.30 | 28.28 | 30.88 |
126
+ | Gemini-1.5-Pro (002) | 32.02 | **61.52** | **63.09** | **29.50** | **31.07** |
127
+ | Gemini-2.0-Flash (exp) | 33.80 | 57.50 | 57.00 | 23.70 | 23.20 |
128
+ | Qwen-Long | 30.13 | 53.14 | 32.15 | 23.01 | 2.02 |
129
+ | **MiniMax-Text-01** | 33.65 | 57.10 | 58.00 | 23.45 | 24.35 |
130
+
131
+
132
+ ## 4. Quickstart
133
+ Here we provide a simple example of loading the tokenizer and model to generate content.
134
+ ```python
135
+ from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig
136
+
137
+ # load hf config
138
+ hf_config = AutoConfig.from_pretrained("MiniMax-Text-01", trust_remote_code=True)
139
+
140
+ # quantization config, int8 is recommended
141
+ quantization_config = QuantoConfig(
142
+ weights="int8",
143
+ modules_to_not_convert=[
144
+ "lm_head",
145
+ "embed_tokens",
146
+ ] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
147
+ + [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
148
+ )
149
+
150
+ # set device map
151
+ device_map = {
152
+ 'model.embed_tokens': 'cuda:0',
153
+ 'model.norm': f'cuda:{world_size - 1}',
154
+ 'lm_head': f'cuda:{world_size - 1}'
155
+ }
156
+ # assume 8 GPUs
157
+ world_size = 8
158
+ layers_per_device = hf_config.num_hidden_layers // world_size
159
+ for i in range(world_size):
160
+ for j in range(layers_per_device):
161
+ device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}'
162
+
163
+ # load tokenizer
164
+ tokenizer = AutoTokenizer.from_pretrained("MiniMax-Text-01")
165
+ prompt = "Hello!"
166
+ messages = [
167
+ {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
168
+ {"role": "user", "content": [{"type": "text", "text": prompt}]},
169
+ ]
170
+ text = tokenizer.apply_chat_template(
171
+ messages,
172
+ tokenize=False,
173
+ add_generation_prompt=True
174
+ )
175
+ # tokenize and move to device
176
+ model_inputs = tokenizer(text, return_tensors="pt").to("cuda")
177
+
178
+ # load bfloat16 model, move to device, and apply quantization
179
+ quantized_model = AutoModelForCausalLM.from_pretrained(
180
+ "MiniMax-Text-01",
181
+ torch_dtype="bfloat16",
182
+ device_map=device_map,
183
+ quantization_config=quantization_config,
184
+ trust_remote_code=True,
185
+ offload_buffers=True,
186
+ )
187
+
188
+ # generate response
189
+ generation_config = GenerationConfig(
190
+ max_new_tokens=20,
191
+ eos_token_id=200020,
192
+ use_cache=True,
193
+ )
194
+ generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config)
195
+ print(f"generated_ids: {generated_ids}")
196
+ generated_ids = [
197
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
198
+ ]
199
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
200
+ ```
201
+
202
+ ## 5. Chatbot & API
203
+ For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://intl.minimaxi.com) for developers.
204
+
205
+ Contact us at [model@minimaxi.com](mailto:model@minimaxi.com).
config.json ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MiniMaxText01ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "attn_type_list": [
7
+ 0,
8
+ 0,
9
+ 0,
10
+ 0,
11
+ 0,
12
+ 0,
13
+ 0,
14
+ 1,
15
+ 0,
16
+ 0,
17
+ 0,
18
+ 0,
19
+ 0,
20
+ 0,
21
+ 0,
22
+ 1,
23
+ 0,
24
+ 0,
25
+ 0,
26
+ 0,
27
+ 0,
28
+ 0,
29
+ 0,
30
+ 1,
31
+ 0,
32
+ 0,
33
+ 0,
34
+ 0,
35
+ 0,
36
+ 0,
37
+ 0,
38
+ 1,
39
+ 0,
40
+ 0,
41
+ 0,
42
+ 0,
43
+ 0,
44
+ 0,
45
+ 0,
46
+ 1,
47
+ 0,
48
+ 0,
49
+ 0,
50
+ 0,
51
+ 0,
52
+ 0,
53
+ 0,
54
+ 1,
55
+ 0,
56
+ 0,
57
+ 0,
58
+ 0,
59
+ 0,
60
+ 0,
61
+ 0,
62
+ 1,
63
+ 0,
64
+ 0,
65
+ 0,
66
+ 0,
67
+ 0,
68
+ 0,
69
+ 0,
70
+ 1,
71
+ 0,
72
+ 0,
73
+ 0,
74
+ 0,
75
+ 0,
76
+ 0,
77
+ 0,
78
+ 1,
79
+ 0,
80
+ 0,
81
+ 0,
82
+ 0,
83
+ 0,
84
+ 0,
85
+ 0,
86
+ 1
87
+ ],
88
+ "auto_map": {
89
+ "AutoConfig": "configuration_minimax_text_01.MiniMaxText01Config",
90
+ "AutoModelForCausalLM": "modeling_minimax_text_01.MiniMaxText01ForCausalLM"
91
+ },
92
+ "bos_token_id": null,
93
+ "eos_token_id": null,
94
+ "head_dim": 128,
95
+ "hidden_act": "silu",
96
+ "hidden_size": 6144,
97
+ "initializer_range": 0.02,
98
+ "intermediate_size": 9216,
99
+ "layernorm_full_attention_alpha": 3.5565588200778455,
100
+ "layernorm_full_attention_beta": 1.0,
101
+ "layernorm_linear_attention_alpha": 3.5565588200778455,
102
+ "layernorm_linear_attention_beta": 1.0,
103
+ "layernorm_mlp_alpha": 3.5565588200778455,
104
+ "layernorm_mlp_beta": 1.0,
105
+ "max_position_embeddings": 10240000,
106
+ "model_type": "minimax_text_01",
107
+ "num_attention_heads": 64,
108
+ "num_experts_per_tok": 2,
109
+ "num_hidden_layers": 80,
110
+ "num_key_value_heads": 8,
111
+ "num_local_experts": 32,
112
+ "output_router_logits": false,
113
+ "postnorm": true,
114
+ "rms_norm_eps": 1e-05,
115
+ "rope_theta": 10000000,
116
+ "rotary_dim": 64,
117
+ "router_aux_loss_coef": 0.001,
118
+ "router_jitter_noise": 0.0,
119
+ "shared_intermediate_size": 0,
120
+ "shared_moe_mode": "sigmoid",
121
+ "sliding_window": null,
122
+ "tie_word_embeddings": false,
123
+ "transformers_version": "4.45.2",
124
+ "use_cache": true,
125
+ "vocab_size": 200064
126
+ }
configuration_minimax_text_01.py ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """ MiniMaxText01 model configuration"""
2
+
3
+ from transformers.configuration_utils import PretrainedConfig
4
+ from transformers.utils import logging
5
+
6
+
7
+ logger = logging.get_logger(__name__)
8
+
9
+
10
+ class MiniMaxText01Config(PretrainedConfig):
11
+ r"""
12
+ This is the configuration class to store the configuration of a [`MiniMaxText01Model`]. It is used to instantiate an
13
+ MiniMaxText01 model according to the specified arguments, defining the model architecture. Instantiating a configuration
14
+ with the defaults will yield a similar configuration to that of the MiniMaxText01.
15
+
16
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
17
+ documentation from [`PretrainedConfig`] for more information.
18
+
19
+
20
+ Args:
21
+ vocab_size (`int`, *optional*, defaults to 32000):
22
+ Vocabulary size of the MiniMaxText01 model. Defines the number of different tokens that can be represented by the
23
+ `inputs_ids` passed when calling [`MiniMaxText01Model`]
24
+ hidden_size (`int`, *optional*, defaults to 4096):
25
+ Dimension of the hidden representations.
26
+ intermediate_size (`int`, *optional*, defaults to 14336):
27
+ Dimension of the MLP representations.
28
+ num_hidden_layers (`int`, *optional*, defaults to 32):
29
+ Number of hidden layers in the Transformer encoder.
30
+ num_attention_heads (`int`, *optional*, defaults to 32):
31
+ Number of attention heads for each attention layer in the Transformer encoder.
32
+ num_key_value_heads (`int`, *optional*, defaults to 8):
33
+ This is the number of key_value heads that should be used to implement Grouped Query Attention. If
34
+ `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
35
+ `num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
36
+ converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
37
+ by meanpooling all the original heads within that group. For more details checkout [this
38
+ paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to `8`.
39
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
40
+ The non-linear activation function (function or string) in the decoder.
41
+ max_position_embeddings (`int`, *optional*, defaults to `4096*32`):
42
+ The maximum sequence length that this model might ever be used with. MiniMaxText01's sliding window attention
43
+ allows sequence of up to 4096*32 tokens.
44
+ initializer_range (`float`, *optional*, defaults to 0.02):
45
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
46
+ rms_norm_eps (`float`, *optional*, defaults to 1e-05):
47
+ The epsilon used by the rms normalization layers.
48
+ use_cache (`bool`, *optional*, defaults to `True`):
49
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
50
+ relevant if `config.is_decoder=True`.
51
+ pad_token_id (`int`, *optional*):
52
+ The id of the padding token.
53
+ bos_token_id (`int`, *optional*, defaults to 1):
54
+ The id of the "beginning-of-sequence" token.
55
+ eos_token_id (`int`, *optional*, defaults to 2):
56
+ The id of the "end-of-sequence" token.
57
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
58
+ Whether the model's input and output word embeddings should be tied.
59
+ rope_theta (`float`, *optional*, defaults to 1000000.0):
60
+ The base period of the RoPE embeddings.
61
+ sliding_window (`int`, *optional*):
62
+ Sliding window attention window size. If not specified, will default to `4096`.
63
+ attention_dropout (`float`, *optional*, defaults to 0.0):
64
+ The dropout ratio for the attention probabilities.
65
+ num_experts_per_tok (`int`, *optional*, defaults to 2):
66
+ The number of experts to route per-token, can be also interpreted as the `top-k` routing
67
+ parameter
68
+ num_local_experts (`int`, *optional*, defaults to 8):
69
+ Number of experts per Sparse MLP layer.
70
+ output_router_logits (`bool`, *optional*, defaults to `False`):
71
+ Whether or not the router logits should be returned by the model. Enabeling this will also
72
+ allow the model to output the auxiliary loss. See [here]() for more details
73
+ router_aux_loss_coef (`float`, *optional*, defaults to 0.001):
74
+ The aux loss factor for the total loss.
75
+ router_jitter_noise (`float`, *optional*, defaults to 0.0):
76
+ Amount of noise to add to the router.
77
+
78
+ ```python
79
+ >>> from transformers import MiniMaxText01Model, MiniMaxText01Config
80
+
81
+ >>> # Initializing a MiniMaxText01 style configuration
82
+ >>> configuration = MiniMaxText01Config()
83
+
84
+ >>> # Initializing a model from the MiniMaxText01 style configuration
85
+ >>> model = MiniMaxText01Model(configuration)
86
+
87
+ >>> # Accessing the model configuration
88
+ >>> configuration = model.config
89
+ ```"""
90
+
91
+ model_type = "MiniMaxText01"
92
+ keys_to_ignore_at_inference = ["past_key_values"]
93
+
94
+ def __init__(
95
+ self,
96
+ vocab_size=32000,
97
+ hidden_size=4096,
98
+ intermediate_size=14336,
99
+ num_hidden_layers=32,
100
+ num_attention_heads=32,
101
+ num_key_value_heads=8,
102
+ hidden_act="silu",
103
+ max_position_embeddings=4096 * 32,
104
+ initializer_range=0.02,
105
+ rms_norm_eps=1e-5,
106
+ use_cache=True,
107
+ pad_token_id=None,
108
+ bos_token_id=None,
109
+ eos_token_id=None,
110
+ tie_word_embeddings=False,
111
+ rope_theta=1e6,
112
+ sliding_window=None,
113
+ attention_dropout=0.0,
114
+ num_experts_per_tok=2,
115
+ num_local_experts=8,
116
+ output_router_logits=False,
117
+ router_aux_loss_coef=0.001,
118
+ router_jitter_noise=0.0,
119
+ **kwargs,
120
+ ):
121
+ self.vocab_size = vocab_size
122
+ self.max_position_embeddings = max_position_embeddings
123
+ self.hidden_size = hidden_size
124
+ self.intermediate_size = intermediate_size
125
+ self.num_hidden_layers = num_hidden_layers
126
+ self.num_attention_heads = num_attention_heads
127
+ self.sliding_window = sliding_window
128
+
129
+ # for backward compatibility
130
+ if num_key_value_heads is None:
131
+ num_key_value_heads = num_attention_heads
132
+
133
+ self.num_key_value_heads = num_key_value_heads
134
+ self.hidden_act = hidden_act
135
+ self.initializer_range = initializer_range
136
+ self.rms_norm_eps = rms_norm_eps
137
+ self.use_cache = use_cache
138
+ self.rope_theta = rope_theta
139
+ self.attention_dropout = attention_dropout
140
+
141
+ self.num_experts_per_tok = num_experts_per_tok
142
+ self.num_local_experts = num_local_experts
143
+ self.output_router_logits = output_router_logits
144
+ self.router_aux_loss_coef = router_aux_loss_coef
145
+ self.router_jitter_noise = router_jitter_noise
146
+ super().__init__(
147
+ pad_token_id=pad_token_id,
148
+ bos_token_id=bos_token_id,
149
+ eos_token_id=eos_token_id,
150
+ tie_word_embeddings=tie_word_embeddings,
151
+ **kwargs,
152
+ )
figures/MiniMaxLogo.png ADDED
figures/TextBench.png ADDED
figures/VisionBench.png ADDED
figures/hailuo.svg ADDED
figures/image.jpg ADDED
figures/minimax.svg ADDED
figures/niah.png ADDED

Git LFS Details

  • SHA256: 73fbd47b590198dad0ea6be7c45c35ce738a2978deb893c842721f0f0cf02eb8
  • Pointer size: 132 Bytes
  • Size of remote file: 1.47 MB
main.py ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig
2
+ import torch
3
+ import argparse
4
+
5
+ """
6
+ usage:
7
+ export SAFETENSORS_FAST_GPU=1
8
+ python main.py --quant_type int8 --world_size 8 --model_id <model_path>
9
+ """
10
+
11
+ def generate_quanto_config(hf_config: AutoConfig, quant_type: str):
12
+ QUANT_TYPE_MAP = {
13
+ "default": None,
14
+ "int8": QuantoConfig(
15
+ weights="int8",
16
+ modules_to_not_convert=[
17
+ "lm_head",
18
+ "embed_tokens",
19
+ ] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
20
+ + [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
21
+ ),
22
+ }
23
+ return QUANT_TYPE_MAP[quant_type]
24
+
25
+
26
+ def parse_args():
27
+ parser = argparse.ArgumentParser()
28
+ parser.add_argument("--quant_type", type=str, default="default", choices=["default", "int8"])
29
+ parser.add_argument("--model_id", type=str, required=True)
30
+ parser.add_argument("--world_size", type=int, required=True)
31
+ return parser.parse_args()
32
+
33
+
34
+ def check_params(args, hf_config: AutoConfig):
35
+ if args.quant_type == "int8":
36
+ assert args.world_size >= 8, "int8 weight-only quantization requires at least 8 GPUs"
37
+
38
+ assert hf_config.num_hidden_layers % args.world_size == 0, f"num_hidden_layers({hf_config.num_hidden_layers}) must be divisible by world_size({args.world_size})"
39
+
40
+
41
+ @torch.no_grad()
42
+ def main():
43
+ args = parse_args()
44
+ print("\n=============== Argument ===============")
45
+ for key in vars(args):
46
+ print(f"{key}: {vars(args)[key]}")
47
+ print("========================================")
48
+
49
+ model_id = args.model_id
50
+
51
+ hf_config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
52
+ check_params(args, hf_config)
53
+ quantization_config = generate_quanto_config(hf_config, args.quant_type)
54
+
55
+ device_map = {
56
+ 'model.embed_tokens': 'cuda:0',
57
+ 'model.norm': f'cuda:{args.world_size - 1}',
58
+ 'lm_head': f'cuda:{args.world_size - 1}'
59
+ }
60
+ layers_per_device = hf_config.num_hidden_layers // args.world_size
61
+ for i in range(args.world_size):
62
+ for j in range(layers_per_device):
63
+ device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}'
64
+
65
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
66
+ prompt = "Hello!"
67
+ messages = [
68
+ {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by Minimax based on MiniMax-Text-01 model."}]},
69
+ {"role": "user", "content": [{"type": "text", "text": prompt}]},
70
+ ]
71
+ text = tokenizer.apply_chat_template(
72
+ messages,
73
+ tokenize=False,
74
+ add_generation_prompt=True
75
+ )
76
+ model_inputs = tokenizer(text, return_tensors="pt").to("cuda")
77
+ quantized_model = AutoModelForCausalLM.from_pretrained(
78
+ model_id,
79
+ torch_dtype="bfloat16",
80
+ device_map=device_map,
81
+ quantization_config=quantization_config,
82
+ trust_remote_code=True,
83
+ offload_buffers=True,
84
+ )
85
+ generation_config = GenerationConfig(
86
+ max_new_tokens=20,
87
+ eos_token_id=200020,
88
+ use_cache=True,
89
+ )
90
+ generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config)
91
+ print(f"generated_ids: {generated_ids}")
92
+ generated_ids = [
93
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
94
+ ]
95
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
96
+ print(response)
97
+
98
+ if __name__ == "__main__":
99
+ main()
100
+
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00000-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45b85926994c2ee092c2319158471ed9262d146bd5973c442e135dae9e21624d
3
+ size 4916773000
model-00001-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cdaea944f170b60d206e41a80accbfcf5b9c74744f014a819c30f45cb9a9130
3
+ size 2191113152
model-00002-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f15b83f1afd5c5da8853b7d9bd2c9814dbcb2de7a2f1a24765e16aa7a310d82
3
+ size 2330307784
model-00003-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:83f66465aca90c07b247e218950c1d03594dc08c7170ad2e1a8aaa82b26612fe
3
+ size 2254810656
model-00004-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7ddaa7380c6dc2cc109b209bfb8450f56bd88f03bcba19f9f0735d24d65a1ed
3
+ size 2116402376
model-00005-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99060d0311ad70c53858396b8fbf2a24e2476b965e809335aac99eb3a536c685
3
+ size 2103016184
model-00006-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4785d81a7df999ea5ffd7688ddc2228c607c6b01173e6e53cbe718fa2da6f8b2
3
+ size 2254810688
model-00007-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1306d2e054fd20b6900b6d50965ee12257116f5967747a6cfe6dcfd5d8d4f8cd
3
+ size 2116402392
model-00008-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afb8c7602baf0a815e8ad02cbecfb80a9ae5d29f9ce10a578762544a5de1c0cb
3
+ size 2202839784
model-00009-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a1d8a6c70f49e8adaf8dafd15c8235be9331ffb975f2dce5ffe986cfca770598
3
+ size 2151680440
model-00010-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f94ec16a2f79adf7fa7eb9ba583fe5441146a8ee9405eb3a9e4ba62ce58b0016
3
+ size 2264926800
model-00011-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f4bd5579bb8314355e60ab945da0d43332914854a6877e1c7e29c0796bdcc45
3
+ size 2151680448
model-00012-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ad1363efef21cc1eff543eb1e18cdc35b706925468b31746589dd5520b27c44
3
+ size 2264926776
model-00013-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d4ed9d125a753491a1a52257e4c3313c81e4b23ceec1ea92060f21fda6506d33
3
+ size 2151680456
model-00014-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3874e30152a24167cffb45a368d644e848b39191a18250c8baa372badf1404fa
3
+ size 2264926792
model-00015-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33fa7b68a1ca03cd36dc34b43098a4342cb90b6ab29a4ed81bbc5fd22ff3206b
3
+ size 2151680440
model-00016-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52a2cdc9917920ae2e678dd8c8f5118bd269310dc24e09a9795def0bfc6db289
3
+ size 2264926792
model-00017-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:966bb346a62cb5bcaf692cae407fb53e3ba58a6a7798a5012e7979110d44debb
3
+ size 2151680456
model-00018-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6acadaecce036887bbf3d9e80bd9765f3f63ed147b0318c47dc151c484ee5dea
3
+ size 2264926776
model-00019-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d45b4bfc0d98ffea2bbfa70672199845c424645139170705d1a615cb4b32bedf
3
+ size 2151680456
model-00020-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4d89e6f890817b4eaacbb61d447ea9741a1ae1750bb6b555f49f97eb53eed3a
3
+ size 2264926792
model-00021-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:989d3448acab544b95096da7cb3fa2530c658c54eb9a067ef0f9735d0fed98e9
3
+ size 2151680440
model-00022-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4079b9211347c16ab94cf92b3be67b2506f7fd22924feeb886c511878ada34a
3
+ size 2264926792
model-00023-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa5667b97d73be5c951a84fd5141d7e5098854e386b08de3ad2d4a26619121d0
3
+ size 2151680456
model-00024-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1b4cfc292b3496550591242fa27d8853f48f26952a68a62264559e2a8a7026c
3
+ size 2264926776
model-00025-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0084a9a0f19e13b06b7fd6c2ab77212ecabc1237991715a4cc0a8b3760bab059
3
+ size 2151680456
model-00026-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec99d07905384e93ff617910268fd614403273bb72a0e9a66e23477609404d39
3
+ size 2264926800
model-00027-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf1ae16fe8da6e760ccdcbff16b81047460783c12a97e35572db64b705832b36
3
+ size 2151680440
model-00028-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce157751c2a5278886267e41e2ad71a8fdc7bdd5f6fa798c939234992dd11ab2
3
+ size 2264926792
model-00029-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7683f74793f627034ee64af8bac8fdd15ebc0e50127cc9779668091ca3410398
3
+ size 2151680456
model-00030-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a713332d8255b32f06f8ca0ff73b6a847c2e788f3086d26e921aa927b1630b2
3
+ size 2264926776
model-00031-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ddc6b163c1970bce84c2c78a34e426873dadda5b5ef1d574e044f914ffae47d
3
+ size 2151680448
model-00032-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57f494acdd9fcdb5338f90b471913df350612bba845ef2c8de766b77c95104a2
3
+ size 2264926800
model-00033-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc35a4d97876ed9cd70ab29ba5c7b486e7c94d38c6caa944c01c8d3aeccfc841
3
+ size 2151680440
model-00034-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a2fd3f245a02e4ac7e398d6431129ab8eb7a1a2650a5ece70415ad03a3d8819
3
+ size 2264926784
model-00035-of-00413.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc3cd1085b6b2bd1e31e9722248130bfeb7ff9525a48f0bec08f86b62db752d5
3
+ size 2151680456