File size: 9,498 Bytes
80c6289 36f8016 9e07b26 e59e9df 9e07b26 20d692d 9e07b26 36f8016 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
license: apache-2.0
datasets:
- Mathoctopus/GSM8KInstruct_Parallel
language:
- en
- es
- zh
- de
- ru
- th
- sw
- ja
- fr
- bn
---
### Introduction
We introduce π MathOctopus, a series of open-source large language models (LLMs) specifically tailored for multilingual math problem-solving. The MathOctopus models are trained on π€ MGSM8KInstruct Dataset, encompassing ten distinct languages.
MathOctopus notably outperforms conventional open-source LLMs and exhibits superiority over ChatGPT in few-shot scenarios.
### Datasets
#### **MGSM8KInstruct**
| Training Dataset | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|:----------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
| MGSM8KInstruct | 7473 | 7472 | 7466 | 6539 | 7466 | 7470 | 7469 | 7471 | 7361 | 7473 | **73.6K** |
#### **MSVAMP**
| Test Dataset | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|:----------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
| MSVAMP | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | **10K** |
#### Usage
Our dataset and models are all available at Huggingface.
π€ [MGSM8KInstruct_Parallel Dataset](https://huggingface.co/datasets/Mathoctopus/GSM8KInstruct_Parallel)
π€ [MSVAMP Dataset](https://huggingface.co/datasets/Mathoctopus/MSVAMP)
Or you can directly download them from
## Models
| Base Model: LLama | Parallel-Training | Cross-Training |
|----|---------------------------------------------------------------|---------------------------------------------------------------------------|
| 7B-LLaMA 2 | π [MathOctopus-Parallel-7B](https://huggingface.co/Mathoctopus/Parallel_7B) | π [MathOctopus-Cross-7B](https://huggingface.co/Mathoctopus/Cross_7B) |
|| π[MathOctopus-Parallel-xRFT-7B](https://huggingface.co/Mathoctopus/Parallel_xRFT_7B)|π[MathOctopus-Cross-xRFT-7B](https://huggingface.co/Mathoctopus/Cross_xRFT_7B)|
| 13B-LLaMA 2 | π [MathOctopus-Parallel-13B] | π [MathOctopus-Cross-13B] |
|| π[MathOctopus-Parallel-xRFT-13B](https://huggingface.co/Mathoctopus/Parallel_xRFT_13B/tree/main)|π[MathOctopus-Cross-xRFT-13B]|
| 33B-LLaMA 1 | π [MathOctopus-Parallel-33B] | π [MathOctopus-Cross-33B] |
| 70B-LLaMA 2 | Coming soon! | Coming Soon! |
*-Parallel refers to our model trained with the parallel-training strategy.
*-Cross refers to our model trained with cross-training strategy.
*-xRFT means we train the model with multilingual rejection sampling.
### **Overall Results on MGSM**
| 7B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
| MathOctupos<sup>C</sup> | 52.0 | 23.6 | 31.6 | 18.8 | 38.0 | 39.2 | 36.4 | 27.2 | 33.6 | 21.6 | 32.2 |
| **xRFT**-MathOctupos<sup>C</sup>| 51.2 | 24.0 | 33.2 | 18.8 | 36.0 | 41.2 | 37.6 | 29.6 | 36.4 | 25.2 | 33.3 |
| MathOctupos<sup>P</sup>-LoRA | 30.4 | 15.2 | 23.6 | 10.4 | 22.8 | 24.8 | 26.4 | 18.0 | 22.0 | 14.8 | 20.8 |
| MathOctupos<sup>P</sup> | 52.4 | 39.2 | 38.4 | 28.8 | 44.8 | 42.4 | 43.6 | 36.0 | 39.6 | 34.4 | 40.0 |
| **xRFT**-MathOctupos<sup>P</sup>| 54.8 | 38.4 | 45.2 | 33.2 | 43.6 | 45.2 | 38.0 | 35.6 | 48.4 | 36.4 | 41.9 |
<p></p >
| 13B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
| MathOctupos<sup>C</sup> | 56.4 | 27.2 | 39.2 | 24.0 | 47.6 | 49.6 | 47.6 | 40.4 | 42.0 | 24.8 | 39.9 |
| **xRFT**-MathOctupos<sup>C</sup>| 53.6 | 28.0 | 45.2 | 21.2 | 48.0 | 46.4 | 46.0 | 35.2 | 45.6 | 28.8 | 39.8 |
| MathOctupos<sup>P</sup> | 53.2 | 42.8 | 48.8 | 35.2 | 44.4 | 48.0 | 48.4 | 43.2 | 47.6 | 46.8 | 45.8 |
| **xRFT**-MathOctupos<sup>P</sup>| 51.6 | 46.0 | 51.2 | 42.0 | 49.2 | 53.2 | 49.6 | 39.6 | 47.6 | 46.0 | 47.6 |
<p></p >
| 30-34B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
| MathOctupos<sup>C</sup> | 55.6 | 24.4 | 36.0 | 19.2 | 40.4 | 51.2 | 44.4 | 27.2 | 37.2 | 21.6 | 35.7 |
| **xRFT**-MathOctupos<sup>C</sup>| 53.6 | 27.6 | 34.4 | 19.2 | 47.2 | 47.6 | 44.8 | 30.8 | 38.8 | 22.8 | 36.7 |
| MathOctupos<sup>P</sup> | 56.4 | 46.8 | 52.0 | 35.2 | 47.2 | 53.2 | 48.0 | 39.2 | 45.6 | 41.2 | 46.5 |
| **xRFT**-MathOctupos<sup>P</sup>| 51.6 | 47.2 | 52.4 | 37.6 | 51.2 | 52.8 | 44.4 | 41.6 | 50.0 | 47.6 | 47.6 |
### **Overall Results on MSVAMP**
| 7B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
| MathOctupos<sup>C</sup> | 49.2 | 36.6 | 43.6 | 30.2 | 48.6 | 46.8 | 46.4 | 42.5 | 46.7 | 34.0 | 42.5 |
| **xRFT**-MathOctupos<sup>C</sup>| 49.9 | 37.7 | 43.3 | 32.9 | 46.5 | 47.6 | 47.3 | 42.7 | 46.6 | 36.2 | 43.1 |
| MathOctupos<sup>P</sup>-LoRA | 30.4 | 15.2 | 23.6 | 10.4 | 22.8 | 24.8 | 26.4 | 18.0 | 22.0 | 14.8 | 20.8 |
| MathOctupos<sup>P</sup> | 46.5 | 40.1 | 42.5 | 29.1 | 43.5 | 45.4 | 46.0 | 42.5 | 45.4 | 35.7 | 41.7 |
| **xRFT**-MathOctupos<sup>P</sup>| 46.8 | 42.3 | 43.2 | 32.8 | 43.1 | 44.5 | 45.3 | 43.2 | 42.1 | 40.5 | 42.4 |
<p></p >
| 13B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
| MathOctupos<sup>C</sup> | 56.6 | 40.4 | 49.0 | 30.3 | 50.9 | 54.2 | 54.7 | 46.3 | 52.4 | 35.7 | 47.1 |
| **xRFT**-MathOctupos<sup>C</sup>| 52.9 | 41.9 | 49.2 | 34.1 | 50.5 | 52.8 | 51.5 | 45.8 | 50.2 | 35.7 | 46.5 |
| MathOctupos<sup>P</sup> | 50.7 | 43.4 | 42.6 | 31.8 | 48.4 | 49.4 | 50.6 | 41.1 | 46.9 | 39.3 | 44.4 |
| **xRFT**-MathOctupos<sup>P</sup>| 44.6 | 43.4 | 46.4 | 34.2 | 47.7 | 48.2 | 49.9 | 43.1 | 48.2 | 39.5 | 44.5 |
<p></p >
| 30-34B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
| MathOctupos<sup>C</sup> | 51.5 | 42.1 | 46.2 | 23.2 | 50.5 | 52.1 | 52.9 | 42.2 | 50.5 | 33.4 | 44.5 |
| **xRFT**-MathOctupos<sup>C</sup>| 48.1 | 42.8 | 43.6 | 23.3 | 48.7 | 50.0 | 48.9 | 43.4 | 44.6 | 35.5 | 42.9 |
| MathOctupos<sup>P</sup> | 56.4 | 46.8 | 52.0 | 35.2 | 47.2 | 53.2 | 48.0 | 39.2 | 45.6 | 41.2 | 46.5 |
| **xRFT**-MathOctupos<sup>P</sup>| 48.0 | 42.3 | 46.1 | 36.2 | 47.5 | 48.5 | 48.3 | 45.8 | 47.2 | 41.2 | 45.1 |
### **MathOctupos in English**
| Models | GSM8K | SVAMP |
|:--------------------------------|:--------|:--------|
| LLaMA 2-7B | 42.4 | 38.3 |
| MathOctupos<sup>P</sup>-7B | 49.3 | 46.8 |
| MathOctupos<sup>C</sup>-7B | 50.8 | 49.3 |
| LLaMA 2-13B | 51.0 | 50.9 |
| MathOctupos<sup>P</sup>-13B | 55.5 | 52.1 |
| MathOctupos<sup>C</sup>-13B | 56.6 | 56.6 |
| LLaMA 1-33B | 50.0 | 49.0 |
| MathOctupos<sup>P</sup>-33B | 56.0 | 52.5 |
| MathOctupos<sup>C</sup>-33B | 53.7 | 51.5 |
## Intended Uses
These models are trained for research purposes. They are designed to solve multilingual math problems. They can be used in educational software, tutoring systems, or any application where a solution to a math problem is needed. |