metadata
license: apache-2.0
Introduction
We introduce π MathOctopus, a series of open-source large language models (LLMs) specifically tailored for multilingual math problem-solving. The MathOctopus models are trained on π€ MGSM8KInstruct Dataset, encompassing ten distinct languages.
MathOctopus notably outperforms conventional open-source LLMs and exhibits superiority over ChatGPT in few-shot scenarios.
Datasets
MGSM8KInstruct
Training Dataset |
En |
Sw |
Zh |
Bn |
De |
Es |
Fr |
Ja |
Ru |
Th |
Overall |
MGSM8KInstruct |
7473 |
7472 |
7466 |
6539 |
7466 |
7470 |
7469 |
7471 |
7361 |
7473 |
73.6K |
MSVAMP
Test Dataset |
En |
Sw |
Zh |
Bn |
De |
Es |
Fr |
Ja |
Ru |
Th |
Overall |
MSVAMP |
1000 |
1000 |
1000 |
1000 |
1000 |
1000 |
1000 |
1000 |
1000 |
1000 |
10K |
Usage
Our dataset and models are all available at Huggingface.
π€ MathInstruct Dataset
Or you can directly download them from
Models
*-Parallel refers to our model trained with the parallel-training strategy.
*-Cross refers to our model trained with cross-training strategy.
*-xRFT means we train the model with multilingual rejection sampling.
Overall Results on MGSM
7B Model |
En |
Sw |
Zh |
Bn |
De |
Es |
Fr |
Ja |
Ru |
Th |
Overall |
MathOctuposC |
52.0 |
23.6 |
31.6 |
18.8 |
38.0 |
39.2 |
36.4 |
27.2 |
33.6 |
21.6 |
32.2 |
xRFT-MathOctuposC |
51.2 |
24.0 |
33.2 |
18.8 |
36.0 |
41.2 |
37.6 |
29.6 |
36.4 |
25.2 |
33.3 |
MathOctuposP-LoRA |
30.4 |
15.2 |
23.6 |
10.4 |
22.8 |
24.8 |
26.4 |
18.0 |
22.0 |
14.8 |
20.8 |
MathOctuposP |
52.4 |
39.2 |
38.4 |
28.8 |
44.8 |
42.4 |
43.6 |
36.0 |
39.6 |
34.4 |
40.0 |
xRFT-MathOctuposP |
54.8 |
38.4 |
45.2 |
33.2 |
43.6 |
45.2 |
38.0 |
35.6 |
48.4 |
36.4 |
41.9 |
13B Model |
En |
Sw |
Zh |
Bn |
De |
Es |
Fr |
Ja |
Ru |
Th |
Overall |
MathOctuposC |
56.4 |
27.2 |
39.2 |
24.0 |
47.6 |
49.6 |
47.6 |
40.4 |
42.0 |
24.8 |
39.9 |
xRFT-MathOctuposC |
53.6 |
28.0 |
45.2 |
21.2 |
48.0 |
46.4 |
46.0 |
35.2 |
45.6 |
28.8 |
39.8 |
MathOctuposP |
53.2 |
42.8 |
48.8 |
35.2 |
44.4 |
48.0 |
48.4 |
43.2 |
47.6 |
46.8 |
45.8 |
xRFT-MathOctuposP |
51.6 |
46.0 |
51.2 |
42.0 |
49.2 |
53.2 |
49.6 |
39.6 |
47.6 |
46.0 |
47.6 |
30-34B Model |
En |
Sw |
Zh |
Bn |
De |
Es |
Fr |
Ja |
Ru |
Th |
Overall |
MathOctuposC |
55.6 |
24.4 |
36.0 |
19.2 |
40.4 |
51.2 |
44.4 |
27.2 |
37.2 |
21.6 |
35.7 |
xRFT-MathOctuposC |
53.6 |
27.6 |
34.4 |
19.2 |
47.2 |
47.6 |
44.8 |
30.8 |
38.8 |
22.8 |
36.7 |
MathOctuposP |
56.4 |
46.8 |
52.0 |
35.2 |
47.2 |
53.2 |
48.0 |
39.2 |
45.6 |
41.2 |
46.5 |
xRFT-MathOctuposP |
51.6 |
47.2 |
52.4 |
37.6 |
51.2 |
52.8 |
44.4 |
41.6 |
50.0 |
47.6 |
47.6 |
Overall Results on MSVAMP
7B Model |
En |
Sw |
Zh |
Bn |
De |
Es |
Fr |
Ja |
Ru |
Th |
Overall |
MathOctuposC |
49.2 |
36.6 |
43.6 |
30.2 |
48.6 |
46.8 |
46.4 |
42.5 |
46.7 |
34.0 |
42.5 |
xRFT-MathOctuposC |
49.9 |
37.7 |
43.3 |
32.9 |
46.5 |
47.6 |
47.3 |
42.7 |
46.6 |
36.2 |
43.1 |
MathOctuposP-LoRA |
30.4 |
15.2 |
23.6 |
10.4 |
22.8 |
24.8 |
26.4 |
18.0 |
22.0 |
14.8 |
20.8 |
MathOctuposP |
46.5 |
40.1 |
42.5 |
29.1 |
43.5 |
45.4 |
46.0 |
42.5 |
45.4 |
35.7 |
41.7 |
xRFT-MathOctuposP |
46.8 |
42.3 |
43.2 |
32.8 |
43.1 |
44.5 |
45.3 |
43.2 |
42.1 |
40.5 |
42.4 |
13B Model |
En |
Sw |
Zh |
Bn |
De |
Es |
Fr |
Ja |
Ru |
Th |
Overall |
MathOctuposC |
56.6 |
40.4 |
49.0 |
30.3 |
50.9 |
54.2 |
54.7 |
46.3 |
52.4 |
35.7 |
47.1 |
xRFT-MathOctuposC |
52.9 |
41.9 |
49.2 |
34.1 |
50.5 |
52.8 |
51.5 |
45.8 |
50.2 |
35.7 |
46.5 |
MathOctuposP |
50.7 |
43.4 |
42.6 |
31.8 |
48.4 |
49.4 |
50.6 |
41.1 |
46.9 |
39.3 |
44.4 |
xRFT-MathOctuposP |
44.6 |
43.4 |
46.4 |
34.2 |
47.7 |
48.2 |
49.9 |
43.1 |
48.2 |
39.5 |
44.5 |
30-34B Model |
En |
Sw |
Zh |
Bn |
De |
Es |
Fr |
Ja |
Ru |
Th |
Overall |
MathOctuposC |
51.5 |
42.1 |
46.2 |
23.2 |
50.5 |
52.1 |
52.9 |
42.2 |
50.5 |
33.4 |
44.5 |
xRFT-MathOctuposC |
48.1 |
42.8 |
43.6 |
23.3 |
48.7 |
50.0 |
48.9 |
43.4 |
44.6 |
35.5 |
42.9 |
MathOctuposP |
56.4 |
46.8 |
52.0 |
35.2 |
47.2 |
53.2 |
48.0 |
39.2 |
45.6 |
41.2 |
46.5 |
xRFT-MathOctuposP |
48.0 |
42.3 |
46.1 |
36.2 |
47.5 |
48.5 |
48.3 |
45.8 |
47.2 |
41.2 |
45.1 |
MathOctupos in English
Models |
GSM8K |
SVAMP |
LLaMA 2-7B |
42.4 |
38.3 |
MathOctuposP-7B |
49.3 |
46.8 |
MathOctuposC-7B |
50.8 |
49.3 |
LLaMA 2-13B |
51.0 |
50.9 |
MathOctuposP-13B |
55.5 |
52.1 |
MathOctuposC-13B |
56.6 |
56.6 |
LLaMA 1-33B |
50.0 |
49.0 |
MathOctuposP-33B |
56.0 |
52.5 |
MathOctuposC-33B |
53.7 |
51.5 |
Intended Uses
These models are trained for research purposes. They are designed to solve multilingual math problems. They can be used in educational software, tutoring systems, or any application where a solution to a math problem is needed.