Text Generation
Transformers
PyTorch
llama
text-generation-inference
Inference Endpoints
Parallel_7B / README.md
Nuo97's picture
Update README.md
9e07b26
|
raw
history blame
9.31 kB
metadata
license: apache-2.0

Introduction

We introduce πŸ™ MathOctopus, a series of open-source large language models (LLMs) specifically tailored for multilingual math problem-solving. The MathOctopus models are trained on πŸ€— MGSM8KInstruct Dataset, encompassing ten distinct languages. MathOctopus notably outperforms conventional open-source LLMs and exhibits superiority over ChatGPT in few-shot scenarios.

Datasets

MGSM8KInstruct

Training Dataset En Sw Zh Bn De Es Fr Ja Ru Th Overall
MGSM8KInstruct 7473 7472 7466 6539 7466 7470 7469 7471 7361 7473 73.6K

MSVAMP

Test Dataset En Sw Zh Bn De Es Fr Ja Ru Th Overall
MSVAMP 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 10K

Usage

Our dataset and models are all available at Huggingface.

πŸ€— MathInstruct Dataset

Or you can directly download them from

Models

Base Model: LLama Parallel-Training Cross-Training
7B-LLaMA 2 πŸ™ MathOctopus-Parallel-7B πŸ™ MathOctopus-Cross-7B
πŸ™MathOctopus-Parallel-xRFT-7B πŸ™MathOctopus-Cross-xRFT-7B
13B-LLaMA 2 πŸ™ [MathOctopus-Parallel-13B] πŸ™ [MathOctopus-Cross-13B]
πŸ™MathOctopus-Parallel-xRFT-13B πŸ™MathOctopus-Cross-xRFT-13B
33B-LLaMA 1 πŸ™ [MathOctopus-Parallel-33B] πŸ™ [MathOctopus-Cross-33B]
70B-LLaMA 2 Coming soon! Coming Soon!

*-Parallel refers to our model trained with the parallel-training strategy.

*-Cross refers to our model trained with cross-training strategy.

*-xRFT means we train the model with multilingual rejection sampling.

Overall Results on MGSM

7B Model En Sw Zh Bn De Es Fr Ja Ru Th Overall
MathOctuposC 52.0 23.6 31.6 18.8 38.0 39.2 36.4 27.2 33.6 21.6 32.2
xRFT-MathOctuposC 51.2 24.0 33.2 18.8 36.0 41.2 37.6 29.6 36.4 25.2 33.3
MathOctuposP-LoRA 30.4 15.2 23.6 10.4 22.8 24.8 26.4 18.0 22.0 14.8 20.8
MathOctuposP 52.4 39.2 38.4 28.8 44.8 42.4 43.6 36.0 39.6 34.4 40.0
xRFT-MathOctuposP 54.8 38.4 45.2 33.2 43.6 45.2 38.0 35.6 48.4 36.4 41.9

13B Model En Sw Zh Bn De Es Fr Ja Ru Th Overall
MathOctuposC 56.4 27.2 39.2 24.0 47.6 49.6 47.6 40.4 42.0 24.8 39.9
xRFT-MathOctuposC 53.6 28.0 45.2 21.2 48.0 46.4 46.0 35.2 45.6 28.8 39.8
MathOctuposP 53.2 42.8 48.8 35.2 44.4 48.0 48.4 43.2 47.6 46.8 45.8
xRFT-MathOctuposP 51.6 46.0 51.2 42.0 49.2 53.2 49.6 39.6 47.6 46.0 47.6

30-34B Model En Sw Zh Bn De Es Fr Ja Ru Th Overall
MathOctuposC 55.6 24.4 36.0 19.2 40.4 51.2 44.4 27.2 37.2 21.6 35.7
xRFT-MathOctuposC 53.6 27.6 34.4 19.2 47.2 47.6 44.8 30.8 38.8 22.8 36.7
MathOctuposP 56.4 46.8 52.0 35.2 47.2 53.2 48.0 39.2 45.6 41.2 46.5
xRFT-MathOctuposP 51.6 47.2 52.4 37.6 51.2 52.8 44.4 41.6 50.0 47.6 47.6

Overall Results on MSVAMP

7B Model En Sw Zh Bn De Es Fr Ja Ru Th Overall
MathOctuposC 49.2 36.6 43.6 30.2 48.6 46.8 46.4 42.5 46.7 34.0 42.5
xRFT-MathOctuposC 49.9 37.7 43.3 32.9 46.5 47.6 47.3 42.7 46.6 36.2 43.1
MathOctuposP-LoRA 30.4 15.2 23.6 10.4 22.8 24.8 26.4 18.0 22.0 14.8 20.8
MathOctuposP 46.5 40.1 42.5 29.1 43.5 45.4 46.0 42.5 45.4 35.7 41.7
xRFT-MathOctuposP 46.8 42.3 43.2 32.8 43.1 44.5 45.3 43.2 42.1 40.5 42.4

13B Model En Sw Zh Bn De Es Fr Ja Ru Th Overall
MathOctuposC 56.6 40.4 49.0 30.3 50.9 54.2 54.7 46.3 52.4 35.7 47.1
xRFT-MathOctuposC 52.9 41.9 49.2 34.1 50.5 52.8 51.5 45.8 50.2 35.7 46.5
MathOctuposP 50.7 43.4 42.6 31.8 48.4 49.4 50.6 41.1 46.9 39.3 44.4
xRFT-MathOctuposP 44.6 43.4 46.4 34.2 47.7 48.2 49.9 43.1 48.2 39.5 44.5

30-34B Model En Sw Zh Bn De Es Fr Ja Ru Th Overall
MathOctuposC 51.5 42.1 46.2 23.2 50.5 52.1 52.9 42.2 50.5 33.4 44.5
xRFT-MathOctuposC 48.1 42.8 43.6 23.3 48.7 50.0 48.9 43.4 44.6 35.5 42.9
MathOctuposP 56.4 46.8 52.0 35.2 47.2 53.2 48.0 39.2 45.6 41.2 46.5
xRFT-MathOctuposP 48.0 42.3 46.1 36.2 47.5 48.5 48.3 45.8 47.2 41.2 45.1

MathOctupos in English

Models GSM8K SVAMP
LLaMA 2-7B 42.4 38.3
MathOctuposP-7B 49.3 46.8
MathOctuposC-7B 50.8 49.3
LLaMA 2-13B 51.0 50.9
MathOctuposP-13B 55.5 52.1
MathOctuposC-13B 56.6 56.6
LLaMA 1-33B 50.0 49.0
MathOctuposP-33B 56.0 52.5
MathOctuposC-33B 53.7 51.5

Intended Uses

These models are trained for research purposes. They are designed to solve multilingual math problems. They can be used in educational software, tutoring systems, or any application where a solution to a math problem is needed.