mosaicml/mpt-7b-chat · reproduce mpt-7b-chat

May 13, 2023

In order to reproduce your results, and in the spirit of open source, I would like to request

The final dataset you used to train mpt-7b-chat (and the scripts you used to compose this dataset from the source datasets)
the code used to train the model.
the hyperparameters you used to train the model (the exact command line and arguments would be lovely)
the hardware you used and how long it took
thank you

May 16, 2023

You can find most of the info you're looking for in MosaicML's MPT blog post and in the github repo.

Datasets used for fine-tuning mpt-7b-chat: ShareGPT-Vicuna, HC3, Alpaca, Helpful and Harmless, and Evol-Instruct datasets. Datasets used to pre-train the base model: By the way, their subset of "the stack" coding dataset doesn't include Javascript and Typescript.
They used their own compute platform. You can use the example command provided here (mcli run -f mcli-1b.yaml --cluster CLUSTER --gpus GPUS --name NAME --follow) with a modified version of the mcli-1b.yaml config probably replacing the training config for mpt-1b (in line 19) with the training config for mpt-7b. Similar process for fine-tuning is mostly explained in the github repo.
This the config for fine-tuning.
Hardware, cost, and training time:

Jun 3, 2023

Closing as stale

abhi-mosaic changed discussion status to closed Jun 3, 2023