reproduce mpt-7b-chat
#6
by
ehartford
- opened
In order to reproduce your results, and in the spirit of open source, I would like to request
- The final dataset you used to train mpt-7b-chat (and the scripts you used to compose this dataset from the source datasets)
- the code used to train the model.
- the hyperparameters you used to train the model (the exact command line and arguments would be lovely)
- the hardware you used and how long it took
thank you
You can find most of the info you're looking for in MosaicML's MPT blog post and in the github repo.
- Datasets used for fine-tuning mpt-7b-chat: ShareGPT-Vicuna, HC3, Alpaca, Helpful and Harmless, and Evol-Instruct datasets. Datasets used to pre-train the base model: By the way, their subset of "the stack" coding dataset doesn't include Javascript and Typescript.
- They used their own compute platform. You can use the example command provided here (
mcli run -f mcli-1b.yaml --cluster CLUSTER --gpus GPUS --name NAME --follow
) with a modified version of the mcli-1b.yaml config probably replacing the training config for mpt-1b (in line 19) with the training config for mpt-7b. Similar process for fine-tuning is mostly explained in the github repo. - This the config for fine-tuning.
- Hardware, cost, and training time:
Closing as stale
abhi-mosaic
changed discussion status to
closed