Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference

reproduce mpt-7b-chat

#6
by ehartford - opened

In order to reproduce your results, and in the spirit of open source, I would like to request

  1. The final dataset you used to train mpt-7b-chat (and the scripts you used to compose this dataset from the source datasets)
  2. the code used to train the model.
  3. the hyperparameters you used to train the model (the exact command line and arguments would be lovely)
  4. the hardware you used and how long it took
    thank you

You can find most of the info you're looking for in MosaicML's MPT blog post and in the github repo.

  1. Datasets used for fine-tuning mpt-7b-chat: ShareGPT-Vicuna, HC3, Alpaca, Helpful and Harmless, and Evol-Instruct datasets. Datasets used to pre-train the base model: image.png By the way, their subset of "the stack" coding dataset doesn't include Javascript and Typescript.
  2. They used their own compute platform. You can use the example command provided here (mcli run -f mcli-1b.yaml --cluster CLUSTER --gpus GPUS --name NAME --follow) with a modified version of the mcli-1b.yaml config probably replacing the training config for mpt-1b (in line 19) with the training config for mpt-7b. Similar process for fine-tuning is mostly explained in the github repo.
  3. This the config for fine-tuning.
  4. Hardware, cost, and training time:
    image.png

Closing as stale

abhi-mosaic changed discussion status to closed

Sign up or log in to comment