Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference
abhi-mosaic commited on
Commit
82d0c1a
1 Parent(s): 1df4d76

update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -17
README.md CHANGED
@@ -12,7 +12,7 @@ inference: false
12
  # MPT-7B-Instruct
13
 
14
  MPT-7B-Instruct is a model for short-form instruction following.
15
- It is built by finetuning [MPT-7B](https://huggingface.co/spaces/mosaicml/mpt-7b) on a [dataset](https://huggingface.co/datasets/sam-mosaic/dolly_hhrlhf) derived from the [Databricks Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and the [Anthropic Helpful and Harmless (HH-RLHF)](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets.
16
  * License: _CC-By-SA-3.0_
17
  * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-instruct)
18
 
@@ -55,37 +55,41 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
55
  trust_remote_code=True
56
  )
57
  ```
58
- Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
59
  This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
60
  `MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
61
 
62
- To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model with `attn_impl='triton'` and move the model to `bfloat16`:
63
  ```python
64
- config = transformers.AutoConfig.from_pretrained(
65
- 'mosaicml/mpt-7b-instruct',
66
- trust_remote_code=True
67
- )
 
 
68
  config.attn_config['attn_impl'] = 'triton'
 
69
 
70
  model = transformers.AutoModelForCausalLM.from_pretrained(
71
- 'mosaicml/mpt-7b-instruct',
72
  config=config,
73
- torch_dtype=torch.bfloat16,
74
  trust_remote_code=True
75
  )
76
- model.to(device='cuda:0')
77
  ```
78
 
79
  Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
80
 
81
  ```python
82
- config = transformers.AutoConfig.from_pretrained(
83
- 'mosaicml/mpt-7b-instruct',
84
- trust_remote_code=True
85
- )
86
- config.update({"max_seq_len": 4096})
 
 
87
  model = transformers.AutoModelForCausalLM.from_pretrained(
88
- 'mosaicml/mpt-7b-instruct',
89
  config=config,
90
  trust_remote_code=True
91
  )
@@ -182,4 +186,4 @@ Please cite this model using the following format:
182
  note = {Accessed: 2023-03-28}, % change this date
183
  urldate = {2023-03-28} % change this date
184
  }
185
- ```
 
12
  # MPT-7B-Instruct
13
 
14
  MPT-7B-Instruct is a model for short-form instruction following.
15
+ It is built by finetuning [MPT-7B](https://huggingface.co/spaces/mosaicml/mpt-7b) on a [dataset](https://huggingface.co/datasets/sam-mosaic/dolly_hhrlhf) derived from the [Databricks Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and the [Anthropic Helpful and Harmless (HH-RLHF)](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets.
16
  * License: _CC-By-SA-3.0_
17
  * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-instruct)
18
 
 
55
  trust_remote_code=True
56
  )
57
  ```
58
+ Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
59
  This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
60
  `MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
61
 
62
+ To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model on GPU (`cuda:0`) with `attn_impl='triton'` and with `bfloat16` precision:
63
  ```python
64
+ import torch
65
+ import transformers
66
+
67
+ name = 'mosaicml/mpt-7b-instruct'
68
+
69
+ config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
70
  config.attn_config['attn_impl'] = 'triton'
71
+ config.init_device = 'cuda:0' # For fast initialization directly on GPU!
72
 
73
  model = transformers.AutoModelForCausalLM.from_pretrained(
74
+ name,
75
  config=config,
76
+ torch_dtype=torch.bfloat16, # Load model weights in bfloat16
77
  trust_remote_code=True
78
  )
 
79
  ```
80
 
81
  Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
82
 
83
  ```python
84
+ import transformers
85
+
86
+ name = 'mosaicml/mpt-7b-instruct'
87
+
88
+ config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
89
+ config.max_seq_len = 4096 # (input + output) tokens can now be up to 4096
90
+
91
  model = transformers.AutoModelForCausalLM.from_pretrained(
92
+ name,
93
  config=config,
94
  trust_remote_code=True
95
  )
 
186
  note = {Accessed: 2023-03-28}, % change this date
187
  urldate = {2023-03-28} % change this date
188
  }
189
+ ```