bigscience
/

T0pp

Text2Text Generation

text-generation-inference

Model card Files Files and versions Community

stas commited on Oct 18, 2021

Commit

89880f4

•

1 Parent(s): 58161ad

fix typos

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -58,7 +58,7 @@ If you want to use another checkpoint, please replace the path in `AutoTokenizer
 # Training procedure
-T0* models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapated T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective.
 At a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.
@@ -119,7 +119,7 @@ We also evaluate T0, T0p and T0pp on the a subset of the [BIG-bench benchmark](h
 # Limitations
-- The models of the T0* series are quite large (3B or 11B parameters). Loading them and performing inference requires non-trivial computational ressources. When using multiple GPUs, it is possible to use [.parallelize()](https://huggingface.co/transformers/parallelism.html).
 - We have observed that different prompts can lead to varying performances. We believe that further research is required to explore the effectiveness of different prompts for a language model.
 - Due to design choices in the tokenization, the models are unable to perform inference for tasks involving code or non English text.

 # Training procedure
+T0* models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapted T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective.
 At a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.
 # Limitations
+- The models of the T0* series are quite large (3B or 11B parameters). Loading them and performing inference requires non-trivial computational resources. When using multiple GPUs, it is possible to use [.parallelize()](https://huggingface.co/transformers/parallelism.html).
 - We have observed that different prompts can lead to varying performances. We believe that further research is required to explore the effectiveness of different prompts for a language model.
 - Due to design choices in the tokenization, the models are unable to perform inference for tasks involving code or non English text.