Text2Text Generation
Transformers
PyTorch
English
switch_transformers
Files changed (1) hide show
  1. README.md +2 -10
README.md CHANGED
@@ -158,11 +158,7 @@ print(tokenizer.decode(outputs[0]))
158
 
159
  ## Direct Use and Downstream Use
160
 
161
- The authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that:
162
-
163
- > The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models
164
-
165
- See the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.
166
 
167
  ## Out-of-Scope Use
168
 
@@ -193,11 +189,7 @@ The model was trained on a Masked Language Modeling task, on Colossal Clean Craw
193
 
194
  ## Training Procedure
195
 
196
- According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
197
-
198
- > These models are based on pretrained SwitchTransformers and are not fine-tuned. It is normal if they perform well on zero-shot tasks.
199
-
200
- The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
201
 
202
 
203
  # Evaluation
 
158
 
159
  ## Direct Use and Downstream Use
160
 
161
+ See the [research paper](https://arxiv.org/pdf/2101.03961.pdf) for further details.
 
 
 
 
162
 
163
  ## Out-of-Scope Use
164
 
 
189
 
190
  ## Training Procedure
191
 
192
+ According to the model card from the [original paper](https://arxiv.org/pdf/2101.03961.pdf) the model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
 
 
 
 
193
 
194
 
195
  # Evaluation