Update README.md
#3
by
ybelkada
- opened
README.md
CHANGED
@@ -158,11 +158,7 @@ print(tokenizer.decode(outputs[0]))
|
|
158 |
|
159 |
## Direct Use and Downstream Use
|
160 |
|
161 |
-
|
162 |
-
|
163 |
-
> The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models
|
164 |
-
|
165 |
-
See the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.
|
166 |
|
167 |
## Out-of-Scope Use
|
168 |
|
@@ -193,11 +189,7 @@ The model was trained on a Masked Language Modeling task, on Colossal Clean Craw
|
|
193 |
|
194 |
## Training Procedure
|
195 |
|
196 |
-
According to the model card from the [original paper](https://arxiv.org/pdf/
|
197 |
-
|
198 |
-
> These models are based on pretrained SwitchTransformers and are not fine-tuned. It is normal if they perform well on zero-shot tasks.
|
199 |
-
|
200 |
-
The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
|
201 |
|
202 |
|
203 |
# Evaluation
|
|
|
158 |
|
159 |
## Direct Use and Downstream Use
|
160 |
|
161 |
+
See the [research paper](https://arxiv.org/pdf/2101.03961.pdf) for further details.
|
|
|
|
|
|
|
|
|
162 |
|
163 |
## Out-of-Scope Use
|
164 |
|
|
|
189 |
|
190 |
## Training Procedure
|
191 |
|
192 |
+
According to the model card from the [original paper](https://arxiv.org/pdf/2101.03961.pdf) the model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
|
|
|
|
|
|
|
|
|
193 |
|
194 |
|
195 |
# Evaluation
|