Fix typo
#4
by
rcojocaru
- opened
README.md
CHANGED
@@ -181,7 +181,7 @@ Falcon2-11B is a causal decoder-only model trained on a causal language modeling
|
|
181 |
|
182 |
The architecture is broadly adapted from the GPT-3 paper ([Brown et al., 2020](https://arxiv.org/abs/2005.14165)), with the following differences:
|
183 |
|
184 |
-
* **
|
185 |
* **Attention:** multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)) and FlashAttention-2 ([Dao, 2023](https://arxiv.org/abs/2307.08691));
|
186 |
* **Decoder-block:** parallel attention/MLP.
|
187 |
|
|
|
181 |
|
182 |
The architecture is broadly adapted from the GPT-3 paper ([Brown et al., 2020](https://arxiv.org/abs/2005.14165)), with the following differences:
|
183 |
|
184 |
+
* **Positional embeddings:** rotary ([Su et al., 2021](https://arxiv.org/abs/2104.09864));
|
185 |
* **Attention:** multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)) and FlashAttention-2 ([Dao, 2023](https://arxiv.org/abs/2307.08691));
|
186 |
* **Decoder-block:** parallel attention/MLP.
|
187 |
|