Update README.md
Browse files
README.md
CHANGED
@@ -3,3 +3,13 @@ license: other
|
|
3 |
license_name: llama3
|
4 |
license_link: LICENSE
|
5 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
license_name: llama3
|
4 |
license_link: LICENSE
|
5 |
---
|
6 |
+
|
7 |
+
The original Llama 3 8b (base) special token weights are zero, which might cause NaN gradients. This version re-initialized the weights of all the following special tokens to alleviate the problem.
|
8 |
+
|
9 |
+
```
|
10 |
+
<|eot_id|>
|
11 |
+
<|start_header_id|>
|
12 |
+
<|end_header_id|>
|
13 |
+
```
|
14 |
+
|
15 |
+
We set the weights of these tokens in `embed` and `lm_head` to be the mean of all other tokens.
|