Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,10 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
|
5 |
+
Fine-tuned input (`embed_tokens: Embedding`) and output (`lm_head: Linear`) embeddings layers, for use with [`Birchlabs/llama-13b-stepwise-adapter`](Birchlabs/llama-13b-stepwise-adapter).
|
6 |
+
|
7 |
+
Prior to finetuning: we grew the vocabulary of the tokenizer and embeddings layers. The new embeddings were average-initialized, and needed training, so we trained them. These are the weights from that training.
|
8 |
+
|
9 |
+
Ordinarily a QLoRA finetune of an LLM would not finetune the `embed_tokens: Embedding` (you'd need to get a bit creative, because not only have the dimensions changed, but also I don't believe any way has been established to train _adapters_ over `Embedding`s).
|
10 |
+
Nor apparently would it finetune `lm_head: Linear`. This is harder than it sounds (i.e. you can't handle it the same way you adapt the other Linear layers), because the dimensions have grown.
|