Birchlabs
/

llama-13b-stepwise-embeddings

Model card Files Files and versions Community

Birchlabs commited on Jul 10, 2023

Commit

e8d35e5

•

1 Parent(s): cb187f5

Update README.md

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -1,3 +1,10 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+Fine-tuned input (`embed_tokens: Embedding`) and output (`lm_head: Linear`) embeddings layers, for use with [`Birchlabs/llama-13b-stepwise-adapter`](Birchlabs/llama-13b-stepwise-adapter).
+Prior to finetuning: we grew the vocabulary of the tokenizer and embeddings layers. The new embeddings were average-initialized, and needed training, so we trained them. These are the weights from that training.
+Ordinarily a QLoRA finetune of an LLM would not finetune the `embed_tokens: Embedding` (you'd need to get a bit creative, because not only have the dimensions changed, but also I don't believe any way has been established to train _adapters_ over `Embedding`s).
+Nor apparently would it finetune `lm_head: Linear`. This is harder than it sounds (i.e. you can't handle it the same way you adapt the other Linear layers), because the dimensions have grown.