Idle chitchat. Low rank/dimensionality (I don't know the maths words) change = small diff?
Sorry if this is in the paper I'm very tired, I will take another look! I'm exhausted and speculating about things I don't really understand.
I was wondering if the blog post/paper you refer to's mention of a 1 dimensional feature and the fact that you simply can't have adjusted billions of parameters quickly implies that most/almost all of the weights in this transformer are exactly the same?
So my question is: how many weights are changed, if you know: how do you know that? (You can just say maths it's okay.) Bonus points:
If you don't know, and you're easily amused like I am, you could try feeding the original model and this in to a single file compression algorithm. Zpaq will crush it.
my hypothesis is that this and the original together in one archive/compressed tar should be smaller than either original model.
I can't do this because it takes me a day to download this so I won't be getting the base model afterwards. Maybe I'll check a single chunk when I get home.
Because it kind of feels like alignment was a last minute bodge. 1d morality? That's not just finetuning that's tensor witchcraft. that's WAY too interpretable.