Smaller size - nearly 1B,3B?

#5
by nirajandhakal - opened

Can you Prune and distill the model to 1B, 3B?

llama tried overpruning to 3B from an 8B and that model turned out to be garbage for its size, you can't fully get away with more than like 30% reduction. it'd be better to just train the smaller models from scratch with the help of logits from the 14b

Sign up or log in to comment