405B or 410B ?

#8
by alielfilali01 - opened

The name and advertisement suggest the 405B name but the safetensors tag show the model as 410B ! Given the overall size it can be negligent but still it's a 5B params not counted ! Is there any specific reason?

@Ali-C137 its probably ignoring the embedding params

According to the llama3 tech paper, 405b is supposed to be using 8 key-value heads (the same as 8b and 70b), in that case, the model will be 405B (with embedding). And later they changed to 16 key-value heads (current published model) but do not want to change the model name..... They should mention it in the tech paper though.

Sign up or log in to comment