Are there bias weights in Llama3 ?
#202
by
Iionbarista
- opened
I was looking through the safetensor map file: https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/model.safetensors.index.json
and found that there are no designated weights for biases?
Does Llama have no biases or is it implicitly loaded from the weights?
Or is replaced by the layernorm?
Google Palm paper mentioned:
No biases were used in any of the dense kernels or layer norms. We found this to result in increased training stability for large models.