Differences bewteen OrionForCausalLM and LlamaForCausalLM
#5
by
J22
- opened
As far as I can tell, the only differences are that input_layernorm
, post_attention_layernorm
and final norm are changed to nn.LayerNorm
from LlamaRMSNorm
.
The attention and embedding are also different by trust remote code