"It is strongly recommended to train Gemma2 models with the `eager` attention implementation "
#10
by
JaronTHU
- opened
Why is eager attention implementation preferred?
"It is strongly recommended to train Gemma2 models with the eager
attention implementation "
f"instead of {self.config._attn_implementation}
. Use eager
with AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')
."
Hello, please see this comment: https://huggingface.co/google/gemma-2-9b-it/discussions/9#667ebaf549c4138109bf96ff
I get it, thank you~
JaronTHU
changed discussion status to
closed