"It is strongly recommended to train Gemma2 models with the `eager` attention implementation "

#10

by JaronTHU - opened Jun 28

Jun 28

Why is eager attention implementation preferred?

Reference: https://github.com/huggingface/transformers/blob/v4.42.0/src/transformers/models/gemma2/modeling_gemma2.py#L1046

"It is strongly recommended to train Gemma2 models with the eager attention implementation "
f"instead of {self.config._attn_implementation}. Use eager with AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')."

lysandre

Google org Jul 1

Hello, please see this comment: https://huggingface.co/google/gemma-2-9b-it/discussions/9#667ebaf549c4138109bf96ff

JaronTHU

Jul 1

I get it, thank you~

JaronTHU changed discussion status to closed Jul 1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment