fix-glu-mlp
#17
by
michael-guenther
- opened
The GluMLP is not working without flash attention, because the tensors are passed in a different shape. This PR fixes the issue. I also tested it that the embeddings with and without flash attentions are the same.
michael-guenther
changed pull request status to
open
LGTM!
michael-guenther
changed pull request status to
merged