Support of Xformer and FlashAttnention

by le723z - opened Aug 10

Aug 10

Hi, Thanks for the great work! I found the model so helpful for my ongoing project, may I ask do you have plan to add support of xformer for accelerated inference like how you have recently have done for gte-large-en-v1.5?

Best

zyznull

Alibaba-NLP org 7 days ago

Sorry，we do not have plan to support xformer for gte-Qwen2-1.5B-instruct. But this model has already support flash-attention, we beleieve they have similar inference speed.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment