Support of Xformer and FlashAttnention

#9
by le723z - opened

Hi, Thanks for the great work! I found the model so helpful for my ongoing project, may I ask do you have plan to add support of xformer for accelerated inference like how you have recently have done for gte-large-en-v1.5?

Best

Alibaba-NLP org

Sorry,we do not have plan to support xformer for gte-Qwen2-1.5B-instruct. But this model has already support flash-attention, we beleieve they have similar inference speed.

Sign up or log in to comment