It would be great to support XLM-R models (base, large, xl, XXL) with flash attention 2.
· Sign up or log in to comment