why mt0-large is 1.3B while mt5-large is 780M?

#6
by tansq - opened

why mt0-large is 1.3B while mt5-large is 780M?

BigScience Workshop org

Where did you get the 780M from? The pytorch weights file is the same size for both models, and the mt5 paper mentions the following:

"Following the original T5 recipe, we consider five model sizes: Small (≈ 300M parameters), Base (580M), Large (1.2B), XL (3.7B), and XXL (13B).

Sign up or log in to comment