testing on MGSM in different languages is insightful
#12
by
Kechen-Li
- opened
I’ve noticed the performance is pretty different between en and zh. will you be doing more analysis in multilingual settings?
wait for the updates.
Thank you for your question! You're absolutely right, multilingual issue has always been a key focus of this research project. The current results (only zh/en) don’t yet fully explain the performance differences. Moving forward, we’ll conduct comprehensive testing across all 10 languages (at least) in MGSM and introduce some fine-grained analysis to explore reasoning capabilities across different languages. Stay tuned!