Lucky52
Collection
Ji, S., & Chen, P. (2024). How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM. In Proceedings of COLING 2025.
•
52 items
•
Updated
This HF repository hosts instruction fine-tuned multilingual BLOOM model using the parallel instruction dataset called Bactrain-X in 52 languages. We progressively add a language during instruction fine-tuning at each time, and train 52 models in total. Then, we evaluate those models in three multilingual benchmarks.
Please refer to our paper for more details.
The model checkpoint should be loaded using transformers
library.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("MaLA-LM/lucky52-bloom-7b1-no-41")
model = AutoModelForCausalLM.from_pretrained("MaLA-LM/lucky52-bloom-7b1-no-41")
@inproceedings{ji2025lucky52,
title={How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM},
author={Shaoxiong Ji and Pinzhen Chen},
year={2025},
booktitle={Proceedings of COLING},
url={https://arxiv.org/abs/2404.04850},
}