MMLU
Just curious, does the training dataset includes the MMLU dataset which is used during evaluation? If so, is it still fair to use MMLU as a metric to evaluate this model? Because compared with other 13B(or maybe 30B) models, the MMLU metric is very high.
Hi,
Yes, our model was trained on MMLU* as we stated in the readme description and results table we provided. This model was supposed to be our private one, however, it was decided by the company to release it too. Our main goal was an improvement of the model in Polish, and our private evaluation didn't include any of the leaderboard datasets, as we focused on testing the model in Polish. Originally, we planned to release the model trained without the MMLU, but the training is still in progress. We will release it as soon as it ends, but it will take another few weeks. Our 7b model shows results without MMLU.
We didn't have any plans of evaluating trurl-13b on MMLU in the leaderboard, unfortunately, anyone can send the model for evaluation.
Since it is already in the leaderboard we will send on Monday the request to the HF leaderboard to remove our 13b model from the MMLU leaderboard.
I hope this answers your question.
*It was only a part of MMLU, modified to only text (no ABCD answers) but still MMLU was in the training set.
Hello,
Just a quick update. Our model have been excluded from the leaderboard and we asked if it's possible to add a new parameter to the model card that would block submitting the model to the leaderboard so similar situation won't happen (third party submitting it again).
We will release the model trained without MMLU when the training will end, and this one will be submitted to the leaderboard by us. However, we guess that results will not be much higher than Llama 2 (if not slightly lower) as the model was trained to work better on Polish, especially on business-related tasks, and no effort was made towards beating other models on HF benchmarks.
Cheers
Just a quick update. The new Trurl is available here: https://huggingface.co/Voicelab/trurl-2-13b-academic