Spaces:
Starting
on
CPU Upgrade
tweetSentBR is missing from the HUB
Hello edugarcia
!
I tried running your fork on all benchmarks (congratulations for putting this together!), but I got a warning that says eduagarcia/tweetsentbr
is not available on the Hub. If you want to reproduce this error, just run this cell on Colab:
!git clone --branch main https://github.com/eduagarcia/lm-evaluation-harness-pt.git
!cd lm-evaluation-harness-pt && pip install -e . -q
!pip install cohere tiktoken sentencepiece -q
!cd lm-evaluation-harness-pt && python lm_eval \
--model huggingface \
--model_args pretrained="nicholasKluge/TeenyTinyLlama-160m",revision="main" \
--tasks enem_challenge,bluex,oab_exams,assin2_rte,assin2_sts,faquad_nli,hatebr_offensive,portuguese_hate_speech,tweetsentbr \
--device cuda:0 \
--output_path "./"
Could you re-upload the tweetsentbr
? Or have you removed it from this evaluation harness?
The tweetsentbr
dataset is currently private.
The authors have not made the text of the tweets publicly available due to Twitter/X policies. More info on their git repository: https://bitbucket.org/HBrum/tweetsentbr
To access to the complete dataset, you have two options:
- Recreate the dataset from the tweets ids of the git repository using the Twitter / X paid API.
- Ask the original authors for a copy of the dataset. (contact info available on the original git repository)
Once you have the text from the tweets you need to create a huggingface dataset with the following format:
Then you just need to update the dataset_path
field in the YAML configuration file in my lm-evaluation-harness fork: https://github.com/eduagarcia/lm-evaluation-harness-pt/blob/main/lm_eval/tasks/portuguese/tweetsentbr.yaml
I'm sorry about the hassle with the private dataset. The reason for the inclusion of the tweetsentbr
is that we wanted to compare our work with others like the Poeta benchmark
from the Sabia
model, but I get that it can be annoying. Just so you know, the benchmarks rankings remain largely consistent with averaging the score from the other 8 tasks, you can evaluate yours models without it and still get a good feedback.
Right now, your submission queue is jammed because we are reevaluating all models on a new version of the benchmark, however we are prioritizing new submissions. While it may take some time, feel free to submit to the leaderboard
Thank you for the fast reply and clarifications! I understand how these things are, so no problems.
I already forked and ran the evaluations on my own (but I'm looking forward to having the results of my models on the leaderboard!). For now, I simply did not run the tweetsentbr
, but everything else worked perfectly.
Thanks again for this. It is great that we have a standard benchmark for Portuguese LLMs now!
Thank you! Based on the current queue rate, your models should be evaluated sometime tomorrow.
I've added a new feature to the leaderboard: now, if you hide a column corresponding to a dataset, the averages and rankings will update accordingly, this should help with offline comparisons
This is actually quite useful; thank you for the work!