Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Accessing examples used for n-shot evals
#26 opened about 1 month ago
by
akritivij
Certain models perhaps clogging up the leaderboard?, Check logs?
1
#25 opened 6 months ago
by
CombinHorizon
How are Faithfulness and Factuality calculated?
2
#22 opened 8 months ago
by
UjjwalP
How could #parameter of a model be 0?
2
#20 opened 9 months ago
by
zhiminy
Why is the score for RACE so low?
1
#18 opened 9 months ago
by
scinerd68
Adding German Faithfulness Detection Task
1
#16 opened 9 months ago
by
mtc
Adding SummEdits to leaderboard?
1
#12 opened 10 months ago
by
philippelaban
Adding tasks from the USB benchmark (for summarization)
1
#11 opened 10 months ago
by
kundank
Adding the Snowball Hallucination detection datasets
#9 opened 10 months ago
by
ofirpress
Longform QA
2
#8 opened 10 months ago
by
shehzaadzd
Metrics for hallucination detection for summarization.
4
#6 opened 10 months ago
by
rohitsaxena
Hello all!
#5 opened 10 months ago
by
pminervini