Accuracy on Spider
#7
by
dalematthews9
- opened
I wrote code to generate predictions for the 1000 questions in here;
https://github.com/taoyds/spider/blob/master/evaluation_examples/dev.sql
Then I use the following Python file to evaluate execution accuracy.
https://github.com/taoyds/spider/blob/master/evaluation.py
NSQL-350m gets 22% execution accuracy on easy questions which is quite a bit lower than the reported 51% accuracy on Spider reported in the blog post below.
https://www.numbersstation.ai/post/introducing-nsql-open-source-sql-copilot-foundation-models
What is the reason for this discrepancy?