Update contamination_report.csv
What are you reporting:
- Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)
Evaluation dataset(s): openai_humaneval
Contaminated model(s): gpt-3.5-turbo-1106, gpt-3.5-turbo-0613
Contaminated split(s): 41.47%, 23.79%
Briefly describe your method to detect data contamination
- Model-based approach
Model-based approaches
The cited paper highlights how ChatGPT, when tested with the HumanEval dataset, shows high contamination levels. This is evident from the high Average Peak and Leak Ratios, especially compared to the clean CodeForces2305 dataset where ChatGPT's performance drops. The TED method proves effective in identifying and mitigating these contamination issues. The values can be verified from Table 5 of the cited paper.
Citation
Is there a paper that reports the data contamination or describes the method used to detect data contamination?
URL: https://arxiv.org/pdf/2402.15938
Citation: @misc{dong2024generalization, title={Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models}, author={Yihong Dong and Xue Jiang and Huanyu Liu and Zhi Jin and Ge Li}, year={2024}, eprint={2402.15938}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Important! If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.
- Full name: Suryansh Sharma
- Institution: Indian Institute of Technology Kharagpur
- Email: suryansh.s@kgpian.iitkgp.ac.in
Hi @suryanshs16103 ,
The evidence you are trying to add is already in the database. Please check this PR.
Please, before creating a new PR, check whether the evidence you want to add is or is not already in the database. I will close this PR.
Best,
Oscar