GPT-3.5 HumanEval_R CodeForces2305 contamination based on https://arxiv.org/abs/2402.15938 42e416f verified suryanshs16103 commited on May 26
Add reports from Benchmarking paper "Benchmark Leakage in Large Language Models" (#27) 25633c4 verified OSainz SinclairWang commited on May 24
Add Reports Based on "Llemma: An Open Language Model For Mathematics" (#23) 9fba4d8 verified OSainz wlchen commited on May 13
Add Aquila model series which have gsm8k test set contamination (#21) 8f6a7cc verified OSainz bpHigh commited on May 6
GPT-3.5 Spider contamination based on https://arxiv.org/pdf/2402.08100 (#18) dc4c3f8 verified OSainz bpHigh commited on May 6
Superglue/RealNews Contamination based on "Noise-Robust De-Duplication at Scale" (#15) 888fb82 verified OSainz emilys commited on Apr 29
Mistral 7B Arc Easy Contamination based on "Proving Test Set Contamination in Black Box Language Models" (#14) 4f71313 verified OSainz AmeyaPrabhu commited on Apr 29
Added Contamination Evidence from GPT4 Tech Report using String matching on GPT-4 (#11) f82db5d verified OSainz AmeyaPrabhu commited on Apr 29
GPT-3.5Turbo HumanEval Contamination based on "Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models" (#16) 6b722ae verified OSainz jupyter31 commited on Apr 29
Added Contamination Evidence on MMLU of ChatGPT/GPT4 from "Investigating data contamination in modern benchmarks for large language models" (#10) f5daf9b verified OSainz AmeyaPrabhu commited on Apr 29
Added Contamination Info on Old Models: GPT3, FLAN, GLaM, PaLM, PaLM 2 (#13) c4acbf6 verified OSainz AmeyaPrabhu commited on Apr 25
Contamination results based on "Data Contamination Quiz" (#9) 36aaa79 verified OSainz shahriargolchin commited on Apr 25
Code contamination in HumanEval and MBPP (#12) ffb0d75 verified OSainz AmeyaPrabhu commited on Apr 25
Add model-based results for MedNLI, RadNLI for GPT-3.5 and GPT-4 (#8) d57b460 verified Iker j-chim commited on Apr 23
Add data from "An Open-Source Data Contamination Report for Large Language Models" (#5) 619ed3b verified Iker vishaal27 commited on Apr 23
Add data from "Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus" (#6) 935e79b verified Iker vishaal27 commited on Apr 18