{"notes":[{"content":{"title":{"value":"Proving Test Set Contamination in Black-Box Language Models"},"authors":{"value":["Yonatan Oren","Nicole Meister","Niladri S. Chatterji","Faisal Ladhak","Tatsunori Hashimoto"]},"authorids":{"value":["~Yonatan_Oren1","~Nicole_Meister1","~Niladri_S._Chatterji1","~Faisal_Ladhak2","~Tatsunori_Hashimoto1"]},"keywords":{"value":["language modeling","memorization","dataset contamination"]},"abstract":{"value":"Large language models are trained on vast amounts of internet data, prompting concerns that they have memorized public benchmarks. Detecting this type of contamination is challenging because the pretraining data used by proprietary models are often not publicly accessible.\n\nWe propose a procedure for detecting test set contamination of language models with exact false positive guarantees and without access to pretraining data or model weights. Our approach leverages the fact that when there is no data contamination, all orderings of an exchangeable benchmark should be equally likely. In contrast, the tendency for language models to memorize example order means that a contaminated language model will find certain canonical orderings to be much more likely than others. Our test flags potential contamination whenever the likelihood of a canonically ordered benchmark dataset is significantly higher than the likelihood after shuffling the examples.\n\nWe demonstrate that our procedure is sensitive enough to reliably detect contamination in challenging situations, including models as small as 1.4 billion parameters, on small test sets only 1000 examples, and datasets that appear only a few times in the pretraining corpus. Finally, we evaluate LLaMA-2 to apply our test in a realistic setting and find our results to be consistent with existing contamination evaluations."},"primary_area":{"value":"societal considerations including fairness, safety, privacy"},"code_of_ethics":{"value":"I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics."},"submission_guidelines":{"value":"I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide."},"anonymous_url":{"value":"I certify that there is no URL (e.g., github page) that could be used to find authors' identity."},"no_acknowledgement_section":{"value":"I certify that there is no acknowledgement section in this submission for double blind review."},"venue":{"value":"ICLR 2024 oral"},"venueid":{"value":"ICLR.cc/2024/Conference"},"pdf":{"value":"/pdf/cfd79aaab7bdcd4f7c032c57fe7e607058042c80.pdf"},"supplementary_material":{"value":"/attachment/aa53d1c5e16ec98e4af4f92f0eef6c0e5dfe7646.zip"},"_bibtex":{"value":"@inproceedings{\noren2024proving,\ntitle={Proving Test Set Contamination in Black-Box Language Models},\nauthor={Yonatan Oren and Nicole Meister and Niladri S. Chatterji and Faisal Ladhak and Tatsunori Hashimoto},\nbooktitle={The Twelfth International Conference on Learning Representations},\nyear={2024},\nurl={https://openreview.net/forum?id=KS8mIvetg2}\n}"},"paperhash":{"value":"oren|proving_test_set_contamination_in_blackbox_language_models"}},"id":"KS8mIvetg2","forum":"KS8mIvetg2","signatures":["ICLR.cc/2024/Conference/Submission9019/Authors"],"readers":["everyone"],"writers":["ICLR.cc/2024/Conference","ICLR.cc/2024/Conference/Submission9019/Authors"],"number":9019,"odate":1697213872796,"invitations":["ICLR.cc/2024/Conference/-/Submission","ICLR.cc/2024/Conference/-/Post_Submission","ICLR.cc/2024/Conference/Submission9019/-/Revision","ICLR.cc/2024/Conference/Submission9019/-/Rebuttal_Revision","ICLR.cc/2024/Conference/-/Edit","ICLR.cc/2024/Conference/Submission9019/-/Camera_Ready_Revision"],"domain":"ICLR.cc/2024/Conference","tcdate":1695534774121,"cdate":1695534774121,"tmdate":1713672762124,"mdate":1713672762124,"pdate":1705411055804,"version":2,"details":{"writable":false}}],"count":1}