Answers include:
- Memorization vs. Generalization : Quantifying Data Leakage in NLP Performance Evaluation
- PaLM: Scaling Language Modeling with Pathways in the evaluation section
- Are Multilingual Models the Best Choice for Moderately Under-Resourced Languages? A Comprehensive Assessment for Catalan
- Data Contamination: From Memorization to Exploitation
- Quantifying Memorization Across Neural Language Models
- XCODEEVAL: An Execution-based Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
...but lbh I haven't read any of these.