Join us to solve the problem of cheating on AI Benchmarking

🎉

Our paper "Benchmarking is Broken - Don't Let AI be its Own Judge" just got accepted to NeurIPS 2025 (Read on arXiv)

We will be publishing a series of subsequent papers on PeerBench and want to reward those who make it possible. Get your name in the paper by contributing to the community (create prompts, comment, review).

PeerBench.ai is an open-source, non-profit community implementation of the NeurIPS paper, bringing the research to life.

General Review
Review general prompts and help improve AI benchmarking quality
Logical Puzzles - English
Review puzzles and contribute to logical reasoning AI evaluation
AIME25 Mathematical Reasoning Benchmark - English
Review AIME25 mathematical reasoning prompts
Enhanced History Questions
Review history prompts: knowledge in combination with reasoning and math skills
Polish Language Mix of Tasks
Review Polish language prompts: culture, language, history, geography, and more
Ukrainian Grammar
Review Ukrainian grammar prompts: updated rules of Ukrainian grammar