Loading
Loading
## Overview This dataset contains **60 advanced mathematical problems in English**, part of the multilingual AIME25 benchmark designed to evaluate reasoning capabilities in Large Language Models. Based on AIME-level competition mathematics, these problems are solved by fewer than 5% of top high school math competition participants. Each problem has 4 multiple-choice options with one correct answer and three plausible distractors. ## What It Tests - **Mathematical Reasoning**: Multi-step problem-solving in number theory, geometry, algebra, combinatorics, and probability - **Logical Inference**: Complex reasoning requiring creative approaches beyond formula application - **Numerical Precision**: Exact computation with large numbers and complex calculations ## Who Should Contribute **Creators & Validators**: Mathematics competition coaches (AIME/IMO), professional mathematicians, PhD students, advanced math educators **Reviewers**: Math professors, Olympiad committee members, assessment specialists **Researchers**: Teams evaluating LLM mathematical reasoning and educational AI ## Why Experts? AIME problems are solved by <5% of top math competition participants. Contributors need deep expertise to verify correctness, ensure clarity, validate plausible wrong answers, and maintain mathematical rigor. ## Use Cases Benchmarking LLM mathematical reasoning, evaluating educational AI, multi-step inference research, and cross-lingual assessment (when combined with Spanish and Chinese versions).
60
Total Prompts
2440
Scored Responses
5
Contributors
0.5
Avg Score
| Rank | Model | Avg. Score | Prompts Tested | Avg. Response Time |
|---|---|---|---|---|
🥇1 | x-ai/grok-4 | 0.97 | 60 | 402ms |
🥈2 | google/gemini-3-pro-preview | 0.97 | 60 | 99ms |
🥉3 | openai/gpt-5-codex | 0.94 | 60 | 292ms |
4 | google/gemini-2.5-pro | 0.90 | 60 | 216ms |
5 | x-ai/grok-4-fast | 0.90 | 60 | 148ms |
Zerui
ownerPrinceton University75325312-2baf-496c-8400-64890cb7e397
Mikołaj Glinka
adminUniversity of Warsaw13be1550-1ff0-4bf6-b3c8-679b858b35e1
mis***@gmail.com
adminf51d0200-afed-4994-8e46-9f48cef6610a
mik***@forest-ai.org
ContributorWarsaw University of Technology28c9e78b-a9a0-4e42-92b2-c81bf439a762
Muhammed Karamuk
ContributorIstanbul University5a67a84a-95ab-4091-b675-996440b596d3