AIME25 Mathematical Reasoning Benchmark - English

Public

## Overview This dataset contains **60 advanced mathematical problems in English**, part of the multilingual AIME25 benchmark designed to evaluate reasoning capabilities in Large Language Models. Based on AIME-level competition mathematics, these problems are solved by fewer than 5% of top high school math competition participants. Each problem has 4 multiple-choice options with one correct answer and three plausible distractors. ## What It Tests - **Mathematical Reasoning**: Multi-step problem-solving in number theory, geometry, algebra, combinatorics, and probability - **Logical Inference**: Complex reasoning requiring creative approaches beyond formula application - **Numerical Precision**: Exact computation with large numbers and complex calculations ## Who Should Contribute **Creators & Validators**: Mathematics competition coaches (AIME/IMO), professional mathematicians, PhD students, advanced math educators **Reviewers**: Math professors, Olympiad committee members, assessment specialists **Researchers**: Teams evaluating LLM mathematical reasoning and educational AI ## Why Experts? AIME problems are solved by <5% of top math competition participants. Contributors need deep expertise to verify correctness, ensure clarity, validate plausible wrong answers, and maintain mathematical rigor. ## Use Cases Benchmarking LLM mathematical reasoning, evaluating educational AI, multi-step inference research, and cross-lingual assessment (when combined with Spanish and Chinese versions).

mathematicscc-by-4.0Public Submissions

Total Prompts

2440

Scored Responses

Contributors

48%

Average Overall Score

Model Leaderboard

Rank	Model	Avg. Score	Prompts Tested	Avg. Response Time
1	x-ai/grok-4	0.97	60	402ms
2	google/gemini-3-pro-preview	0.97	60	99ms
3	openai/gpt-5-codex	0.94	60	292ms
4	google/gemini-2.5-pro	0.90	60	216ms
5	x-ai/grok-4-fast	0.90	60	148ms

Rank	Model	Avg. Score	Prompts Tested	Avg. Response Time

Collaborators & Contributors

Zerui

OwnerPrinceton University

75325312-2baf-496c-8400-64890cb7e397

Mikołaj Glinka

AdminUniversity of Warsaw

13be1550-1ff0-4bf6-b3c8-679b858b35e1

mis*****@gmail.com

Admin

f51d0200-afed-4994-8e46-9f48cef6610a

Assem Hussein

ContributorSéchenyi István University

530052e6-7a47-4492-837e-1984e0670d61

del*****@mi.unc.edu.ar

Contributor

5d775c2d-8c14-4846-9049-d55aeaa48dfc

Text Search

Benchmark

Select a Benchmark...

Total Score Count

Minimum

Maximum

Bad Score Result

Minimum

Maximum

Threshold (for considering a score as bad)

Good Score Result

Minimum

Maximum

Threshold (for considering a score as good)

Total Reviews Count

Minimum

Maximum

Positive Reviews Count

Minimum

Maximum

Negative Reviews Count

Minimum

Maximum

AIME25 Mathematical Reasoning Benchmark - English

Model Leaderboard

Collaborators & Contributors

Filters

Total Score Count

Bad Score Result

Good Score Result

Total Reviews Count

Positive Reviews Count

Negative Reviews Count