An artificial intelligence system developed by Google DeepMind has achieved gold medal-level performance at the International Mathematical Olympiad, marking a significant milestone in machine reasoning and mathematical problem-solving capabilities.
The advanced version of Google’s Gemini model with Deep Think technology solved five of six problems at the 2025 International Mathematical Olympiad, earning 35 out of 42 possible points and surpassing the gold medal threshold. The achievement represents a dramatic leap from last year, when Google’s combined AI systems earned only a silver medal by solving four problems.
“We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points — a gold medal score,” said IMO President Prof. Dr. Gregor Dolinar. “Their solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow.”
The International Mathematical Olympiad stands as the world’s most prestigious competition for young mathematicians, held annually since 1959. Each participating country sends up to six elite pre-university students who compete to solve six exceptionally difficult problems across algebra, combinatorics, geometry and number theory. Only about 8 percent of contestants typically receive gold medals, making the achievement particularly noteworthy.
Racing Against OpenAI
Google’s announcement came just days after OpenAI revealed that its experimental reasoning model had also achieved gold medal-level performance on the same 2025 IMO problems. However, the two companies took different approaches to their achievements, highlighting the competitive intensity in AI development.
While Google officially entered its system in the competition and received formal certification from IMO organizers, OpenAI evaluated its model independently on the publicly available problems. OpenAI researcher Alexander Wei announced their results first over the weekend, before official IMO results were released, drawing some criticism for potentially overshadowing the student competitors.
“Google waited for the IMO to officially certify the competition results rather than release its results over the weekend out of respect for the students in the competition,” said Thang Luong, who led Google’s technical direction for the IMO effort.
Both AI systems solved five of the six problems and earned identical scores of 35 points, demonstrating that multiple research teams have now reached this significant benchmark. Only 67 of the 630 human contestants this year achieved gold medal status, putting the AI performance in the top 10 percent of all participants.
Technical Breakthrough
What makes this year’s achievement particularly impressive is the fundamental shift in how the AI approaches mathematical problems. Unlike Google’s previous systems that required experts to translate problems from natural language into specialized mathematical programming languages like Lean, the new Gemini model operates entirely in natural language.
“AlphaGeometry and AlphaProof required experts to first translate problems from natural language into domain-specific languages, such as Lean, and vice-versa for the proofs. It also took two to three days of computation,” Google researchers explained. “Our advanced Gemini model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions — all within the 4.5-hour competition time limit.”
This represents a significant advance in making AI mathematical reasoning more accessible and practical. The system can now read standard problem statements and produce complete mathematical proofs without requiring specialized formatting or extended computation time.
General Purpose vs. Specialized Systems
The breakthrough carries broader implications beyond mathematics competitions. Both Google and OpenAI emphasized that their systems represent advances in general-purpose reasoning rather than narrow, task-specific programming.
“Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling,” said OpenAI’s Alexander Wei.
This contrasts sharply with previous AI achievements in games like Go, chess, or poker, where researchers spent years developing systems that excelled in one narrow domain. The IMO-capable models are built on the same general-purpose language models used for everyday tasks like writing and coding.
“Just to spell it out as clearly as possible: a next-word prediction machine (because that’s really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre-college prodigies,” noted OpenAI researcher Sebastien Bubeck.
Competition Context
The mathematical problems that stump most humans require sustained creative thinking over periods that can stretch to hours. IMO problems demand not just computational skill but genuine mathematical insight and the ability to construct rigorous logical arguments across multiple pages of reasoning.
This year’s competition was held in Australia, with problems covering advanced topics in algebra, geometry, number theory and combinatorics. The questions are designed to challenge students who have often trained for thousands of hours and frequently go on to become professional mathematicians.
Prof. Sir Timothy Gowers, an IMO gold medalist and Fields Medal winner who helped evaluate Google’s solutions last year, noted the significance of the achievement. “The fact that the program can come up with a non-obvious construction like this is very impressive, and well beyond what I thought was state of the art,” he said.
Looking Forward
The rapid progress in mathematical reasoning capabilities suggests that AI systems may soon become valuable tools for professional mathematicians and scientists. Google envisions “agents that combine natural language fluency with rigorous reasoning” becoming “invaluable tools for mathematicians, scientists, engineers, and researchers, helping us advance human knowledge on the path to AGI.”
However, both companies were careful to emphasize that these achievements should complement rather than replace human mathematical talent. The IMO exists to promote the “beauty of mathematics” to high school students and encourage them to pursue careers in the field.
“Our leap from silver to gold medal-standard in just one year shows a remarkable pace of progress in AI,” Google noted, while acknowledging that the real purpose of mathematics competitions remains inspiring the next generation of human mathematicians.
The timing and competitive nature of the announcements underscore how rapidly AI capabilities are advancing and how intensely companies are competing to demonstrate their technological leadership in reasoning and mathematical problem-solving.
As AI systems continue to approach and surpass human-level performance in increasingly complex cognitive tasks, the mathematical olympiad achievement represents another significant milestone in the broader development of artificial general intelligence. Yet questions remain about when and how these powerful reasoning capabilities will become available to researchers and the broader public, with both companies indicating their gold medal systems remain experimental and not ready for public release for several months.
