DeepMind and OpenAI models solve mathematical problems at the level of the best students
- Юджин Ли
- Jul 29
- 3 min read
For the first time, large language models were made on a par with gold medalists at the International Mathematical Olympiad.
Google DeepMind announced on July 21 that its software hacked a set of mathematical problems at the level of the best high school students in the world, receiving a gold medal in the International Mathematical Olympiad. At first glance, this marked only a slight improvement compared to last year's figures. The company's system was in the upper range of the silver medal standard at the 2024 Olympics, while this year it was valued in the lower range for the human gold medalist.
But this year's estimates hide a "big paradigm shift," says Thang Luong, a computer scientist at DeepMind in Mountain View, California. The company achieved its previous exploits using two artificial intelligence AI) tools specially designed to perform strict logical steps in mathematical calculations, called AlphaGeometry and AlphaProof. This process required human experts to first translate problem statements into something similar to a programming language, and then translate AI solutions back into English.
"This year, everything is natural language, from end to end," says Luong. The team used a large language model (LLM) called DeepThink, which is based on its Gemini system, but with some additional developments that made it better and faster in the production of mathematical arguments, such as parallel circulation with several thought chains. "For a long time I didn't think we could go that far with LLM," Luong adds.
DeepThink scored 35 out of 42 points on 6 tasks that were given to the participants of this year's Olympics. In accordance with the agreement with the organizers, the computer solutions were noted by the same judges who evaluated the participants.
Separately, the creator of ChatGPT OpenAI, based in San Francisco, California, had his own LLM, which solves the same problems of the mathematical Olympiad at the level of a gold medal, but evaluated its solutions independently.
Impressive performance
For many years, many AI researchers ended up in one of two camps. Until 2012, the leading approach was to code the rules of logical thinking in the machine manually. Since then, neural networks that learn automatically by learning from huge data treasures have made a series of sensational breakthroughs, and tools such as ChatGPT from OpenAI have now become widely used.
Gary Marcus, a neuroscientist at New York University (NYU) in New York, called the results of DeepMind and OpenAI "very impressive". Marcus is a supporter of the "manual coding logic" approach, also known as Neurosymbolic AI - and a frequent critic of what he considers hype around LLM. Nevertheless, writing on Substack with New York University computer scientist Ernest Davis, he commented that "being able to solve mathematical problems at the level of 67 best high school students in the world is really good skills in solving mathematical problems".
It remains to be seen whether the superiority of LLM on IMO issues will remain, or whether neurosymbolic AI will return to the top. "At the moment, the two camps continue to develop," says Luong, who is working on both approaches. "They could get together."
His team has already experimented with the use of LLM to automate the translation of mathematical statements from natural language into a formal system that AlphaGeometry can read.
Systems such as AlphaProof also have the advantage that they can confirm the correctness of their proofs, while proofs written by the LLM must be checked by people, as mathematical works written by humans do. Many mathematicians work on translating human-written proofs into machine-readable language so that computers can check their correctness.
Ready for research?
Mathematician Kevin Buzzard from Imperial College London wrote on the Zulip social media platform that the success of the Mathematics Olympiad does not necessarily mean that the young mathematician is ready to conduct advanced research. In addition, he added, this is an "open question" whether the indicators of these systems will lead to the fact that they will be able to solve complex research issues.
Ken Ono, a mathematician from the University of Virginia in Charlottesville, agrees. "I see AI as valuable research partners, providing quick access to scientific literature and data summaries, as well as offering effective strategies to solve surprisingly complex problems," he says. But he adds that "these tests and benchmarks do not correspond to what theoretical mathematicians do".
DeepMind says that later it will allow some researchers to work with the DeepThink version. "Very soon we will be able to have an AI that cooperates with mathematics," says Luong.


















Comments