Generative AI has a reputation for struggling with math, often making mistakes even with simple elementary-level arithmetic. However, Google DeepMind recently announced that their AI achieved a score equivalent to a silver medal in the International Mathematical Olympiad (IMO)(1). Based on this article, let's delve into predicting the future of next-generation generative AI.

1. How Did AI Solve Complex Math Problems?

The achievement is impressive:

“Today, we present AlphaProof, a new reinforcement-learning based system for formal math reasoning, and AlphaGeometry 2, an improved version of our geometry-solving system. Together, these systems solved four out of six problems from this year’s International Mathematical Olympiad (IMO), achieving the same level as a silver medalist in the competition for the first time.”

This is an amazing score, just shy of a gold medal. We'll focus on AlphaProof, the reasoning system, out of the two models.

AlphaProof is explained as follows:

“AlphaProof is a system that trains itself to prove mathematical statements in the formal language Lean. It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself how to master the games of chess, shogi and Go.”

In simple terms, while there is abundant data available for math problems written in natural language, generative AI tends to make plausible yet incorrect statements (hallucinations), making it difficult to utilize effectively. Therefore, Google utilized its generative AI, Gemini, to translate math problems into the formal language Lean. This formal representation was then fed into AlphaZero, known for its long-term planning and reasoning capabilities, for computation. The chart below provides a clear illustration.

AlphaZero has already proven its reasoning prowess in board games like Go. This achievement demonstrates the successful application of its capabilities to the realm of mathematics. Remarkable!

2. Implications from AlphaZero

Let's briefly revisit AlphaZero, which made a reappearance. It is a groundbreaking AI that combines RL (Reinforcement Learning) and MCTS (Monte Carlo Tree Search). The initial model gained fame in March 2016 as the first AI to defeat a top professional Go player. It's important to emphasize that AlphaZero achieved superhuman ability without relying on human-created data; it trained itself using self-generated data. Upon hearing this for the first time, many might wonder, "How is that even possible?" AlphaZero accomplishes this through self-play, generating massive amounts of training data by playing against itself. Refer to the research paper(2) for more details. For context, consider AlphaGo as the initial version of AlphaZero.

3. The Fusion of Current Generative AI and AlphaGo

Interestingly, Demis Hassabis, CEO of Google DeepMind, recently hinted at the future of their generative AI(3). The key takeaways are:

“Gemini” is a natively multimodal model.
It can understand various aspects of the world, including language, images, videos, and audio.
Current models are incapable of long-term planning and problem-solving.
DeepMind possesses expertise in this field through AlphaGo.
The next-generation model will be an agent that fuses Gemini and AlphaGo.

It's plausible to view the project that secured a silver medal in the Math Olympiad as a step towards overcoming the limitations of generative AI in "long-term planning." However, one might question, "How exactly will this fusion work?" A prominent long-form paper (4) in June of this year provides clues.

A look back at AlphaGo—the first AI system that beat the world champions at Go, decades before it was thought possible—is useful here

• In step 1, AlphaGo was trained by imitation learning on expert human Go games. This gave it a foundation.

• In step 2, AlphaGo played millions of games against itself. This let it become superhuman at Go:

remember the famous move 37 in the game against Lee Sedol, an extremely unusual but brilliant move a human would never have played. Developing the equivalent of step 2 for LLMs is a key research problem for overcoming the data wall (and, moreover, will ultimately be the key to surpassing human-level intelligence).

AlphaGo eventually transitioned to self-play, generating its own training data and eliminating the need for human input. This is a remarkable achievement achieved through the combination of "Reinforcement Learning and MCTS." The future of next-generation AI hinges on how generative AI can be trained using this mechanism.

Conclusion:

The ability to execute long-term plans opens up a plethora of possibilities. Imagine AI formulating long-term investment strategies or serving as legal advisors in court, excelling in tasks that demand prolonged reasoning and debate. The world is undoubtedly on the verge of transformation, and the future is incredibly exciting.

That's all for today. Stay tuned!

1) AI achieves silver-medal standard solving International Mathematical Olympiad problems, Google DeepMind, 25 JULY 2024
2)Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Google DeepMind, 5 DEC 2017
3)Unreasonably Effective AI with Demis Hassabis, Google DeepMind, 14 AUG 2024 (around 18:00)
4) SITUATIONAL AWARENESS　p28, The Decade Ahead, Leopold　Aschenbrenner, June 2024　

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.