Toshifumi Kuga

October 19, 2024

AlphaGo, AI, Reinforcement Learning, OpenAI, o1

Paying homage to AlphaGo, we've launched our own AI Go project at ToshiStats!

Toshifumi Kuga

October 19, 2024

AlphaGo, AI, Reinforcement Learning, OpenAI, o1

Reinforcement learning has become a hot topic since the release of OpenAI's o1-preview. Looking back, it was Google DeepMind's AlphaGo, released in March 2016, that truly brought reinforcement learning into the public eye. Go, with its vast search space, was traditionally a formidable challenge for computers. Amateur high-dan levels were roughly the limit at the time. However, AlphaGo, combining reinforcement learning and Monte Carlo Tree Search (MCTS), exceeded expert expectations, becoming the first AI Go player to defeat a top professional. Inspired by this, we've launched our own AI Go project, "ToshiStats-Go project," to research reinforcement learning. We're excited to see what we can achieve.

1. Creating a Go Game Environment

We've decided to build our own Go game environment from scratch. Given the exceptional coding capabilities of o1-preview, we're using it as a coding assistant for this project. We're iteratively developing the code by requesting o1-preview to generate the Go game environment code, executing it in Google Colab, then requesting further refinements based on the results, and repeating the process. Within a few iterations, we were able to establish a basic framework and a functional environment. While we can't perfectly implement a complex game like Go, we've created something akin to "simple-go." This should be sufficient for implementing reinforcement learning and improving its accuracy. Below is an example of o1-preview's explanation of a code modification. As you can see, it's quite detailed.

o1-preview's explanation of code modification

2. Trying a Game of Go

Let's give it a try! The current AI model plays random moves, so it's not very strong. As shown in the example below, a human can win with careful play. While a 9x9 board is available, the calculations can be time-consuming, so we'll stick with a 5x5 board for now. It's enjoyable enough, and if you'd like to try it yourself, please download the Colab notebook from our Github repository (1). A GPU is not required.

3. Perfect Go Rules Are Difficult

Go has some very complex rules. In particular, determining the life and death of stones, especially in the endgame, proved challenging. Implementing "ko" and "seki" also seems difficult. Connecting to an external Go system might solve these issues, but for now, we'll continue with a lightweight environment that completes calculations within the notebook to facilitate reinforcement learning experimentation. We'll strive to make this series engaging and easy to follow, comparing our progress with simpler games like Gomoku or connect five. We appreciate your continued interest.

So, there you have it! We've successfully implemented a Go playing environment in Colab. From here, we'll dive into reinforcement learning and begin training our AI Go player. Stay tuned!

1) ToshiStatsGo-project

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

October 16, 2024

Nobel Prize in Chemistry, AGI, AlphaGo, Google DeepMind, Nobel Prize in Physics]

Reflections on the Future of AI Inspired by the 2024 Nobel Prizes in Physics and Chemistry

Toshifumi Kuga

October 16, 2024

Nobel Prize in Chemistry, AGI, AlphaGo, Google DeepMind, Nobel Prize in Physics]

Last week was truly astonishing. Two prominent figures in AI, Geoffrey Hinton and Demis Hassabis, were awarded the Nobel Prizes in Physics and Chemistry, respectively. To my knowledge, no one had predicted these individuals as Nobel laureates. The world must be equally surprised. I'd like to take this opportunity to reflect on their achievements and speculate on the future of AI.

1.The Nobel Prize in Physics

Let's start with Geoffrey Hinton, a professor at the University of Toronto, who has been researching AI since the 1970s. In 2018, he shared the Turing Award, a prestigious prize for computer scientists, with two other researchers. He's often called the "Godfather of AI." Now 76, he's still actively working. I actually took a massive open online course (MOOC) he offered back in 2013. It was a valuable lecture that led me into the world of AI. Over a decade ago, courses teaching Neural Networks were scarce, so I was fortunate to stumble upon his lectures. Back then, my knowledge was limited to logistic regression models, so much of what he taught seemed incredibly complex and I remember thinking, "This seems amazing, but probably won't be immediately useful." I never imagined he'd win the Nobel Prize in Physics ten years later. Fortunately, his lectures from that time appear to be accessible on the University of Toronto website (1). I highly recommend checking them out. (The Nobel Prize in Physics was awarded jointly to John Hopfield and Geoffrey Hinton.)

2. The Nobel Prize in Chemistry

The Nobel Prize in Chemistry recipient is considerably younger, Demis Hassabis, currently 48. He is a co-founder of one of the world's leading AI companies, Google DeepMind. AlphaFold2 is specifically cited for his award. It's a groundbreaking AI model for predicting the 3D structure of proteins, and is said to have made significant contributions to drug discovery and other fields. He is not only a brilliant AI researcher but also a business leader at Google DeepMind. When presenting to a general audience, he mostly talks about the achievements of Google DeepMind, rather than his personal accomplishments. There's no doubt that the catalyst that propelled this company to the top tier of AI companies was AlphaGo, which appeared about four years before AlphaFold2, in March 2016. The reinforcement learning used in this model is still actively being researched to give large language models true logic and reasoning capabilities. AlphaGo inspired me to seriously study reinforcement learning. I wrote about it on my blog in April 2016. It's a fond memory. (The Nobel Prize in Chemistry was awarded jointly to David Baker, John M. Jumper, and Demis Hassabis.)

3. Scientific and Technological Development and AI

I completely agree that the two individuals discussed here have pioneered new paradigms in AI. However, their being awarded the Nobel Prizes in Physics and Chemistry is a landmark event, demonstrating that AI has transcended its own boundaries and become an indispensable tool for scientific advancement as a whole. Going forward, we need to discuss how to leverage AI and integrate it into all aspects of human intellectual activity. Further development might even lead to the kind of intelligence explosion described by Leopold Aschenbrenner's "SITUATIONAL AWARENESS" that I previously mentioned on my blog, potentially surpassing human intelligence. The implications of these Nobel Prizes are profound.

What are your thoughts? I'm a business person, but I believe the same applies to the business world. With the incredibly rapid pace of AI development, I hope to offer new insights based on a clear understanding of these trends. That's all for today. Stay tuned!

(1) X.post by Geoffrey Hinton, Jan 16, 2019

Toshifumi Kuga

October 6, 2024

AGI, AlphaGo, Reinforcement Learning, Canvas, mcts, OpenAI

The combination of Monte Carlo Tree Search (MCTS) and generative AI could be a real game-changer in the future!

Toshifumi Kuga

October 6, 2024

AGI, AlphaGo, Reinforcement Learning, Canvas, mcts, OpenAI

"Monte Carlo Tree Search," a search technique, gained fame in March 2016 when AlphaGo became the first AI to defeat a top professional Go player. Its effectiveness increases significantly when combined with reinforcement learning, making it a powerful tool. However, implementing it can be quite challenging. With the recent release of ChatGPT canvas (1) on October 3rd, I want to explore implementing Monte Carlo Tree Search in a simple game. Let's begin!

1. AlphaGo and Monte Carlo Tree Search
AlphaGo, which decisively defeated 18-time Go world champion Lee Sedol in March 2016, owed its strength to the combination of reinforcement learning and Monte Carlo Tree Search (MCTS), as discussed previously. A research paper (2) illustrates the performance comparison of various Go AI programs.

Performance comparison of various Go AI programs.

The leftmost "Raw network" doesn't utilize MCTS during inference, resulting in lower performance compared to AlphaGo Zero next to it. This highlights the significant contribution of MCTS. In AlphaGo Zero, MCTS is executed as shown in the diagram below. The action probability 'p' is trained to approach the probability 'π' of the next move selected by MCTS, gradually improving accuracy. For details, please refer to (2).

2.Implementing MCTS in a simple game
Witnessing MCTS's success in AlphaGo makes you want to try it out yourself. The recent release of ChatGPT canvas (1) from OpenAI provides the perfect opportunity. As their message "A new way of working with ChatGPT to write and code" suggests, it offers a new user experience. I promptly asked ChatGPT canvas, "Could you make code of Tic Tac Toe by using python and MCTS?"

Unlike regular ChatGPT, a separate window opens and generates Python code as shown below.

I also wanted an explanation, so I added a prompt to provide it in English. Since the generated code cannot be executed within the canvas, I copied and pasted it into Google Colab to run it.

I was able to enjoy the game as shown below. Fantastic!

The generative AI model GPT-4o, powering ChatGPT canvas, appears to have improved coding abilities, likely due to post-training with data distilled from the recently released, logically robust o1 preview. While I encountered occasional errors, copying and pasting them into a prompt for correction quickly resolved the issues. It felt like a significant upgrade to a full-fledged code assistant. I'm eager to use it more. The generated code can be found at (3).

3.Promising combination of Generative AI and MCTS
Research on incorporating the AlphaGo mechanism into generative AI is actively underway. Versions after AlphaGo Zero, released in 2017, don't require any human input (in this case, game records). This freedom from data constraints makes it a promising technology to address training data scarcity. The combination of reinforcement learning and MCTS offers flexible design possibilities, making it highly intriguing for developers. From the perspective of test-time computing, highlighted by OpenAI's o1-preview, it's a technology worth focusing on. In the next post, I plan to delve deeper into MCTS by examining published research papers. Stay tuned!

What do you think? The concept of MCTS is relatively simple, which broadens its applicability. It works well with ChatGPT canvas, and I'm excited to continue experimenting. Currently, it's available only to paid subscribers, but it's expected to be available to free users upon general release. I'm looking forward to it. That's all for today. Stay tuned!

1)Introducing canvas, OpenAI, Oct 3 2024
2)Mastering the game of Go without human knowledge, David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert , Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis, GoogleDeepMind, Oct 19 2017, VOL 550, NATURE, 355
3)Monte-Carlo-Tree-Search-with-ChatGPT-canvas, Oct 6 2024

Toshifumi Kuga

August 23, 2024

AlphaGo, Google DeepMind, generative ai, AlphaProof

The Future of Generative AI: Predicting the Next Generation Based on Google DeepMind's Math Olympiad Breakthrough

Toshifumi Kuga

August 23, 2024

AlphaGo, Google DeepMind, generative ai, AlphaProof

Generative AI has a reputation for struggling with math, often making mistakes even with simple elementary-level arithmetic. However, Google DeepMind recently announced that their AI achieved a score equivalent to a silver medal in the International Mathematical Olympiad (IMO)(1). Based on this article, let's delve into predicting the future of next-generation generative AI.

1. How Did AI Solve Complex Math Problems?

The achievement is impressive:

“Today, we present AlphaProof, a new reinforcement-learning based system for formal math reasoning, and AlphaGeometry 2, an improved version of our geometry-solving system. Together, these systems solved four out of six problems from this year’s International Mathematical Olympiad (IMO), achieving the same level as a silver medalist in the competition for the first time.”

This is an amazing score, just shy of a gold medal. We'll focus on AlphaProof, the reasoning system, out of the two models.

AlphaProof is explained as follows:

“AlphaProof is a system that trains itself to prove mathematical statements in the formal language Lean. It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself how to master the games of chess, shogi and Go.”

In simple terms, while there is abundant data available for math problems written in natural language, generative AI tends to make plausible yet incorrect statements (hallucinations), making it difficult to utilize effectively. Therefore, Google utilized its generative AI, Gemini, to translate math problems into the formal language Lean. This formal representation was then fed into AlphaZero, known for its long-term planning and reasoning capabilities, for computation. The chart below provides a clear illustration.

AlphaZero has already proven its reasoning prowess in board games like Go. This achievement demonstrates the successful application of its capabilities to the realm of mathematics. Remarkable!

2. Implications from AlphaZero

Let's briefly revisit AlphaZero, which made a reappearance. It is a groundbreaking AI that combines RL (Reinforcement Learning) and MCTS (Monte Carlo Tree Search). The initial model gained fame in March 2016 as the first AI to defeat a top professional Go player. It's important to emphasize that AlphaZero achieved superhuman ability without relying on human-created data; it trained itself using self-generated data. Upon hearing this for the first time, many might wonder, "How is that even possible?" AlphaZero accomplishes this through self-play, generating massive amounts of training data by playing against itself. Refer to the research paper(2) for more details. For context, consider AlphaGo as the initial version of AlphaZero.

3. The Fusion of Current Generative AI and AlphaGo

Interestingly, Demis Hassabis, CEO of Google DeepMind, recently hinted at the future of their generative AI(3). The key takeaways are:

“Gemini” is a natively multimodal model.
It can understand various aspects of the world, including language, images, videos, and audio.
Current models are incapable of long-term planning and problem-solving.
DeepMind possesses expertise in this field through AlphaGo.
The next-generation model will be an agent that fuses Gemini and AlphaGo.

It's plausible to view the project that secured a silver medal in the Math Olympiad as a step towards overcoming the limitations of generative AI in "long-term planning." However, one might question, "How exactly will this fusion work?" A prominent long-form paper (4) in June of this year provides clues.

A look back at AlphaGo—the first AI system that beat the world champions at Go, decades before it was thought possible—is useful here

• In step 1, AlphaGo was trained by imitation learning on expert human Go games. This gave it a foundation.

• In step 2, AlphaGo played millions of games against itself. This let it become superhuman at Go:

remember the famous move 37 in the game against Lee Sedol, an extremely unusual but brilliant move a human would never have played. Developing the equivalent of step 2 for LLMs is a key research problem for overcoming the data wall (and, moreover, will ultimately be the key to surpassing human-level intelligence).

AlphaGo eventually transitioned to self-play, generating its own training data and eliminating the need for human input. This is a remarkable achievement achieved through the combination of "Reinforcement Learning and MCTS." The future of next-generation AI hinges on how generative AI can be trained using this mechanism.

Conclusion:

The ability to execute long-term plans opens up a plethora of possibilities. Imagine AI formulating long-term investment strategies or serving as legal advisors in court, excelling in tasks that demand prolonged reasoning and debate. The world is undoubtedly on the verge of transformation, and the future is incredibly exciting.

That's all for today. Stay tuned!

1) AI achieves silver-medal standard solving International Mathematical Olympiad problems, Google DeepMind, 25 JULY 2024
2)Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Google DeepMind, 5 DEC 2017
3)Unreasonably Effective AI with Demis Hassabis, Google DeepMind, 14 AUG 2024 (around 18:00)
4) SITUATIONAL AWARENESS　p28, The Decade Ahead, Leopold　Aschenbrenner, June 2024