The combination of Monte Carlo Tree Search (MCTS) and generative AI could be a real game-changer in the future!
"Monte Carlo Tree Search," a search technique, gained fame in March 2016 when AlphaGo became the first AI to defeat a top professional Go player. Its effectiveness increases significantly when combined with reinforcement learning, making it a powerful tool. However, implementing it can be quite challenging. With the recent release of ChatGPT canvas (1) on October 3rd, I want to explore implementing Monte Carlo Tree Search in a simple game. Let's begin!
1. AlphaGo and Monte Carlo Tree Search
AlphaGo, which decisively defeated 18-time Go world champion Lee Sedol in March 2016, owed its strength to the combination of reinforcement learning and Monte Carlo Tree Search (MCTS), as discussed previously. A research paper (2) illustrates the performance comparison of various Go AI programs.
The leftmost "Raw network" doesn't utilize MCTS during inference, resulting in lower performance compared to AlphaGo Zero next to it. This highlights the significant contribution of MCTS. In AlphaGo Zero, MCTS is executed as shown in the diagram below. The action probability 'p' is trained to approach the probability 'π' of the next move selected by MCTS, gradually improving accuracy. For details, please refer to (2).
2.Implementing MCTS in a simple game
Witnessing MCTS's success in AlphaGo makes you want to try it out yourself. The recent release of ChatGPT canvas (1) from OpenAI provides the perfect opportunity. As their message "A new way of working with ChatGPT to write and code" suggests, it offers a new user experience. I promptly asked ChatGPT canvas, "Could you make code of Tic Tac Toe by using python and MCTS?"
Unlike regular ChatGPT, a separate window opens and generates Python code as shown below.
I also wanted an explanation, so I added a prompt to provide it in English. Since the generated code cannot be executed within the canvas, I copied and pasted it into Google Colab to run it.
I was able to enjoy the game as shown below. Fantastic!
The generative AI model GPT-4o, powering ChatGPT canvas, appears to have improved coding abilities, likely due to post-training with data distilled from the recently released, logically robust o1 preview. While I encountered occasional errors, copying and pasting them into a prompt for correction quickly resolved the issues. It felt like a significant upgrade to a full-fledged code assistant. I'm eager to use it more. The generated code can be found at (3).
3.Promising combination of Generative AI and MCTS
Research on incorporating the AlphaGo mechanism into generative AI is actively underway. Versions after AlphaGo Zero, released in 2017, don't require any human input (in this case, game records). This freedom from data constraints makes it a promising technology to address training data scarcity. The combination of reinforcement learning and MCTS offers flexible design possibilities, making it highly intriguing for developers. From the perspective of test-time computing, highlighted by OpenAI's o1-preview, it's a technology worth focusing on. In the next post, I plan to delve deeper into MCTS by examining published research papers. Stay tuned!
What do you think? The concept of MCTS is relatively simple, which broadens its applicability. It works well with ChatGPT canvas, and I'm excited to continue experimenting. Currently, it's available only to paid subscribers, but it's expected to be available to free users upon general release. I'm looking forward to it. That's all for today. Stay tuned!
1)Introducing canvas, OpenAI, Oct 3 2024
2)Mastering the game of Go without human knowledge, David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert , Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis, GoogleDeepMind, Oct 19 2017, VOL 550, NATURE, 355
3)Monte-Carlo-Tree-Search-with-ChatGPT-canvas, Oct 6 2024
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.