Reinforcement learning has become a hot topic since the release of OpenAI's o1-preview. Looking back, it was Google DeepMind's AlphaGo, released in March 2016, that truly brought reinforcement learning into the public eye. Go, with its vast search space, was traditionally a formidable challenge for computers. Amateur high-dan levels were roughly the limit at the time. However, AlphaGo, combining reinforcement learning and Monte Carlo Tree Search (MCTS), exceeded expert expectations, becoming the first AI Go player to defeat a top professional. Inspired by this, we've launched our own AI Go project, "ToshiStats-Go project," to research reinforcement learning. We're excited to see what we can achieve.
1. Creating a Go Game Environment
We've decided to build our own Go game environment from scratch. Given the exceptional coding capabilities of o1-preview, we're using it as a coding assistant for this project. We're iteratively developing the code by requesting o1-preview to generate the Go game environment code, executing it in Google Colab, then requesting further refinements based on the results, and repeating the process. Within a few iterations, we were able to establish a basic framework and a functional environment. While we can't perfectly implement a complex game like Go, we've created something akin to "simple-go." This should be sufficient for implementing reinforcement learning and improving its accuracy. Below is an example of o1-preview's explanation of a code modification. As you can see, it's quite detailed.
2. Trying a Game of Go
Let's give it a try! The current AI model plays random moves, so it's not very strong. As shown in the example below, a human can win with careful play. While a 9x9 board is available, the calculations can be time-consuming, so we'll stick with a 5x5 board for now. It's enjoyable enough, and if you'd like to try it yourself, please download the Colab notebook from our Github repository (1). A GPU is not required.
3. Perfect Go Rules Are Difficult
Go has some very complex rules. In particular, determining the life and death of stones, especially in the endgame, proved challenging. Implementing "ko" and "seki" also seems difficult. Connecting to an external Go system might solve these issues, but for now, we'll continue with a lightweight environment that completes calculations within the notebook to facilitate reinforcement learning experimentation. We'll strive to make this series engaging and easy to follow, comparing our progress with simpler games like Gomoku or connect five. We appreciate your continued interest.
So, there you have it! We've successfully implemented a Go playing environment in Colab. From here, we'll dive into reinforcement learning and begin training our AI Go player. Stay tuned!
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.