Hello, DeepSeek-R1, released on January 20th (1), has sparked excitement among AI professionals and investors worldwide. I believe it's had an impact comparable to that of ChatGPT's emergence. Here, I'd like to consider why it has garnered so much global attention.
1. What Was New?
DeepSeek-R1's performance is remarkable. It stands shoulder-to-shoulder with OpenAI's o1 model, a veteran in inference models. Below is a comparison of performance across various benchmarks, where DeepSeek-R1 rivals the o1 model. The fact that a newcomer model suddenly matched OpenAI, the frontrunner in generative AI, is undoubtedly why the world is so astonished.
Performance comparison across various benchmarks
While DeepSeek-R1 appeared suddenly like a comet, there were several technical breakthroughs. Among the most significant is a training method called "GRPO." DeepSeek-R1 uses reinforcement learning to acquire advanced reasoning abilities in mathematics and coding. This is similar to existing generative AI. Reinforcement learning is a powerful training technique that doesn't require so-called "correct answer data," but it's a complex and resource-intensive approach. DeepSeek adopted a method that requires only one model instead of the usual two. This is "GRPO." Here is an overview, with PPO in the upper section representing a common technique used in existing models, and GRPO in the lower section being a new method.
PPO vs GRPO
In comparison, GRPO lacks the Value model present in PPO, and has only a Policy model. This means that only one model is needed instead of two. Since the model here refers to a massive generative AI, being able to complete training with only one model has a massive impact on resource saving. The fact that DeepSeek-R1, developed by a Chinese company unable to use the latest GPUs due to US semiconductor export restrictions, achieved such remarkable results might be related to this. For more details on the technical aspects, please refer to the research paper (2). The research paper (3) first introduced GRPO.
2. Why Did It Attract Global Attention?
DeepSeek-R1 was released as an open-weight model, available for anyone to download and use. Additionally, the entire training method, including GRPO, was published in detail in research papers. Until now, most generative AI models, with a few exceptions, could only be accessed via APIs and not downloaded. Furthermore, how they were trained was rarely disclosed, making them black boxes. In this context, the release of DeepSeek-R1, a cutting-edge model, in a usable form for AI researchers worldwide had a profound impact. Even if a model is called amazing, if the inner workings are unknown, neither criticism nor improvement suggestions can be made. With DeepSeek-R1, I feel that the open-source community can, for the first time, participate in the development of the most advanced generative AI models.
3. What Will Become of Generative AI in the Future?
AI developers around the world are already starting to adopt methods like GRPO in the development of state-of-the-art models. DeepSeek-R1 has proven that it's possible without incurring enormous costs. I'm currently focusing on a public project called "Open-R1" (4), which plans to disclose not only the training data but also the code, which was not revealed with DeepSeek-R1, and I believe this is revolutionary.
Open-R1
Of course, it is expected that such projects will start worldwide, and I am looking forward to that. It's exciting!
How was it? The landscape surrounding generative AI has changed in an instant. New generative AI models will continue to be created. It's really hard to take your eyes off of it. I will continue to deliver further news. Stay tuned!
1) DeepSeek-R1 Release, Jan 22, 2025
2) DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, DeepSeek-AI, Jan 22, 2025
3) DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, DeepSeek-AI, Apr 27,2025
4) Open-R1: a fully open reproduction of DeepSeek-R1, Hugging Face, Jan 28, 2025
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.