OpenAI's "o1-preview" Arrives: Is This the Next Leap Towards Artificial General Intelligence?!
On September 12, 2024, OpenAI released its new generative AI model "o1" (pronounced "oh-one"), which had been the subject of much speculation. I had the opportunity to try it out, and here are my initial impressions.
1. Model Overview
As a new generative AI model, o1 has various features, but the key points are as follows:
Specialized for scientific, coding, and mathematical reasoning.
Available in two versions: OpenAI o1 and OpenAI o1-mini.
Currently in preview with limited functionality and performance.
Not a successor to GPT-4.
OpenAI o1 has a limited usage of 30 requests per week.
Price: OpenAI o1 is about six times more expensive than GPT-4o.
For more details, please refer to the official website (1).
Compared to GPT-4o, o1-preview demonstrates superior performance in coding, data analysis, and mathematics, as shown below. It seems likely that o1 will excel in fields where existing generative AI has struggled to achieve satisfactory accuracy. However, because it utilizes Chain of Thought reasoning to arrive at answers, it can take a considerable amount of time to respond, making it unsuitable for tasks requiring real-time answers.
GPT-4o vs. o1-preview: Task Performance Comparison
2. Challenging o1 with Game24
Let's test the capabilities of o1-preview. A common example of a task that generative AI struggles with is Game24.
This is a simple mathematical puzzle with the following rules:
Use the four given numbers and basic arithmetic operations (addition, subtraction, multiplication, division).
Create a mathematical expression that results in 24.
Each of the four given numbers can be used only once.
Example: 13, 10, 9, 4 → (10 - 4) × (13 - 9)
When attempting this with o1-preview, it produced the following result. It successfully solved the puzzle! The response took about 15 seconds, likely due to internal trial-and-error processes.
Game24 instruction
o1-preview Game24 Trial Result
When trying the same with GPT-4o:
GPT4o Game24 Trial Result
GPT-4o fails to provide a correct answer. This highlights o1's superiority in tasks that require strong logical reasoning.
3. The Impact on the Future of Generative AI
o1's newfound capabilities are attributed to its incorporation of Chain of Thought reasoning, enabling it to generate task-specific chains of thought and produce more reliable correct answers. However, the Chain of Thought process, which demonstrates how the correct answer is derived, is not revealed to the user. This is somewhat disappointing, as users typically want to understand not only the correct answer but also "why" that answer was reached. Therefore, it's understandable that some may perceive it as a black box. We hope that the open-source development community will further research this aspect and share their findings with the world. With excellent open-source generative AI models like Llama and Gemma currently available, we believe that user verification of Chain of Thought will become possible in the near future.
Conclusion
o1-preview seems to have been received with a level of excitement not seen since the release of GPT-4 in March 2023. In the next installment, I plan to explore the technology behind this impressive generative AI, based on external speculation. That's all for today. Stay tuned!
1) Introducing OpenAI o1, OpenAI, Sep 12, 2024
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.