AGI

Reflections on the Future of AI Inspired by the 2024 Nobel Prizes in Physics and Chemistry

Last week was truly astonishing. Two prominent figures in AI, Geoffrey Hinton and Demis Hassabis, were awarded the Nobel Prizes in Physics and Chemistry, respectively. To my knowledge, no one had predicted these individuals as Nobel laureates. The world must be equally surprised. I'd like to take this opportunity to reflect on their achievements and speculate on the future of AI.

 

1.The Nobel Prize in Physics

Let's start with Geoffrey Hinton, a professor at the University of Toronto, who has been researching AI since the 1970s. In 2018, he shared the Turing Award, a prestigious prize for computer scientists, with two other researchers. He's often called the "Godfather of AI." Now 76, he's still actively working. I actually took a massive open online course (MOOC) he offered back in 2013. It was a valuable lecture that led me into the world of AI. Over a decade ago, courses teaching Neural Networks were scarce, so I was fortunate to stumble upon his lectures. Back then, my knowledge was limited to logistic regression models, so much of what he taught seemed incredibly complex and I remember thinking, "This seems amazing, but probably won't be immediately useful." I never imagined he'd win the Nobel Prize in Physics ten years later. Fortunately, his lectures from that time appear to be accessible on the University of Toronto website (1). I highly recommend checking them out. (The Nobel Prize in Physics was awarded jointly to John Hopfield and Geoffrey Hinton.)

 


2. The Nobel Prize in Chemistry

The Nobel Prize in Chemistry recipient is considerably younger, Demis Hassabis, currently 48. He is a co-founder of one of the world's leading AI companies, Google DeepMind. AlphaFold2 is specifically cited for his award. It's a groundbreaking AI model for predicting the 3D structure of proteins, and is said to have made significant contributions to drug discovery and other fields. He is not only a brilliant AI researcher but also a business leader at Google DeepMind. When presenting to a general audience, he mostly talks about the achievements of Google DeepMind, rather than his personal accomplishments. There's no doubt that the catalyst that propelled this company to the top tier of AI companies was AlphaGo, which appeared about four years before AlphaFold2, in March 2016. The reinforcement learning used in this model is still actively being researched to give large language models true logic and reasoning capabilities. AlphaGo inspired me to seriously study reinforcement learning. I wrote about it on my blog in April 2016. It's a fond memory. (The Nobel Prize in Chemistry was awarded jointly to David Baker, John M. Jumper, and Demis Hassabis.)

                                                                                 AlphaGo

 

3. Scientific and Technological Development and AI

I completely agree that the two individuals discussed here have pioneered new paradigms in AI. However, their being awarded the Nobel Prizes in Physics and Chemistry is a landmark event, demonstrating that AI has transcended its own boundaries and become an indispensable tool for scientific advancement as a whole. Going forward, we need to discuss how to leverage AI and integrate it into all aspects of human intellectual activity. Further development might even lead to the kind of intelligence explosion described by Leopold Aschenbrenner's "SITUATIONAL AWARENESS" that I previously mentioned on my blog, potentially surpassing human intelligence. The implications of these Nobel Prizes are profound.

 

What are your thoughts? I'm a business person, but I believe the same applies to the business world. With the incredibly rapid pace of AI development, I hope to offer new insights based on a clear understanding of these trends. That's all for today. Stay tuned!

 


(1) X.post by Geoffrey Hinton,  Jan 16, 2019

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software




The combination of Monte Carlo Tree Search (MCTS) and generative AI could be a real game-changer in the future!

"Monte Carlo Tree Search," a search technique, gained fame in March 2016 when AlphaGo became the first AI to defeat a top professional Go player. Its effectiveness increases significantly when combined with reinforcement learning, making it a powerful tool. However, implementing it can be quite challenging. With the recent release of ChatGPT canvas (1) on October 3rd, I want to explore implementing Monte Carlo Tree Search in a simple game. Let's begin!

 

1. AlphaGo and Monte Carlo Tree Search
AlphaGo, which decisively defeated 18-time Go world champion Lee Sedol in March 2016, owed its strength to the combination of reinforcement learning and Monte Carlo Tree Search (MCTS), as discussed previously. A research paper (2) illustrates the performance comparison of various Go AI programs.

Performance comparison of various Go AI programs.

The leftmost "Raw network" doesn't utilize MCTS during inference, resulting in lower performance compared to AlphaGo Zero next to it. This highlights the significant contribution of MCTS. In AlphaGo Zero, MCTS is executed as shown in the diagram below. The action probability 'p' is trained to approach the probability 'π' of the next move selected by MCTS, gradually improving accuracy. For details, please refer to (2).

AlphaGo Zero "Monte Carlo Tree Search"


2.Implementing MCTS in a simple game
Witnessing MCTS's success in AlphaGo makes you want to try it out yourself. The recent release of ChatGPT canvas (1) from OpenAI provides the perfect opportunity. As their message "A new way of working with ChatGPT to write and code" suggests, it offers a new user experience. I promptly asked ChatGPT canvas, "Could you make code of Tic Tac Toe by using python and MCTS?"

Unlike regular ChatGPT, a separate window opens and generates Python code as shown below.

I also wanted an explanation, so I added a prompt to provide it in English. Since the generated code cannot be executed within the canvas, I copied and pasted it into Google Colab to run it.

I was able to enjoy the game as shown below. Fantastic!

The generative AI model GPT-4o, powering ChatGPT canvas, appears to have improved coding abilities, likely due to post-training with data distilled from the recently released, logically robust o1 preview. While I encountered occasional errors, copying and pasting them into a prompt for correction quickly resolved the issues. It felt like a significant upgrade to a full-fledged code assistant. I'm eager to use it more. The generated code can be found at (3).

 

3.Promising combination of Generative AI and MCTS
Research on incorporating the AlphaGo mechanism into generative AI is actively underway. Versions after AlphaGo Zero, released in 2017, don't require any human input (in this case, game records). This freedom from data constraints makes it a promising technology to address training data scarcity. The combination of reinforcement learning and MCTS offers flexible design possibilities, making it highly intriguing for developers. From the perspective of test-time computing, highlighted by OpenAI's o1-preview, it's a technology worth focusing on. In the next post, I plan to delve deeper into MCTS by examining published research papers. Stay tuned!

 

What do you think? The concept of MCTS is relatively simple, which broadens its applicability. It works well with ChatGPT canvas, and I'm excited to continue experimenting. Currently, it's available only to paid subscribers, but it's expected to be available to free users upon general release. I'm looking forward to it. That's all for today. Stay tuned!

 

1)Introducing canvas, OpenAI, Oct 3 2024
2)Mastering the game of Go without human knowledge,  David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert , Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis,  GoogleDeepMind, Oct 19 2017, VOL 550, NATURE, 355
3)Monte-Carlo-Tree-Search-with-ChatGPT-canvas, Oct 6 2024

 

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Looking at OpenAI's o1-preview, I thought, "Reinforcement learning might become the main character in AI development!"

It's been three weeks since OpenAI's o1 preview unveiled a new paradigm for generative AI. Its accuracy on logical tasks during inference is remarkable. Unfortunately, the mechanism isn't public, but it would be fascinating to know the state of the art in related technologies. Luckily, a helpful research paper (1) has been released by the University of California, Berkeley and Google DeepMind, which I'd like to introduce here and use to speculate on the mechanisms behind o1 preview. Let's begin!

 
  1. What We Learned from OpenAI's o1 Preview and the Latest Research Papers

According to the OpenAI website (2), we've learned two key things. First, o1 preview leverages reinforcement learning for enhanced performance. Second, it emphasizes "chain of thought" and prioritizes test-time computing. However, this information alone isn't enough for a fruitful technical discussion. Therefore, let's examine recent research papers on natural language processing using reinforcement learning. From several papers, I've selected one related to hierarchical reinforcement learning. This algorithm is reportedly effective for "multi-turn" conversations that extend over multiple exchanges. As you may have experienced, when using ChatGPT or similar models to obtain information, rarely do you get the desired results in a single attempt; often, several interactions with the generative AI are required. In such cases, the number of generated tokens or words steadily increases, creating a challenging situation for efficient training of the generative AI. This new algorithm aims to address this challenge. A possible application is the task of "maximizing customer satisfaction at the end of a multi-turn conversation with a generative AI assistant."

 

2. Hierarchical Reinforcement Learning

The algorithm presented in this paper (1) is called "hierarchical reinforcement learning" and is characterized by the following hierarchical structure:

The most notable aspect here is the two-tiered structure consisting of the Utterance level and the token level. Separating utterance-level language processing from the processing of individual minimal units of action at the token level is highly effective for efficient training. Typically, generative AI operates on "next token prediction," where it diligently predicts the next best word based on the prompt's instructions. Its accuracy is remarkable, often generating more polished language than I can. However, in "multi-turn" scenarios with continuous utterances, the number of tokens increases, making training more challenging. This is where reinforcement learning at the Utterance level comes into play, with rewards also being considered at this level. For example, a reward scould be devised where "+1" is awarded for successfully retrieving necessary information by searching a website and "0" for failure. This facilitates efficient training. Based on this reward, an action-value function is calculated and used for reinforcement learning at the token level. This reportedly enables significantly more efficient training. For further details, please refer to (1).

 

3. Flexibility in Reinforcement Learning Design

As we've seen, hierarchical reinforcement learning offers flexibility and a high degree of design freedom. While it's used here to separate utterance-level and token-level analysis, it appears to be employed for other enhancements as well. For example, a research paper (3) from Google DeepMind uses hierarchical reinforcement learning to improve self-correction capabilities:

“Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Existing approaches for training self-correction either require multiple models or rely on a more capable model or other forms of supervision. To this end, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM’s self-correction ability using entirely self-generated data. "

It's exciting to anticipate the various use cases that will likely emerge in the future. For more details, please refer to (3).

 

What do you think? The acclaim for o1-preview seems to be growing daily. While it's unlikely that the details of its mechanism will be revealed soon, speculating about it from the outside is crucial for understanding AGI. Next time, I'd like to consider the application examples of o1-preview. That's all for today. Stay tuned!

 

1) ArCHer: Training Language Model Agents via Hierarchical Multi-Turn, Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine,  Aviral Kumar, University of California, Berkeley, 1Google DeepMind,  Feb 29,2024
2) Introducing OpenAI o1, OpenAI, Sep 12, 2024
3) Training Language Models to Self-Correct via Reinforcement Learning, Aviral Kumar, Vincent Zhuang, Rishabh Agarwal, Yi Su, JD Co-Reyes , Avi Singh , Kate Baumli , Shariq Iqbal , Colton Bishop , Rebecca Roelofs , Lei M Zhang , Kay McKinney , Disha Shrivastava , Cosmin Paduraru , George Tucker , Doina Precup , Feryal Behbahani,  Aleksandra Faust,    Google DeepMind,  Sep 19,2024

 

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

OpenAI o1-preview: A Breakthrough in Generative AI, Introducing a Novel Paradigm

Last week, we introduced OpenAI o1(1). Despite still being in preview, it boasts high performance, and as evidenced by the leaderboard below(2), it seems to be widely regarded as having overwhelming performance, especially in mathematics and coding. In this article, we'd like to explore why OpenAI o1 demonstrates higher accuracy compared to existing generative AI models like GPT4.

Scores of various generative AIs in the mathematics field

 

1. Chain of Thought is Key

The star of the OpenAI o1 model is "Chain of Thought." This refers to "a series of intermediate reasoning steps," which was previously considered an important element of prompts created by users for existing generative AI models. By incorporating "Chain of Thought" into prompts, users enabled generative AI to engage in deeper and broader thinking before providing an answer, thereby improving accuracy. "Chain of Thought" became known to the public through a 2022 research paper(3). Please refer to it for details.

 

2. OpenAI o1 Can Generate Chain of Thought Independently

OpenAI o1 can generate Chain of Thought internally on its own. Users don't need to devise Chain of Thought themselves; it's generated automatically. This is why it achieved high accuracy in mathematics and coding. Unfortunately, OpenAI seems to have decided not to disclose the Chain of Thought itself. Users can only see a summary of it. If you're like most users, you're probably thinking, "I'd really like to see that!" We hope that OpenAI will change its policy and release it in the future.

 

3. Creating a Reward Model

OpenAI has released very little information about what we'll discuss from here on. Please note that the following is based on speculation drawn from previously published research papers and information shared by OpenAI researchers. To enable generative AI to automatically generate task-specific Chain of Thought for practical use, we must evaluate whether the generated Chain of Thought is actually correct. This is where the Reward model comes into play. A 2023 research paper(4) from OpenAI provides a detailed explanation of how to train a Reward model, so let's look to it for clues.

The data for training the Reward model takes the form of Chain of Thought, as shown below. This research paper limits the tasks to mathematics. Since it's challenging for humans to manually create Chain of Thought for each task, they are automatically generated using GPT4. This is called the Generator. Humans then label each step of the Chain of Thought generated by the Generator on a three-point scale (correct, incorrect, or neither). This completes the training data. In the example below, you can see that each step is assigned a three-point label. It must have been quite a task for humans to label a large amount of such data.

Training data for the Reward model

 

4. Training Generative AI Through Reinforcement Learning

Once the Reward model is complete, we can train the generative AI using reinforcement learning. As a result, the generative AI can generate the correct Chain of Thought for the task. We, the users, actually run OpenAI o1 and benefit from the generated Chain of Thought. Unfortunately, OpenAI has not disclosed the specific method for training OpenAI o1 using reinforcement learning. Since this directly affects accuracy, it's unlikely to be released in the future. However, researchers worldwide are working on this issue and have published several promising results. As this is a technology that will support the future development of generative AI, we would like to revisit it in a future article.

 

5. A New Paradigm for Generative AI

OpenAI's website includes the following statement and chart:

“Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them.”

Model computational cost and accuracy

 

Until now, there has been much discussion about how increasing the computational cost (time and number of parameters) in pre-training improves the accuracy of generative AI, but there hasn't been much in-depth discussion about the relationship between inference computational cost and accuracy. However, it has now become clear that by generating Chain of Thought itself and then providing an answer, generative AI can answer tasks requiring complex logical reasoning with high accuracy, albeit with significantly increased inference computational cost. The chart on the right above shows that accuracy improves as the computational cost at inference time increases. We believe this is a groundbreaking development. Therefore, it will be important to consider both training and inference computational costs for generative AI in the future. This marks the dawn of a new paradigm for generative AI.

 

OpenAI o1 has not only improved accuracy but has also brought about a new paradigm for the development of generative AI. We look forward to seeing how future generative AIs will follow in its footsteps. That's all for today. Stay tuned!

 



1) Introducing OpenAI o1, OpenAI, Sep 12, 2024
2) LMSYS Chatbot Arena Leader board
3) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Google Research, Brain Team, Jan 2023
4) Let’s Verify Step by Step, Hunter Lightman , Vineet Kosaraju, Yura Burda, Harri Edwards,  Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe,  OpenAI, May 2023

 




Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.





OpenAI's "o1-preview" Arrives: Is This the Next Leap Towards Artificial General Intelligence?!

On September 12, 2024, OpenAI released its new generative AI model "o1" (pronounced "oh-one"), which had been the subject of much speculation. I had the opportunity to try it out, and here are my initial impressions.

 

1. Model Overview

As a new generative AI model, o1 has various features, but the key points are as follows:

  • Specialized for scientific, coding, and mathematical reasoning.

  • Available in two versions: OpenAI o1 and OpenAI o1-mini.

  • Currently in preview with limited functionality and performance.

  • Not a successor to GPT-4.

  • OpenAI o1 has a limited usage of 30 requests per week.

  • Price: OpenAI o1 is about six times more expensive than GPT-4o.

For more details, please refer to the official website (1).

Compared to GPT-4o, o1-preview demonstrates superior performance in coding, data analysis, and mathematics, as shown below. It seems likely that o1 will excel in fields where existing generative AI has struggled to achieve satisfactory accuracy. However, because it utilizes Chain of Thought reasoning to arrive at answers, it can take a considerable amount of time to respond, making it unsuitable for tasks requiring real-time answers.

GPT-4o vs. o1-preview: Task Performance Comparison

 

2. Challenging o1 with Game24

Let's test the capabilities of o1-preview. A common example of a task that generative AI struggles with is Game24.

This is a simple mathematical puzzle with the following rules:

  • Use the four given numbers and basic arithmetic operations (addition, subtraction, multiplication, division).

  • Create a mathematical expression that results in 24.

  • Each of the four given numbers can be used only once.

Example: 13, 10, 9, 4 → (10 - 4) × (13 - 9)

When attempting this with o1-preview, it produced the following result. It successfully solved the puzzle! The response took about 15 seconds, likely due to internal trial-and-error processes.

Game24 instruction

o1-preview Game24 Trial Result

When trying the same with GPT-4o:

GPT4o Game24 Trial Result

GPT-4o fails to provide a correct answer. This highlights o1's superiority in tasks that require strong logical reasoning.

 

3. The Impact on the Future of Generative AI

o1's newfound capabilities are attributed to its incorporation of Chain of Thought reasoning, enabling it to generate task-specific chains of thought and produce more reliable correct answers. However, the Chain of Thought process, which demonstrates how the correct answer is derived, is not revealed to the user. This is somewhat disappointing, as users typically want to understand not only the correct answer but also "why" that answer was reached. Therefore, it's understandable that some may perceive it as a black box. We hope that the open-source development community will further research this aspect and share their findings with the world. With excellent open-source generative AI models like Llama and Gemma currently available, we believe that user verification of Chain of Thought will become possible in the near future.

 

Conclusion

o1-preview seems to have been received with a level of excitement not seen since the release of GPT-4 in March 2023. In the next installment, I plan to explore the technology behind this impressive generative AI, based on external speculation. That's all for today. Stay tuned!

 

1) Introducing OpenAI o1, OpenAI, Sep 12, 2024 

 

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Can you believe it? A prediction that "AGI will appear before us in 2027" has been announced by a former OpenAI researcher!

A surprising prediction has been announced. Leopold Aschenbrenner, a former researcher at OpenAI, claims that AGI, which matches the capabilities of human experts in various fields, will emerge in 2027, just four years from now. It’s hard to believe, but let’s take a look at his argument.

 

1.From the Past to the Present, and into the Future

To predict the future, it is important to understand how generative AI has developed from the past to the present. Here, the author introduces the concept of OOM (Orders of Magnitude). Simply put, if you create a graph where each unit increase represents a tenfold increase, the trajectory becomes a straight line. OOM=4 means 10,000 times. The vertical axis of this graph represents the computational power (physical computation and algorithmic efficiency) and is displayed in OOM. The current pinnacle of AI, GPT-4, is used as the benchmark.

GPT-2 appeared in 2019, and four years later, in 2023, GPT-4 was introduced. The performance improvement during this period is roughly OOM=5 (100,000 times). If GPT-2 is like a preschooler, then GPT-4 is like a smart high schooler. Now, if we extend this straight line from 2023 to 2027, we can predict that an AI with OOM=5 (100,000 times) higher performance than GPT-4 will be born. This level of AI is expected to achieve AGI. If it is reasonable to connect the points with a straight line, then this prediction is not entirely far-fetched.

 

2. From Which Fields Will AI Growth Emerge?

We have explained that AI will significantly improve its performance by 2027, but what technological innovations will make this possible? The author points to the following three drivers. This graph also has a vertical axis in OOM, so a one-unit increase means tenfold.

First, the blue part represents improvements in computational resource efficiency. This is achieved through the development of new GPUs and the construction of ultra-large GPU clusters.

The second green part is due to improvements in training and inference algorithms and training data. There are concerns that training data may become scarce in the near future, but it is expected that this can be overcome by generating synthetic data.

The third red part refers to advancements in technology that allow us to extract the necessary information from the raw AI, give precise instructions, and have the AI execute what we want. Even now, research on how to give instructions to AI, such as Chain of Thought, is actively being conducted. In the future, it is expected that AI will function as an agent and further develop, leading to significant performance improvements in AI.

Wow, that sounds amazing.

 

3. What Happens After AGI is Achieved?

Once AGI is achieved, the evolution of AI will move to the next phase. Here, the main players are no longer humans but countless AIs. These AIs are trained for AI development and can work continuously 24/7. Therefore, by taking over AI development from humans, productivity will dramatically increase, and as a result, it is predicted that Super Intelligence, which completely surpasses humans, will be born by 2030. The graph below illustrates this.

It was already challenging to understand the prediction of AGI appearing in 2027, but by this point, it’s honestly beyond imagination to think about what our society will look like. Work, education, taxation, healthcare, and even national security will likely look completely different. We can only hope that AI will be a bright star of hope for all humanity.

 

Let’s conclude with the author’s words. It will be exciting to see if AGI is truly realized by 2027. The original paper is a massive 160 pages, but it’s worth reading. You can access it from the link below, so please give it a try.

 

Again, critically, don’t just imagine an incredibly smart ChatGPT: unhobbling gains should mean that this looks more like a drop-in remote worker, an incredibly smart agent that can reason and plan and error-correct and knows everything about you and your company and can work on a problem indepen-dently for weeks. We are on course for AGI by 2027. These AI systems will basically be able to automate basically all cognitive jobs (think: all jobs that could be done remotely).

 

1) SITUATIONAL AWARENESS The Decade Ahead, Leopold Aschenbrenner, June 2024, situational-awareness.ai 

 

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.


GPT-4V is here. I tried it immediately and was amazed. It can do this too!

Sorry to keep you waiting. OpenAI's GPT-4 now comes with image recognition capabilities. To be precise, it was demonstrated when it debuted in March of this year, but it has only now been made available to users after half a year. I recently tried the new feature in ChatGPT+ and, in a word, it's incredible!

By the way, the image mentioned above was also created with a combination of GPT-4 and DALL-E3.

Now, let's start the experiment!


First, we'll start with recognizing mobile-phones. It can accurately count the number of mobile-phones. This is a piece of cake.

 

I thought flight information would be challenging, but it identified the destination impeccably. Since it's originally an excellent language model, it seems proficient in deriving meaning from images.

 

It can even read Osaka's Tsutenkaku tower. Local information is no problem.

 

For a change, I inserted an image of analysis results. It can read graphs effortlessly. This is impressive!

 

What shocked me was that it could easily count cars. Of course, it's not a specialized object detection model, so errors will always occur. I believe there were about 48 cars in this photo, but for general use, this margin of error seems acceptable. It's astonishing what it can do by just being given an image.

 

It can count cans, but the error is relatively significant. It might struggle with cluttered items.

 

It works well to read English text in an OCR-like manner.

 

It can also easily read the time displayed on electronic signboards.

How did you find it? Without any fine-tuning, it achieved this much. GPT-4V has just been launched, and various use cases are likely to emerge in the future. I look forward to introducing interesting examples here as they arise. Stay tuned!

 

Copyright © 2023 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.