Toshifumi Kuga

June 23, 2025

multi AI agent, AI, prompt, generative ai

How can we achieve best practices for constructing multi-agent AI systems?

Toshifumi Kuga

June 23, 2025

multi AI agent, AI, prompt, generative ai

Lately, I've been hearing a lot about multi-agent AI systems. As someone who is always thinking about not just using these services but building them myself, I've been keen to know how to construct high-performance AI agents. Last week, Anthropic published an article titled, "How we built our multi-agent research system(1)," which describes their construction method in detail. So today, using this article as a reference, I'd like to explore the best practices for creating multi-agent AI systems with all of you. Let's get started!

1. Why do we need so many agents?

ChatGPT, which debuted at the end of November 2022, was a single model. Since then, several services using generative AI have appeared, but initially, most of them used a single AI. So why have we recently seen a rise in methods that connect multiple generative AIs to operate as a single system? I believe it's because it has become clear that there are limits to what a single generative AI can accomplish when faced with complex tasks. It has gradually become apparent that by connecting and integrating several agents, even complex tasks can be handled. This trend has become particularly noticeable in conjunction with the performance improvements of standalone generative AI models like Gemini 1.5 Pro and OpenAI's o3.

2. What kind of agent structure should we build?

The Anthropic article included a wonderful chart that I'd love to reference. The key lies with the "Lead agent" and the "sub-agents" placed beneath it.

Here is Anthropic's explanation: "The multi-agent architecture in action: user queries flow through a lead agent that creates specialized subagents to search for different aspects in parallel" . While the chart shows three sub-agents, it's a matter of course that more may be needed to handle more complex tasks.

3. How do you coordinate many agents?

I've described the move to multi-agent AI as if it's all upside, but it requires numerous AI agents to function as expected. Getting a desired response from a single generative AI can be quite a challenge, so is it even possible to control multiple, simultaneously operating AI agents to meet our expectations? The key seems to lie in the "prompt." In fact, the Anthropic article contains countless, very helpful methods for prompt creation. Here, I'd like to introduce two representative examples. For the rest, I highly recommend reading the original article for yourself.

"Teach the orchestrator how to delegate. In our system, the lead agent decomposes queries into subtasks and describes them to subagents. Each subagent needs an objective, an output format, guidance on the tools and sources to use, and clear task boundaries. Without detailed task descriptions, agents duplicate work, leave gaps, or fail to find necessary information.

"Guide the thinking process. Extended thinking mode, which leads Claude to output additional tokens in a visible thinking process, can serve as a controllable scratchpad. The lead agent uses thinking to plan its approach, assessing which tools fit the task, determining query complexity and subagent count, and defining each subagent’s role.

In a nutshell, I think it comes down to "describing things meticulously." Apparently, simple and short instructions like "Research the semiconductor shortage" did not work well, so it seems necessary to write prompts for multi-agent AI as meticulously as possible. I'm going to work on writing better prompts from now on.

What did you think? It appears that various techniques are necessary to make multi-agent AI systems operate as intended. As the performance of generative AI improves in the future, the required orchestration techniques will also change. I want to continue to stay updated and incorporate the latest cutting-edge technologies. That's all for today. Stay tuned!

Toshi Stats Co., Ltd. provides a wide range of AI-related services. Please see here for more details!

1) , "How we built our multi-agent research system”, Anthropic, June 13, 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

June 16, 2025

prompt, AI agent, startup

The Cutting Edge of Prompt Engineering: A Look at Silicon Valley Startup

Toshifumi Kuga

June 16, 2025

prompt, AI agent, startup

Hello everyone. How often do you find yourselves writing prompts? I imagine more and more of you are writing them daily and conversing with generative AI. So today, we're going to look at the state of cutting-edge prompt engineering, using a case study from a Silicon Valley startup. Let's get started.

1. "Parahelp," a Customer Support AI Startup

There's a startup in Silicon Valley called "Parahelp" that provides AI-powered customer support. Impressively, they have publicly shared some of their internally developed prompt know-how (1). In the hyper-competitive world of AI startups, I want to thank the Parahelp management team for generously sharing their valuable knowledge to help those who come after them. The details are in the link below for you to review, but my key takeaway from their know-how is this: "The time spent writing the prompt itself isn't long, but what's crucial is dedicating time to the continuous process of executing, evaluating, and improving that prompt."

When we write prompts in a chat, we often want an immediate answer and tend to aim for "100% quality on the first try." However, it seems the style in cutting-edge prompt engineering is to meticulously refine a prompt through numerous revisions. For an AI startup to earn its clients' trust, this expertise is essential and may very well be the source of its competitive advantage. I believe "iteration" is the key for prompts as well.

2. Prompts That Look Like a Computer Program

Let's take a look at a portion of the published prompt. This is a prompt for an AI agent to behave as a manager, and even this is only about half of the full version.

Here is my analysis of the prompt above:

Assigning a persona (in this case, the role of a manager)
Describing tasks clearly and specifically
Listing detailed, numbered instructions
Providing important points as context
Defining the output format

I felt it adheres to the fundamental structure of a good prompt. Perhaps because it has been forged in the fierce competition of Silicon Valley, it is written with incredible precision. There's still more to it, so if you're interested, please view it from the link. It's written in even finer detail, and with its heavy use of XML tags, you could almost mistake it for a computer program. Incredible!

3. The Future of Prompt Engineering

I imagine that committing this much time and cost to prompt engineering is a high hurdle for the average business person. After learning the basics of prompt writing, many people struggle with what the next step should be.

One tip is to take a prompt you've written and feed it back to the generative AI with the task, "Please improve this prompt." This is called a "meta-prompt." Of course, the challenges of how to give instructions and how to evaluate the results still remain. At Toshi Stats, we plan to explore meta-prompts further.

So, what did you think? Even the simple term "prompt" has a lot of depth, doesn't it?As generative AI continues to evolve, or as methods for creating multi-AI agents advance, I believe prompt engineering itself will also continue to evolve. It's definitely something to keep an eye on. I plan to provide an update on this topic in the near future.

That's all for today. Stay tuned!

ToshiStats Co., Ltd. offers various AI-related services. Please check them out here!

Prompt design at Parahelp, Parahelp, May 28, 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

June 8, 2025

white collar, generative ai, recursiveself-improvement, Anthropic

What Will White-Collar Jobs Be Like in 2030? What Should We Do Now?

Toshifumi Kuga

June 8, 2025

white collar, generative ai, recursiveself-improvement, Anthropic

As many of you may know, Dario Amodei has issued a warning to people. Roughly speaking, he stated, "The demand for entry-level jobs, such as those performed by new graduates, will be cut in half. This will become a reality within the next one to five years." This is shocking news, and the fact that it came from the CEO of a company actually developing generative AI has made it a global topic of discussion. In this article, I would like to delve deeper into this matter.

1. Dario Amodei's Warning

He is the co-founder and CEO of Anthropic, a U.S. company developing generative AI. He holds a Ph.D. in Physics from Princeton University, and from what I've seen, he strikes me more as a researcher than a business executive. I've been following his statements for the past two years, and I remember them being relatively conservative. I thought they were consistent with his researcher-like nature. However, this time he stated, "We are not keeping up with the pace of AI evolution," and "Unemployment rates will be 10% to 20%" (1), which shocked the world. I don't recall similar warnings from other frontier model development companies like OpenAI or Google DeepMind. This is why his latest statement garnered so much attention.

2. Current Performance of Generative AI

Currently, generative AI indeed possesses sufficient ability to handle entry-level tasks. As I mentioned before, Google Gemma 3, an open-source generative AI, achieved an accuracy of around 80% without any specific tuning for a 6-class classification task of bank customer complaints. Typically, relatively simple tasks like "Which product does this complaint relate to?" are assigned to new employees, and they learn the ropes through these assignments. However, with generative AI's performance reaching this level, management will undoubtedly lose the incentive to assign tasks to new employees at a cost. It's not yet clear whether the impact will be as significant as half of entry-level jobs disappearing, but given that even free generative AI can achieve around 80% accuracy today, a considerable impact is inevitable.

3. So, What Should We Do?

There is a division of opinion among experts regarding when AGI (Artificial General Intelligence), with capabilities equivalent to human experts, will appear. The most common estimate seems to be around 2030, but honestly, it's not clear. If so, we have about five years. In any case, we need to adapt our skills to the advent of AGI. Past computers could not be instructed or managed without a computer language. However, with the emergence of ChatGPT in November 2022, generative AI can now be instructed using natural language—"prompts." However, prompting is not a simple matter. It's an extremely delicate process of finely controlling the behavior of generative AI to precisely fit one's needs. Therefore, it's not uncommon to write prompts exceeding 20 to 30 lines. While I cannot delve into the detailed techniques here, it is certainly a skill that requires logical prompt writing. Even though prompts can be written in English or Japanese, acquiring this skill requires time and individual training. Given that open-source and free generative AIs are rapidly improving in performance, it is imperative for us, as users, to learn "prompting," the method of controlling them, regardless of our position or industry.

What do you think? It's good that Dario Amodei's warning has sparked more active discussion. As I mentioned in my previous blog post, generative AI is on the verge of implementing recursive self-improvement, gaining the ability for computers to improve themselves. The evolution of generative AI will accelerate further in the future. I believe the time has come to thoroughly learn prompting and prepare for the emergence of AGI. Discussions about AI and employment will continue globally. ToshiStats will keep you updated. Stay tuned!

ToshiStats Co., Ltd. offers various AI-related services. Please check them out here!

1) AI company's CEO issues warning about mass unemployment, CNN, May 30, 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

June 1, 2025

recursive self-improvemen, agent, AI, Google DeepMind, AlphaEvolve

Google DeepMind Announces "AlphaEvolve," Hinting at an Intelligence Explosion!

Toshifumi Kuga

June 1, 2025

recursive self-improvemen, agent, AI, Google DeepMind, AlphaEvolve

Google DeepMind has unveiled a new research paper today, introducing "AlphaEvolve" (1), a coding agent that leverages evolutionary computation. It's already garnering significant attention due to its broad applicability and proven successes, such as discovering more efficient methods for matrix calculations in mathematics and improving efficiency in Google's data centers. Let's dive a little deeper into what makes it so remarkable.

LLMs Empowered with Evolutionary Computation

In a nutshell, "AlphaEvolve" can be described as an "agent that leverages LLMs to the fullest to evolve code." To briefly touch upon "evolutionary computation," it's an algorithm that mimics the process of evolution in humans and living organisms to improve systems, replicating genetic crossover and mutation on a computer. Traditionally, the function responsible for this, called an "Operator," had to be set by humans. "AlphaEvolve" automates the creation of Operators with the support of LLMs, enabling more efficient code generation. That sounds incredibly powerful! While evolutionary computation itself isn't new, with practical applications dating back to the 2000s, its combination with LLMs appears to have unlocked new capabilities. The red box in the diagram below indicates where evolutionary computation is applied.

2. Continued Evolution with Meta-Prompts

I'm particularly intrigued by the "prompt_sampler" mentioned above because this is where "meta-prompts" are executed. The paper explains, "Meta prompt evolution: instructions and context suggested by the LLM itself in an additional prompt-generation step, co-evolved in a separate database analogous to the solution programs." It seems that prompts are also evolving! The diagram below also shows that accuracy decreases when meta-prompt evolution is not applied compared to when it is.

This is incredible! With an algorithm like this, I'd certainly want to apply it to my own tasks.

3. Have We Taken a Step Closer to an Intelligence Explosion?

Approximately a year ago, researcher Leopold Aschenbrenner published a paper (2) predicting that computers would surpass human performance by 2030 as a result of an intelligence explosion. The graph below illustrates this projection. This latest "AlphaEvolve" can be seen as having acquired the ability to improve its own performance. This might just be a step closer to an intelligence explosion. It's hard to imagine the outcome of countless AI agents like this, each evolving independently, but it certainly feels like something monumental is on the horizon. After all, computers operate 24 hours a day, 365 days a year, so once they acquire self-improvement capabilities, their pace of evolution is likely to accelerate. He refers to this as "recursive self-improvement" (p47).

What are your thoughts? The idea of AI surpassing humans can be a bit challenging to grasp intuitively, but just thinking about what AI agents might be like around 2027 is incredibly exciting. I'll be sure to provide updates if a sequel to "AlphaEvolve" is released in the future. That's all for now. Stay tuned!

1) AlphaEvolve: A coding agent for scientific and algorithmic discovery Alexander Novikov* , Ngân Vu˜ * , Marvin Eisenberger* , Emilien Dupont* , Po-Sen Huang* , Adam Zsolt Wagner* , Sergey Shirobokov* , Borislav Kozlovskii* , Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli and Matej Balog* Google DeepMind ,16 May, 2025

2) S I T U AT I O N A L AWA R E N E S S The Decade Ahead, Leopold Aschenbrenner, June 2024

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes, the software and the contents.

Toshifumi Kuga

May 13, 2025

LLM, generative ai, ADK, agent

We Built a Customer Complaint Classification Agent with Google's New AI Agent Framework "ADK"

Toshifumi Kuga

May 13, 2025

LLM, generative ai, ADK, agent

On April 9th, Google released a new AI agent framework called "ADK" (Agent Development Kit). It's an excellent framework that incorporates the latest multi-agent technology while also being user-friendly, allowing implementation in about 100 lines of code. At Toshi Stats, we decided to immediately try creating a customer complaint classification agent using ADK.

1. Customer Complaint Classification Task

Banks receive various complaints from customers. We want to classify these complaints based on which financial product they concern. Specifically, this is a 6-class classification task where we choose one from the following six financial products. Random guessing would yield an accuracy below 20%.

2. Implementation with ADK

Now, let's move on to the ADK implementation. We'll defer to the official documentation for file structure and other details, and instead show how to write the AI agent below. The "instruction" part is particularly important; writing this carefully improves accuracy. This is what's known as a "prompt". In this case, we've specifically instructed it to select only one from the six financial products. Other parts are largely unchanged from what's described in tutorials, etc. It has a simple structure, and I believe it's not difficult once you get used to it.

3. Accuracy Verification

We created six classification examples and had the AI agent provide answers. In the first example, I believe it answered "student loan" based on the word "graduation." It's quite smart! Also, in the second example, it's presumed to have answered "mortgage " based on the phrase "prime location." ADK has a built-in UI like the one shown below, which is very convenient for testing immediately after implementation.

The generative AI model used this time, Google's "gemini-2.5-flash-04-17," is highly capable. When tasked with a 6-class classification problem using 100 actual customer complaints received by a bank, it typically achieves an accuracy of over 80%. For simple examples like the ones above, it wouldn't be surprising if it achieved 100% accuracy.

So, what did you think? This was our first time covering ADK, but I feel it will become popular due to its high performance and ease of use. Combined with A2A(2), which was announced by Google around the same time, I believe use cases will continue to increase. We're excited to see what comes next! At Toshi Stats, we will continue to build even more advanced AI agents with ADK. Stay tuned!

1) Agent Development Kit, Google, April 9th, 2025
2) Agent2Agent. Google, April 9th, 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

April 12, 2025

gemma3, generative ai, Google DeepMind, ollama

Running Google's Generative AI 'Gemma 3' on a MacBook Air M4 is Impressive!

Toshifumi Kuga

April 12, 2025

gemma3, generative ai, Google DeepMind, ollama

Gemma 3 (1) has been released by Google. While open-source generative AI seemed to be somewhat lagging behind Chinese competitors, it looks like a model capable of competing has finally arrived. Of course, its performance is excellent, but its efficiency, allowing implementation even on a single GPU, is also a key appeal. So, this time, we got our hands on the latest M4 chip-equipped MacBook Air 13 (10-core GPU, 24GB unified memory, 512GB storage) to actually run it locally and check its accuracy and computation speed. Let's get started right away.

1. Data Used in the Experiment

Customer complaints submitted to US banks are publicly available (2). We prepared 10,000 of these data points and had Gemma 3 predict, "What specific financial product is this complaint about?". Specifically, this is a 6-class classification task, choosing one from the following six financial products. The numbers listed above in the image description are used as the labels.

2. Hardware and Software Used

We prepared the latest model of the MacBook Air 13. To implement Gemma 3 locally, we used Ollama (3). This software is often used for implementing generative AI on PCs; it lacks a UI, but is consequently lightweight and easy to use. Additionally, to enable easy swapping of the generative AI with different models in the future, we built the classification process using LangChain (4). The generative AI model used this time was Gemma3-12B-it, downloaded via Ollama.

3. Confusion Matrix Showing Results

We ran the classification on 10,000 samples. Although the model was used straight out-of-the-box without fine-tuning, it achieved a sufficient accuracy of 0.7558. Despite the considerable sample size, the computation time was about 14 hours, manageable within a day. The latest M4 chip truly is powerful. Looking at the confusion matrix, it seems distinguishing between "Bank account or service" and "Checking or savings account" was challenging.

Conclusion

So, what did you think? While I've tried various generative AIs in the past, this was my first time experimenting with 10,000 samples. The classification accuracy was good, and above all, not having to worry about costs is one of the advantages of running generative AI locally. Also, while the analysis data used this time is public, some tasks involve confidential information that cannot be uploaded to the cloud. In such cases, the analysis method presented here becomes a valid solution. I highly encourage everyone to give it a try. We plan to conduct more experiments using various generative AIs, so please look forward to them. Stay tuned!

1) gemma3 https://blog.google/technology/developers/gemma-3/

2) Consumer Complaint Database https://www.consumerfinance.gov/data-research/consumer-complaints/

3) Ollama https://ollama.com/

4) LangChain https://www.langchain.com/

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

February 4, 2025

DeepSeek-R1, artificial intelligence, generative ai, AGI, GRPO

DeepSeek-R1's Impact and the Future of Generative AI

Toshifumi Kuga

February 4, 2025

DeepSeek-R1, artificial intelligence, generative ai, AGI, GRPO

Hello, DeepSeek-R1, released on January 20th (1), has sparked excitement among AI professionals and investors worldwide. I believe it's had an impact comparable to that of ChatGPT's emergence. Here, I'd like to consider why it has garnered so much global attention.

1. What Was New?

DeepSeek-R1's performance is remarkable. It stands shoulder-to-shoulder with OpenAI's o1 model, a veteran in inference models. Below is a comparison of performance across various benchmarks, where DeepSeek-R1 rivals the o1 model. The fact that a newcomer model suddenly matched OpenAI, the frontrunner in generative AI, is undoubtedly why the world is so astonished.

Performance comparison across various benchmarks

While DeepSeek-R1 appeared suddenly like a comet, there were several technical breakthroughs. Among the most significant is a training method called "GRPO." DeepSeek-R1 uses reinforcement learning to acquire advanced reasoning abilities in mathematics and coding. This is similar to existing generative AI. Reinforcement learning is a powerful training technique that doesn't require so-called "correct answer data," but it's a complex and resource-intensive approach. DeepSeek adopted a method that requires only one model instead of the usual two. This is "GRPO." Here is an overview, with PPO in the upper section representing a common technique used in existing models, and GRPO in the lower section being a new method.

In comparison, GRPO lacks the Value model present in PPO, and has only a Policy model. This means that only one model is needed instead of two. Since the model here refers to a massive generative AI, being able to complete training with only one model has a massive impact on resource saving. The fact that DeepSeek-R1, developed by a Chinese company unable to use the latest GPUs due to US semiconductor export restrictions, achieved such remarkable results might be related to this. For more details on the technical aspects, please refer to the research paper (2). The research paper (3) first introduced GRPO.

2. Why Did It Attract Global Attention?

DeepSeek-R1 was released as an open-weight model, available for anyone to download and use. Additionally, the entire training method, including GRPO, was published in detail in research papers. Until now, most generative AI models, with a few exceptions, could only be accessed via APIs and not downloaded. Furthermore, how they were trained was rarely disclosed, making them black boxes. In this context, the release of DeepSeek-R1, a cutting-edge model, in a usable form for AI researchers worldwide had a profound impact. Even if a model is called amazing, if the inner workings are unknown, neither criticism nor improvement suggestions can be made. With DeepSeek-R1, I feel that the open-source community can, for the first time, participate in the development of the most advanced generative AI models.

3. What Will Become of Generative AI in the Future?

AI developers around the world are already starting to adopt methods like GRPO in the development of state-of-the-art models. DeepSeek-R1 has proven that it's possible without incurring enormous costs. I'm currently focusing on a public project called "Open-R1" (4), which plans to disclose not only the training data but also the code, which was not revealed with DeepSeek-R1, and I believe this is revolutionary.

Of course, it is expected that such projects will start worldwide, and I am looking forward to that. It's exciting!

How was it? The landscape surrounding generative AI has changed in an instant. New generative AI models will continue to be created. It's really hard to take your eyes off of it. I will continue to deliver further news. Stay tuned!

1) DeepSeek-R1 Release, Jan 22, 2025
2) DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, DeepSeek-AI, Jan 22, 2025
3) DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, DeepSeek-AI, Apr 27,2025
4) Open-R1: a fully open reproduction of DeepSeek-R1, Hugging Face, Jan 28, 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

January 28, 2025

agent, artificial intelligence, generative ai, smolagents, data analysis

Marketing AI agents for customer targeting in telemarketing can also be easily implemented using the new library "smolagents." This looks promising!

Toshifumi Kuga

January 28, 2025

agent, artificial intelligence, generative ai, smolagents, data analysis

1. Marketing AI Agent

To efficiently reach potential customers, it's necessary to target customers who are likely to purchase your products or services. Marketing activities directed at customers without needs are often wasteful and unsuccessful. However, identifying which customers to focus on from a large customer list beforehand is a challenging task. To meet the expectation of easily targeting customers without complex analysis, provided you have customer-related data at hand, we have implemented a marketing AI agent this time. Anyone with basic Python knowledge should be able to implement it without much difficulty. The secret to this lies in the latest framework "smolagents" (1), which we introduced previously. Please refer to the official documentation for details.

2. Agent Predicting Potential Customers for Deposit-Taking Telemarketing

Let's actually build an AI agent. The theme is "Predicting potential customers for deposit-taking telemarketing with an AI agent using smolagents." As before, by providing data, we want the AI agent itself to internally code using Python and automatically display "the top 10 customers most likely to be successfully reached by telemarketing."

While the coding method should be referenced from the official documentation, here we will present what kind of prompt to write to make the AI agent predict potential customers for deposit-taking telemarketing. The key point, as before, is to instruct it to "use sklearn's HistGradientBoostingClassifier for data analysis." This is a gradient boosting library, highly regarded for its accuracy and ease of use.

Furthermore, as a question (instruction), we specifically add the instruction to calculate "the purchase probability of the 10 customers most likely to be successful." The input to the AI agent is in the form of "prompt + question."

Then, the AI agent automatically generates Python code like the following. The AI agent does this work instead of a human. And as a result, "the top 10 customers most likely to be successfully marketed to" are presented. Customers with a purchase probability close to 100%! Amazing!

"Top 10 customers most likely to be successfully marketed to"

In this way, the user only needs to instruct "tell me the top 10 customers most likely to be successful," and the AI agent writes the code to calculate the purchase probability for each customer. This method can also be applied to various other things. I'm looking forward to future developments.

3. Future Expectations for Marketing AI Agents

As before, we implemented it with "smolagents" this time as well. It's easy to implement, and although the behavior isn't perfect, it's reasonably stable, so we plan to actively use it in 2025 to develop various AI agents. The code from this time has been published as a notebook (2). Also, the data used this time is relatively simple demo data with over 40,000 samples, but given the opportunity, I would like to try how the AI agent behaves with larger and more complex data. With more data, the possibilities will increase accordingly, so we can expect even more. Please look forward to the next AI agent article. Stay tuned!

1) Introducing smolagents, a simple library to build agents, Aymeric Roucher, Merve Noyan, Thomas Wolf, Hugging Face, Dec 31,2024
2) https://github.com/TOSHISTATS/AI-agent-for-Marketing_20250125/blob/main/AI_agent_for_Marketing_20250125.ipynb

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

January 23, 2025

Hugging Face, smolagents, generative ai, LLM, agent, artificial intelligence

I tried using the new AI agent framework "smolagents". The code is simple and easy to use, and I recommend it for AI agent beginners!

Toshifumi Kuga

January 23, 2025

Hugging Face, smolagents, generative ai, LLM, agent, artificial intelligence

At the end of last year, a new AI agent framework called "smolagents" was released from Hugging Face (1). The code is simple and easy to use, and it even supports multi-agents. This time, I actually created a data analysis AI agent and tried various things. I hope it will be helpful.

1. Features of "smolagents"
The newly released "smolagents" has features that existing frameworks do not have. 1) First, it has a simple structure. You can execute an AI agent by writing 3 to 5 lines of code. It's perfect for those who want to start with AI agents. 2) Also, since it was released by Hugging Face, there are already a huge number of open-source models on the Hub. You can easily call and use them. Of course, it also supports proprietary models such as GPT4o, so you can use it for both open and closed models. 3) Finally, when you execute an agent, python code is generated and acted upon. Therefore, you can use the assets of the vast Python ecosystem, which is very convenient. Especially for those who specialize in data analysis like me, it is a perfect framework because you can use Python libraries such as sklearn.

2. An Agent for Predicting Credit Card Defaults

Now, let's actually build an AI agent. The theme is "AI agent by smolagent predicts credit card defaults". Normally, when building a default prediction model, you would code using machine learning libraries such as sklearn, but this time, I want to give it data and have the AI agent itself code internally using Python and automatically display the default probabilities of the first 10 customers.

For how to write the code, please refer to the official documentation , but here I would like to present what kind of prompts I actually wrote to make the AI agent predict defaults. The point is to specifically instruct it to "use sklearn's HistGradientBoostingClassifier for data analysis". This library is highly evaluated for creating machine learning models with high accuracy and ease of use. This is domain knowledge of data analysis, but by including that knowledge in the prompt, we expect to obtain higher accuracy.

Furthermore, as a question, I will add an instruction to specifically calculate "the default probability of 10 customers". The AI agent is input in the form of "prompt + question".

Then, the AI agent automatically generated the following Python code. Normally, this is what I would write myself, but the AI agent does it for me. And as a result, the default probabilities for 10 people are also shown. Amazing!

In this way, the user only needs to instruct "use sklearn to calculate the default probability", and the AI agent writes the code to calculate the default probability for each customer. And you will be able to make default predictions for each customer. I tried it with default prediction this time, but I think it can be covered to the probability in any business, such as marketing, customer churn and human resources. I'm looking forward to future developments.

3. Impressions after using "smolagents" for the first time

Until now, I used LangGraph to implement AI agents. I liked it because I could make various detailed settings, but it was necessary to code each of state, tool, node, edge, etc., and I felt that the hurdle was high for beginners to start with. After implementing it with "smolagents" this time, I found that if I coded according to the template, it would run by writing a few lines, so anyone could start. Of course, it fully meets the needs of AI developers, so I plan to actively use it in 2025 to develop various AI agents. I have published the code this time in a notebook (2). Please look forward to the next AI agent article. Stay tuned!

(1) Introducing smolagents, a simple library to build agents, Aymeric Roucher, Merve Noyan, Thomas Wolf, Hugging Face, Dec 31,2024
(2) AI-agent-to-predict-default-of-credit-card-with-smolagent_20250121

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

October 19, 2024

AlphaGo, AI, Reinforcement Learning, OpenAI, o1

Paying homage to AlphaGo, we've launched our own AI Go project at ToshiStats!

Toshifumi Kuga

October 19, 2024

AlphaGo, AI, Reinforcement Learning, OpenAI, o1

Reinforcement learning has become a hot topic since the release of OpenAI's o1-preview. Looking back, it was Google DeepMind's AlphaGo, released in March 2016, that truly brought reinforcement learning into the public eye. Go, with its vast search space, was traditionally a formidable challenge for computers. Amateur high-dan levels were roughly the limit at the time. However, AlphaGo, combining reinforcement learning and Monte Carlo Tree Search (MCTS), exceeded expert expectations, becoming the first AI Go player to defeat a top professional. Inspired by this, we've launched our own AI Go project, "ToshiStats-Go project," to research reinforcement learning. We're excited to see what we can achieve.

1. Creating a Go Game Environment

We've decided to build our own Go game environment from scratch. Given the exceptional coding capabilities of o1-preview, we're using it as a coding assistant for this project. We're iteratively developing the code by requesting o1-preview to generate the Go game environment code, executing it in Google Colab, then requesting further refinements based on the results, and repeating the process. Within a few iterations, we were able to establish a basic framework and a functional environment. While we can't perfectly implement a complex game like Go, we've created something akin to "simple-go." This should be sufficient for implementing reinforcement learning and improving its accuracy. Below is an example of o1-preview's explanation of a code modification. As you can see, it's quite detailed.

o1-preview's explanation of code modification

2. Trying a Game of Go

Let's give it a try! The current AI model plays random moves, so it's not very strong. As shown in the example below, a human can win with careful play. While a 9x9 board is available, the calculations can be time-consuming, so we'll stick with a 5x5 board for now. It's enjoyable enough, and if you'd like to try it yourself, please download the Colab notebook from our Github repository (1). A GPU is not required.

3. Perfect Go Rules Are Difficult

Go has some very complex rules. In particular, determining the life and death of stones, especially in the endgame, proved challenging. Implementing "ko" and "seki" also seems difficult. Connecting to an external Go system might solve these issues, but for now, we'll continue with a lightweight environment that completes calculations within the notebook to facilitate reinforcement learning experimentation. We'll strive to make this series engaging and easy to follow, comparing our progress with simpler games like Gomoku or connect five. We appreciate your continued interest.

So, there you have it! We've successfully implemented a Go playing environment in Colab. From here, we'll dive into reinforcement learning and begin training our AI Go player. Stay tuned!

1) ToshiStatsGo-project

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

October 16, 2024

Nobel Prize in Chemistry, AGI, AlphaGo, Google DeepMind, Nobel Prize in Physics]

Reflections on the Future of AI Inspired by the 2024 Nobel Prizes in Physics and Chemistry

Toshifumi Kuga

October 16, 2024

Nobel Prize in Chemistry, AGI, AlphaGo, Google DeepMind, Nobel Prize in Physics]

Last week was truly astonishing. Two prominent figures in AI, Geoffrey Hinton and Demis Hassabis, were awarded the Nobel Prizes in Physics and Chemistry, respectively. To my knowledge, no one had predicted these individuals as Nobel laureates. The world must be equally surprised. I'd like to take this opportunity to reflect on their achievements and speculate on the future of AI.

1.The Nobel Prize in Physics

Let's start with Geoffrey Hinton, a professor at the University of Toronto, who has been researching AI since the 1970s. In 2018, he shared the Turing Award, a prestigious prize for computer scientists, with two other researchers. He's often called the "Godfather of AI." Now 76, he's still actively working. I actually took a massive open online course (MOOC) he offered back in 2013. It was a valuable lecture that led me into the world of AI. Over a decade ago, courses teaching Neural Networks were scarce, so I was fortunate to stumble upon his lectures. Back then, my knowledge was limited to logistic regression models, so much of what he taught seemed incredibly complex and I remember thinking, "This seems amazing, but probably won't be immediately useful." I never imagined he'd win the Nobel Prize in Physics ten years later. Fortunately, his lectures from that time appear to be accessible on the University of Toronto website (1). I highly recommend checking them out. (The Nobel Prize in Physics was awarded jointly to John Hopfield and Geoffrey Hinton.)

2. The Nobel Prize in Chemistry

The Nobel Prize in Chemistry recipient is considerably younger, Demis Hassabis, currently 48. He is a co-founder of one of the world's leading AI companies, Google DeepMind. AlphaFold2 is specifically cited for his award. It's a groundbreaking AI model for predicting the 3D structure of proteins, and is said to have made significant contributions to drug discovery and other fields. He is not only a brilliant AI researcher but also a business leader at Google DeepMind. When presenting to a general audience, he mostly talks about the achievements of Google DeepMind, rather than his personal accomplishments. There's no doubt that the catalyst that propelled this company to the top tier of AI companies was AlphaGo, which appeared about four years before AlphaFold2, in March 2016. The reinforcement learning used in this model is still actively being researched to give large language models true logic and reasoning capabilities. AlphaGo inspired me to seriously study reinforcement learning. I wrote about it on my blog in April 2016. It's a fond memory. (The Nobel Prize in Chemistry was awarded jointly to David Baker, John M. Jumper, and Demis Hassabis.)

3. Scientific and Technological Development and AI

I completely agree that the two individuals discussed here have pioneered new paradigms in AI. However, their being awarded the Nobel Prizes in Physics and Chemistry is a landmark event, demonstrating that AI has transcended its own boundaries and become an indispensable tool for scientific advancement as a whole. Going forward, we need to discuss how to leverage AI and integrate it into all aspects of human intellectual activity. Further development might even lead to the kind of intelligence explosion described by Leopold Aschenbrenner's "SITUATIONAL AWARENESS" that I previously mentioned on my blog, potentially surpassing human intelligence. The implications of these Nobel Prizes are profound.

What are your thoughts? I'm a business person, but I believe the same applies to the business world. With the incredibly rapid pace of AI development, I hope to offer new insights based on a clear understanding of these trends. That's all for today. Stay tuned!

(1) X.post by Geoffrey Hinton, Jan 16, 2019

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software

Toshifumi Kuga

October 6, 2024

AGI, AlphaGo, Reinforcement Learning, Canvas, mcts, OpenAI

The combination of Monte Carlo Tree Search (MCTS) and generative AI could be a real game-changer in the future!

Toshifumi Kuga

October 6, 2024

AGI, AlphaGo, Reinforcement Learning, Canvas, mcts, OpenAI

"Monte Carlo Tree Search," a search technique, gained fame in March 2016 when AlphaGo became the first AI to defeat a top professional Go player. Its effectiveness increases significantly when combined with reinforcement learning, making it a powerful tool. However, implementing it can be quite challenging. With the recent release of ChatGPT canvas (1) on October 3rd, I want to explore implementing Monte Carlo Tree Search in a simple game. Let's begin!

1. AlphaGo and Monte Carlo Tree Search
AlphaGo, which decisively defeated 18-time Go world champion Lee Sedol in March 2016, owed its strength to the combination of reinforcement learning and Monte Carlo Tree Search (MCTS), as discussed previously. A research paper (2) illustrates the performance comparison of various Go AI programs.

Performance comparison of various Go AI programs.

The leftmost "Raw network" doesn't utilize MCTS during inference, resulting in lower performance compared to AlphaGo Zero next to it. This highlights the significant contribution of MCTS. In AlphaGo Zero, MCTS is executed as shown in the diagram below. The action probability 'p' is trained to approach the probability 'π' of the next move selected by MCTS, gradually improving accuracy. For details, please refer to (2).

2.Implementing MCTS in a simple game
Witnessing MCTS's success in AlphaGo makes you want to try it out yourself. The recent release of ChatGPT canvas (1) from OpenAI provides the perfect opportunity. As their message "A new way of working with ChatGPT to write and code" suggests, it offers a new user experience. I promptly asked ChatGPT canvas, "Could you make code of Tic Tac Toe by using python and MCTS?"

Unlike regular ChatGPT, a separate window opens and generates Python code as shown below.

I also wanted an explanation, so I added a prompt to provide it in English. Since the generated code cannot be executed within the canvas, I copied and pasted it into Google Colab to run it.

I was able to enjoy the game as shown below. Fantastic!

The generative AI model GPT-4o, powering ChatGPT canvas, appears to have improved coding abilities, likely due to post-training with data distilled from the recently released, logically robust o1 preview. While I encountered occasional errors, copying and pasting them into a prompt for correction quickly resolved the issues. It felt like a significant upgrade to a full-fledged code assistant. I'm eager to use it more. The generated code can be found at (3).

3.Promising combination of Generative AI and MCTS
Research on incorporating the AlphaGo mechanism into generative AI is actively underway. Versions after AlphaGo Zero, released in 2017, don't require any human input (in this case, game records). This freedom from data constraints makes it a promising technology to address training data scarcity. The combination of reinforcement learning and MCTS offers flexible design possibilities, making it highly intriguing for developers. From the perspective of test-time computing, highlighted by OpenAI's o1-preview, it's a technology worth focusing on. In the next post, I plan to delve deeper into MCTS by examining published research papers. Stay tuned!

What do you think? The concept of MCTS is relatively simple, which broadens its applicability. It works well with ChatGPT canvas, and I'm excited to continue experimenting. Currently, it's available only to paid subscribers, but it's expected to be available to free users upon general release. I'm looking forward to it. That's all for today. Stay tuned!

1)Introducing canvas, OpenAI, Oct 3 2024
2)Mastering the game of Go without human knowledge, David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert , Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis, GoogleDeepMind, Oct 19 2017, VOL 550, NATURE, 355
3)Monte-Carlo-Tree-Search-with-ChatGPT-canvas, Oct 6 2024

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

October 3, 2024

AGI, Chain of Thought, Google DeepMind, OpenAI o1, Reinforcement Learning

Looking at OpenAI's o1-preview, I thought, "Reinforcement learning might become the main character in AI development!"

Toshifumi Kuga

October 3, 2024

AGI, Chain of Thought, Google DeepMind, OpenAI o1, Reinforcement Learning

It's been three weeks since OpenAI's o1 preview unveiled a new paradigm for generative AI. Its accuracy on logical tasks during inference is remarkable. Unfortunately, the mechanism isn't public, but it would be fascinating to know the state of the art in related technologies. Luckily, a helpful research paper (1) has been released by the University of California, Berkeley and Google DeepMind, which I'd like to introduce here and use to speculate on the mechanisms behind o1 preview. Let's begin!

What We Learned from OpenAI's o1 Preview and the Latest Research Papers

According to the OpenAI website (2), we've learned two key things. First, o1 preview leverages reinforcement learning for enhanced performance. Second, it emphasizes "chain of thought" and prioritizes test-time computing. However, this information alone isn't enough for a fruitful technical discussion. Therefore, let's examine recent research papers on natural language processing using reinforcement learning. From several papers, I've selected one related to hierarchical reinforcement learning. This algorithm is reportedly effective for "multi-turn" conversations that extend over multiple exchanges. As you may have experienced, when using ChatGPT or similar models to obtain information, rarely do you get the desired results in a single attempt; often, several interactions with the generative AI are required. In such cases, the number of generated tokens or words steadily increases, creating a challenging situation for efficient training of the generative AI. This new algorithm aims to address this challenge. A possible application is the task of "maximizing customer satisfaction at the end of a multi-turn conversation with a generative AI assistant."

2. Hierarchical Reinforcement Learning

The algorithm presented in this paper (1) is called "hierarchical reinforcement learning" and is characterized by the following hierarchical structure:

The most notable aspect here is the two-tiered structure consisting of the Utterance level and the token level. Separating utterance-level language processing from the processing of individual minimal units of action at the token level is highly effective for efficient training. Typically, generative AI operates on "next token prediction," where it diligently predicts the next best word based on the prompt's instructions. Its accuracy is remarkable, often generating more polished language than I can. However, in "multi-turn" scenarios with continuous utterances, the number of tokens increases, making training more challenging. This is where reinforcement learning at the Utterance level comes into play, with rewards also being considered at this level. For example, a reward scould be devised where "+1" is awarded for successfully retrieving necessary information by searching a website and "0" for failure. This facilitates efficient training. Based on this reward, an action-value function is calculated and used for reinforcement learning at the token level. This reportedly enables significantly more efficient training. For further details, please refer to (1).

3. Flexibility in Reinforcement Learning Design

As we've seen, hierarchical reinforcement learning offers flexibility and a high degree of design freedom. While it's used here to separate utterance-level and token-level analysis, it appears to be employed for other enhancements as well. For example, a research paper (3) from Google DeepMind uses hierarchical reinforcement learning to improve self-correction capabilities:

“Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Existing approaches for training self-correction either require multiple models or rely on a more capable model or other forms of supervision. To this end, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM’s self-correction ability using entirely self-generated data. "

It's exciting to anticipate the various use cases that will likely emerge in the future. For more details, please refer to (3).

What do you think? The acclaim for o1-preview seems to be growing daily. While it's unlikely that the details of its mechanism will be revealed soon, speculating about it from the outside is crucial for understanding AGI. Next time, I'd like to consider the application examples of o1-preview. That's all for today. Stay tuned!

1) ArCHer: Training Language Model Agents via Hierarchical Multi-Turn, Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar, University of California, Berkeley, 1Google DeepMind, Feb 29,2024
2) Introducing OpenAI o1, OpenAI, Sep 12, 2024
3) Training Language Models to Self-Correct via Reinforcement Learning, Aviral Kumar, Vincent Zhuang, Rishabh Agarwal, Yi Su, JD Co-Reyes , Avi Singh , Kate Baumli , Shariq Iqbal , Colton Bishop , Rebecca Roelofs , Lei M Zhang , Kay McKinney , Disha Shrivastava , Cosmin Paduraru , George Tucker , Doina Precup , Feryal Behbahani, Aleksandra Faust, Google DeepMind, Sep 19,2024

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

September 23, 2024

AGI, LLM, OpenAI o1, generative ai, Chain of Thought

OpenAI o1-preview: A Breakthrough in Generative AI, Introducing a Novel Paradigm

Toshifumi Kuga

September 23, 2024

AGI, LLM, OpenAI o1, generative ai, Chain of Thought

Last week, we introduced OpenAI o1(1). Despite still being in preview, it boasts high performance, and as evidenced by the leaderboard below(2), it seems to be widely regarded as having overwhelming performance, especially in mathematics and coding. In this article, we'd like to explore why OpenAI o1 demonstrates higher accuracy compared to existing generative AI models like GPT4.

Scores of various generative AIs in the mathematics field

1. Chain of Thought is Key

The star of the OpenAI o1 model is "Chain of Thought." This refers to "a series of intermediate reasoning steps," which was previously considered an important element of prompts created by users for existing generative AI models. By incorporating "Chain of Thought" into prompts, users enabled generative AI to engage in deeper and broader thinking before providing an answer, thereby improving accuracy. "Chain of Thought" became known to the public through a 2022 research paper(3). Please refer to it for details.

2. OpenAI o1 Can Generate Chain of Thought Independently

OpenAI o1 can generate Chain of Thought internally on its own. Users don't need to devise Chain of Thought themselves; it's generated automatically. This is why it achieved high accuracy in mathematics and coding. Unfortunately, OpenAI seems to have decided not to disclose the Chain of Thought itself. Users can only see a summary of it. If you're like most users, you're probably thinking, "I'd really like to see that!" We hope that OpenAI will change its policy and release it in the future.

3. Creating a Reward Model

OpenAI has released very little information about what we'll discuss from here on. Please note that the following is based on speculation drawn from previously published research papers and information shared by OpenAI researchers. To enable generative AI to automatically generate task-specific Chain of Thought for practical use, we must evaluate whether the generated Chain of Thought is actually correct. This is where the Reward model comes into play. A 2023 research paper(4) from OpenAI provides a detailed explanation of how to train a Reward model, so let's look to it for clues.

The data for training the Reward model takes the form of Chain of Thought, as shown below. This research paper limits the tasks to mathematics. Since it's challenging for humans to manually create Chain of Thought for each task, they are automatically generated using GPT4. This is called the Generator. Humans then label each step of the Chain of Thought generated by the Generator on a three-point scale (correct, incorrect, or neither). This completes the training data. In the example below, you can see that each step is assigned a three-point label. It must have been quite a task for humans to label a large amount of such data.

4. Training Generative AI Through Reinforcement Learning

Once the Reward model is complete, we can train the generative AI using reinforcement learning. As a result, the generative AI can generate the correct Chain of Thought for the task. We, the users, actually run OpenAI o1 and benefit from the generated Chain of Thought. Unfortunately, OpenAI has not disclosed the specific method for training OpenAI o1 using reinforcement learning. Since this directly affects accuracy, it's unlikely to be released in the future. However, researchers worldwide are working on this issue and have published several promising results. As this is a technology that will support the future development of generative AI, we would like to revisit it in a future article.

5. A New Paradigm for Generative AI

OpenAI's website includes the following statement and chart:

“Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them.”

Until now, there has been much discussion about how increasing the computational cost (time and number of parameters) in pre-training improves the accuracy of generative AI, but there hasn't been much in-depth discussion about the relationship between inference computational cost and accuracy. However, it has now become clear that by generating Chain of Thought itself and then providing an answer, generative AI can answer tasks requiring complex logical reasoning with high accuracy, albeit with significantly increased inference computational cost. The chart on the right above shows that accuracy improves as the computational cost at inference time increases. We believe this is a groundbreaking development. Therefore, it will be important to consider both training and inference computational costs for generative AI in the future. This marks the dawn of a new paradigm for generative AI.

OpenAI o1 has not only improved accuracy but has also brought about a new paradigm for the development of generative AI. We look forward to seeing how future generative AIs will follow in its footsteps. That's all for today. Stay tuned!

1) Introducing OpenAI o1, OpenAI, Sep 12, 2024
2) LMSYS Chatbot Arena Leader board
3) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Google Research, Brain Team, Jan 2023
4) Let’s Verify Step by Step, Hunter Lightman , Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe, OpenAI, May 2023

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

September 16, 2024

AGI, LLM, generative ai, o1, OpenAI

OpenAI's "o1-preview" Arrives: Is This the Next Leap Towards Artificial General Intelligence?!

Toshifumi Kuga

September 16, 2024

AGI, LLM, generative ai, o1, OpenAI

On September 12, 2024, OpenAI released its new generative AI model "o1" (pronounced "oh-one"), which had been the subject of much speculation. I had the opportunity to try it out, and here are my initial impressions.

1. Model Overview

As a new generative AI model, o1 has various features, but the key points are as follows:

Specialized for scientific, coding, and mathematical reasoning.
Available in two versions: OpenAI o1 and OpenAI o1-mini.
Currently in preview with limited functionality and performance.
Not a successor to GPT-4.
OpenAI o1 has a limited usage of 30 requests per week.
Price: OpenAI o1 is about six times more expensive than GPT-4o.

For more details, please refer to the official website (1).

Compared to GPT-4o, o1-preview demonstrates superior performance in coding, data analysis, and mathematics, as shown below. It seems likely that o1 will excel in fields where existing generative AI has struggled to achieve satisfactory accuracy. However, because it utilizes Chain of Thought reasoning to arrive at answers, it can take a considerable amount of time to respond, making it unsuitable for tasks requiring real-time answers.

GPT-4o vs. o1-preview: Task Performance Comparison

2. Challenging o1 with Game24

Let's test the capabilities of o1-preview. A common example of a task that generative AI struggles with is Game24.

This is a simple mathematical puzzle with the following rules:

Use the four given numbers and basic arithmetic operations (addition, subtraction, multiplication, division).
Create a mathematical expression that results in 24.
Each of the four given numbers can be used only once.

Example: 13, 10, 9, 4 → (10 - 4) × (13 - 9)

When attempting this with o1-preview, it produced the following result. It successfully solved the puzzle! The response took about 15 seconds, likely due to internal trial-and-error processes.

When trying the same with GPT-4o:

GPT-4o fails to provide a correct answer. This highlights o1's superiority in tasks that require strong logical reasoning.

3. The Impact on the Future of Generative AI

o1's newfound capabilities are attributed to its incorporation of Chain of Thought reasoning, enabling it to generate task-specific chains of thought and produce more reliable correct answers. However, the Chain of Thought process, which demonstrates how the correct answer is derived, is not revealed to the user. This is somewhat disappointing, as users typically want to understand not only the correct answer but also "why" that answer was reached. Therefore, it's understandable that some may perceive it as a black box. We hope that the open-source development community will further research this aspect and share their findings with the world. With excellent open-source generative AI models like Llama and Gemma currently available, we believe that user verification of Chain of Thought will become possible in the near future.

Conclusion

o1-preview seems to have been received with a level of excitement not seen since the release of GPT-4 in March 2023. In the next installment, I plan to explore the technology behind this impressive generative AI, based on external speculation. That's all for today. Stay tuned!

1) Introducing OpenAI o1, OpenAI, Sep 12, 2024

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

August 23, 2024

AlphaGo, Google DeepMind, generative ai, AlphaProof

The Future of Generative AI: Predicting the Next Generation Based on Google DeepMind's Math Olympiad Breakthrough

Toshifumi Kuga

August 23, 2024

AlphaGo, Google DeepMind, generative ai, AlphaProof

Generative AI has a reputation for struggling with math, often making mistakes even with simple elementary-level arithmetic. However, Google DeepMind recently announced that their AI achieved a score equivalent to a silver medal in the International Mathematical Olympiad (IMO)(1). Based on this article, let's delve into predicting the future of next-generation generative AI.

1. How Did AI Solve Complex Math Problems?

The achievement is impressive:

“Today, we present AlphaProof, a new reinforcement-learning based system for formal math reasoning, and AlphaGeometry 2, an improved version of our geometry-solving system. Together, these systems solved four out of six problems from this year’s International Mathematical Olympiad (IMO), achieving the same level as a silver medalist in the competition for the first time.”

This is an amazing score, just shy of a gold medal. We'll focus on AlphaProof, the reasoning system, out of the two models.

AlphaProof is explained as follows:

“AlphaProof is a system that trains itself to prove mathematical statements in the formal language Lean. It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself how to master the games of chess, shogi and Go.”

In simple terms, while there is abundant data available for math problems written in natural language, generative AI tends to make plausible yet incorrect statements (hallucinations), making it difficult to utilize effectively. Therefore, Google utilized its generative AI, Gemini, to translate math problems into the formal language Lean. This formal representation was then fed into AlphaZero, known for its long-term planning and reasoning capabilities, for computation. The chart below provides a clear illustration.

AlphaZero has already proven its reasoning prowess in board games like Go. This achievement demonstrates the successful application of its capabilities to the realm of mathematics. Remarkable!

2. Implications from AlphaZero

Let's briefly revisit AlphaZero, which made a reappearance. It is a groundbreaking AI that combines RL (Reinforcement Learning) and MCTS (Monte Carlo Tree Search). The initial model gained fame in March 2016 as the first AI to defeat a top professional Go player. It's important to emphasize that AlphaZero achieved superhuman ability without relying on human-created data; it trained itself using self-generated data. Upon hearing this for the first time, many might wonder, "How is that even possible?" AlphaZero accomplishes this through self-play, generating massive amounts of training data by playing against itself. Refer to the research paper(2) for more details. For context, consider AlphaGo as the initial version of AlphaZero.

3. The Fusion of Current Generative AI and AlphaGo

Interestingly, Demis Hassabis, CEO of Google DeepMind, recently hinted at the future of their generative AI(3). The key takeaways are:

“Gemini” is a natively multimodal model.
It can understand various aspects of the world, including language, images, videos, and audio.
Current models are incapable of long-term planning and problem-solving.
DeepMind possesses expertise in this field through AlphaGo.
The next-generation model will be an agent that fuses Gemini and AlphaGo.

It's plausible to view the project that secured a silver medal in the Math Olympiad as a step towards overcoming the limitations of generative AI in "long-term planning." However, one might question, "How exactly will this fusion work?" A prominent long-form paper (4) in June of this year provides clues.

A look back at AlphaGo—the first AI system that beat the world champions at Go, decades before it was thought possible—is useful here

• In step 1, AlphaGo was trained by imitation learning on expert human Go games. This gave it a foundation.

• In step 2, AlphaGo played millions of games against itself. This let it become superhuman at Go:

remember the famous move 37 in the game against Lee Sedol, an extremely unusual but brilliant move a human would never have played. Developing the equivalent of step 2 for LLMs is a key research problem for overcoming the data wall (and, moreover, will ultimately be the key to surpassing human-level intelligence).

AlphaGo eventually transitioned to self-play, generating its own training data and eliminating the need for human input. This is a remarkable achievement achieved through the combination of "Reinforcement Learning and MCTS." The future of next-generation AI hinges on how generative AI can be trained using this mechanism.

Conclusion:

The ability to execute long-term plans opens up a plethora of possibilities. Imagine AI formulating long-term investment strategies or serving as legal advisors in court, excelling in tasks that demand prolonged reasoning and debate. The world is undoubtedly on the verge of transformation, and the future is incredibly exciting.

That's all for today. Stay tuned!

1) AI achieves silver-medal standard solving International Mathematical Olympiad problems, Google DeepMind, 25 JULY 2024
2)Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Google DeepMind, 5 DEC 2017
3)Unreasonably Effective AI with Demis Hassabis, Google DeepMind, 14 AUG 2024 (around 18:00)
4) SITUATIONAL AWARENESS　p28, The Decade Ahead, Leopold　Aschenbrenner, June 2024　

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

August 13, 2024

artificial intelligence, gemma2, generative ai, Google DeepMind, LLM

Gemma2-2B: A Small Yet Powerful Generative AI - A Hands-On Review

Toshifumi Kuga

August 13, 2024

artificial intelligence, gemma2, generative ai, Google DeepMind, LLM

Today, we'll be diving into Google DeepMind's recently announced compact generative AI model, "Gemma2-2B" (1), and running a simple demo. Gemma is an open-source library. While medium-sized models with 70B and 9B parameters are already available, this latest release boasts a significantly smaller 2B parameter model. It promises remarkable performance despite its size, generating considerable excitement. Let's take a closer look.

1. Remarkable Fundamental Performance

Despite its compact size, the Gemma model exhibits impressive performance, as detailed below. Surpassing GPT3.5 is a feat unimaginable just a year ago. The rapid advancements in open-source models continue to amaze.

Google's website describes it as follows (1):

""This lightweight model produces outsized results by learning from larger models through distillation. In fact, Gemma 2 2B surpasses all GPT-3.5 models on the Chatbot Arena, demonstrating its exceptional conversational AI abilities.

The "distillation" technique mentioned here is key to enhancing the performance of smaller models. It's employed not only in Gemma but also in Llama3 and various other small models, making it a concept worth remembering. With the performance of a 2B parameter model reaching such heights, it's tempting to explore its capabilities. Let's move on to the demo.

2. Performance Check with a News Article Classification Task

For this demo, we'll tackle the task of classifying Japanese articles from the publicly available Livedoor-news dataset (2) into five genres. We'll fine-tune the Gemma2-2B model and evaluate its classification accuracy. Since we're using Japanese articles, this will also assess its multilingual capabilities. Let's get started!

The following article is an example from the validation data. The model's task is to identify this article as belonging to the sports category.

Specifically, each article is categorized into one of the following categories. The goal of today's demo is to improve the accuracy of this classification.

'kaden-channel' (Electronics)
'topic-news' (General News)
'sports-watch' (Sports)
'it-life-hack' (IT/Life Hacks)
'movie-enter' (Movies/Entertainment)

We prepared 100 samples for training data and 1000 samples for validation data. We'll apply fine-tuning using the impressive quantization tool Unsloth, and the data will be in the Alpaca format. For details, please refer to this link (3).

Without extensive tuning, we achieved an accuracy of 81.5%, as shown below. Considering the small training dataset of only 100 samples, this is an impressive result. With further optimization, the accuracy could likely be improved. It's hard to believe this performance comes from a model with only 2B parameters. Its ability to handle Japanese text is also commendable. The notebook used for the demo can be found here.

3. Limitless Potential Applications

With such high performance in a small model, the possibility of implementation on devices like smartphones, previously deemed impractical, becomes a reality. It also opens doors for applications where cost and computational speed were prohibitive. It seems particularly well-suited for customer service applications requiring real-time responses. Additionally, it could be deployed in developing countries where the cost of using frontier models like GPT4 has been a barrier. The future possibilities are truly exciting.

So, what did you think? The Gemma2-2B model can run on Google Colab's free T4 GPU, making it a valuable asset for startups like ours. It's truly remarkable. The small yet powerful Gemma2-2B model is poised for widespread adoption. At ToshiStats, we're committed to developing tuning techniques to maximize the benefits of open-source libraries. We'll be sharing more on this blog in the future. That's all for today. Stay tuned!

(1) Smaller, Safer, More Transparent: Advancing Responsible AI with Gemma
(2) Livedoor-news Japanese articles
(3) Alpaca + Gemma2-2B full example.ipynb

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

July 14, 2024

AI, artificial intelligence, generative ai, in context learning, prompt

Google DeepMind's new prompt engineering technique, "Many-Shot In-Context Learning," is amazing!

Toshifumi Kuga

July 14, 2024

AI, artificial intelligence, generative ai, in context learning, prompt

I recently came across an interesting research paper, "Many-Shot In-Context Learning" (1), by Google DeepMind, and I'd like to share a brief overview. Although it's a highly technical paper, it offers valuable insights that we can apply to our own prompt writing. Let's dive in.

1. Utilizing Context Effectively

When you write prompts for language models or generative AI like ChatGPT, you probably input the information you want, like a search engine, such as "What is the capital of Japan?" However, generative AI can handle much larger amounts of information. For example, as shown in the chart below, you can load a PDF document and then write a prompt like "Summarize this," and the AI will output a summary of the PDF's content. Think of a prompt as an "instruction to the generative AI." The additional information you provide is called the context.

2. What's Needed to Use Generative AI in a Business Setting

Now that we have a basic understanding of how to use generative AI, let's consider what's needed to use it in a company or business setting. Obviously, when you represent your company and interact with customers, you wouldn't express "personal opinions or feelings." You wouldn't say, "I personally don't think this new product will sell." Specifically, companies have established rules and manuals that employees must follow. Normally, employees cannot violate these rules. Therefore, to use generative AI in a company, it must output answers that comply with each company's "rules and manuals," not just general answers. So, how do you convey these rules to the generative AI? One way is to input the "rules and manuals" directly into the generative AI along with the prompt, as shown in the chart above. Many recent generative AIs have "context windows" of 100,000 tokens or more. This represents the amount of information that can be input and output at once, and 100,000 tokens is about 70,000 words in English. You can input a considerable amount of "rules and manuals." Some models, like Google's Gemini 1.5 Pro, can input up to 2 million tokens, which is enough for about 3,000 pages of English manuals. That's amazing. These context windows are sometimes called "long context windows."

3. Many-Shot In-Context Learning

"Many-Shot In-Context Learning" is a technique that utilizes these "long context windows" even more effectively. You may have heard of a similar term, "Few-Shot Learning." "Few-Shot Learning" is a method where you first provide the generative AI with a few "question and answer pairs" as examples and then ask the question you want to know. For instance, you might give examples like "The capital of the United States is Washington, D.C." and "The capital of China is Beijing," and then ask the AI, "What is the capital of Japan?" "Many-Shot In-Context Learning" increases the number of these "question and answer pairs" to 10-10,000. This is said to improve accuracy. The graph below shows that in machine translation and summarization tasks, increasing the number of examples to 500-1,000 improves accuracy. 2 to the power of 10 is 1024. The idea is to put as many examples as possible into the "long context window" since it can easily handle them.

The relationship between accuracy and the number of examples in machine translation and summarization.

What do you think? If simply increasing the number of examples improves accuracy, it might be worth trying. For those who say, "I can't create so many examples myself," "Many-Shot In-Context Learning" also suggests a method to create synthetic data using an LLM (language model). If you're interested, please check out the paper. But if it's just about 10 examples, you could probably create them yourself. I'll give it a try and update here if I get good results. That's all for today. Stay tuned!

1) "Many-Shot In-Context Learning", Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle, Google DeepMind, 22 May 2024, https://arxiv.org/abs/2404.11018

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

July 1, 2024

gemma2, artificial intelligence, Google DeepMind, Llama3, LLM

Google introduces new open-weight generative AI "Gemma2". The competition with Llama3 has finally begun!

Toshifumi Kuga

July 1, 2024

gemma2, artificial intelligence, Google DeepMind, Llama3, LLM

Google has finally introduced a new type of open-weight generative AI, "Gemma2" (1). Although it had been previously announced, it came out sooner than expected. As shown below, the 27B model boasts an impressive 12th place on the leaderboard, closely rivaling larger models. A technical report (2) is also available, so let's take a look at what kind of evolution has occurred.

1. Model Architecture

Gemma2 adopts the familiar decoder-only transformer architecture. It's the same as GPT4. The context window, which indicates the amount of information that can be input and output at once, is 8192 tokens. The model structure is largely the same as Gemma1, but according to the technical report, the following points have been updated:

“We alternate between a local sliding window attention (Beltagy et al., 2020) and global attention (Luong et al., 2015) in every other layer. The sliding window size of local attention layers is set to 4096 tokens, while the span of the global attention layers is set to 8192 tokens.”

Comparison of full self-attention pattern and other attention patterns (4)

2. Pre-training

Gemma2's training data is as follows:

27B model: 13 trillion tokens, primarily English data
9B model: 8 trillion tokens
2.6B model: 2 trillion tokens

"These tokens come from a variety of data sources, including web documents, code, and science articles. Our models are not multimodal and are not trained for state-of-the-art multilingual capabilitiesthe.”

“same tokenizer as Gemma 1 and Gemini: a SentencePiece tokenizer with split digits, preserved whitespace, and byte-level encodings. The resulting vocabulary has 256k entries."

Knowledge distillation was also adopted for the 9B and 2.6B models. In my opinion, this might be the most evolved point of Gemma2. It's a Google-specific strategy to leverage the advantages of their existing large-scale generative AI to improve the performance of smaller models. The technical report explains in detail: "Given a large model used as a teacher, we learn smaller 9B and 2.6B models by distilling from the probability given by the teacher of each token 𝑥 given its context 𝑥𝑐, i.e., 𝑃𝑇(𝑥 | 𝑥𝑐). More precisely, we minimize the negative log-likelihood between the probabilities from the teacher and the student.

where 𝑃𝑆 is the parameterized probability of the student. In practice, we run inference on the teacher once and store the probabilities. Since the vocabulary has 256k entries, we only store a sampled subset of the teacher probabilities."

3. Post-training

This part uses techniques commonly seen in other generative AIs. According to the technical report, it is implemented in the following process:

“For post-training, we fine-tune our pre-trained models into instruction-tuned models. First, we apply supervised fine-tuning (SFT) on a mix of text-only, English-only synthetic and humangenerated prompt-response pairs. We then apply RLHF on top of these models with the reward model trained on labelled English-only preference data and the policy based on the same prompts as the SFT phase. Finally, we average the models obtained after each phase to improve their overall performance.“

It's noteworthy that knowledge distillation is adopted again. "We run behavioral cloning on synthetic and real prompts, and responses predominantly synthetically generated by the teacher, that is a larger model. We also run distillation from the teacher on the student’s distribution." In the future, knowledge distillation from large models to small models may become common practice. It's exciting to see.

What do you think? Gemma2 seems to be a model with high potential even in small sizes, and it's promising. The 2.6B model is also expected to be released soon. By the way, Google, which created Gemma2, and Meta, which created Llama3 that we covered last time, have been rivals in the open-source world for more than 8 years with "Tensorflow vs PyTorch". It seems that a similar battle has begun in generative AI as well. Next time, I'd like to try various things with the Gemma2 model. Stay tuned!

1) Gemma 2 is now available to researchers and developers, Google, 27 June 2024
2) Gemma 2 technical paper, Google DeepMind, 27 June 2024
3) Effective Approaches to Attention-based Neural Machine Translation, Minh-Thang Luong Hieu Pham Christopher D. Manning Computer Science Department, Stanford University, 20 Sep 2015
4) Longformer: The Long-Document Transformer, Iz Beltagy, Matthew E. Peters, Arman Cohan, Allen Institute for Artificial Intelligence, 2 Dec 2020
5) On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes, Rishabh Agarwal12, Nino Vieillard1, Yongchao Zhou13, Piotr Stanczyk1, Sabela Ramos1, Matthieu Geist1, Olivier Bachem1, 1Google DeepMind, 2Mila, 3University of Toronto, 17 Jan 2024

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

June 17, 2024

AGI, artificial intelligence, SITUATIONAL AWARENESS, 2027

Can you believe it? A prediction that "AGI will appear before us in 2027" has been announced by a former OpenAI researcher!

Toshifumi Kuga

June 17, 2024

AGI, artificial intelligence, SITUATIONAL AWARENESS, 2027

A surprising prediction has been announced. Leopold Aschenbrenner, a former researcher at OpenAI, claims that AGI, which matches the capabilities of human experts in various fields, will emerge in 2027, just four years from now. It’s hard to believe, but let’s take a look at his argument.

1.From the Past to the Present, and into the Future

To predict the future, it is important to understand how generative AI has developed from the past to the present. Here, the author introduces the concept of OOM (Orders of Magnitude). Simply put, if you create a graph where each unit increase represents a tenfold increase, the trajectory becomes a straight line. OOM=4 means 10,000 times. The vertical axis of this graph represents the computational power (physical computation and algorithmic efficiency) and is displayed in OOM. The current pinnacle of AI, GPT-4, is used as the benchmark.

GPT-2 appeared in 2019, and four years later, in 2023, GPT-4 was introduced. The performance improvement during this period is roughly OOM=5 (100,000 times). If GPT-2 is like a preschooler, then GPT-4 is like a smart high schooler. Now, if we extend this straight line from 2023 to 2027, we can predict that an AI with OOM=5 (100,000 times) higher performance than GPT-4 will be born. This level of AI is expected to achieve AGI. If it is reasonable to connect the points with a straight line, then this prediction is not entirely far-fetched.

2. From Which Fields Will AI Growth Emerge?

We have explained that AI will significantly improve its performance by 2027, but what technological innovations will make this possible? The author points to the following three drivers. This graph also has a vertical axis in OOM, so a one-unit increase means tenfold.

First, the blue part represents improvements in computational resource efficiency. This is achieved through the development of new GPUs and the construction of ultra-large GPU clusters.

The second green part is due to improvements in training and inference algorithms and training data. There are concerns that training data may become scarce in the near future, but it is expected that this can be overcome by generating synthetic data.

The third red part refers to advancements in technology that allow us to extract the necessary information from the raw AI, give precise instructions, and have the AI execute what we want. Even now, research on how to give instructions to AI, such as Chain of Thought, is actively being conducted. In the future, it is expected that AI will function as an agent and further develop, leading to significant performance improvements in AI.

Wow, that sounds amazing.

3. What Happens After AGI is Achieved?

Once AGI is achieved, the evolution of AI will move to the next phase. Here, the main players are no longer humans but countless AIs. These AIs are trained for AI development and can work continuously 24/7. Therefore, by taking over AI development from humans, productivity will dramatically increase, and as a result, it is predicted that Super Intelligence, which completely surpasses humans, will be born by 2030. The graph below illustrates this.

It was already challenging to understand the prediction of AGI appearing in 2027, but by this point, it’s honestly beyond imagination to think about what our society will look like. Work, education, taxation, healthcare, and even national security will likely look completely different. We can only hope that AI will be a bright star of hope for all humanity.

Let’s conclude with the author’s words. It will be exciting to see if AGI is truly realized by 2027. The original paper is a massive 160 pages, but it’s worth reading. You can access it from the link below, so please give it a try.

Again, critically, don’t just imagine an incredibly smart ChatGPT: unhobbling gains should mean that this looks more like a drop-in remote worker, an incredibly smart agent that can reason and plan and error-correct and knows everything about you and your company and can work on a problem indepen-dently for weeks. We are on course for AGI by 2027. These AI systems will basically be able to automate basically all cognitive jobs (think: all jobs that could be done remotely).

1) SITUATIONAL AWARENESS　The Decade Ahead, Leopold Aschenbrenner, June 2024, situational-awareness.ai　

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.