AGI in 2 Years or 5 Years? — Survival Strategies for 2030

In January 2026, several interviews with CEOs of top AI labs were released. One particularly fascinating encounter was the face-to-face interview (1) between Anthropic CEO Dario Amodei and Google DeepMind CEO Demis Hassabis. I have summarized my thoughts on what their comments imply. I hope you find this insightful!

 

1. Will AGI Arrive Within 2 Years?

Dario seems to hold a more accelerated timeline for the realization of AGI. While prefixing his thoughts with "It is difficult to predict exactly when it will happen," he pointed to the reality within his own company: "There are already engineers at Anthropic who say they no longer write code themselves. In the next 6 to 12 months, AI might handle the majority of code development. I feel that loop is closing rapidly." He argued that AI development is hitting a flywheel effect, particularly noting that progress in coding and research is so remarkable that AI intelligence will surpass public expectations within a few short years.

A prime example is Claude Code, released by Anthropic last year. This revolutionary product is currently taking the software development world by storm. It is no exaggeration to say that the common refrain "I don’t code manually anymore" is a direct result of this tool. In fact, I recently used it to tackle a past Kaggle competition; I achieved an AUC of 0.79 with zero manual coding, which absolutely stunned me (3).

 

2. AGI is Still 5 Years Away

On the other hand, Demis maintains his characteristically cautious stance. He often remarks that there is a "50% chance of achieving AGI in five years." His reasoning is grounded in the current limitations of AI: "Today’s AI isn't yet consistently superior to humans across all fields. A model might show incredible performance in one area but make elementary mistakes in another. This inconsistency means we haven't reached AGI yet." He believes two or three more major breakthroughs are required, which explains his longer timeline compared to Dario.

Unlike Anthropic, which is heavily optimized for coding and language, Google is focusing on a broader spectrum. One such focus is World Models—simulations of the physical spaces we inhabit. In these models, physics like gravity are reproduced, allowing the AI to better understand the "real" world. Genie 3 (2) is their latest version in this category. While it has only been released in the US so far, I am eagerly anticipating its global rollout. The "breakthroughs" Demis mentions likely lie at the end of this developmental path.

 

3. Are We Prepared for AGI?

While their timelines differ, Dario and Demis agree on one fundamental point: AGI—which will surpass human capabilities in every field—is not far off. Exactly ten years ago, in March 2016, DeepMind’s AlphaGo defeated the world’s top Go professional. Since then, no human has been able to beat AI in the game of Go. Soon, we may reach a point where humans can no longer outperform AI in any field. What we are seeing in the world of coding today is the precursor to that shift.

It is a world that is difficult to visualize. Industrial structures will be upended, and the very role of "human work" will change. It is hard to say that we are currently prepared for this reality. In 2026, we must begin a serious global dialogue on how to adapt. I look forward to engaging in these discussions with people around the world.

I highly recommend watching the full interview with Dario and Demis. These two individuals hold the keys to our collective future. That’s all for today. Stay tuned!

 

1) The Day After AGI | World Economic Forum Annual Meeting 2026, World Economic Forum,  Jan 21, 2026
2) Genie 3, Google DeepMind, Jan 29, 2026
3) Is agentic coding viable for Kaggle competitions?, January 16, 2026



You can enjoy our video news ToshiStats-AI from this link, too!

Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Is agentic coding viable for Kaggle competitions?

The "Agentic Coding" trend continues to accelerate as we enter 2026. In this post, I will challenge myself to see how high I can push accuracy by delegating the coding process to an AI agent, using data from the Kaggle competition Home Credit Default Risk [1]. Let's get started right away.

 

1. Combining Claude Code and Opus 4.5

I will be using Opus 4.5, a generative AI renowned for its coding capabilities. Additionally, I will use Claude Code as my coding assistant, as shown below. While I enter instructions into the prompt box, I do not write any Python code myself.

You can see the words "plan mode" at the bottom of the screen. In this mode, Claude Code formulates an implementation plan based on my instructions. I simply review it, and if everything looks good, I authorize the execution.

Let's look at the actual instructions I issued. It is quite long for a "prompt," spanning about two A4 pages. The beginning of the implementation instructions is shown below. I wrote it in great detail. I'd like you to pay special attention to the final instruction regarding the creation of 50 new features using ratio calculations.

              Part of the Product Requirement Document

Below is a portion of the implementation plan formulated by the AI agent. It details the method for creating new features via ratio calculations. Although I only specified the quantity of features, the plan shows that it selected features likely to be relevant to loan defaults before calculating the ratios.

The AI agent utilized its own domain knowledge to make these selections; they were certainly not chosen at random. This demonstrates the high-level judgment capabilities unique to AI agents.

              New feature creation plan by the AI Agent

            Part of the new features actually created by the AI Agent

 

2. Achieving an AUC of 0.79

By adopting LightGBM as the machine learning library, using the newly created features, and performing hyperparameter tuning, I was able to achieve an AUC of 0.79063, as shown below.

Reaching this level without writing a single line of Python code myself marks this experiment as a success. The data used to build the machine learning model consisted of seven different CSV files. These had to be merged correctly, and the AI agent handled this task seamlessly. Truly impressive!

                 Evaluation results on Kaggle

 

3. Will AI Agents Handle Future Machine Learning Model Development?

While the computation time depends on the number of features created, it generally took between 1 to 4 hours. I ran the process several times, and the calculation never stopped due to syntax errors. The AI agent likely corrected any errors itself before proceeding to the next calculation step.

Therefore, once the initial implementation plan is approved, the results are generated without any further human intervention. This could be revolutionary. You simply input what you want to achieve via a PRD (Product Requirement Document), the AI agent creates an implementation plan, and once you approve it, you just wait for the results. The potential for multiplying productivity several times over is certainly there.

 

How was it? I was personally astonished by the high potential of the "Claude Code and Opus 4.5" combination. With a little ingenuity, it seems capable of even more.

This story is just beginning. Opus 4.5 will likely be upgraded to Opus 5 within the year. I am already looking forward to seeing what AI agents will be capable of then.

That’s all for today. Stay tuned!




1) Home Credit Default Risk, kaggle



You can enjoy our video news ToshiStats-AI from this link, too!



Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

"ClaudeCode + Opus 4.5" Arrives as the 2026 Game Changer !

2026 has officially begun! The AI community is already abuzz with talk of "agentic coding" using ClaudeCode + Opus 4.5. I decided to build an actual application myself to test the potential of this combination. Let’s dive in.

 

1. ClaudeCode + Opus 4.5

These are the coding assistant and frontier model from Anthropic, respectively, both renowned for their strength in coding tasks. I imagine many will use them integrated into an IDE like VS Code, as shown below. You can see the selected model is Opus 4.5. Also, notice the "plan mode" indicator at the bottom.

                   ClaudeCode

Here, a data scientist inputs a prompt detailing exactly what they want to develop. The system then enters "plan mode" and generates an implementation plan like the following. The actual output is quite long, but here is the summary:

                   Implementation Plan

The goal this time is to create an application that combines machine learning and Generative AI, as described above. Once you agree to this implementation plan, the actual coding begins.

 

2. Completion of the AI App with GUI

In this completed app, you can input customer data via the screen below to calculate the probability of default, which can then be used to assess loan eligibility.

The first customer shows low risk, so a loan appears feasible.

                    Input Screen

                   Default Probability 1

‍                 ‍Default Probability 2

For the second customer, as highlighted in the red frame, the payment status shows a 2-month delay. The probability of default skyrockets to 65.54%. This is a no-go for a loan.

 

3. Validating Model Accuracy on a Separate Screen

This screen displays the metrics for the constructed prediction model, allowing you to gauge its accuracy. While figures like AUC are bread and butter for experts, they might be a bit difficult for general business users to grasp.

To address this, I decided to include natural language explanations. By leveraging Generative AI, implementing multilingual support is relatively straightforward.

Switching the setting changes the text from English to Japanese. Of course, support for other languages could be added with further development.

While I used Opus 4.5 during the development phase, this application uses an open-source Generative AI model internally. This allows it to function completely disconnected from the internet—making it ideal even for enterprises with strict security requirements.

 

So, what are your thoughts?

An application with this rich feature set and a high-precision machine learning model was completed entirely with no-code. I didn't write a single line of code this time.

Opus 4.5 was truly impressive; the process never stalled due to syntax errors or similar issues. I can genuinely feel that the accuracy is on a completely different level compared to just six months ago. moving forward, it seems likely that "agentic coding" will become the standard starting point for creating new machine learning models and GenAI apps. It feels like PoC-level projects could now be knocked out in a matter of days.

I’m looking forward to building many more things. That’s all for today.

Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

What Awaits Us in 2026? Bold Predictions for AI Agents & Machine Learning

Happy New Year!

As we finally step into 2026, I am sure many of you are keenly interested in how AI agents will develop this year. Therefore, I would like to make some bold predictions by raising three key points, while also considering their connection to machine learning. Let's get started.

 

1. A Dramatic Leap in Multimodal Performance

I believe the high precision of the image generation AI "Nano Banana Pro (1)," released by Google on November 20, 2025, likely stunned not just AI researchers but the general public as well. Its ability to thoroughly grasp the meaning of a prompt and faithfully reproduce it in an image is magnificent, possessing a capability that could be described as "Text-to-Infographics."

Furthermore, its multilingual capabilities have improved significantly, allowing it to perfectly generate Japanese neon signs like this: "明けましておめでとう 2026" (Happy New Year 2026)

"明けましておめでとう 2026" (Happy New Year 2026)

This model is not a simple image generation AI; it is built on top of the Gemini 3 Pro frontier model with added image generation capabilities. That is why the AI can deeply understand the user's prompt and generate images that align with their intent. Google also possesses AI models like Genie 3(2) that perform simulations using video, leading the industry with multimodal models. We certainly cannot take our eyes off their movements in 2026.

 

2. The Explosive Popularity of "Agentic Coding"

Currently, coding by AI agents—"Agentic Coding"—has become a massive global movement. However, for complex code, it is not yet 100% perfect, and human review is still necessary. Additionally, humans still need to create the Product Requirement Document (PRD), which serves as the blueprint for implementation.

I have built several default prediction models used in the financial industry, and I always feel that development is more efficient when the human side first creates a precise PRD. By doing so, we can largely entrust the actual coding to the AI agent. This is an example of default prediction model.

However, the speed of evolution for frontier models is tremendous. In the latter half of 2026, we expect updates like Gemini 4, GPT-6, and Claude 5, and frankly, it is difficult to even imagine what capabilities AI agents will acquire as a result.

Alongside the progress of these models, the toolsets known as "code assistants" are also likely to significantly improve their capabilities. Tools like Claude Code, Gemini CLI, Cursor, and Codex have become indispensable for programmers today, but in 2026, these code assistants will likely play an active role in fields closer to business, such as machine learning and economic analysis.

At this point, calling them "code assistants" might be off the mark; a broader name like "Thinking Machine for Business" might be more appropriate. The day when those who don't know how to code can master these tools may be close at hand. It is very exciting.

 

3. AI Agents and Governance

As mentioned above, it is predicted that in 2026, AI agents will increasingly permeate large organizations such as corporations and governments. However, there is one thing we must be careful about here.

The behavior of AI agents changes probabilistically. This means that different outputs can be produced for the same input, which is vastly different from current systems. Furthermore, if an AI agent possesses the ability for Recursive Self-Improvement (updating and improving itself), it means the AI agent will change over time and in response to environmental changes. In 2026, we must begin discussions on governance: how do we structure organizational processes and achieve our goals using AI agents that possess characteristics unlike any previous system? This is a very difficult theme, but I believe it is unavoidable if humanity is to securely capture the benefits and gains from AI agents. I previously established corporate governance structures in the financial industry, and I hope to contribute even a little based on that experience.

 

What did you think? It looks like AI evolution will accelerate even further in 2026. I hope we can all enjoy it together. I look forward to another great year with you all.

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Introducing Nano Banana Pro, Google, Nov 20, 2025
2) Genie 3: A new frontier for world models, Jack Parker-Holder and Shlomi Fruchter, Google DeepMind, August 5, 2025

Copyright © 2026 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Gemini 3 Flash: The Multi-modal Powerhouse Dominating the 2026 AI Scene!

Gemini 3 Flash (1) — likely the final major AI model debut of 2025 — is currently making waves. Despite being positioned as an affordable, mid-tier model, its performance is reportedly on par with flagship models. Today, I want to put Gemini 3 Flash to the test and see just how much its multimodal capabilities have evolved. Let’s dive right in.

 

1. App Development

To conduct our experiments, I wanted to create a simple application using Google AI Studio. By simply entering a prompt into the interface, the app was ready in an instant. No Python was used at all. This level of accessibility means even non-engineers can build functional apps now. Things have truly become incredibly convenient.

 

2. Object Counting

First, I challenged the model with a task that has historically been difficult for AI: counting objects. I asked the AI to count the number of cans and cars in an image. I counted them myself as well, and the AI’s response was spot on. At this level of accuracy, we might no longer need specialized object detection models for general tasks.

 

3. Economic Analysis from Charts

Next, let’s try a task that requires a higher level of intelligence: interpreting economic indicators from charts and generating an analytical report. Japan has entered a super-aging society faster than any other developed nation, and the labor force is steadily declining. For this test, I provided charts for the labor force population, unemployment rate, and Manufacturing Sector hourly wages. I then instructed the AI to read these charts, synthesize the data, and produce a comprehensive analysis.

labor force population

unemployment rate

                Manufacturing Sector hourly wages

In 30 seconds, the economic report was generated. Below is an excerpt. I was genuinely impressed by the depth of analysis derived from just three charts. Gemini 3 Flash is truly formidable!

 

Conclusion

What do you think? Gemini 3 Flash is a fantastic value, being significantly cheaper than rival flagship models. Given that its multimodal performance is top-tier, I believe this will become the "go-to" model for many users. For AI startups like ours, having a model that allows for extensive experimentation with high token volumes without breaking the bank is incredibly reassuring. I highly recommend giving it a try!

Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!


1) Gemini 3 Flash: frontier intelligence built for speed, Dec 17, 2025, Google

Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Improving ML Vibe Coding Accuracy: Hands-on with Claude Code's Plan Mode

2025 was a year where I actively incorporated "Vibe Coding" into machine learning. After repeated trials, I encountered situations where coding accuracy was inconsistent—sometimes good, sometimes bad.

Therefore, in this experiment, I decided to use Claude Code "Plan Mode" (1) to automatically generate an implementation plan via an AI agent before generating the actual code. Based on this plan, I will attempt to see if a machine learning model can be built stably using "Vibe Coding." Let's get started!

 

1. Generating an Implementation Plan with Claude Code "Plan Mode"

Once again, I would like to build a model that predicts in advance whether a customer will default (on a loan, etc.). I will use publicly available credit card default data (2). For the code assistant, I am using Claude Code, and for the IDE, the familiar VS Code.

To provide input to the Claude Code AI agent, I summarized the task and implementation points into a "Product Requirement Document (PRD)." This is the only document I created.

I input this PRD into Claude Code "Plan Mode" and instructed it to: "Create a plan to create predictive model under the folder of PD-20251217".

Within minutes, the following implementation plan was generated. Comparing it to the initial PRD, you can see how refined it is. Note that I am only showing half of the actual plan generated here—a truly detailed plan was created. I can only say that the ability of the AI agent to envision this far is amazing.

 

2. Beautifully Visualizing Prediction Accuracy

When this implementation plan is approved and executed, the prediction model is generated. Naturally, we are curious about the accuracy of the resulting model.

Here, it is visualized clearly according to the implementation plan. While these are familiar metrics for machine learning experts, all the important ones are covered and visualized in an easy-to-understand way, summarized as a single HTML file viewable in a browser.

The charts below are excerpts from that file. It includes ROC curves, SHAP values, and even hyperparameter tuning results. This time, the total implementation time was about 10 minutes. If it can be generated automatically to this extent in that amount of time, I’d rather leave it to the AI agent.

 

3. Meta-Prompting with Claude Code "Plan Mode"

A Meta-Prompt refers to a "prompt (instruction to AI) used to create and control prompts."

In this case, I called Claude Code "Plan Mode" and instructed it to "generate an implementation plan" based on my PRD. This is nothing other than executing a meta-prompt in "Plan Mode."

Thanks to the meta-prompt, I didn't have to write a detailed implementation plan myself; I only needed to review the output. It is efficient because I can review it before coding, and since that implementation plan can be viewed as a highly precise prompt, the accuracy of the actual coding is expected to improve.

To be honest, I don't have the confidence to write the entire implementation plan myself. I definitely want to leave it to the AI agent. It has truly become convenient!

 

How was it? Generating implementation plans with Claude Code "Plan Mode" seems applicable not only to machine learning but also to various other fields and tasks. I definitely intend to continue trying it out in the future. I encourage everyone to give it a challenge as well.

That’s all for today. Stay tuned!




You can enjoy our video news ToshiStats-AI from this link, too!

1) How to use Plan Mode,  Anthropic

2) Default of Credit Card Clients








Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Can You "Vibe Code" Machine Learning? I Tried It and Built an App

2025 was the year the coding style known as "Vibe Coding" truly gained mainstream acceptance. So, for this post, I conducted an experiment to see just how far we could go in building a machine learning model using only AI agents via "Vibe Coding"—with almost zero human programming involved. Let's get started!

 
  1. The Importance of the "Product Requirement Document" for Task Description

This time, I wanted to build a model that predicts whether bank loan customers will default. I used the publicly available Credit Card Default dataset (1).

In Vibe Coding, we delegate the actual writing of the program to the AI agent, while the human shifts to a reviewer role. In practice, having a tool called a "Code Assistant" is very convenient. For this experiment, I used Google's Gemini CLI. For the IDE, I used the familiar VS Code.

Gemini CLI

To entrust the coding to an AI agent, you must teach it exactly what you want it to do. While it is common to enter instructions as prompts in a chatbot, in Vibe Coding, we want to use the same prompts repeatedly, so we often input them as Markdown files.

It is best to use what is called a "Product Requirement Document (PRD)" for this content. You summarize the goals you want the product to achieve, the libraries you want to use, etc. The PRD I created this time is as follows:

PRD

By referencing this PRD and entering a prompt to create a default prediction model, the model was built in just a few minutes. The evaluation metric, AUC, was also excellent, ranging between 0.74 and 0.75. Amazing!!

 

2. Describing the Folder Structure with PROJECT_SUMMARY

It is wonderful that the machine learning model was created, but if left as is, we won't know which files are where, and handing it over to a third party becomes difficult.

Therefore, if you input the prompt: "Analyze the current directory structure and create a concise summary that includes: 1. A tree view of all files 2. Brief description of what each file does 3. Key dependencies and their purposes 4. Overall architecture pattern Save this as PROJECT_SUMMARY.md", it will create a Markdown file like the one below for you.

PROJECT_SUMMARY.md

With this, anyone can understand the folder structure at any time, and it is also convenient when adding further functional extensions later. I highly recommend creating a PROJECT_SUMMARY.md.

 

3. Adding a UI and Turning the ML Model into an App

Since we built such a good model, we want people to use it. So, I experimented to see if I could build an app using Vibe Coding as well.

I created PRD-pdapp.md and asked the AI agent to build the app. I instructed it to save the model file and to use Streamlit for app development. The actual file and its translation are below:

PRD-pdapp.md

When executed, the following app was created. It looks cool, doesn't it?

You can input customer data using the boxes and sliders on the left, and when you click the red button, the probability of default is calculated.

  • Customer 1: Default probability is 7.65%, making them a low-risk customer.

  • Customer 2: Default probability is 69.15%, which is high, so I don't think we can offer them a loan. The PAY_0 Status is "2", meaning their most recent payment status is 2 months overdue. This is the biggest factor driving up the default probability.

As you can see, having a UI is incredibly convenient because you can check the model's behavior by changing the input data. I was able to create an app like this using Vibe Coding. Wonderful.

 

How was it? It was indeed possible to perform machine learning using Vibe Coding. However, instead of programming code, you need to create precise PRDs. I believe this will become a new and crucial skill. I encourage you all to give it a try.

That’s all for today. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Default of Credit Card Clients

 



Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

The OpenAI Code Red: What’s Next for the Generative AI Market?

In late November 2022, OpenAI released ChatGPT. It has been three years since then, and just as it was about to celebrate its third birthday, an event occurred that dampened the celebratory mood. CEO Sam Altman declared a "CODE RED" (Emergency) (1). The driving force behind this was the breakthrough of the new generative AI, "Gemini 3" (2), released by Google on November 18. Today, I would like to delve into this theme and forecast the generative AI market for 2026. Let’s get started.

 

1. Gemini 3 vs. GPT-5

On August 6, 2025, OpenAI released GPT-5. Since it was the first major update since GPT-4, people had very high expectations. However, in reality, it was difficult to perceive a significant difference compared to other models. Although it managed to update scores across various benchmarks, the impression was that its impact felt somewhat muted compared to the arrival of GPT-4.

Of course, it is evolving steadily, so if rival companies' models had remained stagnant, I believe it could have celebrated its third birthday peacefully. However, the moves made by its rival, Google, surpassed our expectations. On November 18, 2025, Gemini 3 was released, and everyone was astonished by its high performance. Its scores in almost all benchmarks surpassed those of GPT-5, and for the first time since the birth of ChatGPT, GPT-5 lost its "technological competitive advantage." The battle surrounding generative AI has entered a new phase.

 

2. Why Gemini 3 is Particularly Superior

There are several technical talking points, but what I am paying special attention to is its high capability in image processing and generation. As shown in the leaderboard (3) below, its strength is overwhelming and unrivaled. The famous image generation app Nano Banana Pro is officially named Gemini 3-Pro-Image, and its high scores truly stand out.

                        Leaderboard

When considering individual customers, the ability to easily generate and edit images exactly as envisioned is crucial and can serve as a "killer app." I feel that once individuals experience the technical level of Gemini 3, they will find it difficult to easily switch back to competitor apps. The image below was generated using Nano Banana Pro. As you can see, it has become easy to render both English and Japanese text together on an image. Previously, Japanese text was often incomplete or incomprehensible, so it was quite moving to see clean Japanese generated for the first time.

                   Image generated by Nano Banana Pro

 

3. The Generative AI Market in 2026

With Sam Altman issuing a CODE RED, I believe OpenAI will allocate significant development resources to improving the model itself and will frantically work to close this gap in the image generation field. On the other hand, Google, armed with Gemini 3, possesses several multimodal generative AI models beyond just Nano Banana Pro, and I expect them to leverage that expertise to aim for further breakthroughs.

In particular, generative AI capable of simulation using 3D structures—known as World Models—will likely influence Large Language Models (LLMs) as well, solidifying Google's competitive advantage. One has to admit that Google, which owns YouTube, is incredibly strong in this field. It looks like 2026 will be a year where we cannot take our eyes off how OpenAI launches its counterattack.

 

How was it? While there are several other players creating generative AI, I believe the industry style will involve companies defining their own positions within the context of the "OpenAI vs. Google" battle. Therefore, the outcome of OpenAI vs. Google is extremely important for all AI-related companies. I would like to write another blog post on this same theme if the opportunity arises.

That’s all for today. Stay tuned!









You can enjoy our video news ToshiStats-AI from this link, too!


1) Sam Altman’s ‘Code Red’ Memo Urges ChatGPT Improvements Amid Growing Google Threat, Reports Say, Forbes, 2 Dec 2025
2) A new era of intelligence with Gemini 3, Google, 18 Nov 2025
3)  Leaderboard Overview





Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Game Changer: How Nano Banana Pro is Redefining Digital Marketing!

Just fresh off the heels of last week's new model release, Google has debuted yet another new image generation model: Nano Banana Pro (Gemini 3 Pro Image). Rumors on the street say it boasts incredible performance. So, let's dive in and test it out to see its potential capabilities.

 

1. The Latest Tokyo Fashion Trends

Fashion evolves with every season, and keeping up with the trends can be a challenge. However, the internet is overflowing with the latest style information. I figured that by feeding this real-time data into generative AI, we could generate images of models wearing the styles currently in vogue. Let's give it a try. Below is the original image of the model. She is wearing an outfit typical of Japanese autumn.

Original Image

I fed this original image and the prompt "Perform Google Search for current Tokyo fashion trends for 20s lady and apply that style to the model in the attached photo. 4 images are needed." into Nano Banana Pro.

Generated Images

The same model appears in all four images, maintaining consistency. Furthermore, the latest fashion trends have been incorporated thanks to Google Search. This is wonderful. Nano Banana Pro's Grounding feature using Google Search is excellent. As the model updates in the future, we can expect the accuracy of capturing trendy fashion to improve even further.

 

2. Creating a Signature Cafe Menu

Next, I want to devise a set menu featuring shortcake and coffee for opening a cafe in Ashiya, a high-end residential area in Japan. For this one too, I prepared a prompt to generate the image after researching currently popular cakes using Google Search.

"I am opening a cafe in Ashiya, Japan, featuring a fruit shortcake and coffee set as the signature dish. Use Google Search to identify current cake trends in Ashiya City. Then, create a high-quality menu image for this set that includes a description and price in English, incorporating the local trends."

I generated the following Japanese and English versions of the menu.

English Version

Japanese Version

Both the Japanese and English text are perfect. I think this is a huge leap forward, especially since AI image generation has struggled to correctly render local languages like Japanese until now. I’m sure it will work well with other local languages too. It looks like Nano Banana Pro will be able to perform globally, regardless of language.

 

3. 3D Visualization of Loss Functions

Raising the abstraction level a bit, I want to execute a 3D visualization of a loss function—a topic often discussed when building targeting models for marketing—and clearly explain the concept of the gradient descent method. Nano Banana Pro can understand even theoretical and highly abstract phenomena like loss functions and map them in 3D. Below is the result. You can see at a glance how the parameters get stuck in a local minimum and cannot reach the point where the loss function is at its global minimum. Amazing.

Gradient Descent Method

 

How was it? Even from these few experiments, the excellence of Nano Banana Pro is clear. I have a hunch that Nano Banana Pro is going to change the very methods of digital marketing. I felt particularly strong potential in the Grounding feature using Google Search. I plan to cover Nano Banana Pro again in the near future.

That’s all for today. Stay tuned!

 



You can enjoy our video news ToshiStats-AI from this link, too!

 

1) Introducing Nano Banana Pro, Google, 20 Nov 2025







Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.













Google Antigravity: The Game Changer for Software Development in the Agent-First Era

Google has unveiled Gemini 3.0, its new generative AI, and "Antigravity" (1), a next-gen IDE powered by it. Google states that "Google Antigravity is our agentic development platform, evolving the IDE into the agent-first era," signaling a shift toward truly agent-centric development. Here, I’m going to task Antigravity with creating a "Bank Complaint Classification App." I want to actually run it to explore its potential.

                   Antigravity

 

1.Agentic Development with Antigravity

Antigravity is built on top of VS Code. If you are a VS Code user, the editor will look familiar, making it very approachable and easy to pick up. However, the real power of Antigravity lies in its dedicated interface for agentic development: the Agent Manager (shown below). Just enter a prompt into the box and run it to kick off "Vibe Coding." The prompt shown here is the very simple one I entered at the beginning of the development process. Antigravity also appears to be packed with various features designed to facilitate efficient communication with the Agent. For more details, please check the website (1).

                         Agent Manager

 

2. Prompt Refinement and Improvement

Just because you start "Vibe Coding" doesn't mean you'll get perfect code immediately. I started with a simple prompt this time as well, but the process proved to be more challenging than anticipated. While Gemini 3.0 Pro often demonstrates human-level capability when handling HTML and CSS for website building, the framework used for this app—Google ADK—is a brand-new agent development kit that just debuted in April 2025. Consequently, there are likely very few code examples available on the web, and I assume it hasn't been fully absorbed into Gemini 3.0's training data yet.

               Development with Google ADK

It was quite a struggle, but as shown above, I managed to build a fully functional app via "Vibe Coding." To generate these files, I relied solely on natural language instructions; I didn't write a single line of code directly in the editor. However, I did include simple code snippets within the prompts. This is a technique known as "few-shot learning," where you provide examples to guide the model. I believe this approach is highly effective when Vibe Coding with Gemini 3.0 for Google ADK development. While this might become unnecessary as Gemini 3 is updated in the future, it’s certainly a technique worth remembering for now.

Bank Complaint Classification App using Google ADK

The screenshot above shows the "Bank Complaint Classification App" I developed. I verified its accuracy with some simple examples, and the results were excellent. It seems the internal prompts within the app were generated very effectively. Impressive work!

 

3. Summary of Building a Complaint Classification App with ADK

  • Total Time: 6 hours (starting from the Antigravity installation) to complete the app.

  • Execution: With the finalized prompt, the run time is just over a minute.

  • Manual Effort: The actual coding for Google ADK to make the app is only about a 20-minute task if done manually without vibe-coding.

  • Reasons for the Delay:

    • I had to iterate on the prompts several times because Gemini 3 is still unfamiliar with Google ADK

    • I had to explicitly instruct it on file structures and code syntax.

    • I was also using Antigravity for the first time.

  • Conclusion: It is manageable once you understand Gemini 3 Pro's behavior regarding Google ADK.

 

So, what do you think?

It took a little longer because I wasn't used to the new IDE yet, but the combination of Gemini 3.0 Pro and Antigravity was outstanding. I could really feel its high potential. Since the execution speed itself is fast, next time I plan to challenge myself by "Vibe Coding" a multi-agent app. Look forward to it! That's all for today. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!



1) Experience liftoff with the next-generation IDE, Google,  19 Nov 2025







Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

OpenAI & MUFG : A Strategic Collaboration Poised to Reshape the Future of Finance

On November 12, 2025, OpenAI and MUFG (Mitsubishi UFJ Financial Group, Inc.), one of Japan's three largest financial groups, announced a strategic collaboration (1). As this content has the potential to transform Japan's financial sector, I'd like to share the key points from the news release along with my own analysis. Let's get started!

 

1. Business Transformation Utilizing AI

"Beginning in January 2026, all approximately 35,000 employees of MUFG Bank will use ChatGPT Enterprise in their daily operations." This is a significant step forward in transforming the subsidiary bank into an AI-native organization. It's presumed that OpenAI and MUFG, having already collaborated for over a year, have accumulated considerable expertise in applying AI to banking operations. If they can unlock the full potential of the generative AI GPT-5 through ChatGPT Enterprise, the impact on their business processes is expected to be substantial.

                   ChatGPT Enterprise

 

2. Talent Development

"Furthermore, to accelerate the company-wide adoption of AI, the two companies will establish a project team. They will collaborate on training specialized personnel, or 'AI Champions,' who can drive AI utilization and organizational reform. This will be supported by providing education, training programs, and support for MUFG's company-wide AI adoption campaign, 'Hello, AI @MUFG.'" As this indicates, talent development is essential for embedding AI within the company. While GPT-5 is highly capable, it cannot completely replace human abilities. Collaboration between AI and humans remains indispensable. There is no fixed methodology for how we communicate with AI to achieve our goals; I believe this will continue to be a process of trial and error.

 

3. Creating Innovative Customer Experiences in Retail

"We will install an 'AI Concierge' equipped with the latest AI into the apps provided by MUFG's group companies. This will go beyond simply answering questions to provide personalized support that becomes more tailored with use. In the future, data from each app will be integrated, enabling the AI to grasp the customer's entire transaction history and offer precise suggestions from any app. The first implementation is planned for the digital bank scheduled to launch next fiscal year, with the aim of creating an AI-native digital bank." Of the various retail measures, this "AI Concierge for personalized support" is particularly striking. I believe that without accurately recorded past transaction histories and conversations, providing relevant support is impossible. The entry of Japan's largest financial group into the "AI Concierge" space holds great significance for the financial industry. I'm looking forward to trying it myself.

 

4. Participation in the OpenAI Ecosystem

"We will explore integration with 'Apps in ChatGPT,' which OpenAI announced in October. By connecting MUFG's group company apps and services to ChatGPT's framework, we aim to offer a new financial experience where customers can naturally discuss household financial management and asset investment tailored to their situation, all within the flow of a conversation with ChatGPT." This can be interpreted as MUFG's medium-to-long-term strategy to enter the OpenAI ecosystem. OpenAI is solidifying its position as a global portal to the internet and, from that base, has begun building an ecosystem to realize "Agentic Commerce." I believe MUFG is considering being one of the first in the world to take this leap. I'm excited to see how this unfolds.

 



What did you think? While it has only just been announced and details are still scarce, I feel the content clearly conveys the strong commitment from both companies. I am very excited to see how this "tag team" will change the future of finance in Japan and Asia. For those who wish to read the full content of this release, please see the original source (1). That's all for today. Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!


1)Initiatives for AI-Driven Business Transformation and New Service Creation in the Retail Sector, 12 Nov 2025, Mitsubishi UFJ Financial Group, Inc. (MUFG) MUFG Bank, Ltd. (MUFG Bank)


Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

This Is What Happens When an AI Agent Runs Our 2025 Autumn Marketing!

Hello, the high temperature in Tokyo has dropped to 16°C, and it's starting to feel very much like autumn. For those unfamiliar with autumn in Japan, this is the season when the leaves on the mountains change from green to orange. The entire mountainside is dyed orange, creating a beautiful and spectacular view. Therefore, I decided to use orange as the background color for this marketing campaign's promotional video. The challenge is: "To devise a campaign to sell cakes to women in Ashiya, an affluent residential area in the Kansai region." What happens when we entrust this task to an AI agent? Let's find out.

 

1. Creating an AI Marketing Agent with "Google Opal"

This time, I'm creating an AI marketing agent using Google Opal (1). As the description says, "Opal, our no-code AI mini-app builder," you can easily develop an AI agent app like the one below.

For this AI agent's development, I only entered the following prompt: "You are an expert in marketing campaigns. You will be given the following information: 1. The product/service to sell, 2. The target customer, 3. The location/region, 4. The time/season of the campaign, 5. The desired brand image color, 6. A photo of the facilitator. Using this information, please create the following: a. A marketing strategy, b. A marketing campaign name, c. A logo based on the name, d. A promotional video featuring the facilitator, complete with BGM."

Just by executing this, you can create a workflow like the one shown above using the AI agent. After that, you just switch to the app and answer questions related to your task, and the marketing campaign is created. Amazing, isn't it!

 

2. Marketing Strategy and Logo

Once you input all the necessary information, you get the results back immediately. First is the marketing strategy. In reality, a more detailed discussion followed. This time, I'll just introduce the beginning. Even though I didn't input very detailed information about the campaign at the initial stage, I think this marketing strategy is well-done.

                  Marketing Strategy

Next is the marketing campaign name and logo. What it generated was a cool, French-style logo. I'd love to try using it sometime.

          Logo

 

3. Three Short Promotional Videos

First, I provide the AI agent with a base image of a woman. Then, using this image as a starting point and based on the created marketing strategy, an approximately 8-second short video is generated. It's exciting to see what kind of video the AI agent will produce. This time, it created three videos with BGM. All of them are based on the theme of "Autumn Cakes." It's hard to pick a winner; they are all excellent. After actually creating the videos, I felt that even 8 seconds is enough to convey the image clearly. Which one did you like the best?

 

What did you think? Although this was just a demo AI agent, I was astonished at what it could accomplish with no code, no programming. It seems like it will become a powerful ally for marketers. Of course, there are limitations, but what I created this time can be done for free with just a Google account. I highly recommend giving it a try. ToshiStats will continue to share more about AI agents. Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!

1) Opal is now available in more than 160 countries, Google, 7 Nov 2025

Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

OpenAI vs. Google: Who Has the Right Take on AGI?

Recently, OpenAI CEO Sam Altman commented on YouTube(1) that 'it is plausible that a legitimate AI Researcher will be achieved by March 2028.' Can this really be achieved in such a short time, less than 2.5 years from now? I would like to consider this deeply, comparing it with the statements of Demis Hassabis, CEO of rival company Google DeepMind.

 
  1. Achieving Legitimate AI Researcher by March 2028

As for when Artificial General Intelligence (AGI)—which would surpass human intelligence—will actually be achieved, opinions are divided even among experts. Amidst this, OpenAI CEO Sam Altman commented, referencing the following timeline, that 'It is a plausible that a legitimate AI Researcher will be achieved by March 2028.'"

Of course, this is an internal goal, and he isn't claiming it's AGI. However, if AI can take on the role of a researcher, technological development will accelerate dramatically, and the current industrial structure will likely change completely. I think it's groundbreaking that they have set a timeline for such a high-impact goal. The issue is its feasibility. Although technical points were discussed in this YouTube video, I felt that alone was insufficient to explain its feasibility. There is likely much that cannot be disclosed as it is confidential information, but it would have been better if there had been a more in-depth explanation.

 

2. Current AI Lacks Consistency

At this point, let's introduce the opinion(2) of Google DeepMind CEO Demis Hassabis regarding the realization of AGI. As you know, he is a co-founder of DeepMind and has aimed to develop AGI since its founding in 2010. Despite that extensive experience, he says it will still take 5 to 10 years to achieve AGI. One reason for this is that 'current generative AI exhibits PhD-level capabilities for some tasks, yet at other times, it can make mistakes on simple high school math.' In short, its abilities 'lack consistency.' . 'Consistency' is essential for achieving AGI, and apparently, two or three more breakthroughs will be necessary to get there. I find this to be a rather cautious view. For other points of discussion, please watch the YouTube video(2).

 

3. AI is Steadily Evolving, Step by Step

Although there are differences in their definitions of AGI and their timelines, both parties seem to agree on its eventual realization. We cannot predict when breakthroughs will occur. I believe the only thing we should do is 'prepare for the emergence of AGI.' . Whether it arrives in 2028 or 10 years from now, we need to start preparing now how we can use AGI—considered humanity's greatest invention—to realize a better society, industry, and life. Even as we speak, AI is likely evolving beneath the surface. Our company, ToshiStats, intends to continue discussions in order to successfully incorporate those advancements.



You can enjoy our video news ToshiStats-AI from this link, too!


1) Sam, Jakub, and Wojciech on the future of OpenAI with audience Q&A, OpenAI, 30 Oct 2025

2) Google DeepMind CEO Demis Hassabis on AI, Creativity, and a Golden Age of Science | All-In Summit,  13 Sep 2025





Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

"AGI Is Still a Decade Away" : A Message from a Genius AI Engineer

I recently found an interesting interview video on YouTube. It was an interview with a prominent AI engineer, and the message from it was the shocking statement that "AGI is still a decade away."

While many opinions suggest AGI will be realized in just a few years, his mention of such a long timespan—10 years—seems to have gathered global attention. This time, I'd like to share the key points that caught my attention from the video (1), which is over two hours long, and a subsequent tweet (2) he posted on X. you have a perfect AI tutor, maybe you can get extreme far, the genius today are barely scratching the surface of what a human mind can do,

Andrej Karpathy (left)— “We’re summoning ghosts, not building animals”

 

1. AGI is Still a Decade Away

The timeline for achieving AGI is debated among researchers, but the claim that it will take 10 years feels like a minority opinion, perhaps due to the flood of hype surrounding AI agents.

Of course, he has his reasons for asserting this. His tweet (2) stated: "There is still a lot of work (grunt work, integration work, sensors/actuators to the physical world, social work, safety & security work (jailbreaks, poisoning, etc)) to be done before we get to something that you’d rather hire than a human for any job in the world."

Indeed, AI agents in the world of text, like coding, have only just begun this year. The speculation that it will take a considerable amount of time to achieve an AGI that can also operate with high precision in the real world, including physical interaction, feels very convincing.

 

2. On LLM Agents

I believe this topic is especially important for those who use code assistants. His tweet included a critical comment on the current state: "I live in an intermediate world of collaborating with LLMs, where our pros/cons combine. The industry lives in a future where fully autonomous entities collaborate in parallel to write all the code and humans are useless."

I also feel that "those unfamiliar with AI technology might misunderstand, thinking they can easily build anything just by asking a code assistant." The performance of the latest generative AI like GPT-5 is incredible, but I believe there are still many cases where you can't just delegate 100% of a task to it. A collaborative relationship is still necessary, where the human decides the basic outline and structure, has the AI agent draft the details, and then the human reviews the results.

Once AGI is achieved, human intervention shouldn't be necessary at all, but it makes sense that it will take a considerable time to get there.

 

3. On Education in the AGI Era

Let's approach this final topic with optimism. In the interview, he spoke about the future of education, saying: "Teaching Assistants are currently human, but I think they can be replaced by AI in the future. Even in that case, the overall structure of the course would be devised by myself or the faculty, but perhaps in the future, AGI will even do that."

In fact, my company is also developing an e-learning program. While I am designing the overall structure, an AI avatar is scheduled to deliver the actual lectures. It's not possible to automate everything with current AI agents, but I think everyone can agree on the point that by humans and AI collaborating, we can create wonderful educational programs.

I'd like to close with his words: "If you have a perfect AI tutor, maybe you can get extremely far, the geniuses today are barely scratching the surface of what a human mind can do."

 

What did you think?

I want to note that he is bullish on the realization of AGI itself; it's his opinion on the timeline that differs from the consensus. Although the time until realization may vary, AGI will eventually appear before us.

What I've introduced here is just a tiny fraction of the more-than-two-hour interview. I highly recommend that you all watch this wonderful interview. I'm sure you will find some hints about the future of AGI.

Well, that's all for today. Stay tuned!






You can enjoy our video news ToshiStats-AI from this link, too!


1) Andrej Karpathy — “We’re summoning ghosts, not building animals” ,  Dwarkesh Podcast, 18 Oct 2025

2) X_post, Andrej Karpathy, 19 Oct 2025






Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Your Guide to AI Agents: Insights from Andrew Ng's Latest Course

A new online course called "Agentic AI" (1) has been released by DeepLearning AI. The creator is Andrew Ng, an adjunct professor at Stanford University, who is also famous for his past machine learning-related courses. For me, this is the first course I've taken from him since the Deep Learning Specialization in 2018. I've just completed it, and I'd like to share my thoughts and a recommendation.

 

1. Course Overview

The course is divided into five modules, each consisting of 5-7 short videos (about 5-10 minutes each), a quiz, and coding tasks using jupyter notebook. By passing each assignment, you are ultimately awarded a certificate of completion. The level is listed as intermediate; while a basic knowledge of Python is necessary, I believe that even those without specialized knowledge in AI can progress through the material and naturally come to understand it. The main topics are as follows:

Reflection: AI critiques its own work and iterates to improve quality—like code review, but automated.

Tool Use: Connect AI to databases, APIs, and external services so it can actually perform actions, not just generate text.

Planning: Break complex tasks into executable steps that AI can follow and adapt when things don’t go as expected.

Multi-Agent: Coordinate multiple specialized AI systems to handle different parts of a complex workflow.

Created by Andrew Ng, who teaches at Stanford while concurrently doing practical consulting work, I found the course to have a wonderful balance between theory and practice.

 

2. Reflection and Tool Use

The second and third modules are critical technologies for the future realization of AGI. In particular, "Reflection," where an AI improves itself, is also known as Recursive Self Improvement and is a field being researched worldwide. This module introduces a method that allows even non-experts to incorporate reflection functionality, which I am very eager to try implementing. Additionally, using tools allows a generative AI to incorporate information that is difficult to acquire on its own, thereby enhancing the AI agent's capabilities. Furthermore, this information can be applied to the "Reflection" process, promising a synergistic effect. I'm also keen to implement this and see what kind of information can be integrated.

 

3. Error Analysis

As Andrew Ng states, this fourth module is, in my opinion, the most important and valuable content in the course. Generative AI is excellent, but it is not perfect. There is still a considerable possibility that it will produce incorrect answers. Therefore, to raise its accuracy to a practical level, the course emphasizes the importance of adopting a strategy that quickly identifies the parts of the overall process with the lowest performance and allocates resources to improving those areas. I can certainly see how for a complex AI agent that may contain numerous sub-agents, identifying and prioritizing the reinforcement of its weaknesses is incredibly important in practical applications.

 

So, what did you think? With a flood of AI-related news every day, many people are likely wondering, "How should I proceed with my AI projects from now on?" I believe this course provides a valuable perspective for thinking in the medium to long term. While it is a paid course, it is not as expensive as university tuition, and I highly recommend trying it. Incidentally, because I studied intensively, I was able to receive my certificate in about three days. It's certainly possible for a business professional to complete it over a long weekend.

Well, that's all for today. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!


1) Agentic AI, Andrew Ng,  DeepLearning AI, Oct 2025 







Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

The Secret to High-Accuracy AI: An Exploration of Machine Learning engineering agent

In a previous post, I explained Google's research paper, "MLE STAR" (1), and uncovered the mechanism by which an AI can build its own high-accuracy machine learning models. This time, I'm going to implement that AI agent using the Google ADK and experiment to see if it can truly achieve high accuracy. For reference, the MLE STAR code is available as open source (2).

 

1. The Information I Provided

With MLE STAR, humans only need to handle the data input and task definition. The data I used for this experiment comes from the Kaggle competition "Home Credit Default Risk" (3). While the original data consists of 8 files, I combined them into a single file for this experiment. I reduced the training data to 10% of the original, resulting in about 30,000 samples, and kept the original test data of 48,700 samples.

The task was set as follows: "A classification task to predict default." Note that to speed up the experiment, the number of iterative loops was set to a minimum.

                     Task Setup

 

2. Deciding Which Model to Use

MLE STAR uses a web search to select the optimal model for the given task. In this case, it ultimately chose LightGBM. To finish the experiment quickly, I configured it to select only one model. If I had set it to select two, it likely would have also chosen something like XGBoost. Both are models frequently used in data science competitions.

                Model Selection by MLE STAR

It generated the initial script below. As a frequent user of LightGBM, the code looks familiar, but the ability to generate it in an instant is something only an AI can do. It's amazing!

 

3. Identifying Key Code Blocks with "Ablation Studies"

Next, it uses ablation studies to identify which code blocks should be improved. In this case, ablation2 showed that removing Early Stopping worsened the model's performance, so this feature was kept in the training process from then on.

               Ablation Studies Results by MLE STAR

 

4. Iteratively Improving the Model

Based on the ablation studies, MLE STAR decided to improve the model using the following two techniques: K-fold target encoding and binary encoding. These techniques themselves are common in machine learning and are not particularly unusual.

                   K-fold Target Encoding

                     Binary Encoding

This ability to "use ablation studies to identify which code blocks to improve" is likely a major reason for MLE STAR's high accuracy. I look forward to seeing how this functionality evolves in the future.

 

5. The Results Are In. Unfortunately, I Lost.

For its final step, MLE STAR ensembles the models to create the final version. For more details, please see the research paper. It also generates a CSV file with the default predictions, which I slightly modified and promptly submitted to Kaggle. This task is evaluated using AUC, where a score closer to 1 indicates higher accuracy.

The top score is the result I achieved using my own LightGBM model. The score in the red box at the bottom is the one automatically generated by MLE STAR. With a difference of more than 0.01 on both the Public and Private scores, it was my complete defeat.

             Kaggle Prediction Accuracy Evaluation (AUC)

Improving the AUC by 0.01 is quite a challenge, which gives a glimpse into how excellent MLE STAR is. I didn't perform any extensive tuning on my LightGBM model, so I believe my score would have improved if I had spent time tuning it manually. However, MLE STAR produced its result in about 7 minutes from the start of the computation, so from an efficiency standpoint, I couldn't compete.

 
 

So, what did you think? Although this was a limited experiment, I feel I was able to grasp the high potential of MLE STAR. I was truly impressed by the power of its Recursive Self-Improvement, which identifies specific code blocks and improves upon them autonomously.

Here at Toshi Stats, I plan to continue digging into MLE STAR. Stay tuned!





You can enjoy our video news ToshiStats-AI from this link, too!




1) MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
Jaehyun Nam1 2 *, Jinsung Yoon1, Jiefeng Chen1, Jinwoo Shin2, Sercan Ö. Arık1 and Tomas Pfister1, Google Cloud1, KAIST2,  23, Aug 2025

2) Machine Learning Engineering with Multiple Agents (MLE-STAR) , Google

3) Home Credit Default Risk, kaggle



Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Is an AI Machine Learning Assistant Finally a Reality? I Looked Into It, and It's Incredible!

I often build machine learning models for my job. The process of collecting data, creating features, and gradually improving the model's accuracy takes time, specialized knowledge, and programming skills in various libraries. I've always found it to be quite a challenge. That's why I've been hoping for an AI that could skillfully assist with this work, and recently, a potential candidate has emerged. I'd like to take a deep dive into it right away.

 
  1. A Basic Three-Layer Structure

This AI assistant is called MLE-STAR, and according to a research paper (1), it has the following structure. Simply put, it first searches the internet for promising libraries. Next, after writing code using those libraries, it identifies which parts, called "code blocks," should be improved further. Finally, it decides how to improve those code blocks. Let's explore each of these steps in detail.

 

2. Selecting the Optimal Library with a Search Function

To create a high-accuracy machine learning model, you first need to decide "what kind of model to use." This means you have to select a library to implement the model. This is where the search function comes in. For example, in a finance task to calculate default probability, many methods are possible, but gradient boosting is often used in competitions like Kaggle. I also use gradient boosting in most cases. It seems MLE-STAR can use its search function to find the optimal library on its own, even without me specifying "use gradient boosting." That's amazing! This would eliminate the need for humans to research everything, leading to greater efficiency.

 

3. Finding Where to Improve the Code and Steadily Making Progress

Once the library is chosen and a baseline script is written, it's time to start making improvements to increase accuracy. But it's often difficult to know where to begin. MLE-STAR employs an ablation study to understand how accuracy changes when a feature is added or removed, thereby identifying the most impactful code block. This part of the process typically relies on human experience and intuition, involving a lot of trial and error. By using MLE-STAR, we can make data-driven decisions, which is incredibly efficient.

 

4. Iterating Until Accuracy Actually Improves

Once the code block for improvement is identified, the system gradually changes parameters and confirms the accuracy improvements. This is also done automatically within a loop, without requiring human intervention. The accuracy is calculated at each step, and as a rule, only changes that improve performance are adopted, ensuring that the model's accuracy steadily increases. Incredible, isn't it? In fact, a graph comparing the performance of MLE-STAR with past AI assistants shows that MLE-STAR won a "gold medal" in approximately 36% of the tasks, highlighting its superior performance.

 

So, what did you think? This new framework for an AI assistant looks extremely promising. In particular, its ability to identify which code blocks to improve and then actually increase the accuracy is likely to become even more powerful as the performance of foundation models continues to advance. I'm truly excited about future developments.

Next time, I plan to apply it to some actual analysis data to see what kind of accuracy it can achieve. Stay tuned!




You can enjoy our video news ToshiStats-AI from this link, too!



1) MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
Jaehyun Nam1 2 *, Jinsung Yoon1, Jiefeng Chen1, Jinwoo Shin2, Sercan Ö. Arık1 and Tomas Pfister1, Google Cloud1, KAIST2,  23, Aug 2025



Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

A Sweet Strategy: Selling Cakes in Wealthy Residential Areas !

Has everyone ever thought about starting a cake shop? As a cake lover myself, I often find myself wondering, "What kind of cake would be perfect?" However, developing a concrete business strategy is a real challenge. That's why this time, I'd like to conduct a case study with the support of an "AI marketing-agency." Let's get started.


1. Selling Cakes in an Upscale Kansai Neighborhood

The business scenario I've prepared for this case is a simple one:

Goal: To sell premium fruit cakes in the Kansai region.

  • Cake Features: Premium shortcakes featuring strawberries, peaches, and muscat grapes.

  • Target Audience: Women in their 20s to 40s living in upscale residential areas.

  • Stores: 3 cafes near Yamate Dentetsu Ashiya Station, 1 cafe near Kaigan Dentetsu Ashiya Station.

  • Direct Sales Outlet: 1 store inside the Yamate Dentetsu Ashiya Station premises.

  • Branding: The brand's primary color will be blue, with the website and logo also unified in blue.

  • Current Plan: In the process of planning a sales promotion for the autumn season.

From here, what kind of concrete business strategy can we derive? First, I'll input the business scenario into the AI marketing-agency.

The first thing it does is automatically generate 10 cool domain names.

It's hard to choose, but for now, I'll proceed with branding using "PremiumAshiyaCake.com".

 

2. A Practical Business Strategy

Now, let's ask the AI marketing-agency to formulate a business strategy for selling our premium fruit cakes in Kansai. When prompted to input the necessary information, I re-entered the business scenario, and the following business strategy was generated in about two minutes. Amazing!

It's a long document, over five pages, so I can't share it all, but here is the "Core of the Marketing Strategy."

  • Overall Approach: Direct Response that Inspires Aspiration

    • We will build an aspirational, luxury brand image through beautiful content, and then convert that desire into immediate store visits using precisely targeted calls-to-action (CTAs).

  • Core Message and Positioning:

    • Positioning Statement: For the discerning women of Kansai, Premium Ashiya Cake is the patisserie that transforms a moment into a cherished memory with its exquisitely crafted seasonal shortcakes.

    • Tagline / Core Message: "Premium Ashiya Cake: An exquisite moment, crafted for you."

  • Key Pillars of the Strategy:

    • Visual Elegance and a "Blue" Signature: All visuals must be of professional, magazine-quality. The brand color "blue" will be used as a sophisticated accent in styling—such as on blue ribbons, parts of the tableware, or as background elements—to create a recognizable and unique visual signature.

    • Hyper-local Exclusivity: Marketing efforts will be geographically and demographically laser-focused on the target audience residing in Ashiya and its surrounding affluent areas. This creates an "in-the-know" allure for locals.

    • Seasonal Storytelling: Treat each season's campaign as a major event. We will build a narrative around the star ingredients, such as Shine Muscat grapes from a specific partner farm, to build anticipation and justify the premium price point.

This is wonderfully practical content. The keywords I provided—"blue," "Ashiya," and "muscat"—have been skillfully integrated into the strategy.

 

3. The Logo is Excellent, Too—This is Usable!

Because I specified in the initial business scenario that I wanted to "unify the color scheme based on blue," it created this cool logo for me. It really looks like something I could use right away. Google's image generation AI, Imagen 3.0, is used here. The quality of this AI is always highly rated, so it's no surprise that the logo generated this time is also of outstanding quality.

 

So, what did you think of the AI marketing-agency? The business strategy is professional, and it's amazing how it automatically created the domain names and logo with such excellent results. Although I couldn't introduce it this time, it also includes a website creation feature. It's surprising that a tool this high-performance is actually available for free. A development kit called "Google ADK" is provided as open-source, and the AI marketing-agency from this article can be downloaded and used for free as Sample (1). For those who can use Python, I think you'll get the hang of it with a little practice. The operational costs are also limited to the usage fees for Google Gemini 2.5 Pro, so the cost-effectiveness is outstanding. I encourage you all to give it a try.

Please note that this story is a work of fiction and does not represent anything that actually exists. That's all for today, stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Marketing Agency, Google, May 2025



Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Unlocking Sales Forecasts: Can GPT-5 Reveal the Most Important Data?

Have you ever found yourself in marketing, wanting to predict sales and gathering a ton of data? For example, let's say you have sticker sales data (1) like the set below. The num_sold column represents the number of units sold. This is actually a large dataset with over 200,000 entries. So, among these data columns (which we call "features"), which one is the most important for predicting sales? They all seem important, and it's impossible to check all 200,000 records one by one. So, let's try asking the generative AI, GPT-5.

                         Sticker sales data

 

1. Asking GPT-5 with a Prompt

To identify the important features for a prediction, you first have to create a predictive model. This is a task that data scientists perform all the time. However, they usually create these models by coding in Python, which can be a high barrier for the average business person. So, isn't there an easier way? Yes, and this is where prompts come in handy. If you can give instructions to GPT-5 with a prompt, no coding is necessary. Here is the prompt I created for this task.

     data & prompt

Key points of the prompt:

  • Use HistGradientBoostingRegressor from sklearn.

  • Evaluate the error using mean_absolute_percentage_error.

  • Split the data into train-data and test-data at an 80:20 ratio.

  • Display the top 10 feature importances with their original variable names.

  • Print the results as numerical output.

By getting the top 10 feature importances, we can understand which data column is the most significant. I won't explain the predictive model itself this time, so for those who want to dive deeper, please refer to a machine learning textbook.

 

2. The Code Actually Being Executed

Based on the prompt above, GPT-5 generated the following Python code on its own. It might look complicated to non-specialists, but rest assured, we don't have to touch Python at all. However, we can review this code to see how the calculation is being done, so it's by no means a black box. I believe this transparency is very important when using GPT-5 in a business context.

                 GPT-5's code for building the prediction model

 

3. "Product" Was the Most Important!

Ultimately, we got the following result.

Feature Importance Ranking

A higher "importance" value in the table above means the feature is more significant. This analysis revealed that "product" was overwhelmingly important. It seems that thinking about "what is selling" is essential. This is followed by "store" and "country". This suggests that considering "in what kind of store" and "in which country" is also crucial.

                     feature importance ranking

 

So, what did you think? This time, we instructed GPT-5 with a prompt to calculate which features are most important for predicting sales. It's true that you might run into errors along the way that GPT-5 has to correct itself, so I felt that having some basic knowledge of machine learning is beneficial. However, we were able to get the result without the user having to write any Python, which means marketing professionals can start trying this out today. I hope you can use the method we introduced today in your own marketing work. That's all for now. Stay tuned!

 


You can enjoy our video news ToshiStats-AI from this link, too!


1)Forecasting Sticker Sales, kaggle, January 1,2025



Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

How to Turn GPT-5 into a Pro Marketing Analyst with AI Agents!

A while back, I introduced a guide to prompting GPT-5, but it can be quite a challenge to write a perfect prompt from scratch. Not to worry! You can actually have GPT-5 write prompts for GPT-5. Pretty cool, right? Let's take a look at how.

 

1. Using GPT-5 to Do a Marketer's Job

I have some global sales data for stickers(1). Based on this data, I want to develop a sales strategy.

                 Global Sticker Sales Records

In a typical company, a data scientist would analyze the data, and a marketing manager would then create an action plan based on the results. We're going to see if we can get GPT-5 to handle this entire process. Of course, this requires a good prompt, but what kind of prompt is best? This is where it gets tricky. The principle I always adhere to is this: "Data analysis is a means, not an end." There are many data analysis methods, so the same data can be analyzed in various ways. However, what we really want is a sales strategy that boosts revenue. With this in mind, let's reconsider what makes a good prompt.

It's a bit of a puzzle, but I've managed to draft a preliminary version.

 

2. Using Metaprompting to Improve the Prompt with GPT-5

Now, let's have GPT-5 improve the prompt I quickly drafted. The image below shows the process. The first red box is my draft prompt.

                    Metaprompt

The second red box explicitly states the principle: "Perform data analysis with the goal of creating a Marketing strategy." When you provide the data and run this prompt, GPT-5 creates the improvement suggestions you see below, which are very detailed. I actually ran this process twice to get a better result.

                   Final Prompt

 

3. The Result: GPT-5 Generates MARKETING Strategy!

Running the final prompt took about a minute and produced the following output. The detailed analysis and resulting insights are directly connected to marketing actions, staying true to our initial principle. It's fantastic.

The output is concise and perfect for busy executives. Creating this content on my own would likely take an entire day, but with GPT-5, the whole process—including the time it took to draft the initial prompt by myself —takes only about 30 minutes. This really shows how powerful GPT-5 is.

 

What do you think? This time, we explored a method for getting GPT-5 to improve its own prompts. This technique is called Metaprompting, and it's described in the OpenAI GPT-5 Prompting Guide (2).

I encourage you to try Metaprompting starting today and take your AI agent to the next level. That's all for now! Stay tuned!

 



You can enjoy our video news ToshiStats-AI from this link, too!

 

Copyright © 2025 Toshifumi Kuga. All right reserved

1)Forecasting Sticker Sales, kaggle, January 1,2025

2) GPT-5 prompting_guide, OpenAI, August 7, 2025


Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.