Toshifumi Kuga

February 1, 2026

Genie 3, agentic coding, claude code

AGI in 2 Years or 5 Years? — Survival Strategies for 2030

Toshifumi Kuga

February 1, 2026

Genie 3, agentic coding, claude code

In January 2026, several interviews with CEOs of top AI labs were released. One particularly fascinating encounter was the face-to-face interview (1) between Anthropic CEO Dario Amodei and Google DeepMind CEO Demis Hassabis. I have summarized my thoughts on what their comments imply. I hope you find this insightful!

1. Will AGI Arrive Within 2 Years?

Dario seems to hold a more accelerated timeline for the realization of AGI. While prefixing his thoughts with "It is difficult to predict exactly when it will happen," he pointed to the reality within his own company: "There are already engineers at Anthropic who say they no longer write code themselves. In the next 6 to 12 months, AI might handle the majority of code development. I feel that loop is closing rapidly." He argued that AI development is hitting a flywheel effect, particularly noting that progress in coding and research is so remarkable that AI intelligence will surpass public expectations within a few short years.

A prime example is Claude Code, released by Anthropic last year. This revolutionary product is currently taking the software development world by storm. It is no exaggeration to say that the common refrain "I don’t code manually anymore" is a direct result of this tool. In fact, I recently used it to tackle a past Kaggle competition; I achieved an AUC of 0.79 with zero manual coding, which absolutely stunned me (3).

2. AGI is Still 5 Years Away

On the other hand, Demis maintains his characteristically cautious stance. He often remarks that there is a "50% chance of achieving AGI in five years." His reasoning is grounded in the current limitations of AI: "Today’s AI isn't yet consistently superior to humans across all fields. A model might show incredible performance in one area but make elementary mistakes in another. This inconsistency means we haven't reached AGI yet." He believes two or three more major breakthroughs are required, which explains his longer timeline compared to Dario.

Unlike Anthropic, which is heavily optimized for coding and language, Google is focusing on a broader spectrum. One such focus is World Models—simulations of the physical spaces we inhabit. In these models, physics like gravity are reproduced, allowing the AI to better understand the "real" world. Genie 3 (2) is their latest version in this category. While it has only been released in the US so far, I am eagerly anticipating its global rollout. The "breakthroughs" Demis mentions likely lie at the end of this developmental path.

3. Are We Prepared for AGI?

While their timelines differ, Dario and Demis agree on one fundamental point: AGI—which will surpass human capabilities in every field—is not far off. Exactly ten years ago, in March 2016, DeepMind’s AlphaGo defeated the world’s top Go professional. Since then, no human has been able to beat AI in the game of Go. Soon, we may reach a point where humans can no longer outperform AI in any field. What we are seeing in the world of coding today is the precursor to that shift.

It is a world that is difficult to visualize. Industrial structures will be upended, and the very role of "human work" will change. It is hard to say that we are currently prepared for this reality. In 2026, we must begin a serious global dialogue on how to adapt. I look forward to engaging in these discussions with people around the world.

I highly recommend watching the full interview with Dario and Demis. These two individuals hold the keys to our collective future. That’s all for today. Stay tuned!

1) The Day After AGI | World Economic Forum Annual Meeting 2026, World Economic Forum, Jan 21, 2026
2) Genie 3, Google DeepMind, Jan 29, 2026
3) Is agentic coding viable for Kaggle competitions?, January 16, 2026

You can enjoy our video news ToshiStats-AI from this link, too!

Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

January 16, 2026

Agentic AI, claude code, Opus 4.5, agentic coding

Is agentic coding viable for Kaggle competitions?

Toshifumi Kuga

January 16, 2026

Agentic AI, claude code, Opus 4.5, agentic coding

The "Agentic Coding" trend continues to accelerate as we enter 2026. In this post, I will challenge myself to see how high I can push accuracy by delegating the coding process to an AI agent, using data from the Kaggle competition Home Credit Default Risk [1]. Let's get started right away.

1. Combining Claude Code and Opus 4.5

I will be using Opus 4.5, a generative AI renowned for its coding capabilities. Additionally, I will use Claude Code as my coding assistant, as shown below. While I enter instructions into the prompt box, I do not write any Python code myself.

You can see the words "plan mode" at the bottom of the screen. In this mode, Claude Code formulates an implementation plan based on my instructions. I simply review it, and if everything looks good, I authorize the execution.

Let's look at the actual instructions I issued. It is quite long for a "prompt," spanning about two A4 pages. The beginning of the implementation instructions is shown below. I wrote it in great detail. I'd like you to pay special attention to the final instruction regarding the creation of 50 new features using ratio calculations.

Part of the Product Requirement Document

Below is a portion of the implementation plan formulated by the AI agent. It details the method for creating new features via ratio calculations. Although I only specified the quantity of features, the plan shows that it selected features likely to be relevant to loan defaults before calculating the ratios.

The AI agent utilized its own domain knowledge to make these selections; they were certainly not chosen at random. This demonstrates the high-level judgment capabilities unique to AI agents.

New feature creation plan by the AI Agent

Part of the new features actually created by the AI Agent

2. Achieving an AUC of 0.79

By adopting LightGBM as the machine learning library, using the newly created features, and performing hyperparameter tuning, I was able to achieve an AUC of 0.79063, as shown below.

Reaching this level without writing a single line of Python code myself marks this experiment as a success. The data used to build the machine learning model consisted of seven different CSV files. These had to be merged correctly, and the AI agent handled this task seamlessly. Truly impressive!

3. Will AI Agents Handle Future Machine Learning Model Development?

While the computation time depends on the number of features created, it generally took between 1 to 4 hours. I ran the process several times, and the calculation never stopped due to syntax errors. The AI agent likely corrected any errors itself before proceeding to the next calculation step.

Therefore, once the initial implementation plan is approved, the results are generated without any further human intervention. This could be revolutionary. You simply input what you want to achieve via a PRD (Product Requirement Document), the AI agent creates an implementation plan, and once you approve it, you just wait for the results. The potential for multiplying productivity several times over is certainly there.

How was it? I was personally astonished by the high potential of the "Claude Code and Opus 4.5" combination. With a little ingenuity, it seems capable of even more.

This story is just beginning. Opus 4.5 will likely be upgraded to Opus 5 within the year. I am already looking forward to seeing what AI agents will be capable of then.

That’s all for today. Stay tuned!

1) Home Credit Default Risk, kaggle

You can enjoy our video news ToshiStats-AI from this link, too!

Toshifumi Kuga

January 9, 2026

Opus 4.5, claude code, generative ai, ADK, Machine Learning, Agentic AI

"ClaudeCode + Opus 4.5" Arrives as the 2026 Game Changer !

Toshifumi Kuga

January 9, 2026

Opus 4.5, claude code, generative ai, ADK, Machine Learning, Agentic AI

2026 has officially begun! The AI community is already abuzz with talk of "agentic coding" using ClaudeCode + Opus 4.5. I decided to build an actual application myself to test the potential of this combination. Let’s dive in.

1. ClaudeCode + Opus 4.5

These are the coding assistant and frontier model from Anthropic, respectively, both renowned for their strength in coding tasks. I imagine many will use them integrated into an IDE like VS Code, as shown below. You can see the selected model is Opus 4.5. Also, notice the "plan mode" indicator at the bottom.

Here, a data scientist inputs a prompt detailing exactly what they want to develop. The system then enters "plan mode" and generates an implementation plan like the following. The actual output is quite long, but here is the summary:

The goal this time is to create an application that combines machine learning and Generative AI, as described above. Once you agree to this implementation plan, the actual coding begins.

2. Completion of the AI App with GUI

In this completed app, you can input customer data via the screen below to calculate the probability of default, which can then be used to assess loan eligibility.

The first customer shows low risk, so a loan appears feasible.

**‍　　　　　　　　　　　　　　　　　‍**Default Probability 2

For the second customer, as highlighted in the red frame, the payment status shows a 2-month delay. The probability of default skyrockets to 65.54%. This is a no-go for a loan.

3. Validating Model Accuracy on a Separate Screen

This screen displays the metrics for the constructed prediction model, allowing you to gauge its accuracy. While figures like AUC are bread and butter for experts, they might be a bit difficult for general business users to grasp.

To address this, I decided to include natural language explanations. By leveraging Generative AI, implementing multilingual support is relatively straightforward.

Switching the setting changes the text from English to Japanese. Of course, support for other languages could be added with further development.

While I used Opus 4.5 during the development phase, this application uses an open-source Generative AI model internally. This allows it to function completely disconnected from the internet—making it ideal even for enterprises with strict security requirements.

So, what are your thoughts?

An application with this rich feature set and a high-precision machine learning model was completed entirely with no-code. I didn't write a single line of code this time.

Opus 4.5 was truly impressive; the process never stalled due to syntax errors or similar issues. I can genuinely feel that the accuracy is on a completely different level compared to just six months ago. moving forward, it seems likely that "agentic coding" will become the standard starting point for creating new machine learning models and GenAI apps. It feels like PoC-level projects could now be knocked out in a matter of days.

I’m looking forward to building many more things. That’s all for today.

Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!

Toshifumi Kuga

January 2, 2026

AI agent, Machine Learning, claude code, Governance

What Awaits Us in 2026? Bold Predictions for AI Agents & Machine Learning

Toshifumi Kuga

January 2, 2026

AI agent, Machine Learning, claude code, Governance

Happy New Year!

As we finally step into 2026, I am sure many of you are keenly interested in how AI agents will develop this year. Therefore, I would like to make some bold predictions by raising three key points, while also considering their connection to machine learning. Let's get started.

1. A Dramatic Leap in Multimodal Performance

I believe the high precision of the image generation AI "Nano Banana Pro (1)," released by Google on November 20, 2025, likely stunned not just AI researchers but the general public as well. Its ability to thoroughly grasp the meaning of a prompt and faithfully reproduce it in an image is magnificent, possessing a capability that could be described as "Text-to-Infographics."

Furthermore, its multilingual capabilities have improved significantly, allowing it to perfectly generate Japanese neon signs like this: "明けましておめでとう 2026" (Happy New Year 2026)

This model is not a simple image generation AI; it is built on top of the Gemini 3 Pro frontier model with added image generation capabilities. That is why the AI can deeply understand the user's prompt and generate images that align with their intent. Google also possesses AI models like Genie 3(2) that perform simulations using video, leading the industry with multimodal models. We certainly cannot take our eyes off their movements in 2026.

2. The Explosive Popularity of "Agentic Coding"

Currently, coding by AI agents—"Agentic Coding"—has become a massive global movement. However, for complex code, it is not yet 100% perfect, and human review is still necessary. Additionally, humans still need to create the Product Requirement Document (PRD), which serves as the blueprint for implementation.

I have built several default prediction models used in the financial industry, and I always feel that development is more efficient when the human side first creates a precise PRD. By doing so, we can largely entrust the actual coding to the AI agent. This is an example of default prediction model.

However, the speed of evolution for frontier models is tremendous. In the latter half of 2026, we expect updates like Gemini 4, GPT-6, and Claude 5, and frankly, it is difficult to even imagine what capabilities AI agents will acquire as a result.

Alongside the progress of these models, the toolsets known as "code assistants" are also likely to significantly improve their capabilities. Tools like Claude Code, Gemini CLI, Cursor, and Codex have become indispensable for programmers today, but in 2026, these code assistants will likely play an active role in fields closer to business, such as machine learning and economic analysis.

At this point, calling them "code assistants" might be off the mark; a broader name like "Thinking Machine for Business" might be more appropriate. The day when those who don't know how to code can master these tools may be close at hand. It is very exciting.

3. AI Agents and Governance

As mentioned above, it is predicted that in 2026, AI agents will increasingly permeate large organizations such as corporations and governments. However, there is one thing we must be careful about here.

The behavior of AI agents changes probabilistically. This means that different outputs can be produced for the same input, which is vastly different from current systems. Furthermore, if an AI agent possesses the ability for Recursive Self-Improvement (updating and improving itself), it means the AI agent will change over time and in response to environmental changes. In 2026, we must begin discussions on governance: how do we structure organizational processes and achieve our goals using AI agents that possess characteristics unlike any previous system? This is a very difficult theme, but I believe it is unavoidable if humanity is to securely capture the benefits and gains from AI agents. I previously established corporate governance structures in the financial industry, and I hope to contribute even a little based on that experience.

What did you think? It looks like AI evolution will accelerate even further in 2026. I hope we can all enjoy it together. I look forward to another great year with you all.

You can enjoy our video news ToshiStats-AI from this link, too!

1) Introducing Nano Banana Pro, Google, Nov 20, 2025
2) Genie 3: A new frontier for world models, Jack Parker-Holder and Shlomi Fruchter, Google DeepMind, August 5, 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

December 18, 2025

Vibe Coding, claude code, AI agent, Machine Learning, Plan Mode, MetaPrompt

Improving ML Vibe Coding Accuracy: Hands-on with Claude Code's Plan Mode

Toshifumi Kuga

December 18, 2025

Vibe Coding, claude code, AI agent, Machine Learning, Plan Mode, MetaPrompt

2025 was a year where I actively incorporated "Vibe Coding" into machine learning. After repeated trials, I encountered situations where coding accuracy was inconsistent—sometimes good, sometimes bad.

Therefore, in this experiment, I decided to use Claude Code "Plan Mode" (1) to automatically generate an implementation plan via an AI agent before generating the actual code. Based on this plan, I will attempt to see if a machine learning model can be built stably using "Vibe Coding." Let's get started!

1. Generating an Implementation Plan with Claude Code "Plan Mode"

Once again, I would like to build a model that predicts in advance whether a customer will default (on a loan, etc.). I will use publicly available credit card default data (2). For the code assistant, I am using Claude Code, and for the IDE, the familiar VS Code.

To provide input to the Claude Code AI agent, I summarized the task and implementation points into a "Product Requirement Document (PRD)." This is the only document I created.

I input this PRD into Claude Code "Plan Mode" and instructed it to: "Create a plan to create predictive model under the folder of PD-20251217".

Within minutes, the following implementation plan was generated. Comparing it to the initial PRD, you can see how refined it is. Note that I am only showing half of the actual plan generated here—a truly detailed plan was created. I can only say that the ability of the AI agent to envision this far is amazing.

2. Beautifully Visualizing Prediction Accuracy

When this implementation plan is approved and executed, the prediction model is generated. Naturally, we are curious about the accuracy of the resulting model.

Here, it is visualized clearly according to the implementation plan. While these are familiar metrics for machine learning experts, all the important ones are covered and visualized in an easy-to-understand way, summarized as a single HTML file viewable in a browser.

The charts below are excerpts from that file. It includes ROC curves, SHAP values, and even hyperparameter tuning results. This time, the total implementation time was about 10 minutes. If it can be generated automatically to this extent in that amount of time, I’d rather leave it to the AI agent.

3. Meta-Prompting with Claude Code "Plan Mode"

A Meta-Prompt refers to a "prompt (instruction to AI) used to create and control prompts."

In this case, I called Claude Code "Plan Mode" and instructed it to "generate an implementation plan" based on my PRD. This is nothing other than executing a meta-prompt in "Plan Mode."

Thanks to the meta-prompt, I didn't have to write a detailed implementation plan myself; I only needed to review the output. It is efficient because I can review it before coding, and since that implementation plan can be viewed as a highly precise prompt, the accuracy of the actual coding is expected to improve.

To be honest, I don't have the confidence to write the entire implementation plan myself. I definitely want to leave it to the AI agent. It has truly become convenient!

How was it? Generating implementation plans with Claude Code "Plan Mode" seems applicable not only to machine learning but also to various other fields and tasks. I definitely intend to continue trying it out in the future. I encourage everyone to give it a challenge as well.

That’s all for today. Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!

1) How to use Plan Mode, Anthropic

2) Default of Credit Card Clients

Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.