Machine Learning

"ClaudeCode + Opus 4.5" Arrives as the 2026 Game Changer !

2026 has officially begun! The AI community is already abuzz with talk of "agentic coding" using ClaudeCode + Opus 4.5. I decided to build an actual application myself to test the potential of this combination. Let’s dive in.

 

1. ClaudeCode + Opus 4.5

These are the coding assistant and frontier model from Anthropic, respectively, both renowned for their strength in coding tasks. I imagine many will use them integrated into an IDE like VS Code, as shown below. You can see the selected model is Opus 4.5. Also, notice the "plan mode" indicator at the bottom.

                   ClaudeCode

Here, a data scientist inputs a prompt detailing exactly what they want to develop. The system then enters "plan mode" and generates an implementation plan like the following. The actual output is quite long, but here is the summary:

                   Implementation Plan

The goal this time is to create an application that combines machine learning and Generative AI, as described above. Once you agree to this implementation plan, the actual coding begins.

 

2. Completion of the AI App with GUI

In this completed app, you can input customer data via the screen below to calculate the probability of default, which can then be used to assess loan eligibility.

The first customer shows low risk, so a loan appears feasible.

                    Input Screen

                   Default Probability 1

‍                 ‍Default Probability 2

For the second customer, as highlighted in the red frame, the payment status shows a 2-month delay. The probability of default skyrockets to 65.54%. This is a no-go for a loan.

 

3. Validating Model Accuracy on a Separate Screen

This screen displays the metrics for the constructed prediction model, allowing you to gauge its accuracy. While figures like AUC are bread and butter for experts, they might be a bit difficult for general business users to grasp.

To address this, I decided to include natural language explanations. By leveraging Generative AI, implementing multilingual support is relatively straightforward.

Switching the setting changes the text from English to Japanese. Of course, support for other languages could be added with further development.

While I used Opus 4.5 during the development phase, this application uses an open-source Generative AI model internally. This allows it to function completely disconnected from the internet—making it ideal even for enterprises with strict security requirements.

 

So, what are your thoughts?

An application with this rich feature set and a high-precision machine learning model was completed entirely with no-code. I didn't write a single line of code this time.

Opus 4.5 was truly impressive; the process never stalled due to syntax errors or similar issues. I can genuinely feel that the accuracy is on a completely different level compared to just six months ago. moving forward, it seems likely that "agentic coding" will become the standard starting point for creating new machine learning models and GenAI apps. It feels like PoC-level projects could now be knocked out in a matter of days.

I’m looking forward to building many more things. That’s all for today.

Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

What Awaits Us in 2026? Bold Predictions for AI Agents & Machine Learning

Happy New Year!

As we finally step into 2026, I am sure many of you are keenly interested in how AI agents will develop this year. Therefore, I would like to make some bold predictions by raising three key points, while also considering their connection to machine learning. Let's get started.

 

1. A Dramatic Leap in Multimodal Performance

I believe the high precision of the image generation AI "Nano Banana Pro (1)," released by Google on November 20, 2025, likely stunned not just AI researchers but the general public as well. Its ability to thoroughly grasp the meaning of a prompt and faithfully reproduce it in an image is magnificent, possessing a capability that could be described as "Text-to-Infographics."

Furthermore, its multilingual capabilities have improved significantly, allowing it to perfectly generate Japanese neon signs like this: "明けましておめでとう 2026" (Happy New Year 2026)

"明けましておめでとう 2026" (Happy New Year 2026)

This model is not a simple image generation AI; it is built on top of the Gemini 3 Pro frontier model with added image generation capabilities. That is why the AI can deeply understand the user's prompt and generate images that align with their intent. Google also possesses AI models like Genie 3(2) that perform simulations using video, leading the industry with multimodal models. We certainly cannot take our eyes off their movements in 2026.

 

2. The Explosive Popularity of "Agentic Coding"

Currently, coding by AI agents—"Agentic Coding"—has become a massive global movement. However, for complex code, it is not yet 100% perfect, and human review is still necessary. Additionally, humans still need to create the Product Requirement Document (PRD), which serves as the blueprint for implementation.

I have built several default prediction models used in the financial industry, and I always feel that development is more efficient when the human side first creates a precise PRD. By doing so, we can largely entrust the actual coding to the AI agent. This is an example of default prediction model.

However, the speed of evolution for frontier models is tremendous. In the latter half of 2026, we expect updates like Gemini 4, GPT-6, and Claude 5, and frankly, it is difficult to even imagine what capabilities AI agents will acquire as a result.

Alongside the progress of these models, the toolsets known as "code assistants" are also likely to significantly improve their capabilities. Tools like Claude Code, Gemini CLI, Cursor, and Codex have become indispensable for programmers today, but in 2026, these code assistants will likely play an active role in fields closer to business, such as machine learning and economic analysis.

At this point, calling them "code assistants" might be off the mark; a broader name like "Thinking Machine for Business" might be more appropriate. The day when those who don't know how to code can master these tools may be close at hand. It is very exciting.

 

3. AI Agents and Governance

As mentioned above, it is predicted that in 2026, AI agents will increasingly permeate large organizations such as corporations and governments. However, there is one thing we must be careful about here.

The behavior of AI agents changes probabilistically. This means that different outputs can be produced for the same input, which is vastly different from current systems. Furthermore, if an AI agent possesses the ability for Recursive Self-Improvement (updating and improving itself), it means the AI agent will change over time and in response to environmental changes. In 2026, we must begin discussions on governance: how do we structure organizational processes and achieve our goals using AI agents that possess characteristics unlike any previous system? This is a very difficult theme, but I believe it is unavoidable if humanity is to securely capture the benefits and gains from AI agents. I previously established corporate governance structures in the financial industry, and I hope to contribute even a little based on that experience.

 

What did you think? It looks like AI evolution will accelerate even further in 2026. I hope we can all enjoy it together. I look forward to another great year with you all.

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Introducing Nano Banana Pro, Google, Nov 20, 2025
2) Genie 3: A new frontier for world models, Jack Parker-Holder and Shlomi Fruchter, Google DeepMind, August 5, 2025

Copyright © 2026 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Improving ML Vibe Coding Accuracy: Hands-on with Claude Code's Plan Mode

2025 was a year where I actively incorporated "Vibe Coding" into machine learning. After repeated trials, I encountered situations where coding accuracy was inconsistent—sometimes good, sometimes bad.

Therefore, in this experiment, I decided to use Claude Code "Plan Mode" (1) to automatically generate an implementation plan via an AI agent before generating the actual code. Based on this plan, I will attempt to see if a machine learning model can be built stably using "Vibe Coding." Let's get started!

 

1. Generating an Implementation Plan with Claude Code "Plan Mode"

Once again, I would like to build a model that predicts in advance whether a customer will default (on a loan, etc.). I will use publicly available credit card default data (2). For the code assistant, I am using Claude Code, and for the IDE, the familiar VS Code.

To provide input to the Claude Code AI agent, I summarized the task and implementation points into a "Product Requirement Document (PRD)." This is the only document I created.

I input this PRD into Claude Code "Plan Mode" and instructed it to: "Create a plan to create predictive model under the folder of PD-20251217".

Within minutes, the following implementation plan was generated. Comparing it to the initial PRD, you can see how refined it is. Note that I am only showing half of the actual plan generated here—a truly detailed plan was created. I can only say that the ability of the AI agent to envision this far is amazing.

 

2. Beautifully Visualizing Prediction Accuracy

When this implementation plan is approved and executed, the prediction model is generated. Naturally, we are curious about the accuracy of the resulting model.

Here, it is visualized clearly according to the implementation plan. While these are familiar metrics for machine learning experts, all the important ones are covered and visualized in an easy-to-understand way, summarized as a single HTML file viewable in a browser.

The charts below are excerpts from that file. It includes ROC curves, SHAP values, and even hyperparameter tuning results. This time, the total implementation time was about 10 minutes. If it can be generated automatically to this extent in that amount of time, I’d rather leave it to the AI agent.

 

3. Meta-Prompting with Claude Code "Plan Mode"

A Meta-Prompt refers to a "prompt (instruction to AI) used to create and control prompts."

In this case, I called Claude Code "Plan Mode" and instructed it to "generate an implementation plan" based on my PRD. This is nothing other than executing a meta-prompt in "Plan Mode."

Thanks to the meta-prompt, I didn't have to write a detailed implementation plan myself; I only needed to review the output. It is efficient because I can review it before coding, and since that implementation plan can be viewed as a highly precise prompt, the accuracy of the actual coding is expected to improve.

To be honest, I don't have the confidence to write the entire implementation plan myself. I definitely want to leave it to the AI agent. It has truly become convenient!

 

How was it? Generating implementation plans with Claude Code "Plan Mode" seems applicable not only to machine learning but also to various other fields and tasks. I definitely intend to continue trying it out in the future. I encourage everyone to give it a challenge as well.

That’s all for today. Stay tuned!




You can enjoy our video news ToshiStats-AI from this link, too!

1) How to use Plan Mode,  Anthropic

2) Default of Credit Card Clients








Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Can You "Vibe Code" Machine Learning? I Tried It and Built an App

2025 was the year the coding style known as "Vibe Coding" truly gained mainstream acceptance. So, for this post, I conducted an experiment to see just how far we could go in building a machine learning model using only AI agents via "Vibe Coding"—with almost zero human programming involved. Let's get started!

 
  1. The Importance of the "Product Requirement Document" for Task Description

This time, I wanted to build a model that predicts whether bank loan customers will default. I used the publicly available Credit Card Default dataset (1).

In Vibe Coding, we delegate the actual writing of the program to the AI agent, while the human shifts to a reviewer role. In practice, having a tool called a "Code Assistant" is very convenient. For this experiment, I used Google's Gemini CLI. For the IDE, I used the familiar VS Code.

Gemini CLI

To entrust the coding to an AI agent, you must teach it exactly what you want it to do. While it is common to enter instructions as prompts in a chatbot, in Vibe Coding, we want to use the same prompts repeatedly, so we often input them as Markdown files.

It is best to use what is called a "Product Requirement Document (PRD)" for this content. You summarize the goals you want the product to achieve, the libraries you want to use, etc. The PRD I created this time is as follows:

PRD

By referencing this PRD and entering a prompt to create a default prediction model, the model was built in just a few minutes. The evaluation metric, AUC, was also excellent, ranging between 0.74 and 0.75. Amazing!!

 

2. Describing the Folder Structure with PROJECT_SUMMARY

It is wonderful that the machine learning model was created, but if left as is, we won't know which files are where, and handing it over to a third party becomes difficult.

Therefore, if you input the prompt: "Analyze the current directory structure and create a concise summary that includes: 1. A tree view of all files 2. Brief description of what each file does 3. Key dependencies and their purposes 4. Overall architecture pattern Save this as PROJECT_SUMMARY.md", it will create a Markdown file like the one below for you.

PROJECT_SUMMARY.md

With this, anyone can understand the folder structure at any time, and it is also convenient when adding further functional extensions later. I highly recommend creating a PROJECT_SUMMARY.md.

 

3. Adding a UI and Turning the ML Model into an App

Since we built such a good model, we want people to use it. So, I experimented to see if I could build an app using Vibe Coding as well.

I created PRD-pdapp.md and asked the AI agent to build the app. I instructed it to save the model file and to use Streamlit for app development. The actual file and its translation are below:

PRD-pdapp.md

When executed, the following app was created. It looks cool, doesn't it?

You can input customer data using the boxes and sliders on the left, and when you click the red button, the probability of default is calculated.

  • Customer 1: Default probability is 7.65%, making them a low-risk customer.

  • Customer 2: Default probability is 69.15%, which is high, so I don't think we can offer them a loan. The PAY_0 Status is "2", meaning their most recent payment status is 2 months overdue. This is the biggest factor driving up the default probability.

As you can see, having a UI is incredibly convenient because you can check the model's behavior by changing the input data. I was able to create an app like this using Vibe Coding. Wonderful.

 

How was it? It was indeed possible to perform machine learning using Vibe Coding. However, instead of programming code, you need to create precise PRDs. I believe this will become a new and crucial skill. I encourage you all to give it a try.

That’s all for today. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Default of Credit Card Clients

 



Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

The Secret to High-Accuracy AI: An Exploration of Machine Learning engineering agent

In a previous post, I explained Google's research paper, "MLE STAR" (1), and uncovered the mechanism by which an AI can build its own high-accuracy machine learning models. This time, I'm going to implement that AI agent using the Google ADK and experiment to see if it can truly achieve high accuracy. For reference, the MLE STAR code is available as open source (2).

 

1. The Information I Provided

With MLE STAR, humans only need to handle the data input and task definition. The data I used for this experiment comes from the Kaggle competition "Home Credit Default Risk" (3). While the original data consists of 8 files, I combined them into a single file for this experiment. I reduced the training data to 10% of the original, resulting in about 30,000 samples, and kept the original test data of 48,700 samples.

The task was set as follows: "A classification task to predict default." Note that to speed up the experiment, the number of iterative loops was set to a minimum.

                     Task Setup

 

2. Deciding Which Model to Use

MLE STAR uses a web search to select the optimal model for the given task. In this case, it ultimately chose LightGBM. To finish the experiment quickly, I configured it to select only one model. If I had set it to select two, it likely would have also chosen something like XGBoost. Both are models frequently used in data science competitions.

                Model Selection by MLE STAR

It generated the initial script below. As a frequent user of LightGBM, the code looks familiar, but the ability to generate it in an instant is something only an AI can do. It's amazing!

 

3. Identifying Key Code Blocks with "Ablation Studies"

Next, it uses ablation studies to identify which code blocks should be improved. In this case, ablation2 showed that removing Early Stopping worsened the model's performance, so this feature was kept in the training process from then on.

               Ablation Studies Results by MLE STAR

 

4. Iteratively Improving the Model

Based on the ablation studies, MLE STAR decided to improve the model using the following two techniques: K-fold target encoding and binary encoding. These techniques themselves are common in machine learning and are not particularly unusual.

                   K-fold Target Encoding

                     Binary Encoding

This ability to "use ablation studies to identify which code blocks to improve" is likely a major reason for MLE STAR's high accuracy. I look forward to seeing how this functionality evolves in the future.

 

5. The Results Are In. Unfortunately, I Lost.

For its final step, MLE STAR ensembles the models to create the final version. For more details, please see the research paper. It also generates a CSV file with the default predictions, which I slightly modified and promptly submitted to Kaggle. This task is evaluated using AUC, where a score closer to 1 indicates higher accuracy.

The top score is the result I achieved using my own LightGBM model. The score in the red box at the bottom is the one automatically generated by MLE STAR. With a difference of more than 0.01 on both the Public and Private scores, it was my complete defeat.

             Kaggle Prediction Accuracy Evaluation (AUC)

Improving the AUC by 0.01 is quite a challenge, which gives a glimpse into how excellent MLE STAR is. I didn't perform any extensive tuning on my LightGBM model, so I believe my score would have improved if I had spent time tuning it manually. However, MLE STAR produced its result in about 7 minutes from the start of the computation, so from an efficiency standpoint, I couldn't compete.

 
 

So, what did you think? Although this was a limited experiment, I feel I was able to grasp the high potential of MLE STAR. I was truly impressed by the power of its Recursive Self-Improvement, which identifies specific code blocks and improves upon them autonomously.

Here at Toshi Stats, I plan to continue digging into MLE STAR. Stay tuned!





You can enjoy our video news ToshiStats-AI from this link, too!




1) MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
Jaehyun Nam1 2 *, Jinsung Yoon1, Jiefeng Chen1, Jinwoo Shin2, Sercan Ö. Arık1 and Tomas Pfister1, Google Cloud1, KAIST2,  23, Aug 2025

2) Machine Learning Engineering with Multiple Agents (MLE-STAR) , Google

3) Home Credit Default Risk, kaggle



Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Is an AI Machine Learning Assistant Finally a Reality? I Looked Into It, and It's Incredible!

I often build machine learning models for my job. The process of collecting data, creating features, and gradually improving the model's accuracy takes time, specialized knowledge, and programming skills in various libraries. I've always found it to be quite a challenge. That's why I've been hoping for an AI that could skillfully assist with this work, and recently, a potential candidate has emerged. I'd like to take a deep dive into it right away.

 
  1. A Basic Three-Layer Structure

This AI assistant is called MLE-STAR, and according to a research paper (1), it has the following structure. Simply put, it first searches the internet for promising libraries. Next, after writing code using those libraries, it identifies which parts, called "code blocks," should be improved further. Finally, it decides how to improve those code blocks. Let's explore each of these steps in detail.

 

2. Selecting the Optimal Library with a Search Function

To create a high-accuracy machine learning model, you first need to decide "what kind of model to use." This means you have to select a library to implement the model. This is where the search function comes in. For example, in a finance task to calculate default probability, many methods are possible, but gradient boosting is often used in competitions like Kaggle. I also use gradient boosting in most cases. It seems MLE-STAR can use its search function to find the optimal library on its own, even without me specifying "use gradient boosting." That's amazing! This would eliminate the need for humans to research everything, leading to greater efficiency.

 

3. Finding Where to Improve the Code and Steadily Making Progress

Once the library is chosen and a baseline script is written, it's time to start making improvements to increase accuracy. But it's often difficult to know where to begin. MLE-STAR employs an ablation study to understand how accuracy changes when a feature is added or removed, thereby identifying the most impactful code block. This part of the process typically relies on human experience and intuition, involving a lot of trial and error. By using MLE-STAR, we can make data-driven decisions, which is incredibly efficient.

 

4. Iterating Until Accuracy Actually Improves

Once the code block for improvement is identified, the system gradually changes parameters and confirms the accuracy improvements. This is also done automatically within a loop, without requiring human intervention. The accuracy is calculated at each step, and as a rule, only changes that improve performance are adopted, ensuring that the model's accuracy steadily increases. Incredible, isn't it? In fact, a graph comparing the performance of MLE-STAR with past AI assistants shows that MLE-STAR won a "gold medal" in approximately 36% of the tasks, highlighting its superior performance.

 

So, what did you think? This new framework for an AI assistant looks extremely promising. In particular, its ability to identify which code blocks to improve and then actually increase the accuracy is likely to become even more powerful as the performance of foundation models continues to advance. I'm truly excited about future developments.

Next time, I plan to apply it to some actual analysis data to see what kind of accuracy it can achieve. Stay tuned!




You can enjoy our video news ToshiStats-AI from this link, too!



1) MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
Jaehyun Nam1 2 *, Jinsung Yoon1, Jiefeng Chen1, Jinwoo Shin2, Sercan Ö. Arık1 and Tomas Pfister1, Google Cloud1, KAIST2,  23, Aug 2025



Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.