Toshifumi Kuga

October 19, 2024

AlphaGo, AI, Reinforcement Learning, OpenAI, o1

Paying homage to AlphaGo, we've launched our own AI Go project at ToshiStats!

Toshifumi Kuga

October 19, 2024

AlphaGo, AI, Reinforcement Learning, OpenAI, o1

Reinforcement learning has become a hot topic since the release of OpenAI's o1-preview. Looking back, it was Google DeepMind's AlphaGo, released in March 2016, that truly brought reinforcement learning into the public eye. Go, with its vast search space, was traditionally a formidable challenge for computers. Amateur high-dan levels were roughly the limit at the time. However, AlphaGo, combining reinforcement learning and Monte Carlo Tree Search (MCTS), exceeded expert expectations, becoming the first AI Go player to defeat a top professional. Inspired by this, we've launched our own AI Go project, "ToshiStats-Go project," to research reinforcement learning. We're excited to see what we can achieve.

1. Creating a Go Game Environment

We've decided to build our own Go game environment from scratch. Given the exceptional coding capabilities of o1-preview, we're using it as a coding assistant for this project. We're iteratively developing the code by requesting o1-preview to generate the Go game environment code, executing it in Google Colab, then requesting further refinements based on the results, and repeating the process. Within a few iterations, we were able to establish a basic framework and a functional environment. While we can't perfectly implement a complex game like Go, we've created something akin to "simple-go." This should be sufficient for implementing reinforcement learning and improving its accuracy. Below is an example of o1-preview's explanation of a code modification. As you can see, it's quite detailed.

o1-preview's explanation of code modification

2. Trying a Game of Go

Let's give it a try! The current AI model plays random moves, so it's not very strong. As shown in the example below, a human can win with careful play. While a 9x9 board is available, the calculations can be time-consuming, so we'll stick with a 5x5 board for now. It's enjoyable enough, and if you'd like to try it yourself, please download the Colab notebook from our Github repository (1). A GPU is not required.

3. Perfect Go Rules Are Difficult

Go has some very complex rules. In particular, determining the life and death of stones, especially in the endgame, proved challenging. Implementing "ko" and "seki" also seems difficult. Connecting to an external Go system might solve these issues, but for now, we'll continue with a lightweight environment that completes calculations within the notebook to facilitate reinforcement learning experimentation. We'll strive to make this series engaging and easy to follow, comparing our progress with simpler games like Gomoku or connect five. We appreciate your continued interest.

So, there you have it! We've successfully implemented a Go playing environment in Colab. From here, we'll dive into reinforcement learning and begin training our AI Go player. Stay tuned!

1) ToshiStatsGo-project

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

July 14, 2024

AI, artificial intelligence, generative ai, in context learning, prompt

Google DeepMind's new prompt engineering technique, "Many-Shot In-Context Learning," is amazing!

Toshifumi Kuga

July 14, 2024

AI, artificial intelligence, generative ai, in context learning, prompt

I recently came across an interesting research paper, "Many-Shot In-Context Learning" (1), by Google DeepMind, and I'd like to share a brief overview. Although it's a highly technical paper, it offers valuable insights that we can apply to our own prompt writing. Let's dive in.

1. Utilizing Context Effectively

When you write prompts for language models or generative AI like ChatGPT, you probably input the information you want, like a search engine, such as "What is the capital of Japan?" However, generative AI can handle much larger amounts of information. For example, as shown in the chart below, you can load a PDF document and then write a prompt like "Summarize this," and the AI will output a summary of the PDF's content. Think of a prompt as an "instruction to the generative AI." The additional information you provide is called the context.

2. What's Needed to Use Generative AI in a Business Setting

Now that we have a basic understanding of how to use generative AI, let's consider what's needed to use it in a company or business setting. Obviously, when you represent your company and interact with customers, you wouldn't express "personal opinions or feelings." You wouldn't say, "I personally don't think this new product will sell." Specifically, companies have established rules and manuals that employees must follow. Normally, employees cannot violate these rules. Therefore, to use generative AI in a company, it must output answers that comply with each company's "rules and manuals," not just general answers. So, how do you convey these rules to the generative AI? One way is to input the "rules and manuals" directly into the generative AI along with the prompt, as shown in the chart above. Many recent generative AIs have "context windows" of 100,000 tokens or more. This represents the amount of information that can be input and output at once, and 100,000 tokens is about 70,000 words in English. You can input a considerable amount of "rules and manuals." Some models, like Google's Gemini 1.5 Pro, can input up to 2 million tokens, which is enough for about 3,000 pages of English manuals. That's amazing. These context windows are sometimes called "long context windows."

3. Many-Shot In-Context Learning

"Many-Shot In-Context Learning" is a technique that utilizes these "long context windows" even more effectively. You may have heard of a similar term, "Few-Shot Learning." "Few-Shot Learning" is a method where you first provide the generative AI with a few "question and answer pairs" as examples and then ask the question you want to know. For instance, you might give examples like "The capital of the United States is Washington, D.C." and "The capital of China is Beijing," and then ask the AI, "What is the capital of Japan?" "Many-Shot In-Context Learning" increases the number of these "question and answer pairs" to 10-10,000. This is said to improve accuracy. The graph below shows that in machine translation and summarization tasks, increasing the number of examples to 500-1,000 improves accuracy. 2 to the power of 10 is 1024. The idea is to put as many examples as possible into the "long context window" since it can easily handle them.

The relationship between accuracy and the number of examples in machine translation and summarization.

What do you think? If simply increasing the number of examples improves accuracy, it might be worth trying. For those who say, "I can't create so many examples myself," "Many-Shot In-Context Learning" also suggests a method to create synthetic data using an LLM (language model). If you're interested, please check out the paper. But if it's just about 10 examples, you could probably create them yourself. I'll give it a try and update here if I get good results. That's all for today. Stay tuned!

1) "Many-Shot In-Context Learning", Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle, Google DeepMind, 22 May 2024, https://arxiv.org/abs/2404.11018

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

May 20, 2024

AI, artificial intelligence, fine-tuning, generative ai, Llama3

Llama3-8B has shown impressive performance even when fine-tuned on Japanese data. Its high base performance likely plays a significant role in this.

Toshifumi Kuga

May 20, 2024

AI, artificial intelligence, fine-tuning, generative ai, Llama3

In the previous post, we introduced the high performance of Llama3-70B. However, Llama3 also has a smaller 8B model, and I've been wanting to fine-tune it to fit my own tasks. Since it's small, it's cost-effective and fast, so if you have a clear task in mind, this 8B model will surely be an option. Therefore, this time, we will fine-tune the Llama3-8B model for the task of classifying the published Livedoor-news Japanese articles (3) into several genres, and check its accuracy. Let's get started!

Creating an Alpaca-style dataset

Livedoor-news Japanese articles are divided into the following 9 genres. The distribution of each genre is shown in the following chart.

'kaden-channel',
'livedoor-homme',
'topic-news',
'sports-watch',
'peachy',
'dokujo-tsushin',
'it-life-hack',
'movie-enter',
'smax'

Distribution and sample size of each genre

This time, we will randomly extract 1000 samples for both training and validation data, and actually classify each article into the above 9 genres to verify whether high accuracy can be achieved. We have adopted Alpaca as the data format. As shown below, it consists of instruction, input, and output. Here, the instruction is common to all samples.

2. Fine-tuning using Hugging face TRL + "unsloth"

This time, we used Hugging face's TRL (1), a library for fine-tuning LLMs, along with "unsloth", a library for accelerating training, to efficiently perform fine-tuning. The development environment was Google Colab, and we prepared a paid L4 (GPU) instance. The training time was about 100 minutes for 4 epochs. L4 has 22.5GB of GPU-RAM, which is large enough for this training. Also, "unsloth" prepares a 4-bit quantized model for fine-tuning, so you can download and use it directly from Hugging Face Hub, which is convenient. This training process was based on the "unsloth" notebook (2). If you are interested in speeding up training, please check it out.

3. Verify model accuracy

At first, I simply asked, "The skill to score a penalty kick from this impossible angle is amazing." The answer was "sports-watch". It's a soccer/football story, so I think it's a reasonable answer.

Next, I asked, "Which is better, iPhone or Android?" The answer was "it-life-hack". This is also a good answer.

It's hard to type in one by one, and the actual articles are longer and more complex. This time, I prepared 1000 validation data samples and tried it. The result was a very good accuracy of 94.5%. Since the input is Japanese, I thought Llama3 would struggle, but I was surprised that it easily exceeded 90%. It must be the effect of pre-training with a huge corpus of 15 trillion tokens. Even the 8B model seems to be practical in Japanese if fine-tuned.

How was it? Even though Llama3-8B is small, it has high potential and seems to be active in various places. Fine-tuning is required for each task, but "unsloth" can help speed it up. If you want to shorten the training time, please try it. This time, we were able to obtain sufficient accuracy in about 2 hours even with a general-purpose single GPU. It's a reliable ally for small startups like us! If you want to try it by yourself, you can use my notebook here.

We will update you as we gain new insights. Stay tuned!

(1) TRL - Transformer Reinforcement Learning https://huggingface.co/docs/trl/en/index

(2) Alpaca + Llama-3 8b full example.ipynb https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#scrollTo=iHjt_SMYsd3P

(3) Livedoor-news Japanese articles https://www.rondhuit.com/download.html

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

May 5, 2024

AI, generative ai, LLM, Meta, Llama3

Llama3: Exceeding Expectations and Expanding Horizons!

Toshifumi Kuga

May 5, 2024

AI, generative ai, LLM, Meta, Llama3

The release of the new LLM "Llama3" by Meta on April 18th has created quite a stir in the AI community (1). As a highly anticipated open-source model with performance expectations on par with GPT-4, its potential applications seem limitless.

Examining the performance on leaderboards (2), Llama3 is available in two sizes: 70B and 8B parameters. The larger 70B model, in particular, demonstrates capabilities that rival proprietary models such as GPT-4, Claude3-Opus, and Gemini 1.5 Pro.

To assess Llama3's performance, a test was conducted using a bank customer complaint classification task. The objective was to evaluate the model's accuracy in categorizing complaints without any fine-tuning.

1.To what extent can we discriminate between six categories of customer complaints without training?

The dataset consisted of customer complaints from a US bank, categorized into six product areas:

Mortgage
Checking or savings account
Student loan
Money transfer, virtual currency, or money service
Bank account or service
Consumer loan

Examples of these complaints, all in English, were provided.

A random sample of 500 complaints was used with a prompt instructing Llama3-70B to assign a product category to each complaint. The results were astounding, achieving an accuracy rate of 88.6%. This near 90% accuracy was unprecedented and speaks volumes about Llama3's potential.

2. Maintaining Accuracy with Japanese Data?

Considering the potential use of Llama3 in Japan, the English dataset was translated into Japanese using Google Translate. The classification task was then repeated with the translated data.

Despite Llama3's training data being predominantly English (around 95%), the model maintained an impressive accuracy rate of 82.8% with the Japanese data. This suggests that Llama3's capabilities extend beyond English and hold promise for multilingual applications.

3. Conclusion and Future Prospects

Llama3 has proven to be a top-tier performer, despite being open-source. This achievement deserves appreciation for Meta's contribution to the AI community. Hopefully, other companies like Google will follow suit and release their own open-source models more.

Further experiments are planned to evaluate the accuracy and computational speed of the smaller 8B model. Stay tuned for the results!

1) meta website https://llama.meta.com/llama3/
2) LMSYS Chatbot Arena Leaderboard　https://chat.lmsys.org/?leaderboard
3) https://github.com/TOSHISTATS/Classification-of-Consumer-Complaints-by-Llama3/tree/main

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

March 25, 2024

agent, AI, artificial intelligence, Claude3, generative ai

I tried the new generative AI model "Claude3 Haiku". Fast, smart, and low-priced. I want to use it as an AI agent!

Toshifumi Kuga

March 25, 2024

agent, AI, artificial intelligence, Claude3, generative ai

On March 14th, "Claude3 Haiku" (1), the lightest model among the Claude3 generative AIs, was released and became available for use in web applications and APIs. I'm usually drawn to the highest-performing models, but this time I'd like to focus on the lightest one. Recently, algorithms that execute repetitive calculations like AI Agents have become more common. I want to use high-end models like GPT4, but they are very costly to run. So I was looking for a low-cost, high-performance model, and "Claude3 Haiku" is perfect as it costs 1/60th of the high-end model "Claude3 Opus" while still delivering excellent performance. I'd like to try it out here right away. The details of each model are as follows.

1. First, let's test the text

I checked if "Claude3 Haiku" knows about Hiroshima-style okonomiyaki, a hyper-local Japanese food. I used to live in Hiroshima, so I know it well, and I think this answer is generally good. The Japanese is clean, so it passes for now.

Next, I asked about transportation from Tokyo to Osaka. Unfortunately, there was one clear mistake. The travel time by bus is stated as "about 4 hours and 30 minutes," but in reality, it takes around 8 hours. This is a hallucination.

Then I asked about the "Five Forces," a framework for analyzing market competitiveness. It analyzed the automotive industry, and the analysis incorporates the latest examples, such as the threat of electric vehicles as substitutes, making it a sufficient quality starting point for discussion. However, the fact that it's not in a table format is a drawback.

2. Next, let's analyze images.

First, I asked about the number of smartphones, but unfortunately, it got it wrong. It may not be good at counting.

This is a photo of the Atomic Bomb Dome in Hiroshima. It answered this perfectly. It seems to understand famous Japanese buildings.

This is a photo of a streetcar running in Hiroshima City. I think it captures it pretty well overall. However, the streetcars don't run solely for tourists, so the explanation may be somewhat incomplete.

This is a flight information board at Haneda Airport. It perfectly understands the detailed information. Excellent.

Counting the number of cars in a parking lot is a difficult task for generative AI. This time it answered 60 cars, but there are actually 48. If the accuracy improves a bit more, it will reach a practical level, which is a bit disappointing.

3. Impressions of using "Claude3 Haiku".

Honestly, the performance was unbelievable for a general-use AI. The Japanese is natural and clean. The fact that it can incorporate and analyze images in the first place is groundbreaking. Multimodality has arrived in general-use AI. The calculation speed is also fast, and I think it will be applied to applications that require real-time responses. And the cost is low. This allows for plenty of interesting experiments. It's a savior for startups with tight cost constraints! I want to continue doing interesting experiments using "Claude3 Haiku". Stay tuned!

(1) Claude 3 Haiku: our fastest model yet 2024.3.14 Anthropic

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

February 19, 2024

AI, Google DeepMind, LLM, generative ai, gemini pro

The Evolution of AI Accelerates: A Deep Dive into Google's "Gemini 1.5 Pro"

Toshifumi Kuga

February 19, 2024

AI, Google DeepMind, LLM, generative ai, gemini pro

The pace of AI advancement is truly remarkable, and this year is no exception. Google has unveiled a new generative AI called "Gemini 1.5 Pro," which boasts a groundbreaking Mixture-of-Experts (MoE) architecture. Currently only available to a limited number of users, with broader testing to come, this technology presents intriguing breakthroughs that warrant a closer look.

1. Unprecedented Context Window of 1 Million Tokens

Gemini 1.5 Pro boasts a context window that is unfathomable by existing LLMs, capable of processing up to 1 million tokens. Research has even demonstrated data ingestion of up to 10 million tokens. This represents a revolutionary breakthrough, considering that GPT-4's context window is limited to 128,000 tokens (1).

Comparison of Context Windows for Different LLMs

With such an extensive context window, Gemini 1.5 Pro can ingest an entire book at once. Currently, when creating RAG systems and referencing internal documents, chunking is necessary to accommodate the LLM's context window. However, with Gemini 1.5 Pro, this requirement is minimized, simplifying RAG development and operation. Furthermore, the model maintains high accuracy, even with such a large context window, achieving over 99% accuracy in information retrieval tests (see chart below).

2. Remarkable In-Context Learning Capabilities

The ability to process vast amounts of data is not the only noteworthy aspect of Gemini 1.5 Pro. It also excels at understanding and applying this information to various tasks. This is evident in its in-context learning capabilities, showcased in a Kalamang language translation task. The model was trained using a Kalamang grammar book and dictionary, enabling it to translate between English and Kalamang.

Gemini 1.5 Pro outperformed other models, achieving scores that rival those of human learners. This is an astonishing feat.

3. Towards Individualized Agents with Gemini 1.5 Pro

If a model can acquire translation capabilities simply by reading a grammar book, it stands to reason that it can also learn from knowledge systems in other domains and apply that knowledge to various tasks. In other words, Gemini 1.5 Pro has the potential to develop its own "frame of reference" that influences its understanding and values. The ability to incorporate a vast amount of data into its context through its extensive context window has significant implications in this regard. This is because it allows Gemini 1.5 Pro to potentially become an individualized agent with diverse perspectives in the future. The Kalamang translation experiment provides promising evidence of this potential.

Gemini 1.5 Pro is a remarkable advancement in AI technology, offering unprecedented capabilities in terms of context window size and in-context learning. "A host of improvements made across nearly the entire model stack (architecture, data, optimization and systems) allows Gemini 1.5 Pro to achieve comparable quality to Gemini 1.0 Ultra , while using significantly less training compute and being significantly more efficient to serve" according to the report(1). This is truly a testament to the rapid progress being made in the field of AI.

I am eager to experiment with Gemini 1.5 Pro once it becomes publicly available. Stay tuned for future updates!

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, Gemini Team, Google

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

December 31, 2023

synthetic data, AI, fine-tuning, LLM, Google DeepMind, promt

"REST MEETS REACT" is a new prompt-engineering method using synthetic data. It holds immense potential for enhancing AI without relying on human-generated data

Toshifumi Kuga

December 31, 2023

synthetic data, AI, fine-tuning, LLM, Google DeepMind, promt

Happy New Year! Thank you for your continued support. Promptly, Google DeepMind has announced a new, advanced prompt engineering method suitable for the new year. It is a paper titled "REST MEETS REACT: SELF-IMPROVEMENT FOR MULTI-STEP REASONING LLM AGENT"(1). It incorporates fine-tuning with synthetic data, which looks promising! Let's get started.

1.Prompt Structure

This prompt is designed with a web Q&A system in mind that answers complex questions. The structure is as follows:

The blue part in the figure above represents the flow of the agent described in the prompt, aiming to answer complex questions using web search. In the latter half, "Relevance self-check" and "Grounding self-check" are functions for the agent to check its own answers. It's a self-check function. For a detailed explanation of the entire flow, please refer to the paper.

2. "Reward Model" - The Key to Success

Now, let's explain the core part of self-improvement. In a nutshell, it's about "creating new high-quality data and fine-tuning the model with it." . This function consists of three parts:

Grow: Start with a model capable of running Search Agent, using Google PaLM 2-L model for this purpose. Trajectories are collected based on a selected set of 2000 public questions. Trajectory, though an unfamiliar term, refers to the reasoning process and is commonly used in reinforcement learning.
Improve: Convert trajectories into data for fine-tuning, using the Reward model to select only high-quality data. No external data, like labels, are used.
Fine-tuning: Fine-tune a new model of the same size with this new data, ensuring it performs better than the original.

This process is repeated with the better model using the new data. As a result, accuracy improves while maintaining the original data, without adding external data. Therefore, the accuracy of the Reward model in ranking is crucial. The Reward model is constructed as a set of prompts in this paper. Let's look more closely at these prompts, showing only the initial part.

The goal of this rating is to filter out bad actions so that they'll be excluded from the fine-tuning dataset.
Overall, we want the agent to produce relevant and grounded answers with minimal steps. Anything deviating from this goal is considered bad.
If any element (thoughts, comments, etc.) is empty, then it's automatically bad.

"Filter out" indicates a method of discarding items that don't meet the standards and adopting only the high-quality data that remains. Please see the paper (p19) for details.

3.Improve Accuracy with Synthetic Data

Papers including this one have been published in late 2023, focusing on using the Reward model to create high-quality synthetic data for model fine-tuning and accuracy improvement. Vigorous research is expected to continue in 2024, yielding various results. Especially in the LLM field, collecting high-quality training data is becoming increasingly difficult, and fine-tuning with synthetic data is anticipated as a solution.

How was it? The improvement in model accuracy with synthetic data is expected to be a very effective development method for startups like us, who cannot collect vast amounts of data independently. Our blog will continue to follow these synthetic data and other technological innovations, so stay tuned. Wishing you a great year!

1) “REST MEETS REACT: SELF-IMPROVEMENT FOR MULTI-STEP REASONING LLM AGENT" Renat Aksitov†1 , Sobhan Miryoosefi†1 , Zonglin Li†1 , Daliang Li†1 , Sheila Babayan†2 , Kavya Kopparapu†2 , Zachary Fisher1 , Ruiqi Guo1 , Sushant Prakash1 , Pranesh Srinivasan3 , Manzil Zaheer2 , Felix Yu1 , and Sanjiv Kumar1, 1Google Research, 2Google DeepMind, 3Google †Core contributors, 15 Dec 2023, https://arxiv.org/abs/2312.10003

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

November 14, 2023

DevDay, OpenAI, LLM, AI, ChatGPT

The era of "agent-style applications" has arrived, earlier than expected and seems to be accelerating even further

Toshifumi Kuga

November 14, 2023

DevDay, OpenAI, LLM, AI, ChatGPT

On November 6, the OpenAI DevDay was held, marking its first annual developer's conference. The technological developments since the debut of GPT-4 in March 2023 were introduced at once. There's too much to cover comprehensively, so I'll leave that to OpenAI CEO Sam Altman, but here I want to raise three key points I've considered and explore them further.

Price is Key

The anticipated price reduction has been realized. GPT-4 is roughly about 65% off. Of course, the reduction varies depending on usage. I've already tried the new GPT-4 Turbo for half a day, and it cost about $5, which would have definitely exceeded $10 before. This makes it more viable for Proof of Concept (PoC) use. It seems the time has come to utilize GPT-4's still unseen potential in various areas. A wallet-friendly approach is a welcome change for everyone.

2. Building AI Apps Without Being a Programmer

At this developer's conference, I noticed many features that operate with no-code. GPTs, which allow creation of customized ChatGPT in a dialogue format, is a prime example. The developer-oriented Assistants API also doesn't require coding if used with the Playground. With the code interpreter tool already implemented, writing prompts to invoke and execute it automates the rest. This is impressive.

I implemented a model to calculate default probabilities using a step-by-step prompt, from 1 to 5, with the code-interpreter turned on, without writing any specific code. When executed, the model was successfully created, and it performed tasks like calculating AUC and generating histograms as instructed.

3. Easy Construction of "Agent-Style Applications"

Listening to OpenAI CEO Sam Altman's presentation, I felt a strong emphasis on agents. The Playground Tool includes function calling, which seems to make it much easier to create agents that determine their next actions based on situations. While open-source implementations of agents have been increasing, I didn't expect them to be implemented this quickly on the OpenAI platform. Paired with GPTs, the year of 2024 feels like it could be the first year of "agent-style applications." This is truly exciting.

How about these new services? Following the announcements at DevDay, developers worldwide seem to be thinking about various AI applications. I'm also eager to start creating an agent-style application. Stay tuned!

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

November 2, 2023

AI, Deep Learning, GPT4, LLM, LangChain, Step-Back Prompting

New Prompt Engineering Method from Google DeepMind Surpassing CoT in Accuracy !

Toshifumi Kuga

November 2, 2023

AI, Deep Learning, GPT4, LLM, LangChain, Step-Back Prompting

Hello everyone, how have you been? There are only two months left in this year. It has truly been a year of incredible AI advancements, and it doesn't seem to be slowing down. Recently, Google DeepMind announced a new prompt-engineering method called "Step-Back Prompting (1)." Let's dive into the details right away.

Step-Back Prompting:

Coming from DeepMind, one might initially think it's a complicated method, but the concept turned out to be quite simple. Instead of directly answering the question input by the user, the process involves:

Creating a more generalized and essential question (Stepback Question)
Answering the generated question (Stepback Answer)
Producing the final answer to the user based on the original question and the generated response (Final Answer)

The paper abstract has the following note which could give insights on the Stepback Answer:

"The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise. — Edsger W. Dijkstra"

2. Automatic Generation of "Stepback Question":

The key to this method seems to be the effective creation of the Stepback Question. However, constantly coming up with the Stepback Question could be challenging. While searching for an easier way, an excellent automatic generation method was introduced in LangChain's cookbook (2), which seems to apply Few shot learning.

By presenting these two examples to the model first, when a new user question like "Was ChatGPT around when Trump was president?" is posed,

As shown, a more general question, "When was ChatGPT developed?" is generated. Using this to guide the final answer results in higher accuracy. Although not always 100% correct based on my own trials, the accuracy does seem notably higher. According to the paper, it even achieves accuracy surpassing GPT-4 in some instances.

3. Anticipation for Future Developments:

Since "Step-Back Prompting" has a simple structure, it seems versatile for various applications. It can also be combined with existing techniques like CoT. Looking forward to its future growth, it seems highly compatible with LangChain and easy to implement, which will likely lead to an increase in use cases.

So, what do you think? I will continue to experiment and if there are any significant findings, I'll share them here. Stay tuned!

1) “TAKE A STEP BACK: EVOKING REASONING VIA ABSTRACTION IN LARGE LANGUAGE MODELS" Huaixiu Steven Zheng∗ Swaroop Mishra∗ Xinyun Chen Heng-Tze Cheng Ed H. Chi Quoc V Le Denny Zhou Google DeepMind” Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan, Google DeepMind, 9 Oct 2023, https://arxiv.org/abs/2310.06117

2) langchain/cookbook/stepback-qa.ipynb https://github.com/langchain-ai/langchain/blob/master/cookbook/stepback-qa.ipynb

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

October 22, 2023

AGI, ChatGPT, AI, GPT4, OpenAI

GPT-4V is here. I tried it immediately and was amazed. It can do this too!

Toshifumi Kuga

October 22, 2023

AGI, ChatGPT, AI, GPT4, OpenAI

Sorry to keep you waiting. OpenAI's GPT-4 now comes with image recognition capabilities. To be precise, it was demonstrated when it debuted in March of this year, but it has only now been made available to users after half a year. I recently tried the new feature in ChatGPT+ and, in a word, it's incredible!

By the way, the image mentioned above was also created with a combination of GPT-4 and DALL-E3.

Now, let's start the experiment!

First, we'll start with recognizing mobile-phones. It can accurately count the number of mobile-phones. This is a piece of cake.

I thought flight information would be challenging, but it identified the destination impeccably. Since it's originally an excellent language model, it seems proficient in deriving meaning from images.

It can even read Osaka's Tsutenkaku tower. Local information is no problem.

For a change, I inserted an image of analysis results. It can read graphs effortlessly. This is impressive!

What shocked me was that it could easily count cars. Of course, it's not a specialized object detection model, so errors will always occur. I believe there were about 48 cars in this photo, but for general use, this margin of error seems acceptable. It's astonishing what it can do by just being given an image.

It can count cans, but the error is relatively significant. It might struggle with cluttered items.

It works well to read English text in an OCR-like manner.

It can also easily read the time displayed on electronic signboards.

How did you find it? Without any fine-tuning, it achieved this much. GPT-4V has just been launched, and various use cases are likely to emerge in the future. I look forward to introducing interesting examples here as they arise. Stay tuned!

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

September 4, 2023

AI, LLM, NLP, Tree of Thoughts, GPT4, Graph of Thoughts

The "Graph of Thoughts" might pave the way for new avenues in "human and LLM (Large Language Model) collaboration"!

Toshifumi Kuga

September 4, 2023

AI, LLM, NLP, Tree of Thoughts, GPT4, Graph of Thoughts

Last week, I came across an intriguing paper on Large Language Models (LLMs). It appears to further develop the "Tree of Thoughts" (ToT) reasoning method I mentioned before, introducing a new technique called the "Graph of Thoughts" (GoT). Let's take a closer look.

Features of GoT

First, let's compare various methods using the chart provided in the paper.

The far right shows the newly introduced GoT. Key differences from ToT may be that GoT allows for the merging of thoughts, and that users can define the shape of the GoT themselves. Incidentally, this merging is referred to as "aggregation" within the paper. While it may seem similar to ToT, the differences might be significant. Let's explore this in more detail.

2. Four Key Modules

GoT (Graph of Thoughts) has the following four crucial modules. Understanding these will clarify the differences between it and ToT (Tree of Thoughts).

Prompter
Parser
Scoring & Validation
Controller

Let's look at each one in detail. The Prompter, as the name suggests, is the module responsible for creating prompts. The Parser extracts the required information, or "thoughts," from the LLM (Large Language Model). You might think of the Prompter as handling input and the Parser as managing output. Scoring & Validation is the module that evaluates the gathered thoughts. This evaluation allows us to select the thoughts worth keeping. Finally, let's elaborate on the Controller. It is responsible for adding new thoughts or merging multiple thoughts, a process referred to as "transform." The Controller decides which transformations should be applied to which thoughts and passes this information to the Prompter. It is a critical module for executing problem-solving strategies. It has two functions: Graph of Operations (GoO), which is an execution plan for operations defined by the user, and Graph Reasoning State (GRS), which maintains the ongoing LLM reasoning process based on the thought state.

3. Considering the Number Sorting Problem

Since merely talking abstractly may not advance understanding, let's consider an actual task. This time we will consider sorting a list of 64 numbers in ascending order. Here, we'll see how the Graph of Operations (GoO) comes into play. In the chart below, each thought is tagged with operations like G (Generate), S (Sort), K (Keep the best), and A (Merge). Initially, we take a list of 64 numbers and divide it into four lists, each containing 16 numbers. Each of these lists is then sorted and evaluated. Only the list with the highest accuracy is kept. These are then further merged to form a new list containing 32 numbers. You'll see various operations functioning as the process progresses.

For those who want to delve deeper, detailed explanations are available here, particularly in the green part of the chart above.

It might feel complex at a glance, but it's user-controllable, allowing you to incorporate your domain knowledge. I am excited to conduct various experiments in the future.

Thank you for your attention! I will keep you updated on the progress of GoT and hope to share more with you soon. Stay tuned!

1) "Graph of Thoughts: Solving Elaborate Problems with Large Language Models", Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler, 21 Aug 2023, https://arxiv.org/abs/2308.09687v2

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

August 25, 2023

AI, ChatGPT, LLM, NLP, OpenAI

Fine-tuning has come to ChatGPT. Its effects are outstanding, and if implemented according to the task, we can perhaps expect significant improvements in accuracy!!

Toshifumi Kuga

August 25, 2023

AI, ChatGPT, LLM, NLP, OpenAI

Hello everyone, how are you doing? Although the illustration is autumn-like, it seems that summer will stick around for a while in Japan

While that was happening, I suddenly received a message from OpenAI saying, "The fine-tuning feature has been implemented." I have always fine-tuned open-source models, so I was a little disappointed that ChatGPT didn't have this feature. But it seems that it has finally made its appearance. I guess OpenAI got a little serious. Let's get started right away.

Is fine-tuning effective for ChatGPT?

I'm sure you all want to know, "Does fine-tuning work well with ChatGPT?" So I created a small dataset and conducted a simple experiment. To put it briefly, "Something amazing is happening!" Below is the table with the results.

I had GPT3.5 perform a 6-class classification task and expected some fine-tuning effects. However, exceeding an accuracy of 0.8 was unexpected. The normal GPT3.5 only barely surpassed 0.5, so I initially thought that the model's potential was lacking. However, an accuracy of 0.88 appeared on the first fine-tuning, which was hard to believe. Upon changing the seed and refreshing the data, it still yielded an accuracy near 0.8, completely different from the normal accuracy. The compatibility between fine-tuning and ChatGPT must be outstanding.

2. Experiment Details

In this experiment, the task was to identify what type of financial product a given English complaint was about. This is a task of classifying 6 different financial products such as home loans or bank accounts, and the data used for fine-tuning consisted of 100 samples each for training and validation, which is a minimum configuration. The training results show a decrease in training loss and eventually seem to reach zero (actually it continues to go down further). Quick conclusion: it went well. Using this fine-tuned model yielded the results mentioned in section 1.

3. Discussion

Just by looking at the results of this experiment, we can't definitively say that fine-tuning always succeeds. Various cases will emerge in the future, and it will be important to make comprehensive judgments based on those results. Especially this time, minimal prompt engineering was done. Combining prompt engineering and fine-tuning to achieve the best performance is a future challenge. There are many points to consider, like cost and computation time. It will require trial and error. While GPT-4 indeed performs well with an accuracy around 0.8 for this task, its cost is high, and implementation isn't always straightforward. Even in such cases, the new weapon of fine-tuning has come into our hands, increasing our options and potentially moving us a step forward in problem-solving.

How was it? I would like to introduce more experiments and their results here in the future. Stay tuned!

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

July 22, 2023

AI, NLP, LLM, Llama2

"Llama2" is a great LLM as it is Open source and for commercial use. I want to try many applications with this language model.

Toshifumi Kuga

July 22, 2023

AI, NLP, LLM, Llama2

Llama2 website https://ai.meta.com/llama/

Hi friend, I would like to introduce a new LLM, which was released from Meta on July 18,2023. It is called “Llama2”. I have some experiments with this model. Let us start!

1. What is Llama2?

“Llama2” is language model from Meta AI. Many researchers are very excited because it is a open source and available for commercial usage. Its specs are explained in the table below.

2. Let us extract information from the article in English

I want to perform a small experiment to extract information from text.

sentiment
root cause of the sentiment
name of product
name of makers of product

I made my prompt and fiction story in the mail. Then run Llama2 13B chat. Here are the results

Woh, looks good! I can obtain the information I need from text. Unfortunately the model cannot output it in Japanese.

3. Let us see how it works against Japanese sentences

Next, I would like to apply the same prompt against Japanese sentences here.

Woh, looks good, too! Although the model cannot output it in Japanese, either.

4. Llama2 has a great potential for AI applications in the future!

Today I found that Llama2 works in English very well. When we want to minimize running costs for AI applications or keep secret/confidential data within our organization, this model can be a good candidate for AI models in our applications. It is great to have many choices of LLMs in addition to proprietary models, such as ChatGPT.

I want to mention a great repo on GitHub. It makes it easier to compare many open source LLMs, It is a strong recommendation for everyone who is interested in LLMs. Thanks camenduru!

Thanks for your attention! would like to follow the progress of Llama2 and share it with you soon. Stay tuned!

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

July 10, 2023

AI, LLM, NLP, ChatGPT, Tree of Thoughts

"Tree of Thoughts" can go mainstream in prompt engineering!

Toshifumi Kuga

July 10, 2023

AI, LLM, NLP, ChatGPT, Tree of Thoughts

Today, I found a very interesting paper called “Tree of Thoughts (ToT)”(1). With ToT, we can solve the tasks, where we could not do it before. So I want to share it with you and consider how it works together. Let us start now!

1.Chain of Thoughts (CoT)

This paper provides four kinds of prompting as the chart below says. The left one is called “IO prompting” and is relatively simple. The right one is the most complex, called “Tree of Thoughts (ToT)”.

Among four kinds of prompting, I focus on Chain of Thoughts (CoT) first because it gives us a fundamental space to explore. The paper says “The key idea is to introduce a chain of thoughts z1, · · · , zn to bridge x and y, where each zi is a coherent language sequence that serves as a meaningful intermediate step toward problem solving“. By CoT, we explore a prompting method for improving the reasoning abilities of LLMs and solve complex tasks effectively. Once we understand how CoT works, let us move on ToT.

2. Tree of Thoughts (ToT)

Let us expand CoT with tree search so that we can apply it to more complex tasks effectively. This paper says “we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving.”. Sounds great! OK, let us consider how it works.

ToT has four steps to implement it. I would like to explain them step by step.

decompose the process into thoughts
- each thought should be small enough so that LLMs can generate promising and diverse samples
generate states
- generate potential thoughts from each state. There are two kinds of methods to do this according to this paper.
evaluate each state
- LLMs evaluate each state to decide how a tree should grow
search for the best state
- If the current state is not good enough, we should search into other branches. There are several search algorithms to do that.

3. ToT can be solved by MCTS

Although ToT can be solved with relatively simple Tree Search algorithms, we can use more advanced algorithms, such as Monte Carlo Tree Search (MCTS). It has been famous since AlphaGo defeated a professional human Go player in March 2016. In AlphaGo, MCTS is combined with Neural network. This is sometimes called “model guided Tree Search” and we do not need to search for the whole possible state anymore. In the picture, Demis Hassabis, Google DeepMind CEO, explained how it works(2).

It must be exciting when ToT can be searched by MTCS in the near future as wider and deeper states can be explored and it must provide us good results.

Thanks for your attention! I would like to follow the progress of ToT and share it with you soon. Stay tuned!

1) “Tree of Thoughts: Deliberate Problem Solving with Large Language Models” Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan, 17 May 2023, https://arxiv.org/abs/2305.10601

2) Using AI to Accelerate Scientific Discovery | Campus Lecture with Demis Hassabis, https://www.youtube.com/watch?v=Ds132TzmLRQ&t=1381s

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

June 18, 2023

AI, GPT4, LLM, NLP, Function calling, OpenAI

“Function calling” is a game changer as GPT can access outside and be converted to our agents easily!

Toshifumi Kuga

June 18, 2023

AI, GPT4, LLM, NLP, Function calling, OpenAI

Today, I want to create web-site with a description of the Japanese sweets collection, just like “Dorayaki“ in the picture above. So I ordered my AI agent to create an awesome web-site. But is it really possible? I am sure yes, it is!. As you know, OpenAI created GPT, which is very intelligent large language model (LLM). On 13 June 2023, “Function calling” was introduced by OpenAI. It can bridge GPT to other systems, APIs and functions outside. Let me explain step by step!

1.What is the advantage of “Function calling”?

Function calling makes it easy for GPT to access functions outside. For example, when you want to create a web-site where Japanese sweets are explained to customers, you need to connect GPT to the function that can write code of web-site with HTML/CSS. With “Function calling”, GPT can call this function and pass the parameters, such as “explanations of Japanese sweets” to this function. Official documents says “The latest models (gpt-3.5-turbo-0613 and gpt-4-0613) have been fine-tuned to both detect when a function should to be called (depending on the input) and to respond with JSON that adheres to the function signature.”

2. The list of “functions” is key to set “function calling” up

“Function calling”looks great! But how can we implement in our code. I think it is so simple. Just prepare the list of functions. This should have

"name"
"description"
"parameters" : "type" , "properties", "required"

In ChatCompletion.create, we should add “functions=functions” because we want to call the function. The other part of the code has not changed so much. The code below shows us an example of functions, which comes from Official documents. Please look at these docs for the details if needed.

3. Let us see how the generated web looks like

OK, it is the time that we see the result from our agent. I instruct "Create a web-site for a pretty Japanese sweets collection" to our agent. Text of “title” and “explanation” are generated by GPT3.5-turbo and are sent to the function that creates a web. Here is the result. All are written in Japanese. The title means “a pretty Japanese sweets collection". The sentences of the explanation are pretty good! I do not think there is a need to fix or modify these sentences at all.

If you want to know more details with the code, you can see it here.

https://github.com/TOSHISTATS/Wagashi-Collection-Web-Generation-agent-by-GPT3.5#readme

Hope you can understand how AI agents work. I think potential use-cases of “Function calling”are limitless. I tried several use cases by “Function calling” and found that it can be a game changer to develop LLM application systems. I would like to update my article about AI agents by OpenAI GPT soon. Stay tuned!

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

June 12, 2023

LLM, NLP, AGI, GPT4, Deep Learning

"Large Language Models as Tool Makers" is a good idea to enhance the accuracy of the model while keeping the cost of computation low!

Toshifumi Kuga

June 12, 2023

LLM, NLP, AGI, GPT4, Deep Learning

Since GPT4 , one of the most intelligent Large Language Model (LLM) in the world, was released on 14 March 2023, many people are surprised because it is very intelligent. This is great. But there is one problem for users. It is not free service. Users should pay the fee of GPT4, based on how much tokens they use on GPT4. Therefore when we continue to use GPT4 all day long, it must be very expensive. Of course we prefer more intelligence. But we should consider the cost of it. There is a trade off between them. What should we do to solve it? Last week, I found a good research paper called “Large Language Models as Tool Makers“(1). All charts below come from this awesome paper. The idea is simple and looks promising to tackle these problems. So let me explain more details.

1. Tool user and Tool maker

The basic idea is as follows. Let us have two LLMs, one is called “Tool maker” and another is called “Tool user”. When a new task comes to us, Tool maker create “tools” for this task. Once “tools” are ready, they are passed to Tool user for inference. These tools are reusable to solve similar tasks in the future. So GPT4 can be used only as Tool maker as it should be more intelligent. Light weights models, such as GPT3.5, can be used as a Tool user. Then we can reduce the cost of computation for inference. It sounds great! The chart below explains how it works.

2. How can we create tools for our task?

As we want to keep the accuracy of the results, Tool maker should create better tools. There are three steps to do that.

• Tool Proposing: The tool maker generates a Python function to solve the given task. If the proposed tool makes errors, the tool maker makes another tool.

• Tool Verification: The tool maker generates unit tests using validation samples and subsequently executes these tests. 3 validation samples are prepared here. If the tool fails any of these tests, the tool makes an attempt to fix the issues. The paper explains it as follows “This stage fulfills two key roles: 1) it provides examples that demonstrate how to convert natural language questions into function calls, and 2) it verifies the tool’s reliability, enabling the entire process to be fully automated.”

• Tool Wrapping: If the execution or verification passes the preset threshold, tool maker prepares the wrapped tool for tool user. This step involves wrapping up the function code and providing demonstrations of how to convert a task into a function call. This final product is then ready for use by the tool user.

The chart below shows us how it works.

Once the tool is ready, the tool is passed to the tool-user. The tool user should solve various instances of the task by using tools made by the Tool maker. The prompt for this stage is the wrapped tool which contains the function for solving the task and demonstrations of how to convert a task query into a function call. With the demonstrations, tool user can then generate the required function call in an in-context learning fashion. The function calls are then executed to solve the task. The chart below shows us how the processes from Tool maker to Tool user are going.

3. Can we confirm if tools that fit our tasks are available or not?

Here, we use a third LLM called “the dispatcher”. Because the dispatcher maintains a record of existing tools produced by the tool maker, it can confirm if tools that fit our task are available at the time when a task is received, If no appropriate tool is found, the dispatcher identifies the instance as a new task and solves the instance with a powerful model, such as GPT4. The dispatcher’s workflow is shown here.

That is it! This is a major part of “Large Language Models as Tool Makers” or “LATM” in short. By LATM, we might reduce computational cost for heavy models, such as GPT4. It is amazing! Hope you enjoy the article today. I will update new technologies around LLM in the near future. Stay tuned!

1) “Large Language Models as Tool Makers” Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, Denny Zhou, 26 May 2023, https://arxiv.org/abs/2305.17126

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

May 14, 2023

AI

LLM can be "reasoning engine" to create our agents. It must be a game changer!

Toshifumi Kuga

May 14, 2023

AI

Recently, Large Language model (LLM) is getting more attractions all over the world. Google released their new LLM called “PaLM 2” on 10 May 2023. It starts competing against “ChatGPT” which was released in Nov 2022 and attracts over 100 million users in just two months. LLM is expected to be more intelligent in a short period as competition between big IT companies is getting tough. What does it mean to us? . Let us consider step by step!

1. How can we create our own agent?

In my article in Feb 2023, I said AI can be our agent which understands our languages. Let us consider how it is possible step by step. When I want to eat lunch. I just order my agent, saying “I would like to have lunch”, LLM can understand what I say and try to order my favorite hamburger at the restaurant. For LLM to act against the outside world (such as call restaurants), it needs some tools, which can be created with libraries such as “LangChain”. Then LLM can order my lunch and finally I can have lunch, anyway. It sounds good. Let us move deeper.

2. LLM is not just an “interface” to us with natural languages.

As I said in Feb this year, the first time I used ChatGPT, I felt like it could understand what I said. But now I do not think it is just an“interface” any more. Because LLM is trained with massive amounts of text from the web, books and other sources, LLM obtains a lot of knowledge of human beings from the past to the present. Since Chat GPT appeared in front of us last year, I performed many experiments with LLM and found that LLM has an ability to make decisions. Although it is not perfect, it sometimes performs at the same level as human beings. It is amazing! In addition to that, LLM is still in the early stage and evolves on a daily basis!

3. LLM will be more and more intelligent as a “reasoning engine”!

Mr. Sam Altman, OpenAI CEO says in youtube “ChatGPT may be a reasoning engine”(1). I completely agree with his opinion. When we create our agents, LLM works as a “reasoning engine” to make decisions to solve complex tasks. Around LLM, there are many systems to act against the outside world, such as “search web” or “shop in e-commerce”. All we have to do is think “how can we enable LLM make the right decisions”. Because LLM is very new for everyone, no one knows the right answer. Fortunately, LLM can understand our languages, it may not need programming anymore. It is very important for us. So let us consider step by step!

I would like to update the progress of AI agents. Stay tuned!

1) Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI | Lex Fridman Podcast #367 https://www.youtube.com/watch?v=L_Guz73e6fw&t=867s (around 14:25)

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.