ChatGPT

The era of "agent-style applications" has arrived, earlier than expected and seems to be accelerating even further

On November 6, the OpenAI DevDay was held, marking its first annual developer's conference. The technological developments since the debut of GPT-4 in March 2023 were introduced at once. There's too much to cover comprehensively, so I'll leave that to OpenAI CEO Sam Altman, but here I want to raise three key points I've considered and explore them further.




  1. Price is Key

The anticipated price reduction has been realized. GPT-4 is roughly about 65% off. Of course, the reduction varies depending on usage. I've already tried the new GPT-4 Turbo for half a day, and it cost about $5, which would have definitely exceeded $10 before. This makes it more viable for Proof of Concept (PoC) use. It seems the time has come to utilize GPT-4's still unseen potential in various areas. A wallet-friendly approach is a welcome change for everyone.



2. Building AI Apps Without Being a Programmer

At this developer's conference, I noticed many features that operate with no-code. GPTs, which allow creation of customized ChatGPT in a dialogue format, is a prime example. The developer-oriented Assistants API also doesn't require coding if used with the Playground. With the code interpreter tool already implemented, writing prompts to invoke and execute it automates the rest. This is impressive.

I implemented a model to calculate default probabilities using a step-by-step prompt, from 1 to 5, with the code-interpreter turned on, without writing any specific code. When executed, the model was successfully created, and it performed tasks like calculating AUC and generating histograms as instructed.





3. Easy Construction of "Agent-Style Applications"

Listening to OpenAI CEO Sam Altman's presentation, I felt a strong emphasis on agents. The Playground Tool includes function calling, which seems to make it much easier to create agents that determine their next actions based on situations. While open-source implementations of agents have been increasing, I didn't expect them to be implemented this quickly on the OpenAI platform. Paired with GPTs, the year of 2024 feels like it could be the first year of "agent-style applications." This is truly exciting.

How about these new services? Following the announcements at DevDay, developers worldwide seem to be thinking about various AI applications. I'm also eager to start creating an agent-style application. Stay tuned!




Copyright © 2023 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

GPT-4V is here. I tried it immediately and was amazed. It can do this too!

Sorry to keep you waiting. OpenAI's GPT-4 now comes with image recognition capabilities. To be precise, it was demonstrated when it debuted in March of this year, but it has only now been made available to users after half a year. I recently tried the new feature in ChatGPT+ and, in a word, it's incredible!

By the way, the image mentioned above was also created with a combination of GPT-4 and DALL-E3.

Now, let's start the experiment!


First, we'll start with recognizing mobile-phones. It can accurately count the number of mobile-phones. This is a piece of cake.

 

I thought flight information would be challenging, but it identified the destination impeccably. Since it's originally an excellent language model, it seems proficient in deriving meaning from images.

 

It can even read Osaka's Tsutenkaku tower. Local information is no problem.

 

For a change, I inserted an image of analysis results. It can read graphs effortlessly. This is impressive!

 

What shocked me was that it could easily count cars. Of course, it's not a specialized object detection model, so errors will always occur. I believe there were about 48 cars in this photo, but for general use, this margin of error seems acceptable. It's astonishing what it can do by just being given an image.

 

It can count cans, but the error is relatively significant. It might struggle with cluttered items.

 

It works well to read English text in an OCR-like manner.

 

It can also easily read the time displayed on electronic signboards.

How did you find it? Without any fine-tuning, it achieved this much. GPT-4V has just been launched, and various use cases are likely to emerge in the future. I look forward to introducing interesting examples here as they arise. Stay tuned!

 

Copyright © 2023 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Fine-tuning has come to ChatGPT. Its effects are outstanding, and if implemented according to the task, we can perhaps expect significant improvements in accuracy!!

Hello everyone, how are you doing? Although the illustration is autumn-like, it seems that summer will stick around for a while in Japan

While that was happening, I suddenly received a message from OpenAI saying, "The fine-tuning feature has been implemented." I have always fine-tuned open-source models, so I was a little disappointed that ChatGPT didn't have this feature. But it seems that it has finally made its appearance. I guess OpenAI got a little serious. Let's get started right away.

 
  1. Is fine-tuning effective for ChatGPT?

I'm sure you all want to know, "Does fine-tuning work well with ChatGPT?" So I created a small dataset and conducted a simple experiment. To put it briefly, "Something amazing is happening!" Below is the table with the results.

Accuracy for 100 samples

I had GPT3.5 perform a 6-class classification task and expected some fine-tuning effects. However, exceeding an accuracy of 0.8 was unexpected. The normal GPT3.5 only barely surpassed 0.5, so I initially thought that the model's potential was lacking. However, an accuracy of 0.88 appeared on the first fine-tuning, which was hard to believe. Upon changing the seed and refreshing the data, it still yielded an accuracy near 0.8, completely different from the normal accuracy. The compatibility between fine-tuning and ChatGPT must be outstanding.

 

2. Experiment Details

In this experiment, the task was to identify what type of financial product a given English complaint was about. This is a task of classifying 6 different financial products such as home loans or bank accounts, and the data used for fine-tuning consisted of 100 samples each for training and validation, which is a minimum configuration. The training results show a decrease in training loss and eventually seem to reach zero (actually it continues to go down further). Quick conclusion: it went well. Using this fine-tuned model yielded the results mentioned in section 1.

 

3. Discussion

Just by looking at the results of this experiment, we can't definitively say that fine-tuning always succeeds. Various cases will emerge in the future, and it will be important to make comprehensive judgments based on those results. Especially this time, minimal prompt engineering was done. Combining prompt engineering and fine-tuning to achieve the best performance is a future challenge. There are many points to consider, like cost and computation time. It will require trial and error. While GPT-4 indeed performs well with an accuracy around 0.8 for this task, its cost is high, and implementation isn't always straightforward. Even in such cases, the new weapon of fine-tuning has come into our hands, increasing our options and potentially moving us a step forward in problem-solving.

How was it? I would like to introduce more experiments and their results here in the future. Stay tuned!




Copyright © 2023 Toshifumi Kuga. All right reserved



Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

"Tree of Thoughts" can go mainstream in prompt engineering!

Today, I found a very interesting paper called “Tree of Thoughts (ToT)”(1). With ToT, we can solve the tasks, where we could not do it before. So I want to share it with you and consider how it works together. Let us start now!

1.Chain of Thoughts (CoT)

This paper provides four kinds of prompting as the chart below says. The left one is called “IO prompting” and is relatively simple. The right one is the most complex, called “Tree of Thoughts (ToT)”.

Among four kinds of prompting, I focus on Chain of Thoughts (CoT) first because it gives us a fundamental space to explore. The paper says “The key idea is to introduce a chain of thoughts z1, · · · , zn to bridge x and y, where each zi is a coherent language sequence that serves as a meaningful intermediate step toward problem solving“. By CoT, we explore a prompting method for improving the reasoning abilities of LLMs and solve complex tasks effectively. Once we understand how CoT works, let us move on ToT.

 

2. Tree of Thoughts (ToT)

Let us expand CoT with tree search so that we can apply it to more complex tasks effectively. This paper says “we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving.”. Sounds great! OK, let us consider how it works.

ToT has four steps to implement it. I would like to explain them step by step.

  • decompose the process into thoughts

    • each thought should be small enough so that LLMs can generate promising and diverse samples

  • generate states

    • generate potential thoughts from each state. There are two kinds of methods to do this according to this paper.

  • evaluate each state

    • LLMs evaluate each state to decide how a tree should grow

  • search for the best state

    • If the current state is not good enough, we should search into other branches. There are several search algorithms to do that.


3. ToT can be solved by MCTS

Although ToT can be solved with relatively simple Tree Search algorithms, we can use more advanced algorithms, such as Monte Carlo Tree Search (MCTS). It has been famous since AlphaGo defeated a professional human Go player in March 2016. In AlphaGo, MCTS is combined with Neural network. This is sometimes called “model guided Tree Search” and we do not need to search for the whole possible state anymore. In the picture, Demis Hassabis, Google DeepMind CEO, explained how it works(2).

It must be exciting when ToT can be searched by MTCS in the near future as wider and deeper states can be explored and it must provide us good results.

 

Thanks for your attention! I would like to follow the progress of ToT and share it with you soon. Stay tuned!

 

1) “Tree of Thoughts: Deliberate Problem Solving with Large Language Models” Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan, 17 May 2023, https://arxiv.org/abs/2305.10601

2) Using AI to Accelerate Scientific Discovery | Campus Lecture with Demis Hassabis, https://www.youtube.com/watch?v=Ds132TzmLRQ&t=1381s

 



Copyright  ©  2023  Toshifumi Kuga  All right reserved



Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

LLM can be "reasoning engine" to create our agents. It must be a game changer!

Recently, Large Language model (LLM) is getting more attractions all over the world. Google released their new LLM called “PaLM 2” on 10 May 2023. It starts competing against “ChatGPT” which was released in Nov 2022 and attracts over 100 million users in just two months. LLM is expected to be more intelligent in a short period as competition between big IT companies is getting tough. What does it mean to us? . Let us consider step by step!


1. How can we create our own agent?

In my article in Feb 2023, I said AI can be our agent which understands our languages. Let us consider how it is possible step by step. When I want to eat lunch. I just order my agent, saying “I would like to have lunch”, LLM can understand what I say and try to order my favorite hamburger at the restaurant. For LLM to act against the outside world (such as call restaurants), it needs some tools, which can be created with libraries such as “LangChain”. Then LLM can order my lunch and finally I can have lunch, anyway. It sounds good. Let us move deeper.


2. LLM is not just an “interface” to us with natural languages.

As I said in Feb this year, the first time I used ChatGPT, I felt like it could understand what I said. But now I do not think it is just an“interface” any more. Because LLM is trained with massive amounts of text from the web, books and other sources, LLM obtains a lot of knowledge of human beings from the past to the present. Since Chat GPT appeared in front of us last year, I performed many experiments with LLM and found that LLM has an ability to make decisions. Although it is not perfect, it sometimes performs at the same level as human beings. It is amazing! In addition to that, LLM is still in the early stage and evolves on a daily basis!



3. LLM will be more and more intelligent as a “reasoning engine”!

Mr. Sam Altman, OpenAI CEO says in youtube “ChatGPT may be a reasoning engine”(1). I completely agree with his opinion. When we create our agents, LLM works as a “reasoning engine” to make decisions to solve complex tasks. Around LLM, there are many systems to act against the outside world, such as “search web” or “shop in e-commerce”. All we have to do is think “how can we enable LLM make the right decisions”. Because LLM is very new for everyone, no one knows the right answer. Fortunately, LLM can understand our languages, it may not need programming anymore. It is very important for us. So let us consider step by step!


I would like to update the progress of AI agents. Stay tuned!



1) Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI | Lex Fridman Podcast #367 https://www.youtube.com/watch?v=L_Guz73e6fw&t=867s (around 14:25)


Copyright  ©  2023  Toshifumi Kuga  All right reserved


Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.