agent

I tried the new generative AI model "Claude3 Haiku". Fast, smart, and low-priced. I want to use it as an AI agent!

On March 14th, "Claude3 Haiku" (1), the lightest model among the Claude3 generative AIs, was released and became available for use in web applications and APIs. I'm usually drawn to the highest-performing models, but this time I'd like to focus on the lightest one. Recently, algorithms that execute repetitive calculations like AI Agents have become more common. I want to use high-end models like GPT4, but they are very costly to run. So I was looking for a low-cost, high-performance model, and "Claude3 Haiku" is perfect as it costs 1/60th of the high-end model "Claude3 Opus" while still delivering excellent performance. I'd like to try it out here right away. The details of each model are as follows.




1. First, let's test the text

I checked if "Claude3 Haiku" knows about Hiroshima-style okonomiyaki, a hyper-local Japanese food. I used to live in Hiroshima, so I know it well, and I think this answer is generally good. The Japanese is clean, so it passes for now.




Next, I asked about transportation from Tokyo to Osaka. Unfortunately, there was one clear mistake. The travel time by bus is stated as "about 4 hours and 30 minutes," but in reality, it takes around 8 hours. This is a hallucination.



Then I asked about the "Five Forces," a framework for analyzing market competitiveness. It analyzed the automotive industry, and the analysis incorporates the latest examples, such as the threat of electric vehicles as substitutes, making it a sufficient quality starting point for discussion. However, the fact that it's not in a table format is a drawback.





2. Next, let's analyze images.

First, I asked about the number of smartphones, but unfortunately, it got it wrong. It may not be good at counting.




This is a photo of the Atomic Bomb Dome in Hiroshima. It answered this perfectly. It seems to understand famous Japanese buildings.





This is a photo of a streetcar running in Hiroshima City. I think it captures it pretty well overall. However, the streetcars don't run solely for tourists, so the explanation may be somewhat incomplete.




This is a flight information board at Haneda Airport. It perfectly understands the detailed information. Excellent.





Counting the number of cars in a parking lot is a difficult task for generative AI. This time it answered 60 cars, but there are actually 48. If the accuracy improves a bit more, it will reach a practical level, which is a bit disappointing.






3. Impressions of using "Claude3 Haiku".

Honestly, the performance was unbelievable for a general-use AI. The Japanese is natural and clean. The fact that it can incorporate and analyze images in the first place is groundbreaking. Multimodality has arrived in general-use AI. The calculation speed is also fast, and I think it will be applied to applications that require real-time responses. And the cost is low. This allows for plenty of interesting experiments. It's a savior for startups with tight cost constraints! I want to continue doing interesting experiments using "Claude3 Haiku". Stay tuned!

(1) Claude 3 Haiku: our fastest model yet   2024.3.14  Anthropic

Copyright © 2024 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

The era of "agent-style applications" has arrived, earlier than expected and seems to be accelerating even further

On November 6, the OpenAI DevDay was held, marking its first annual developer's conference. The technological developments since the debut of GPT-4 in March 2023 were introduced at once. There's too much to cover comprehensively, so I'll leave that to OpenAI CEO Sam Altman, but here I want to raise three key points I've considered and explore them further.




  1. Price is Key

The anticipated price reduction has been realized. GPT-4 is roughly about 65% off. Of course, the reduction varies depending on usage. I've already tried the new GPT-4 Turbo for half a day, and it cost about $5, which would have definitely exceeded $10 before. This makes it more viable for Proof of Concept (PoC) use. It seems the time has come to utilize GPT-4's still unseen potential in various areas. A wallet-friendly approach is a welcome change for everyone.



2. Building AI Apps Without Being a Programmer

At this developer's conference, I noticed many features that operate with no-code. GPTs, which allow creation of customized ChatGPT in a dialogue format, is a prime example. The developer-oriented Assistants API also doesn't require coding if used with the Playground. With the code interpreter tool already implemented, writing prompts to invoke and execute it automates the rest. This is impressive.

I implemented a model to calculate default probabilities using a step-by-step prompt, from 1 to 5, with the code-interpreter turned on, without writing any specific code. When executed, the model was successfully created, and it performed tasks like calculating AUC and generating histograms as instructed.





3. Easy Construction of "Agent-Style Applications"

Listening to OpenAI CEO Sam Altman's presentation, I felt a strong emphasis on agents. The Playground Tool includes function calling, which seems to make it much easier to create agents that determine their next actions based on situations. While open-source implementations of agents have been increasing, I didn't expect them to be implemented this quickly on the OpenAI platform. Paired with GPTs, the year of 2024 feels like it could be the first year of "agent-style applications." This is truly exciting.

How about these new services? Following the announcements at DevDay, developers worldwide seem to be thinking about various AI applications. I'm also eager to start creating an agent-style application. Stay tuned!




Copyright © 2023 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

LLM can be "reasoning engine" to create our agents. It must be a game changer!

Recently, Large Language model (LLM) is getting more attractions all over the world. Google released their new LLM called “PaLM 2” on 10 May 2023. It starts competing against “ChatGPT” which was released in Nov 2022 and attracts over 100 million users in just two months. LLM is expected to be more intelligent in a short period as competition between big IT companies is getting tough. What does it mean to us? . Let us consider step by step!


1. How can we create our own agent?

In my article in Feb 2023, I said AI can be our agent which understands our languages. Let us consider how it is possible step by step. When I want to eat lunch. I just order my agent, saying “I would like to have lunch”, LLM can understand what I say and try to order my favorite hamburger at the restaurant. For LLM to act against the outside world (such as call restaurants), it needs some tools, which can be created with libraries such as “LangChain”. Then LLM can order my lunch and finally I can have lunch, anyway. It sounds good. Let us move deeper.


2. LLM is not just an “interface” to us with natural languages.

As I said in Feb this year, the first time I used ChatGPT, I felt like it could understand what I said. But now I do not think it is just an“interface” any more. Because LLM is trained with massive amounts of text from the web, books and other sources, LLM obtains a lot of knowledge of human beings from the past to the present. Since Chat GPT appeared in front of us last year, I performed many experiments with LLM and found that LLM has an ability to make decisions. Although it is not perfect, it sometimes performs at the same level as human beings. It is amazing! In addition to that, LLM is still in the early stage and evolves on a daily basis!



3. LLM will be more and more intelligent as a “reasoning engine”!

Mr. Sam Altman, OpenAI CEO says in youtube “ChatGPT may be a reasoning engine”(1). I completely agree with his opinion. When we create our agents, LLM works as a “reasoning engine” to make decisions to solve complex tasks. Around LLM, there are many systems to act against the outside world, such as “search web” or “shop in e-commerce”. All we have to do is think “how can we enable LLM make the right decisions”. Because LLM is very new for everyone, no one knows the right answer. Fortunately, LLM can understand our languages, it may not need programming anymore. It is very important for us. So let us consider step by step!


I would like to update the progress of AI agents. Stay tuned!



1) Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI | Lex Fridman Podcast #367 https://www.youtube.com/watch?v=L_Guz73e6fw&t=867s (around 14:25)


Copyright  ©  2023  Toshifumi Kuga  All right reserved


Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.