artificial intelligence

I tried the new generative AI model "Claude3 Haiku". Fast, smart, and low-priced. I want to use it as an AI agent!

On March 14th, "Claude3 Haiku" (1), the lightest model among the Claude3 generative AIs, was released and became available for use in web applications and APIs. I'm usually drawn to the highest-performing models, but this time I'd like to focus on the lightest one. Recently, algorithms that execute repetitive calculations like AI Agents have become more common. I want to use high-end models like GPT4, but they are very costly to run. So I was looking for a low-cost, high-performance model, and "Claude3 Haiku" is perfect as it costs 1/60th of the high-end model "Claude3 Opus" while still delivering excellent performance. I'd like to try it out here right away. The details of each model are as follows.




1. First, let's test the text

I checked if "Claude3 Haiku" knows about Hiroshima-style okonomiyaki, a hyper-local Japanese food. I used to live in Hiroshima, so I know it well, and I think this answer is generally good. The Japanese is clean, so it passes for now.




Next, I asked about transportation from Tokyo to Osaka. Unfortunately, there was one clear mistake. The travel time by bus is stated as "about 4 hours and 30 minutes," but in reality, it takes around 8 hours. This is a hallucination.



Then I asked about the "Five Forces," a framework for analyzing market competitiveness. It analyzed the automotive industry, and the analysis incorporates the latest examples, such as the threat of electric vehicles as substitutes, making it a sufficient quality starting point for discussion. However, the fact that it's not in a table format is a drawback.





2. Next, let's analyze images.

First, I asked about the number of smartphones, but unfortunately, it got it wrong. It may not be good at counting.




This is a photo of the Atomic Bomb Dome in Hiroshima. It answered this perfectly. It seems to understand famous Japanese buildings.





This is a photo of a streetcar running in Hiroshima City. I think it captures it pretty well overall. However, the streetcars don't run solely for tourists, so the explanation may be somewhat incomplete.




This is a flight information board at Haneda Airport. It perfectly understands the detailed information. Excellent.





Counting the number of cars in a parking lot is a difficult task for generative AI. This time it answered 60 cars, but there are actually 48. If the accuracy improves a bit more, it will reach a practical level, which is a bit disappointing.






3. Impressions of using "Claude3 Haiku".

Honestly, the performance was unbelievable for a general-use AI. The Japanese is natural and clean. The fact that it can incorporate and analyze images in the first place is groundbreaking. Multimodality has arrived in general-use AI. The calculation speed is also fast, and I think it will be applied to applications that require real-time responses. And the cost is low. This allows for plenty of interesting experiments. It's a savior for startups with tight cost constraints! I want to continue doing interesting experiments using "Claude3 Haiku". Stay tuned!

(1) Claude 3 Haiku: our fastest model yet   2024.3.14  Anthropic

Copyright © 2024 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

The Evolution of AI Accelerates: A Deep Dive into Google's "Gemini 1.5 Pro"

The pace of AI advancement is truly remarkable, and this year is no exception. Google has unveiled a new generative AI called "Gemini 1.5 Pro," which boasts a groundbreaking Mixture-of-Experts (MoE) architecture. Currently only available to a limited number of users, with broader testing to come, this technology presents intriguing breakthroughs that warrant a closer look.

 
 

1. Unprecedented Context Window of 1 Million Tokens

Gemini 1.5 Pro boasts a context window that is unfathomable by existing LLMs, capable of processing up to 1 million tokens. Research has even demonstrated data ingestion of up to 10 million tokens. This represents a revolutionary breakthrough, considering that GPT-4's context window is limited to 128,000 tokens (1).

Comparison of Context Windows for Different LLMs

With such an extensive context window, Gemini 1.5 Pro can ingest an entire book at once. Currently, when creating RAG systems and referencing internal documents, chunking is necessary to accommodate the LLM's context window. However, with Gemini 1.5 Pro, this requirement is minimized, simplifying RAG development and operation. Furthermore, the model maintains high accuracy, even with such a large context window, achieving over 99% accuracy in information retrieval tests (see chart below).

 
 

2. Remarkable In-Context Learning Capabilities

The ability to process vast amounts of data is not the only noteworthy aspect of Gemini 1.5 Pro. It also excels at understanding and applying this information to various tasks. This is evident in its in-context learning capabilities, showcased in a Kalamang language translation task. The model was trained using a Kalamang grammar book and dictionary, enabling it to translate between English and Kalamang.

English to Kalamang Translation Test

Gemini 1.5 Pro outperformed other models, achieving scores that rival those of human learners. This is an astonishing feat.

 
 

3. Towards Individualized Agents with Gemini 1.5 Pro

If a model can acquire translation capabilities simply by reading a grammar book, it stands to reason that it can also learn from knowledge systems in other domains and apply that knowledge to various tasks. In other words, Gemini 1.5 Pro has the potential to develop its own "frame of reference" that influences its understanding and values. The ability to incorporate a vast amount of data into its context through its extensive context window has significant implications in this regard. This is because it allows Gemini 1.5 Pro to potentially become an individualized agent with diverse perspectives in the future. The Kalamang translation experiment provides promising evidence of this potential.

Gemini 1.5 Pro is a remarkable advancement in AI technology, offering unprecedented capabilities in terms of context window size and in-context learning. "A host of improvements made across nearly the entire model stack (architecture, data, optimization and systems) allows Gemini 1.5 Pro to achieve comparable quality to Gemini 1.0 Ultra , while using significantly less training compute and being significantly more efficient to serve" according to the report(1). This is truly a testament to the rapid progress being made in the field of AI.

I am eager to experiment with Gemini 1.5 Pro once it becomes publicly available. Stay tuned for future updates!

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, Gemini Team, Google

 

Copyright © 2024 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.