Google DeepMind

The new generative AI "Google Gemini 1.5 Pro" is as amazing as expected!

Last month, I informed you that Google released a new generative AI called "Gemini 1.5 Pro" (1). And today, the "Gemini 1.5 Pro" finally arrived at Toshi Stats. I would like to experiment with it right away.



1. Can the 1 million token long context window really work?

Gemini 1.5 Pro boasts an incredibly long context window of 1 million tokens, which is unthinkable for previous LLMs. Because it is so amazing, anyone would wonder, "Can this really work?" Today, I would like to explore its capabilities here. I have prepared two experiments. The first one is to extract detailed information including numbers from relatively short materials, and the second one is to see if it can answer comprehensive questions well from materials over 200,000 tokens long. Let's begin.



2. Information extraction from Toyota Motor Corporation's financial results  

First, I will check if it can accurately extract numerical information from Toyota Motor Corporation's financial results for the fiscal year ended March 2023. The number of pages is 28, and the number of tokens is about 27,000, which is not a long material, but it is a task often seen in practice. This time, I have prepared 13 questions. Let's upload the material to Google AI Studio and ask the questions one by one.

Google AI Studio 




Here are the results. The correct answer rate is about 88%.

Questions and Results & Computation Time

For question 8, the financial results refer to ROE as "Return on equity attributable to owners of the parent company". This long phrase in Japanese (Kanji) may have been difficult for the American-born Gemini 1.5 Pro to understand. However, it showed the value by calculating from other related numbers by itself, so I gave it a △ evaluation with 0.5 points added. It seems to be able to handle Japanese quite well. By the way, the average computation time was around 30 seconds to output the results, so it seems that the computation resources have also been optimized. Just to note, this material was published on May 10, 2023, so it's possible that it has been incorporated into the training data.





3. Information extraction from the Financial Inspection Manual 

Next, I would like to analyze the lengthy financial administrative document "Financial Inspection Manual" with Gemini 1.5 Pro. This is an unforgettable document for those who have been in the financial industry since 2000 in Japan, as it sparked discussions on establishing risk management and governance frameworks. The number of pages is 366, and the number of tokens is 220,526, exceeding 200,000 tokens. Gemini 1.5 Pro is the world's first generative AI that can take in documents of this volume all at once. I was able to upload it smoothly as shown below. It answers properly when asked questions.

Uploading the Financial Inspection Manual

I actually asked, "What should be done to verify the appropriateness of the credit rating assignment process?" and received the following response (showing only 1 and 2 here as it is long).  To verify the appropriateness of the credit rating assignment process, the following items need to be examined:

* 1. Validity of rating criteria

    * Do the rating criteria appropriately reflect the credit risk of the debtors?

    * Are the rating criteria defined in an objective and verifiable manner?  

    * Are the rating criteria consistent with the debtor classifications?

    * Are the rating criteria periodically reviewed?

* 2. Appropriateness of rating assignment procedures      

    * Are the rating assignment procedures carried out appropriately in accordance with the rating criteria?

    * Are the rating assignment procedures implemented in an accurate and verifiable objective manner?

    * Can the rating assignment procedures timely reflect changes in the debtor's situation? 

    * Are the rating assignment procedures performed by an independent department not influenced by sales departments, etc.?

It continues with 3: Model Utilization, 4: Data Maintenance, 5: Internal Audit, and so on. It is a comprehensive and holistic answer that follows the principles of risk management and is also consistent with the descriptions around page 142 of the Financial Inspection Manual. Furthermore, the descriptions related to credit risk management are in the middle of this manual, and there were past comments that generative AIs tend to have lower accuracy in the middle parts of long data. However, Gemini 1.5 Pro does not seem to have any issues. Despite the specialized content, it provided a very good answer. The computation time was also around 90 seconds, which is sufficiently practical. It will surely make a good risk management assistant.  





How was that? It seems that it can analyze materials over 200,000 tokens quite accurately even in Japanese. It might also be useful for internal document search tasks at work. Next time, I would like to challenge even more difficult tasks in English. Stay tuned!"

 

1) Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, Gemini Team, Google

Copyright © 2024 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

The Evolution of AI Accelerates: A Deep Dive into Google's "Gemini 1.5 Pro"

The pace of AI advancement is truly remarkable, and this year is no exception. Google has unveiled a new generative AI called "Gemini 1.5 Pro," which boasts a groundbreaking Mixture-of-Experts (MoE) architecture. Currently only available to a limited number of users, with broader testing to come, this technology presents intriguing breakthroughs that warrant a closer look.

 
 

1. Unprecedented Context Window of 1 Million Tokens

Gemini 1.5 Pro boasts a context window that is unfathomable by existing LLMs, capable of processing up to 1 million tokens. Research has even demonstrated data ingestion of up to 10 million tokens. This represents a revolutionary breakthrough, considering that GPT-4's context window is limited to 128,000 tokens (1).

Comparison of Context Windows for Different LLMs

With such an extensive context window, Gemini 1.5 Pro can ingest an entire book at once. Currently, when creating RAG systems and referencing internal documents, chunking is necessary to accommodate the LLM's context window. However, with Gemini 1.5 Pro, this requirement is minimized, simplifying RAG development and operation. Furthermore, the model maintains high accuracy, even with such a large context window, achieving over 99% accuracy in information retrieval tests (see chart below).

 
 

2. Remarkable In-Context Learning Capabilities

The ability to process vast amounts of data is not the only noteworthy aspect of Gemini 1.5 Pro. It also excels at understanding and applying this information to various tasks. This is evident in its in-context learning capabilities, showcased in a Kalamang language translation task. The model was trained using a Kalamang grammar book and dictionary, enabling it to translate between English and Kalamang.

English to Kalamang Translation Test

Gemini 1.5 Pro outperformed other models, achieving scores that rival those of human learners. This is an astonishing feat.

 
 

3. Towards Individualized Agents with Gemini 1.5 Pro

If a model can acquire translation capabilities simply by reading a grammar book, it stands to reason that it can also learn from knowledge systems in other domains and apply that knowledge to various tasks. In other words, Gemini 1.5 Pro has the potential to develop its own "frame of reference" that influences its understanding and values. The ability to incorporate a vast amount of data into its context through its extensive context window has significant implications in this regard. This is because it allows Gemini 1.5 Pro to potentially become an individualized agent with diverse perspectives in the future. The Kalamang translation experiment provides promising evidence of this potential.

Gemini 1.5 Pro is a remarkable advancement in AI technology, offering unprecedented capabilities in terms of context window size and in-context learning. "A host of improvements made across nearly the entire model stack (architecture, data, optimization and systems) allows Gemini 1.5 Pro to achieve comparable quality to Gemini 1.0 Ultra , while using significantly less training compute and being significantly more efficient to serve" according to the report(1). This is truly a testament to the rapid progress being made in the field of AI.

I am eager to experiment with Gemini 1.5 Pro once it becomes publicly available. Stay tuned for future updates!

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, Gemini Team, Google

 

Copyright © 2024 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

"REST MEETS REACT" is a new prompt-engineering method using synthetic data. It holds immense potential for enhancing AI without relying on human-generated data

Happy New Year! Thank you for your continued support. Promptly, Google DeepMind has announced a new, advanced prompt engineering method suitable for the new year. It is a paper titled "REST MEETS REACT: SELF-IMPROVEMENT FOR MULTI-STEP REASONING LLM AGENT"(1). It incorporates fine-tuning with synthetic data, which looks promising! Let's get started.

 

1.Prompt Structure

This prompt is designed with a web Q&A system in mind that answers complex questions. The structure is as follows:

The blue part in the figure above represents the flow of the agent described in the prompt, aiming to answer complex questions using web search. In the latter half, "Relevance self-check" and "Grounding self-check" are functions for the agent to check its own answers. It's a self-check function. For a detailed explanation of the entire flow, please refer to the paper.

 

2. "Reward Model" - The Key to Success

Now, let's explain the core part of self-improvement. In a nutshell, it's about "creating new high-quality data and fine-tuning the model with it." . This function consists of three parts:

  • Grow: Start with a model capable of running Search Agent, using Google PaLM 2-L model for this purpose. Trajectories are collected based on a selected set of 2000 public questions. Trajectory, though an unfamiliar term, refers to the reasoning process and is commonly used in reinforcement learning.

  • Improve: Convert trajectories into data for fine-tuning, using the Reward model to select only high-quality data. No external data, like labels, are used.

  • Fine-tuning: Fine-tune a new model of the same size with this new data, ensuring it performs better than the original.

This process is repeated with the better model using the new data. As a result, accuracy improves while maintaining the original data, without adding external data. Therefore, the accuracy of the Reward model in ranking is crucial. The Reward model is constructed as a set of prompts in this paper. Let's look more closely at these prompts, showing only the initial part.

  • The goal of this rating is to filter out bad actions so that they'll be excluded from the fine-tuning dataset.

  • Overall, we want the agent to produce relevant and grounded answers with minimal steps. Anything deviating from this goal is considered bad.

  • If any element (thoughts, comments, etc.) is empty, then it's automatically bad.

"Filter out" indicates a method of discarding items that don't meet the standards and adopting only the high-quality data that remains. Please see the paper (p19) for details.

 




3.Improve Accuracy with Synthetic Data

Papers including this one have been published in late 2023, focusing on using the Reward model to create high-quality synthetic data for model fine-tuning and accuracy improvement. Vigorous research is expected to continue in 2024, yielding various results. Especially in the LLM field, collecting high-quality training data is becoming increasingly difficult, and fine-tuning with synthetic data is anticipated as a solution.


 


How was it? The improvement in model accuracy with synthetic data is expected to be a very effective development method for startups like us, who cannot collect vast amounts of data independently. Our blog will continue to follow these synthetic data and other technological innovations, so stay tuned. Wishing you a great year!






1) “REST MEETS REACT: SELF-IMPROVEMENT FOR MULTI-STEP REASONING LLM AGENT" Renat Aksitov†1 , Sobhan Miryoosefi†1 , Zonglin Li†1 , Daliang Li†1 , Sheila Babayan†2 , Kavya Kopparapu†2 , Zachary Fisher1 , Ruiqi Guo1 , Sushant Prakash1 , Pranesh Srinivasan3 , Manzil Zaheer2 , Felix Yu1 , and Sanjiv Kumar1,    1Google Research, 2Google DeepMind, 3Google †Core contributors, 15 Dec 2023, https://arxiv.org/abs/2312.10003





Copyright © 2023 Toshifumi Kuga. All right reserved





Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.