ai

The new generative AI "Google Gemini 1.5 Pro" is as amazing as expected!

Last month, I informed you that Google released a new generative AI called "Gemini 1.5 Pro" (1). And today, the "Gemini 1.5 Pro" finally arrived at Toshi Stats. I would like to experiment with it right away.



1. Can the 1 million token long context window really work?

Gemini 1.5 Pro boasts an incredibly long context window of 1 million tokens, which is unthinkable for previous LLMs. Because it is so amazing, anyone would wonder, "Can this really work?" Today, I would like to explore its capabilities here. I have prepared two experiments. The first one is to extract detailed information including numbers from relatively short materials, and the second one is to see if it can answer comprehensive questions well from materials over 200,000 tokens long. Let's begin.



2. Information extraction from Toyota Motor Corporation's financial results  

First, I will check if it can accurately extract numerical information from Toyota Motor Corporation's financial results for the fiscal year ended March 2023. The number of pages is 28, and the number of tokens is about 27,000, which is not a long material, but it is a task often seen in practice. This time, I have prepared 13 questions. Let's upload the material to Google AI Studio and ask the questions one by one.

Google AI Studio 




Here are the results. The correct answer rate is about 88%.

Questions and Results & Computation Time

For question 8, the financial results refer to ROE as "Return on equity attributable to owners of the parent company". This long phrase in Japanese (Kanji) may have been difficult for the American-born Gemini 1.5 Pro to understand. However, it showed the value by calculating from other related numbers by itself, so I gave it a △ evaluation with 0.5 points added. It seems to be able to handle Japanese quite well. By the way, the average computation time was around 30 seconds to output the results, so it seems that the computation resources have also been optimized. Just to note, this material was published on May 10, 2023, so it's possible that it has been incorporated into the training data.





3. Information extraction from the Financial Inspection Manual 

Next, I would like to analyze the lengthy financial administrative document "Financial Inspection Manual" with Gemini 1.5 Pro. This is an unforgettable document for those who have been in the financial industry since 2000 in Japan, as it sparked discussions on establishing risk management and governance frameworks. The number of pages is 366, and the number of tokens is 220,526, exceeding 200,000 tokens. Gemini 1.5 Pro is the world's first generative AI that can take in documents of this volume all at once. I was able to upload it smoothly as shown below. It answers properly when asked questions.

Uploading the Financial Inspection Manual

I actually asked, "What should be done to verify the appropriateness of the credit rating assignment process?" and received the following response (showing only 1 and 2 here as it is long).  To verify the appropriateness of the credit rating assignment process, the following items need to be examined:

* 1. Validity of rating criteria

    * Do the rating criteria appropriately reflect the credit risk of the debtors?

    * Are the rating criteria defined in an objective and verifiable manner?  

    * Are the rating criteria consistent with the debtor classifications?

    * Are the rating criteria periodically reviewed?

* 2. Appropriateness of rating assignment procedures      

    * Are the rating assignment procedures carried out appropriately in accordance with the rating criteria?

    * Are the rating assignment procedures implemented in an accurate and verifiable objective manner?

    * Can the rating assignment procedures timely reflect changes in the debtor's situation? 

    * Are the rating assignment procedures performed by an independent department not influenced by sales departments, etc.?

It continues with 3: Model Utilization, 4: Data Maintenance, 5: Internal Audit, and so on. It is a comprehensive and holistic answer that follows the principles of risk management and is also consistent with the descriptions around page 142 of the Financial Inspection Manual. Furthermore, the descriptions related to credit risk management are in the middle of this manual, and there were past comments that generative AIs tend to have lower accuracy in the middle parts of long data. However, Gemini 1.5 Pro does not seem to have any issues. Despite the specialized content, it provided a very good answer. The computation time was also around 90 seconds, which is sufficiently practical. It will surely make a good risk management assistant.  





How was that? It seems that it can analyze materials over 200,000 tokens quite accurately even in Japanese. It might also be useful for internal document search tasks at work. Next time, I would like to challenge even more difficult tasks in English. Stay tuned!"

 

1) Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, Gemini Team, Google

Copyright © 2024 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Fine-tuning GPT-3.5 with synthetic text generated by GPT-4. The accuracy has improved! In the future, we might not even need training text???

Hello, despite being in the latter half of September, it is still quite hot in Japan. The photos feel mismatched, but I'm deliberately sticking to the autumn theme, hoping it will get cooler soon. However, it might stay hot for the rest of the month.

Now, about the fine-tuning of ChatGPT-3.5 that I introduced the other day, it's certainly a hot topic. I think there is a strong demand in companies to specialize its performance for specific tasks. For this reason, we conducted an experiment assuming cases where you would want to proceed even without data at hand by generating synthetic text and then fine-tuning it.

 
  1. Experiment Details

Just like the previous experiment, we set a task to determine which financial product a given English-language complaint is about. They are complaints for the banking industry, so the task involves differentiating between six types of financial products such as mortgages and bank accounts. The data used for fine-tuning was minimal, with 100 samples for validation, just like last time. However, the training data is different this time. We generated customer complaint emails using GPT-4, and they are indistinguishable from real ones at a glance. GPT-4's performance is indeed impressive. We generated 15 similar customer complaints for training and then proceeded with fine-tuning.

synthetic text generated by GPT-4


2. Experiment Results

Since this was our first time using synthetic text, we were worried about the outcome, but we were able to confirm the effectiveness of fine-tuning as follows. Though the improvement isn't dramatic with just 15 samples, the accuracy for this task has improved compared to the base GPT-3.5, which had an accuracy of 0.5 to 0.55.

For more details on the experiment, please refer to this notebook.

 

3. Discussion

Fine-tuning with synthetic text was a method not even considered before, but with the arrival of GPT-4, it's becoming more realistic. There are several points to consider, such as the number of samples and how to write prompts, but the advantage of being able to start even without data is significant. Currently, GPT-4 is the only option for generation models, but it seems like new models like Gemini from Google will also be available next year. Technology is advancing rapidly, so we can expect a lot more in the future.

So, what did you think? We will continue to conduct various experiments and share our findings here. See you again soon!




Copyright © 2023 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.