The release of the new LLM "Llama3" by Meta on April 18th has created quite a stir in the AI community (1). As a highly anticipated open-source model with performance expectations on par with GPT-4, its potential applications seem limitless.
Examining the performance on leaderboards (2), Llama3 is available in two sizes: 70B and 8B parameters. The larger 70B model, in particular, demonstrates capabilities that rival proprietary models such as GPT-4, Claude3-Opus, and Gemini 1.5 Pro.
To assess Llama3's performance, a test was conducted using a bank customer complaint classification task. The objective was to evaluate the model's accuracy in categorizing complaints without any fine-tuning.
1.To what extent can we discriminate between six categories of customer complaints without training?
The dataset consisted of customer complaints from a US bank, categorized into six product areas:
Mortgage
Checking or savings account
Student loan
Money transfer, virtual currency, or money service
Bank account or service
Consumer loan
Examples of these complaints, all in English, were provided.
A random sample of 500 complaints was used with a prompt instructing Llama3-70B to assign a product category to each complaint. The results were astounding, achieving an accuracy rate of 88.6%. This near 90% accuracy was unprecedented and speaks volumes about Llama3's potential.
2. Maintaining Accuracy with Japanese Data?
Considering the potential use of Llama3 in Japan, the English dataset was translated into Japanese using Google Translate. The classification task was then repeated with the translated data.
Despite Llama3's training data being predominantly English (around 95%), the model maintained an impressive accuracy rate of 82.8% with the Japanese data. This suggests that Llama3's capabilities extend beyond English and hold promise for multilingual applications.
3. Conclusion and Future Prospects
Llama3 has proven to be a top-tier performer, despite being open-source. This achievement deserves appreciation for Meta's contribution to the AI community. Hopefully, other companies like Google will follow suit and release their own open-source models more.
Further experiments are planned to evaluate the accuracy and computational speed of the smaller 8B model. Stay tuned for the results!
1) meta website https://llama.meta.com/llama3/
2) LMSYS Chatbot Arena Leaderboard https://chat.lmsys.org/?leaderboard
3) https://github.com/TOSHISTATS/Classification-of-Consumer-Complaints-by-Llama3/tree/main
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.