Google follows OpenAI’s lead by saying almost nothing about its new AI program PaLM 2


When AI researchers at Google revealed a major new program – the Pathways Language Model (PaLM) – a year ago, they devoted several hundred words in a technical document to describing new AI techniques. used to obtain program results.

During the presentation of the PaLM successor last week, the PaLM 2, Google revealed almost nothing. In a single table inserted into an appendix at the end of the 92-page technical report, Google researchers describe very briefly how, this time around, they won’t say anything:


PaLM-2 is a new state-of-the-art language model. We have small, medium, and large variants that use stacked layers based on the Transformer architecture, with variable settings depending on model size. Further details on the model’s size and architecture are not released outside of the company..

A turning point in the entire history of AI publishing

The deliberate refusal to divulge what is called the architecture of PaLM 2, that is to say the way the program is built, is not only in contradiction with the previous article on PaLM, but constitutes a turning point from the entire history of AI publishing, which has been mostly based on free software code and has always included substantial details about the program’s architecture.

This is clearly a response to one of Google’s main competitors, OpenAI, which stunned the research community in April by refusing to release details of its latest “generative AI” program, GPT-4. . Prominent AI scholars have warned that the surprising choice of OpenAI could have a chilling effect on information disclosure across the industry, and the PaLM 2 document is the first major sign that they could to be right.

(There is also a blog post that summarizes the new elements of PaLM 2, but without technical details).

Google is reversing decades of open publishing

PaLM 2, like GPT-4, is a generative AI program that can produce groups of texts in response to (prompt) prompts, allowing it to perform a number of tasks such as answering questions and software coding.

Like OpenAI, Google is reversing decades of open publishing in AI research. It was a 2017 Google research paper titled “Attention is all you need” that revealed in great detail a revolutionary program called The Transformer. This program was quickly adopted by much of the AI ​​research community, and industry, to develop natural language processing programs.

Among these derivatives, the ChatGPT program unveiled in the fall by OpenAI, the program that sparked the worldwide enthusiasm for ChatGPT.

There is an ideal balance between the amount of data a machine learning program is trained with and the size of the program.

None of the authors of the original paper, including Ashish Vaswani, are among the authors of PaLM 2.

Somehow, by revealing in a single paragraph that PaLM 2 is a descendant of The Transformer and refusing to divulge anything else, the company’s researchers clearly show both their contribution to the field and their intention to put an end to this tradition of sharing research advances.

The rest of the article focuses on the training data used and the benchmark scores that allow the program to shine.

This paper offers a key insight, drawing from the AI ​​research literature: there is an ideal balance between the amount of data a machine learning program is trained with and the size of the program.

The authors were able to put the PaLM 2 program on a diet by finding the right balance between program size and the amount of training data, so the program itself is much smaller than the original PaLM program, write- they. This seems important, given that the trend of AI has been in the opposite direction lately, on an increasingly larger scale.

As the authors write,

“The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model, but it uses more computation for training. Our evaluation results show that the PaLM 2 models are significantly outperformed PaLM models in a variety of tasks, including natural language generation, translation, and reasoning.These results suggest that scaling models is not the only way to improve performance. On the contrary, performance can be unlocked by careful data selection and efficient architecture/goals.Furthermore, a smaller but higher quality model greatly improves inference efficiency, reduces service cost, and enables the ‘downstream application of the model for a greater number of applications and users’.

The authors of PaLM 2 claim that there is a middle ground between the size of the program and the amount of training data. PaLM 2 programs compared to PaLM show a marked improvement in accuracy in benchmark tests, as the authors highlight in a single table:


palm-2-beats-palm-on-benchmark-tests-may-2023


Google

They are thus based on the observations of the last two years of practical research on the scale of artificial intelligence programs.

For example, a widely cited work done last year by Jordan Hoffman and his colleagues at Google’s DeepMind invented what is now known as Chinchilla’s rule of thumb, which is the formula for balancing the amount training data and program size.

The PaLM 2 scientists obtained slightly different figures from those of Hoffman and his team, but which confirm the conclusions of the article. They present their results head-to-head with Chinchilla’s work in a single scaling chart:


palm-2-scaling-versus-chinchilla-may-2023


Google

The idea is in line with efforts by young companies like Snorkel, a three-year-old San Francisco-based AI startup, which in November unveiled tools to label workout data. Snorkel assumes that better data curation can reduce some of the calculations needed.

The focus on the “sweet spot” deviates somewhat from the original PaLM model. With this model, Google highlighted the scale of the program’s training, stating that it was “the largest TPU-based system configuration used for training to date”, in reference to TPU computer chips. from Google.

This time, nothing like that is announced. So little is revealed in the new work of PaLM 2, it can be said to confirm the trend away from size for the sake of size and towards a more thoughtful treatment of scale and ability.


Source: “ZDNet.com”



Source link -97