Culture of secrecy: when an AI pioneer takes OpenAI from behind


Cerebras’ Andromeda supercomputer was used to train seven language programs similar to OpenAI’s ChatGPT. Image: Cerebras.

The world of artificial intelligence, and more particularly the very popular branch of “generative AI”, which consists in automatically creating texts and images, risks closing in on itself due to the chilling effect of companies that choose not to publish details of their research.

But this trend towards secrecy is also prompting some players in the AI ​​world to step in and fill the disclosure void.

On Tuesday, AI pioneer Cerebras Systems, maker of a computer dedicated to AI, released several open-source versions of generative AI programs for unrestricted use.

“Companies make different decisions, and we disagree”

Programs are “trained” by Cerebras, meaning they are brought to optimal performance using the company’s powerful supercomputer, which reduces some of the work that outside researchers have to do.

“Companies are making different decisions than they made a year or two ago, and we disagree with those decisions,” Cerebras co-founder and CEO Andrew Feldman said in an interview. to ZDNET, alluding to the decision by OpenAI, the creator of ChatGPT, not to release technical details when it unveiled its latest generative AI program this month, GPT-4. A decision that has been widely criticized in the world of AI research.

cerebras-announcement-march-2023-distribution-version-slide-2

Image: Cerebras.

cerebras-announcement-march-2023-distribution-version-slide-3

Image: Cerebras.

The code is available on AI startup Hugging Face’s website and on GitHub

“We believe that an open and vibrant community – not just made up of researchers, and not just three, four, five or eight people who own an LLM, but a vibrant community in which start-ups, large companies medium and large companies form great language models – is good for us and for others,” he says.

The term “big language model” refers to AI programs based on the principles of machine learning, in which a neural network captures the statistical distribution of words in a sample of data. This process allows a large language model to predict the next word in sequence. This capability is the basis of popular generative AI programs such as ChatGPT.

The same type of machine learning approach applies to generative AI in other fields, such as OpenAI’s Dall E, which generates images from a sentence suggestion.

Cerebras has released seven major language models in the same style as OpenAI’s GPT program, which started the generative AI craze in 2018. The code is available on the startup’s website. ‘IA Hugging Face and on GitHub.

Programs vary in size, from 111 million parameters to 13 billion

The programs vary in size, from 111 million parameters, or neural weights, to 13 billion. More parameters make an AI program more powerful, in general, so Cerebras code offers a range of performance.

The company has released not only the sources of the programs, in Python and TensorFlow format, under the Apache 2.0 free license, but also the details of the training mode that brought the programs to a developed state of functionality.

This disclosure allows researchers to review and reproduce the work of Cerebras.

According to Andrew Feldman, this is the first time a GPT-like program has been released “using state-of-the-art training techniques.”

Other published work on learning AI either concealed technical data, such as OpenAI’s GPT-4, or the programs were not optimized during development, meaning that the data provided in the program have not been adapted to the size of the program, as explained in an article from the technical blog of Cerebras.

cerebras-announcement-march-2023-distribution-version-slide-11

Image: Cerebras.

Language models of this size are notoriously computationally intensive. Cerebras’ work released Tuesday was developed on a cluster of 16 of its CS-2 computers, dorm-fridge-sized computers that are specifically designed for AI-like programs. This cluster, whose existence has already been revealed by the company, is known as the Andromeda supercomputer, which can significantly reduce the work of training LLMs on thousands of Nvidia GPU chips.

As part of Tuesday’s post, Cerebras offered what it said was the first open-source scaling law, a gold standard for how the accuracy of such programs increases with the size of data-based programs. open-source data. The dataset used is the open source The Pile, an 825 gigabyte collection of mostly professional and academic texts introduced in 2020 by the nonprofit Eleuther Lab.

cerebras-announcement-march-2023-distribution-version-slide-12

Image: Cerebras.

Earlier scaling laws from OpenAI and Google’s DeepMind used training data that was not open source.

In the past, Cerebras has promoted the benefits of its systems in terms of efficiency. The ability to train demanding natural language programs efficiently is at the heart of open publishing issues, said Andrew Feldman.

“If you can achieve efficiencies, you can afford to put things out in the open source community,” he explains. “Efficiency allows us to act quickly and easily and do our part for the community. »

One of the main reasons OpenAI and others are beginning to close their work to the rest of the world is to protect the source of profit from the rising cost of training AI, it says. he.

“It’s so expensive that they decided it was a strategic asset and they decided to hide it from the community because it’s strategic for them,” he says. “And I think that’s a very reasonable strategy. »

“It’s a reasonable strategy if a company wants to invest a lot of time, effort and money and not share the results with the rest of the world,” he adds.

However, “we think this makes the ecosystem less interesting and that in the long term, this limits the rise” of research, he believes.

Companies can “store” resources, such as datasets or model expertise, by curating them, he observes.

“The question is how these resources are used strategically in the landscape,” he asks. “We think we can help by providing open models, using data that everyone can see. »

Asked what the open-source version could produce, Andrew Feldman remarked that “hundreds of separate institutions could work with these GPT models that they otherwise couldn’t, and solve problems that could have otherwise been set aside”.

Source: ZDNet.com





Source link -97