Why open source is the birthplace of artificial intelligence


In a way, open source and artificial intelligence were born together.

In 1971, if you had mentioned AI to most people, they might have thought of Isaac Asimov’s Three Laws of Robotics. However, AI was already a real topic that year at MIT, where Richard M. Stallman (RMS) joined the MIT Artificial Intelligence Lab. Years later, as proprietary software emerged, RMS developed the radical idea of ​​open source software. Decades later, this concept, transformed into free software, would become the birthplace of modern AI.

It was not a science fiction author but a computer scientist, Alan Turing, who launched the modern AI movement. Turing’s 1950 paper Computing Machine and Intelligence is the origin of the Turing test. In summary, this test states that if a machine can make you believe it is talking to a human being, it is intelligent.

Why didn’t we have GNU-ChatGPT?

According to some, today’s AIs can already do this. I don’t agree, but we are clearly on the right track.

In 1960, computer scientist John McCarthy coined the term “artificial intelligence” and, in the process, created the language Lisp. According to computer scientist Paul Graham, McCarthy accomplished for programming what Euclid did for geometry. He showed how, from a handful of simple operators and a notation for functions, it is possible to build a complete programming language.

Lisp, in which data and code are mixed, has become the first AI language. This is also RMS’s first love for programming.

So why didn’t we have GNU-ChatGPT in the 1980s?

ChatGPT and Llama 2 are born from open sources

There are many theories. My favorite is that the first AIs had the right ideas in the wrong decade. The equipment was not up to the challenge. Other essential elements, like Big Data, were not yet available to help real AI get off the ground. Open source projects such as Hadoop, Spark, and Cassandra provided the tools AI and machine learning needed to store and process large amounts of data on clusters of machines. Without this data and rapid access to it, large language models (LLMs) could not work.

Today, even Bill Gates, who is not a fan of open source, admits that open source-based AI is the biggest thing since he discovered the idea of ​​an interface graphical user interface (GUI) in 1980. From this GUI idea, you may remember that Gates created a little program called Windows.

In particular, today’s extremely popular generative AI models such as ChatGPT and Llama 2 were born from open sources. This does not mean that ChatGPT, Llama 2 or DALL-E are free software. This is not the case.

TensorFlow and PyTorch power ChatGPT

Oh, they were meant to be. As Elon Musk, an early investor in OpenAI, said: “OpenAI was created as open source (that’s why I called it “Open” AI), a non-profit company to serve as a counterweight to Google. But it has now become a proprietary, maximum-profit company, controlled by Microsoft. This is not at all what I had planned.”

Regardless, OpenAI and all other generative AI programs are built on open-source foundations. In particular, Hugging Face’s Transformer is the best open-source library for building machine learning models. It provides pre-trained models, architectures and tools for natural language processing tasks. This allows developers to build on existing models and refine them for specific use cases. ChatGPT relies on the Hugging Face library for its LLMs. Without Transformer, there is no ChatGPT.

Additionally, TensorFlow and PyTorch, developed by Google and Facebook respectively, powered ChatGPT. These Python frameworks provide essential tools and libraries for building and training deep learning models. Needless to say, other open-source AI/ML programs are built on top of these frameworks. For example, Keras, a high-level TensorFlow API, is often used by developers without deep learning experience to build neural networks.

The Meta license trap

You can debate endlessly about which one is better – and AI programmers do – but both TensorFlow and PyTorch are used in many projects. So behind the scenes of your favorite AI chatbot is a mix of different open-source projects.

Some high-profile programs, like Meta’s Llama-2, claim to be free software. This is not the case. Although many free software programmers have turned to Llama because it is as user-friendly as any of the major AI programs, Llama-2 is not free software. Certainly, you can download it and use it. And it’s easy to create apps powered by Llama. There is only one small problem, buried in the license: If your program is wildly successful and you have

over 700 million monthly active users in the preceding month, you must request a license from Meta, which Meta may grant to you in its sole discretion.

So you can give up any dreams of becoming a billionaire by developing a Virtual Girl/Boy Friend based on Llama. Mark Zuckerberg may thank you for helping him earn a few extra billion euros.

The revealing leak from the Google engineer

But there are some true open-source LLMs — like Falcon180B.

However, almost all major commercial LLMs are not truly open source. Certainly, Be careful, all the main LLMs were trained on open data. For example, GPT-4 and most other major LLMs get some of their data from CommonCrawl, a text archive that contains petabytes of data scraped from the web. If you wrote something on a public site – a happy birthday message on Facebook, a comment on Reddit, a mention on Wikipedia or a book on Archives.org – and that text was written in HTML, there are there is a good chance that your data will be there.

So, is open source doomed to continue to be the good daughter of AI? Not so fast.

In a leaked internal Google document, a Google AI engineer wrote: “The uncomfortable truth is that we are not in a position to win the generative AI arms race no “More. While we were bickering, a third faction quietly ate our lunch.”

This third actor? This is the free software community.

AI doesn’t just run on the cloud

It turns out you don’t need cloud computing giants or thousands of high-end GPUs to get useful answers with generative AI. In fact, you can run LLMs on a smartphone: People running models on a Pixel 6 at 5 LLM tokens per second. You can also polish a custom AI on your laptop in an evening. When you can “customize a language model in a matter of hours on consumer hardware,” the engineer notes, “that’s important.”

With the open source Hugging Face low-rank adaptation (LoRA) software, you can perform fine-tuning of the model at a fraction of the cost and time of other methods. What fraction? What do you think about customizing a language model in a few hours on consumer hardware?

The Google developer adds:

“Part of what makes LoRA so effective is that, like other forms of tuning, it is stackable. Enhancements like instruction tuning can be applied and then leveraged as other contributors add dialogs , reasoning or the use of tools. While individual adjustments are small, their sum is not necessarily small. This allows model updates to accumulate over time. This means that As new and better data sets and tasks become available, the model can be updated inexpensively without ever having to pay the cost of a full run.”

“Direct competition with open source is a losing proposition”

Our mysterious programmer concluded: “Direct competition with open source is a losing proposition…. We should not expect to be able to catch up. There is a reason why the modern internet runs on software “Open source has significant benefits that we cannot replicate.”

Thirty years ago, no one imagined that an open source operating system could one day supplant proprietary systems like Unix and Windows. It may take much less than three decades for a truly open and comprehensive AI program to supplant the semi-proprietary programs we use today.


Source: “ZDNet.com”





Source link -97