Stable Diffusion, Imagen, Dall-E 2: the AI ​​sometimes generates images that are scrupulously identical… to the images that fed it


Robin Lamorlette

February 06, 2023 at 10:45 a.m.

2

Dall-E - Outpainting © © OpenAI

© Open AI

Short of inspiration, the AIs begin to reproduce feature by feature the works that have trained them. This observation, quite logical in the end, comes from a committee of expert researchers in the field of AI.

The research group in question includes scientists specializing in artificial intelligence from no less than Google, DeepMind, ETHZ as well as the universities of Berkeley, California and Princeton. Be clearly people who know what they are talking about.

Early Inspiration

Available on arXiv.org and cited in source below, this study therefore demonstrates that popular image-generating AIs such as DALL-E, Imagen, Midjourney or Stable Diffusion sometimes run out of inspiration. So much so that they can’t find anything better to produce than what was used to train them.

As a reminder, these tools have indeed been fed by their creators with thousands, even millions of images before being placed in the hands of the public. Since then, they have experienced the exponential growth that we know of them, no offense to many art communities.

The research group thus estimated that out of 1,000 images generated by these artificial intelligences, 100 would be an almost identical reproduction of the images that were used to train them.

A copyright issue

By digging deeper, the committee of scientists established another well-known flip side of these particularly popular tools. To train them, their creators have indeed collected samples from all over the Internet. However, many of them are protected by copyright.

More precisely, according to the research group, 35% of reused images explicitly display their coverage by such intellectual property rights. 65% of the samples do not indicate it black on white, but are in principle placed under the aegis of the general regime of protection by copyright.

The solution found by the creators of such tools is generally to add noise to the images during the processing phase, giving the illusion (often very easy to detect) that the image thus produced is not a copy. The committee of scientists concludes its study on this point by indicating that the creators of these tools must add a marking system to the images used in their training to avoid such annoying repetitions.

Source : arXiv.org



Source link -99