2024 - Generative art models "remember" certain images, posing a risk to privacy

Are image-generating artificial intelligences a privacy risk? The results published in a scientific article and relayed by Gizmodo raise the question.

A bias common to all models

A group of scientists from DeepMind, UC Berkeley, Princeton and ETH Zurich succeeded in generating synthetic images that are factually very similar to those studied by the model in its learning phase. As a reminder, generative artificial intelligences such as DALL-E or Stable Diffusion are trained on databases of several thousand images, a phase called deep learning. As part of their demonstrations, the researchers notably managed to find an original image of Anne Graham Lotz, an American Protestant evangelist, initially incorporated into the training data.

To obtain the quasi-original photos initially stored by the AI, the specialists asked the software several times to create an image with the same sentence. They then checked whether the latter was part of the AI learning database. Of approximately 350,000 images generated, 94 direct matches and 109 near matches were identified. That is a memorization rate of about 0.03%, very low compared to all the images stored. All diffusion models have the same problem, to a greater or lesser degree.

The risk of medical data

Even though the reproduction rate of AI is relatively low, scientists fear that with the rise of models, more of the learned information will be regenerated in a raw way. “Maybe next year the new model that will come out will be much bigger and much more powerful, and these memorization risks will be much higher than today.“, assures Vikash Sehwag, doctoral candidate at Princeton University who participated in the study, quoted by Gizmodo.

Eric Wallace, a doctoral student at the University of Berkeley, questions the deleterious consequences of this bias with the potential use of AI on a series of synthetic medical data from x-rays. Could we manage to find the original scans of the patients? “It’s pretty rare, so you might not notice it happening at first, and then you might actually deploy that dataset to the web”warns the scientist, who recalls that the objective of this research is “to anticipate these types of errors.”

Advertising, your content continues below

Source link -98

how to declare income from your furnished rentals?

What’s going on in Tyrol? – “SpongeBob Gate” becomes a source of excitement in the relegation battle

New role-playing game for PS5, Xbox and PC is here: Is Dragon’s Dogma 2 worth buying?

At the trial of Donald Trump in New York, the testimony of Stormy Daniels

It’s about billions: Bahn has to bear the additional costs of Stuttgart 21 alone

Generative art models “remember” certain images, posing a risk to privacy

A bias common to all models

The risk of medical data