The next big AI threat may already be lurking on the web

Artificial intelligence (AI) and machine learning experts warn of the risk of data ‘poisoning’ attacks, which can undermine large datasets used to train models deep learning capabilities of many AI services.

Data poisoning occurs when attackers tamper with training data used to build deep learning models. This action means that it is possible to affect decisions made by the AI ​​in a way that is difficult to track.

By altering the source information used to train machine learning algorithms, data poisoning attacks can be extremely powerful. Because the AI ​​learns from incorrect data and can therefore make “bad” decisions that have important consequences.

Split-view poisoning, small but strong

However, there is currently no evidence of actual attacks involving the poisoning of web-scale datasets. But a group of AI and machine learning researchers from Google, ETH Zurich, NVIDIA and Robust Intelligence claim to have demonstrated the possibility of poison attacks that “guarantee” malicious examples appear in sets. of web-scale data used to train the largest machine learning models.

“Although large deep learning models are resilient, even tiny amounts of ‘noise’ in the training sets (i.e. a poison attack) are enough to introduce targeted errors into the behavior of the model. “, warn the researchers.

The researchers explain that using the techniques they devised to exploit the way datasets work, they could have poisoned 0.01% of the largest deep learning datasets with little to no damage. effort and low cost. While 0.01% doesn’t seem like a lot of datasets, the researchers warn that it’s “enough to poison a model.”

This attack is known as “split-view poisoning”. If an attacker managed to gain control of a web resource indexed by a particular dataset, they could poison the collected data, rendering it inaccurate, with the potential to negatively affect the entire algorithm.

Always traffic from expired domain names

One way for attackers to achieve this goal is to purchase expired domain names. Domains expire regularly and can then be purchased by someone else, providing a perfect opportunity for a data poisoner. “The adversary does not need to know the exact time when clients will download the resource in the future: by owning the domain, the adversary guarantees that any future downloads will collect poisoned data,” the researchers said. .

Researchers point out that buying a domain and exploiting it for malicious purposes is not a new idea. Cybercriminals use it to spread malware. But attackers with different intentions could potentially poison a large dataset.

Front-running poisoning, the wound for Wikipedia

Additionally, the researchers detailed a second type of attack they call “front-running poisoning.”

In this case, the attacker does not have full control of the specific dataset, but is able to accurately predict when a web resource will be accessed to be included in a snapshot of the dataset . With this knowledge, the attacker can poison the dataset just before the information is collected.

Even if the information returns to its original, unmanipulated form after just a few minutes, the dataset will still be incorrect in the snapshot taken when the malicious attack was active.

One of the most widely used resources for finding training data for machine learning is Wikipedia. But the nature of Wikipedia means anyone can edit it – and according to the researchers, an attacker “can poison a training set from Wikipedia by making malicious edits”.

Predict snapshots, the key to winning infection

Wikipedia’s datasets are not based on the live page, but on snapshots taken at a specific time, which means attackers who intervene at the right time can maliciously modify the page and force the model to collect data. inaccurate data, which will be stored in the dataset permanently.

“An attacker who can predict when a Wikipedia page will be used for inclusion in the next snapshot can perform poisoning immediately before scrapping. Even if the edit is quickly undone on the live page, the snapshot will contain the malicious content – forever,” the researchers wrote.

The way Wikipedia uses a well-documented protocol to produce snapshots means that it is possible to predict the times of article snapshots with great accuracy. The researchers suggest that it is possible to exploit this protocol to poison Wikipedia pages with a success rate of 6.5%.

This percentage may seem low, but the number of Wikipedia pages and the way they are used to train machine learning datasets means that it would be possible to feed models with inaccurate information.

The researchers note that they did not modify any Wikipedia pages live and that they informed Wikipedia of the attacks and potential ways to defend against them as part of the responsible disclosure process. ZDNET contacted Wikipedia for comment. The researchers also note that the purpose of publishing the article is to encourage others in the security field to conduct their own research on how to defend AI and machine learning systems from attackers. malicious attacks.

“Our work is only a starting point for the community to develop a better understanding of the risks of generating patterns from web-scale data,” the document states.


Source link -97