From ChatGPT to Google Bard, a security flaw affects generative AIs


Camille Coirault

September 11, 2023 at 1:00 p.m.

0

Death and Robots © © Ecran Large

© Widescreen

This isn’t the first time, and it certainly won’t be the last. Once again, cybersecurity researchers have uncovered a new exploitable flaw within AI-powered language models.

After the discovery of measures to circumvent chatbot protections, this time another type of software weakness was spotted by these researchers. These language models can be easily manipulated by anyone with sufficient knowledge of computer science or cybersecurity. This new way of hijacking chatbots like Bard or ChatGPT is called “Indirect Prompt Injection”.

Indirect Prompt Injection: a clever but potentially dangerous diversion

When you interact with a chatbot powered by generative AI, you type a query in text form. These instructions, called “prompts”, then allow the system to access your request directly. To prevent illegal or fraudulent use, chatbots have protections preventing them from providing information if the prompt in question is suspicious. ChatGPT or Google Bard will never give you a foolproof method for organizing an assassination or a bank robbery, for example. Still happy.

In practice, and for most users, these protections work and are effective. The recent discovery of these researchers is, however, worrying. Instead of providing a prompt directly, it is possible to provide hidden instructions (in a PDF or web page, for example) to a model to make the AI ​​act by ignoring its protective measures. Hundreds of cases of Indirect Prompt Injection have already been documented, and this is clearly only the beginning.

Chatbot © © Deemerwha studio / Shutterstock

© Deemerwha studio / Shutterstock

A practice that tends to accelerate

With this technique, the field of possibilities is wide open: data theft, execution of malicious code or manipulation of information. The head of information security at Google DeepMind, Vijay Bolina, assures that this threat is serious. If this indirect injection technique was previously considered “problematic”, today it is viewed with much more concern. Indeed, this type of misuse was rather rare, but things have changed, and this process is more and more frequent since it is possible to connect language models to the Internet and to different plugins.

Even if there is no miracle solution, Bolina assures for his part that Google DeepMind is working seriously on the development of AI models capable of identifying this type of suspicious activity. Once again, it’s a game of cat and mouse between service providers and hackers. With always this same question remaining unanswered: who will be the quickest to lose the other?

Source : Wired



Source link -99