Recognizing texts generated by an AI, a more difficult task than it seems


Scientists from the American University of Stanford have been interested in tools for detecting texts generated by AI, by testing seven of them. To do this, they submitted 91 essays written by people whose native English is not English as part of the Toefl test (Test of English as a Foreign Language). The research team found that this software identified these texts as being produced by AIs, not humans. One of them even claimed that 98% of these writings were the result of artificial intelligence. Surprisingly, these detection tools have a much easier time with American fourth-grade student essays. They managed to identify them correctly in 90% of cases, as explained by the researchers in their study published in the journal patterns.

These results show that artificial intelligences have a very particular style. ChatGPT and other generative AI software tend to produce texts that are close to perfect: they do not contain the slightest spelling or grammatical error. But they are quite basic and do not contain complex grammatical constructions or “rare” words (supported, slang, etc.). Texts produced by AIs give “the illusion of accuracy”according to Melissa Heikkilä, a journalist specializing in artificial intelligence issues. “The sentences the AIs produce look correct — they use the right kinds of words in the right order. But [ces technologies] don’t know[vent] not what that means. These language models predict the most likely next word in a sentence. They don’t have a clue what’s right or wrong.”she explains in the MIT Technology Review.

The complexity of the language

AI-generated text detection tools were designed to recognize these stylistic tics. They are based on algorithms that assess the complexity of a piece of writing by looking, for example, at the words and turns of phrase used. If they are rudimentary, the software will tend to believe that this text is the result of an AI, and not of a person with a limited vocabulary. “If you use common English words, the detectors will give a low complexity score [à votre texte], which means it will likely be considered AI-generated. If you use complex words, algorithms are more likely to classify your text as being written by a human.”said James Zou, assistant professor at Stanford University and lead author of the study, in a press release.

This is why James Zou and his colleagues recommend careful use of these AI-produced text detection software, especially in school or professional settings. They are not 100% reliable and, above all, can be easily fooled by changing a few words or turns of phrase.

Advertising, your content continues below



Source link -98