Has ChatGPT lost reliability? According to these researchers, “yes”, and not just a little!


Camille Coirault

July 20, 2023 at 4:30 p.m.

13

AI Thinking © © Adobe Stock

© Phonlamai Photo / Shutterstock

Is the OpenAI chatbot slowly decreasing in intelligence? In any case, this is what several teams of researchers seem to confirm in a study published on July 18.

While ChatGPT’s popularity saw a slight decline in June, the chatbot is now raising concerns about its reliability. Researchers from UC Berkeley and Stanford recently published a paper demonstrating that the GPT-4 version would have undergone significant changes… losing some of its performance in the process.

GPT Twitter 2 © © Twitter

© Matei Zaharia on Twitter

Questionable reliability and declining math skills

The researchers evaluated OpenAI’s two language models, GPT-4 and GPT-3.5, by giving each a simple math problem: identify prime numbers. The result was rather alarming: GPT-4 lagged more than the free version. This only gave the correct answer 2.4% of the time, while GPT-3.5 answered correctly 97.6% of the time. There is clearly reason to wonder about these results, especially since they are not really advanced mathematical problems. There is no doubt that version 4 would be seriously cut off from some of its specific uses if these poor results were to be confirmed over time.

OpenAI’s generative AI models were already singled out in some cases for their reliability, in particular on the accuracy of historical facts or on the relay of false information. This new finding established by the study is not likely to give good press to the company, which has not yet provided any official explanations.

© James Zou on Twitter

Visible behavioral changes

The field of mathematics is not the only one concerned. Both teams of researchers also reported that ChatGPT showed more difficulty in explaining why certain questions were sensitive to address. Previously, the OpenAI chatbot explained rather precisely why it could not answer such and such a question (requests against the law or morals, for example). The more recent version remains much more evasive and does not provide explanations. Instead, no response and an apology.

Version 4 would also have deteriorated in the face of spatial reasoning issues. A trick question like ” Imagine that you are in a room with three doors; You enter through the right door and exit through the left door. Where do you find yourself now? could put the chatbot in trouble. This was not the case with the previous version. This degradation could also limit the use of ChatGPT in certain circumstances.

The results of the study conducted by the UC Berkeley researchers are quite clear: GPT-4 experiences a slight deterioration in its initial capabilities. For the moment, it is difficult to really know what is the origin of this problem. What is certain is that users and companies that make use of AI models will have to be more vigilant in the future. We also have the right to expect more transparency from OpenAI about the origin of these changes.

ChatGPT

Download

ChatGPT

  • Chat in different languages, including French
  • Generate, translate and get a text summary
  • Generate, optimize and correct code

Created by OpenAI, ChatGPT is an advanced chat platform, based on the state-of-the-art GPT-4 language model. By leveraging deep learning and artificial intelligence technologies, this GPT chatbot is able to interpret and understand user requests. With its ability to intelligently generate text, ChatGPT produces tailored and appropriate responses, providing an optimized user experience and natural chat interaction.

Created by OpenAI, ChatGPT is an advanced chat platform, based on the state-of-the-art GPT-4 language model. By leveraging deep learning and artificial intelligence technologies, this GPT chatbot is able to interpret and understand user requests. With its ability to intelligently generate text, ChatGPT produces tailored and appropriate responses, providing an optimized user experience and natural chat interaction.

Sources: Gizmodo, Cointelegraph, James Zou on Twitter



Source link -99