ChatGPT can write code. But above all, he can correct it


ChatGPT, OpenAI’s AI-powered chatbot, can write code. But it can also and above all very well fix software bugs. Its main advantage, compared to other AI methods and models, is its unique ability to dialogue with humans – which allows it to improve the accuracy of its answers.

How effective are current methods?

To measure its level of effectiveness, researchers from the Johannes Gutenberg University of Mainz (Germany) and University College London (Great Britain) compared OpenAI’s ChatGPT with “standard automatic program repair techniques”. and two deep learning approaches to program repair: the CoCoNut method, created by researchers at the University of Waterloo, Canada; and Codex, OpenAI’s GPT-3 model that underpins Copilot’s autocompletion service, from GitHub.

After analyzing the results, “we find that ChatGPT’s bug-fixing performance is competitive with common CoCoNut and Codex deep learning approaches,” the researchers write in an arXiv research paper, first spotted by New Scientist.

Before adding that these performances “are significantly better than the results reported for standard program repair approaches”.

OpenAI highlights the dialogue capacity of ChatGPT

The fact that ChatGPT can solve coding problems is nothing new. But this study points out that its unique ability to dialogue with humans gives it a potential advantage over other approaches and models.

The researchers tested the performance of ChatGPT using the QuixBugs bugfix benchmark. Automated Program Repair (APR) systems seem to be at a disadvantage, as they were developed before 2018.

ChatGPT is based on a so-called transformation architecture, which Meta AI chief Yann LeCunn this week pointed out was developed by Google. Codex, CodeBERT from Microsoft Research, and its predecessor BERT from Google, are all based on Google’s method.

OpenAI highlights ChatGPT’s dialog capability in code debugging examples where it can ask for clarification and receive advice from someone to arrive at a better answer. He trained the large language models (LLMs) behind ChatGPT and GPT 3.5 using Reinforcement Learning from Human Feedback (RLHF).

The quality of the suggestions remains uncertain

While ChatGPT’s chat capability can help arrive at a more correct answer, the quality of its suggestions remains unclear, the researchers note. That’s why they wanted to evaluate ChatGPT’s performance when it comes to bug fixes.

The researchers tested ChatGPT on 40 Python-only QuixBugs issues and then manually checked whether the suggested solution was correct or not. They repeated the query four times because the reliability of ChatGPT’s responses is somewhat hit-and-miss, as a Wharton professor found after putting the chatbot through a Masters-like exam.

ChatGPT fixed 19 out of 40 Python bugs, putting it on par with CoCoNut (19) and Codex (21). But standard RPA methods solved only seven of the problems.

ChatGPT success rate during interactions reached 77.5%

The researchers found that the success rate of ChatGPT reached 77.5%.

The productivity implications for developers, however, are ambiguous. Stack Overflow recently banned ChatGPT-generated responses for being low quality, but plausible-looking. The Wharton professor felt that ChatGPT could be a great companion for Masters students, as it can act as an “intelligent consultant” – one that produces elegant, but often wrong answers – and encourages critical thinking.

“This shows that human input can be of great help to an automated APR system, with ChatGPT providing the means to do so,” the researchers write. “Despite its great performance, the question arises whether the mental cost required to check the answers of ChatGPT brings. »

Source: ZDNet.com





Source link -97