Can GitHub Copilot really improve developer productivity?


GitHub has published research showing that its recently released code completion tool Copilot actually correlates with improving developer productivity.

GitHub Copilot, an artificial intelligence pair programming service, was made available to the public a month ago at a cost of $10 per user per month, or $100 per user per year.

It is an extension of Microsoft’s Visual Studio Code editor that suggests to code developers what they can accept, reject or change. Code suggestions are generated by OpenAI’s Codex-based natural language AI model, itself a version of GPT-3, and has been trained on billions of lines of publicly available source code, including the code published on GitHub.

Controversial

Copilot has caused some controversy, as not all developers are happy with using their code for model training. But GitHub has published a study aimed at verifying its theory that Copilot enables a higher productivity rate among developers.

Its researchers analyzed 2,631 survey responses from developers using Copilot and matched their responses to metrics collected in the IDE (integrated development environment). The challenge was to find the best method to measure the effect of Copilot on developer productivity.

“We found that the rate of acceptance of submitted suggestions is a better predictor of productivity than other measures,” explain the authors.

The delicate measurement of productivity

The measurement method used by GitHub is different from that of another study published in April by GitHub concerning the impact of Copilot on the productivity of developers and measuring the execution times of repetitive tasks.

The study authors concluded that Copilot did not necessarily improve task completion time or success rate, but that most of the 24 participants preferred using Copilot because it often provided a useful starting point and their avoided doing research online.

One of the authors of the GitHub study, Albert Ziegler, compares the service to “a developer in pairs with a computer attached”, very good at small tasks, but reliable enough to close all the brackets in the right order.

Define productivity

But the term “productivity” in development contains a wide range of possible practical meanings. “Do developers ideally want to save keystrokes or avoid Google and StackOverflow searches? asks Albert Ziegler in a blog post.

“Should GitHub Copilot help them stay in the flow by giving them highly accurate solutions for mechanical, calculator-like tasks?” Or should he inspire them with proposals that might help them break free when they get stuck? »

The three key questions that the GitHub study posed to developers are:

  1. Do respondents feel that GitHub Copilot makes them more productive?
  2. Does this sentiment translate into objective usage metrics?
  3. Which usage metrics best reflect this sentiment?

Better productivity

Albert Ziegler notes that the results of the study show that Copilot “is correlated with higher developer productivity.” The strongest correlation was obtained by dividing the number of suggestions accepted by the number of suggestions posted.

“This acceptance rate indicates how many code suggestions produced by GitHub Copilot are deemed promising enough to be accepted,” he notes.

Additionally, developers who report the highest productivity gains with Copilot also accept the highest number of code suggestions shown.

All languages ​​are not equal

The study also reveals different levels of acceptance rates depending on the language.

“We are aware that there are significant differences for how GitHub Copilot behaves for different programming languages,” note the GitHub authors.

“The most common languages ​​among our user base are TypeScript (24.7% of all achievements posted in the observed period, 21.9% for users polled in the survey, JavaScript (21.3%, 24 .2%) and Python (14.1%, 14.5%).The latter two benefit from higher acceptance rates, which could indicate a relative strength of neural tools compared to deductive tools for untyped languages Regardless of language, survey participants had a slightly higher acceptance rate than users overall,” the authors note in the report.

The value of Copilot is not in properly automated code

The authors also point out that their measures of persistence, the number of suggestions retained over time, does not correspond to the declared productivity.

“Consistent with previous work, we collected measures of proposal acceptance, but we also developed measures of persistence. These measures are based on the idea that for longer code completion proposals, a developer might have to perform more corrective actions after accepting a proposal, such as deleting or correcting an erroneous proposal. We were surprised to find that acceptance rate (number of acceptances normalized by number of completions shown) correlated better with reported productivity than our measures of persistence. »

However, they believe that the value of Copilot lies not in how many lines of code it automates correctly, but in giving users a model to modify. “But looking back, it makes sense. Coding is not typing, and the core value of GitHub Copilot is not how the user enters as many lines of code as possible. Rather, it consists in helping the user to make the best progress towards their objectives,” they point out.

Source: ZDNet.com





Source link -97