Does GitHub Copilot Improve Code Quality? Here's How We Lie With Statistics | Jadarma's Blog (2024-11-20)

submitted by thingsiplay@beehaw.org
edited

jadarma.github.io/blog/posts/2024/11/does-githu…

Counter article: https://jadarma.github.io/blog/posts/2024/11/does-github-copilot-improve-code-quality-heres-how-we-lie-with-statistics/ about the original statistics article from Github this talk and blog post is about: https://github.blog/news-insights/research/does-github-copilot-improve-code-quality-heres-what-the-data-says/

If you rather like a reactionary video commentary to the article from The Primeagen: https://youtu.be/IxYN7DKefmI or watch on Invidious, a privacy focused web YouTube client without using YouTube directly: https://inv.nadeko.net/watch?v=IxYN7DKefmI

43

Log in to comment

20 Comments

It annoys me so much, too, that Microsoft keeps on advertising with those fictitious numbers, despite multiple studies showing very different results. At some point, it's just misleading advertising, which is illegal where I live.

Because its a "study" and "statistics" and not an "advertisement", it does not fall under the laws of ads I assume. And why too many take this seriously, because it presents numbers... Microsoft is not the only company doing this, but one of the strongest companies to fight against. It's actually depressing.

I will instead label them as “Copilot-ers” and “Control-ers”, for brevity.

Were the Copilot-ers copiloted or are they copilots? 🤔 There are probably both kinds.

They are co-copilots copiloting the copilot. There is no top level pilot involved.

Terrible article. 90% fluffy rant. 10% actual points.

Obviously GitHub is biased here, but anyone that has actually used Copilot knows it is useful. It's not going to write your whole program for you but it clearly improves productivity by a small amount (which makes it a no-brainer commercially).

For some reason the author clearly *needs* Copilot to be useless. I'm not sure why.

While the author does not like Copilot or Ai tools for this task, the entire article is not about Copilot itself. The author makes points and points out why the statistics and article from Github/Microsoft is nonsense and misleading at best, or even straight up lies at worst. Its not just that Github is biased here, hey first straight up lie with the statistics and why the brought up points of Github makes no sense or are misleading. The author of this article did actually a good job of breaking it down and explains each point.

which makes it a no-brainer commercially

There is no such thing as "no-brainer commercially" when Ai is involved. If you turn off your brain because you are using Ai, then you are using Ai wrongly. And soon you will find yourself in trouble, especially if its commercially used.

A commercial no-brainer means it makes such financial sense that even someone with no brain would make the same decision.
For $100/year subscription, it has to save something like 2 hours of dev time per a year for it to make financial sense.

It doesn't mean that anyone gets to switch off their brain

There is no such thing as “no-brainer commercially” when Ai is involved

There absolutely is. Copilot is $100/year (or something like that). Developer salaries are like $100k/year (depending on location). So it only has to improve productivity by 0.1% to be worthwhile. It *easily* does that.

You can't "turn off your brain" when using copilot. It isn't that advanced yet.

Productivity goes up, quality goes down.

Have you ever seriously used Copilot?

My team tested it out for our company (17k employees) and it was so bad we immediately said no. It wasn’t just harmful, it was actively intrusive. I’d be trying to type something and it would autocomplete the exact opposite of what I wanted to type. I was constantly deleting what it wrote because it was nowhere in the vicinity of being correct. The same experience was had across everyone else that tried it.

Claude on the other hand is wonderful.

I have, but in my experience any personal gains are lost if I account for the extra time needed to review other devs' PRs. The volume of sloc submitted has gone way up, but everything runs and looks fine, so the bugs that do sneak in are really nasty little things.

The experience of using it to fill out, like a wall of config changes, or a bunch of repetitive test cases is good though.

I'm neither a professional programmer nor a user of Ai but...

Do you think your experience, I'm guessing a pre-ai trained programmer, is reflective of post-ai trained programmers?

Will the inevitable reliance on AI in learning and training, will creativity of new programmers drop? Is that even a problem?

He obviously hasn't. This is one of those things where some people feel threatened by something, haven't used it, and feel like they can comment based on how they imagine it is.

Reminds me of when the iPhone came out. You had all sorts of nonsense criticism of it from people that had clearly never even touched one.

Copilot was worse than useless in my situation. My team tested it out and immediately were losing productivity due to it. Terrible autocomplete is worse than just jumping over to ChatGPT or Claude when we actually need help. It literally was so bad we all overwhelmingly said “no, but can we get something like