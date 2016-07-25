from the Artificial-Software dept.
Contrary to popular belief, using cutting-edge artificial intelligence tools slowed down experienced software developers when they were working in codebases familiar to them, rather than supercharging their work, a new study found.
AI research nonprofit METR conducted the in-depth study of seasoned developers earlier this year while they used Cursor, a popular AI coding assistant, to help them complete tasks in open-source projects they were familiar with.
Before the study, the open-source developers believed using AI would speed them up, estimating it would decrease task completion time by 24%. Even after completing the tasks with AI, the developers believed that they had decreased task times by 20%. But the study found that using AI did the opposite: it increased task completion time by 19%.
The study's lead authors, Joel Becker and Nate Rush, said they were shocked by the results: prior to the study, Rush had written down that he expected "a 2x speed up, somewhat obviously."
The findings challenge the belief that AI always makes expensive human engineers much more productive, a factor that has attracted substantial investment into companies selling AI products to aid software development.
(Score: 2, Interesting) by khallow on Thursday July 17, @12:16PM (9 children)
(Score: 5, Insightful) by JoeMerchant on Thursday July 17, @01:51PM (8 children)
The one thing no software dev shop, big or small, needs is two (or more) Rock Star Devs on one team.
The corollary for this is: big dev shops don't need Rock Star Devs at all, because (good) teams are fluid with cross-pollination, helping other teams during demand spikes, etc. Managing to keep multiple Rock Stars siloed is almost as counter-productive as putting them together.
So, what the big shops really need is a bunch of Senior Devs who work well with others. But, where do you get these Senior Devs? Hiring from the field means picking up other companies' outcasts, not usually the best strategy. Developing talent in-house? Yeah, that's actually how it's done. So you get a bunch of interns and juniors, and hopefully the ones less suited for the role find other paths, but hopefully at least you don't advance them to Senior or otherwise keep giving them more responsibility than they can handle.
If you just fire all that "dead wood" learning crew, you're back in the market recycling "talent" that didn't find a long term home elsewhere. Even if it takes time away from your Seniors training and checking the Juniors, that's how you get good Seniors. De-task the Seniors from all mentoring and checking responsibility and what do they evolve into? Rock Stars.
(Score: 5, Insightful) by PiMuNu on Thursday July 17, @02:28PM (5 children)
> Hiring from the field means picking up other companies' outcasts,
Not really - people move jobs for lots of reasons. Bored/not challenged, moving for partner, etc etc.
(Score: 3, Interesting) by JoeMerchant on Thursday July 17, @02:38PM (4 children)
>> Hiring from the field means picking up other companies' outcasts,
>Not really - people move jobs for lots of reasons. Bored/not challenged, moving for partner, etc etc.
Sure, not 100%. There's a mix out there. But, telling whether the trailing spouse is going to be a net positive to the team is something that takes about 6 months post hire to really know - even in a "probationary period" it's hard to let full-time hires go within typical HR rules. What's guaranteed: the job hopping con-men with lots of interviewing experience but little value to offer on the job are a part of the mix when you interview for a Senior level position. Sometimes you can spot them on the resume, sometimes you can catch them in the interview (but they have more experience as an interviewee than you do as an interviewer...) but inevitably some get through.
I met a retired CEO who had a very successful 12 year run, growing his company 8x in size and 3x in profit margins over the period. First thing he said was that he was just lucky, right place right time, didn't screw it up too badly. Second thing, it's all about the people you hire - they make or break the company. Third thing, if he made better than 50% good hires in a given year, that was a good year. He elaborated: by good hire he means "at least they're not actively making things worse by being there."
(Score: 2) by PiMuNu on Thursday July 17, @03:16PM (3 children)
Fair enough - hiring is indeed a "roll of the dice", I can't argue there. Just worth saying that it aint all bad and we have hired some decent people.
(Score: 2) by JoeMerchant on Thursday July 17, @04:26PM
Sure, every one of the 10 companies that have hired me over the decades got "a decent people" in the deal, but... the horror stories are all too real. It's not a "don't open that door, the nuke will go off if you do!!!" horror scenario, it's more of a "watch the life slowly drain from your team due to low grade toxicity..." kind of horror.
(Score: 3, Interesting) by bloodnok on Thursday July 17, @08:27PM (1 child)
I don't agree that hiring is a roll of the dice.
It certainly is the way that it is practised in most companies but it doesn't have to be that way.
I once worked at a really good software engineering company. We took the engineering part very seriously. We had a quality system long before it was fashionable. We got BS5750 certified on the very first day that it was possible, and ISO 9001 when that was released.
Everything we did was engineered and extensively reviewed, including our recruitment process. When we hired grads, we put them through a very full day of interviews and assessments, after a less formal social evening with them. At the end of the day we all met, compared notes and made a decision. For experienced staff, the process was shorter but usually involved interviews with 2 directors as well as a technical interview with 2 interviewers. Decisions were always made that same day.
It was by far the best place I have ever worked and that mostly came down to the people who worked there. Of course some of the people we hired were not great, but there were far fewer of those than in any other place I've worked.
If you really value your employees you will take time to make the interview process as rigorous and fair as you can. Trouble is, it's not cheap. I doubt that in today's market it would even be possible to do what we used to, unless you are in an industry that has ridiculous margins.
(Score: 2) by PiMuNu on Friday July 18, @07:43AM
> Trouble is, it's not cheap
> I doubt that in today's market it would even be possible to do
I don't know - once the system is set up I guess its less than a month of person time (10 people for a day plus admin support and travel expenses), which is not much.
(Score: 0) by Anonymous Coward on Thursday July 17, @09:47PM (1 child)
Rock stars are, by definition, a stereotype. If you characterize a rock star as the sort that can't get along with another, then just like a true Scotsman, you're right.
If you characterize rockstars as your most team boosting productive workers though, they are either so good you don't need a team, or they always get along (by definition).
There is no end of people lining up to off their competition, and this is just one way they do it, if you can't compete, slander.
(Score: 1) by khallow on Friday July 18, @12:56AM
Have you ever seen two True Scotsmen in the same room together? I rest my non sequitur!
(Score: 2, Disagree) by Username on Thursday July 17, @01:42PM (18 children)
I have no idea what this bot does and doesn't do, but if you automate something, it should have near instantaneous results. There is no way someone can think and type faster than a computer. Either this bot sucks or the study was faulty.
(Score: 3, Insightful) by OrugTor on Thursday July 17, @01:56PM (1 child)
When the lead author predicts the results you have to question their impartiality. When he qualifies his prediction with "somewhat obviously" you have to question his judgment. When the prediction turns out completely wrong you actually have less need to question the study.
(Score: 2) by VLM on Thursday July 17, @04:33PM
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ [metr.org]
As near as I can tell they screencapped and categorized the time intervals each dev spent doing various tasks and half their time was spent doing non-AI stuff ("processes" git branch creation, test in the branch, git merge pull request processing, testing the final merged product, various non-AI non-coding filler that devs do day to day) and the other half was stuff like coding where they allowed AI so they assumed if the dev was just cut-n-paste the issue into the bot and then cut and paste the result into the IDE, it was plausible that half of the time would approach zero so a 2x speedup seemed "somewhat obvious". If the devs not using AI were measured as spending half their time doing AI-able coding and half doing filler, the filler would remain while coding time goes to zero. Of course AI total debugging time increased to more than original coding time so the total time spent very surprisingly actually increased...
Note the link above isn't a real paper with real numbers but the infographic "Calculate change in time from AI" seems to plausibly resemble the above paragraph.
(Score: 4, Interesting) by JoeMerchant on Thursday July 17, @02:03PM (10 children)
The bots I have been working with are indeed very fast, at making very simple little things. They become slow around the level of complexity that it would take me a couple of hours to do something without the bot. Thing is, nobody asks me to do stuff that only takes a few minutes. My small tasks take half a day, and most of my tasks aren't small - some run into the months, and when you put the tasks together into larger projects those projects take years.
So, AI is "accelerating" my work, as long as you only look at that 2% of my work that AI can handle. For the past 3 days I have been using AI to teach me Rust - I'd guess it would have taken me about 6 days (using regular search engines) to get as far as I did in those 3 days without AI - because: I started knowing nothing about Rust. I'm making a new Rust component on a system that was developed for C#, C, C++, Qt/C++ even Wt/C++, over the course of a year or so. If I were trying to re-develop the C++ section from scratch as a new module, using libraries I am familiar with, asking AI for help would be slowing me down. It's impossible to say how long these 3 days would have taken back in the "paper reference books" days. The core language was covered well enough in those books, but they rarely covered all the libraries (crates) that you'd need to get to the end-game, so there would be additional trips to the bookstore / library - for my Masters' Thesis in the 1980s there were even trips to neighboring towns' libraries to avoid the week-long+ delays involved with inter-library loans. AI is a layer on search engines is a layer on the near-free bandwidth digital backbone of the internet. Yeah, it's faster than "the old days." But, in cases where you already know what you're doing? Looking it up is a waste of time.
(Score: 2) by ikanreed on Thursday July 17, @03:20PM (4 children)
It's kinda the whole "You should not evaluate programmers on the number of lines of code written" thing writ large.
If I wanted to generate a brand new thousand line web-app that explores some well trod-territory, like say, an image gallery, using a well-known framework, and AI could do it with almost no trouble nearly instantly.
If I want to remove a subtle performance defect hidden in an existing multi-layered scalable microservices enterprise application serving 5 different groups' use cases, it might be as changing a single index in a single table from hash to btree, but discovering and doing that would take me weeks, and the AI would just flounder and give completely incorrect solutions, that would remain incorrect no matter how much you prompted it, because the context is too complicated for the kind of reasoning it does.
(Score: -1, Troll) by Anonymous Coward on Thursday July 17, @05:18PM (2 children)
Abasolutely hilarious that you think reasoning is involved!
(Score: 3, Interesting) by ikanreed on Thursday July 17, @09:03PM (1 child)
I get what you're saying, but come on, we have to have terminology to describe what a giant matrix processing a series of tokens through an obfuscated process is. Human language is full of overtaxed metaphors that kinda barely adequately describe how new technology works.
Do you think when someone mentions driving a car you're hitting it with a stick like an ox?
Do you think when someone posts online they're sending a letter in the mail?
There are kinds of higher-order reasoning these things are simply incapable of because the technology isn't built around using formal rigor, but it still is analogous. Come on man.
(Score: 2) by sjames on Friday July 18, @05:06PM
In turn, I get what you're saying, and won't object so long as we all understand that AI currently is much more like really advanced pattern matching than it is actual reasoning. But we don't have a good word for that in-between thing that it does.
(Score: 2) by sjames on Friday July 18, @05:03PM
Of course, if you just want an image gallery, you can probably find one that's public domain or BSD licensed and just download it and make a few adaptations.
(Score: 0) by Anonymous Coward on Thursday July 17, @05:16PM (4 children)
It's almost funny that you think you're learning Rust.
(Score: 2) by JoeMerchant on Thursday July 17, @06:58PM
Learning is a relative thing. I believe I have advanced to a level of Rust comprehension on par with some of my "Rust experienced" teammates already.
(Score: 2, Informative) by khallow on Thursday July 17, @07:53PM (2 children)
(Score: 2) by JoeMerchant on Friday July 18, @06:12PM (1 child)
One counter-intuitive benefit of the current state of AI for learning something like Rust: AI makes mistakes - common mistakes it reads from people posting about Rust code, apparently it doesn't always distinguish between the questions and the answers on Stack Exchange, but anyway - the code it puts out almost never compiles cleanly on the first try, but we have compilers that will describe the error and most of the time provide some help describing what might be done to correct the problem. So, instead of a one-and-done solution, there's a constant back and forth: ask for a simple thing, ask to correct the errors, ask to add a little feature, ask to correct the errors in that, ask to revise the architecture, ask to correct the errors in that. I end up seeing more "common errors" than I would by hand coding it myself. I also can ask for things like "handle all error return states" which saves a lot of grunt work that often gets skipped in early hand coding work and tutorials.
(Score: 2) by hendrikboom on Saturday July 19, @01:18AM
And that's where the real artificial intelligence resides -- painstakingly built into the compiler. It's not human-level autonomous general intelligence, but it is quite helpful.
(Score: 5, Interesting) by RamiK on Thursday July 17, @04:39PM (2 children)
And here lies the problem. You see, AI agents are not just LLMs. They're actually (python) glue that takes the input from the user ("audit this codebase for bugs / implement this feature in this repo?"), passes it to the LLM, copy-pastes the LLM's output into a compiler to see if it can at least somehow work, if it doesn't it copy-pastes the compiler errors to the LLM for it to fix the issue, repeats until it compiles and then makes a pull request for the developer to review.
And there's usually a few test units in between that either the dev wrote or that it asked the AI to write and then he tweaks to make sure it doesn't cheat.
So, when you have a million monkeys machine regurgitate stackoverflow.com answers against (gcc) -Wall to see if it sticks, things get a bit slow.
(Score: 2) by VLM on Thursday July 17, @04:55PM (1 child)
The article was about an extreme form of autocomplete: https://cursor.com/features [cursor.com]
Looking at the description, this would be unbearably annoying for me as it seems it randomly rewrites code behind my back as I'm typing, like a spell checker but 1000x worse. I don't think I could deal with that sort of interruption context switch as I'm typing.
"I see you're writing a post about cursor, I'll change your post in the background as you're typing it, to be a post about emacs because I want to" Uh no thanks.
(Score: 2) by RamiK on Friday July 18, @05:41AM
The contemporary ones "only" doing auto-completion are still doing the same cycle but against language servers like clangd instead or along-with. Whether prompt or completion or whether they're integrated in the IDE or the terminal, being "agentic" is standard practice in contemporary AI coding whatevers.
(Score: 4, Touché) by stormreaver on Friday July 18, @01:20PM
That is precisely what LLMs don't do: think. They are a lot of things, but intelligent is not one of them. They are sometimes advanced auto-completion engines, but they are always advanced copy and paste engines that can only provide you with what has already been written by someone else. They are the next-gen snake oil salesmen in computer form. They are definitely useful for some types of grunt work (such as some types of boilerplate code generation), but they drag down progress in things in which humans are already proficient.
I asked ChatGPT to generate some code for something I already knew how to do. It generated syntactically correct code, but it was highly flawed and incorrect in its execution. My guess was that the code was probably pulled from a question on a question and answer website.
(Score: 3, Insightful) by sjames on Friday July 18, @04:53PM
But you're crazy if you just paste in whatever the AI comes up with in response to your prompt. Add in the time it takes to analyse what it produced and consider if there are any side effects or security implications, modify as necessary.
By the time you do that and correct any problems, it might actually take longer than just typing it out as you think.
Keep in mind, one of the traits of AIs is that their output tends to flow well. That's great, but it makes it all to easy to miss things when proof reading.
Different field, but I just saw a video where an attorney and two judges (one at the appellate level) missed glaring flaws in a court document. It cited cases that don't exist and others that had nothing to do with the matter at hand. So tell me how quickly you'd care to proof-read the AI output!
(Score: 2) by looorg on Thursday July 17, @02:22PM
Of cause it will slow things down. You have to write prompts, search for things. See if and how they fit into what you have already done. Or are these just like Clippy where you are typing along and it starts to fill things in for you or make, less then, helpful suggestions for what it thinks that you want to do. Which is extremely distracting, so that will slow things down for sure.
Then you have to fit it in there. Testing. Fixing. More testing. If it had just worked then what does the system need the human for. You are just the overpaid keyboard-monkey at that point.
If this perceived increases or decreases are just based on what people think those numbers are horrible to start with. People are crap at estimating how long things will take, or how much they have gained from doing this and that.
What a fantastic conclusion. So people that know the system, know what they are doing gets distracted by new fancy gizmo that doesn't add anything of use.
If I know how to do something and then have to look things up and get offered other solutions to the problem then yes that will take more time then if I had just done it the way I wanted to do it in the first place.
If "Cursor" or as I like to refer to them all 'Clippy' acts like a child in a car -- are we there yet .... are we there yet ... doesn't actually make the trip go faster. Getting suggesting from Clippy if I know what I'm doing is not going to help. Or speed things up. No matter how many times it asks or how many suggestions it makes ...
So they really really hope and wish it will improve then ... If they just wish hard enough, wait long enough or something such it will improve for sure.
Do you know what will improve your coding skills? Actually coding, reading a few books and experimenting with things. Not cut and paste in what commander Clippy tells you to.
(Score: 3, Interesting) by DadaDoofy on Thursday July 17, @02:50PM
"The study's lead authors, Joel Becker and Nate Rush, said they were shocked by the results: prior to the study, Rush had written down that he expected 'a 2x speed up, somewhat obviously'." (emphasis added)
When you are conditioned to see the emperor's new clothes, it's shocking to finally realize there aren't any.
(Score: 2) by VLM on Thursday July 17, @04:14PM
I suspect a large part of the results depends upon the definition of "task" and "completion time" and "task completion time".
Something people often forget is most programming time IRL is spent trying to debug code. Its actually very easy and fast to write code, if you don't care if it works or not. Debugging it to actually meet business needs takes vastly longer. Often nobody understands or can even define the problem until the programmer gets involved in some hyper vague bug report. If I'm debugging my own code that I wrote minutes ago, I tend to be pretty fast. If I'm debugging someone elses code or old code but at least its written at my level or below, or I've personally fixed bugs like this before, it'll take awhile but I can grind away and do it. If its AI generated code that perhaps nobody understands because its wrong at multiple levels or I have no experience in this area / language / framework / algo, the time required to debug will approach infinity, or at least weeks/months/years to learn something entirely new.
Remember with fancy automated CI/CD unit tests, you can't just "oh well who cares if the orbital autopilot can't tell the difference between miles and kilometers, ship it today!" the devops chain will literally not accept it if it doesn't pass. The biggest AI speed gains will be while operating SUCCESSFULLY above your level, and the biggest AI speed slowdowns will also happen when operating UNSUCCESSFULLY above your level. Meanwhile the "no brainer" work already takes about zero time to implement and debug if there's a typo level problem, so there's no AI gain there.
There are also higher-level problems "Well, nice try AI but legal won't let us use that patented algo" or "nice try but that would be a legal licensing violation for us to do that" and this can spiral down a rabbit hole very quickly.
I've found its very helpful for educational purposes and very helpful for super hyper verbose languages (enterprise grade hello_world.java pointless paper shuffling thats very corporate with great metrics but accomplishes nothing)
All attempts at AI generate coding are either trivially small and simple, or eventually end in the cut -n- paste technician having absolutely no idea what to do other than start applying for jobs elsewhere. Its a Disney "Sorcerer's Apprentice" animated comedy for sure.
Very few people can't write a book because they write too slowly with a quill pen. Ditto very few fail to write a book because they can't spell. Ditto very few fail to write a book because they have no automated grammar checker. We are at a new stage where very few fail to write a book because they don't have AI. Mostly, people fail to write a book because they have nothing to say and can't think of anything interesting to express.
Another example is very few people have the skills to leverage a library card catalog, a computerized library card catalog, wikipedia, or now, AI. They're going to do dumb things because they don't know a question exists or can't imagine doing something less stupid. A tech that can answer questions correctly 90% of the time isn't going to help them very much if they can't imagine what question to ask.
It's a major philosophical error. Tax accountants used to spend a lot of time doing addition and subtraction. Visually, their job is mostly to add and subtract (some percentage calculations). Therefore the invention of the pocket calculator means all tax accountants will be fired and every moron on the street will do their taxes by themselves alone. That is not how things turned out.
Another way to look at it is democratization of technical skills is supposed to eliminate specialization. Any moron off the street can buy nails and shingles from Home Depot on the way home therefore everyone will replace their own roofs and the entire industry of "roofing" will disappear. It seems this does not happen in practice LOL.
(Score: 2) by VLM on Thursday July 17, @04:43PM (1 child)
The Reutters article is https://www.reuters.com/business/ai-slows-down-some-experienced-software-developers-study-finds-2025-07-10/ [reuters.com]
That links to a poster-level infographic heavy non-cut-n-paste-able page at https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ [metr.org]
That links to a 50 page paper https://arxiv.org/abs/2507.09089 [arxiv.org]
I have not read all 50 pages of the paper but its very interesting so far. Perhaps instead of spending 100 minutes reading it, I could use AI to speed up by reading thus taking only 119 minutes to understand it (this is supposed to be a self referential joke)
There are some quotes from the actual paper that strike me as the driest of dry humor:
similar to 100% of car drivers think they're above average at driving ability. This made me LOL IRL.
Yeah if all you have is a hammer the whole world looks like a nail even if you're trying to install screws. AI is usually shoved down the throat as an all or nothing workflow, but its more like a paradigm like OO vs functional where theres just times it works and times it won't work and apparently it worked 44% of the time according to page 19. Which is better than 0%, but its not like you can close 56% of bug reports and feature requests with a solution code of "it can't be done with AI so we're not going to do it"
(Score: 2) by JoeMerchant on Thursday July 17, @07:06PM
I have encountered some topics that the AI simply cannot produce code that compiles correctly on, particularly using certain crates which are not well discussed / documented on the web. It will keep swinging at it, but never comes up with a solution that even compiles without errors, much less runs correctly.
Other topics, it's fairly impressive - initial tries may have a handful of compilation errors, but a couple of iterations showing it the errors fixes them, the code works well, and from that point forward you can add features / error handling etc. and it continues to work well - for very limited situations. After the code grows too large, it starts falling apart again.
Shockingly, MS CoPilot is "winning" today's competition with a clear, concise example on target with the request. Claude made a more "full bodied" example, but at over double the length, even after asking Claude to refine and simplify. Google struggled the most, and I don't like Google's solution with how they initialize and reinitialize a module that could be more efficiently initialized once and used many times.
(Score: 3, Insightful) by kolie on Thursday July 17, @05:44PM (2 children)
As someone who regularly uses AI to do things people are constantly telling me it can't do, it doesn't work for, or it is worse then doing it manually - I have to say that you are using it wrong >
It works on big and small things, and it is quicker.
The trick - like any tool - is knowing: 1) How, 2) When, and 3) Why - to apply it correctly.
(Score: 2) by YeaWhatevs on Thursday July 17, @10:08PM (1 child)
You might be right. Hard to prove.
It could also be false productivity, now with AI. Before AI, this was done by writing absolute garbage as fast as possible. These sorts would quit rather than wallow in their own ...
stuff.
So is it different now? It depends. If maintenance sucks, you'll know.
(Score: 2) by kolie on Friday July 18, @03:47AM
Well fortunately I measure my productivity. My scoping, planning, and customer facing prep hasn't changed, and so I can get a sense upfront of project planning and scheduling. I know how much workload I could take before and the number of clients I could juggle. My project throughput is atleast 4x. I could deal on average one client a week and get stuff out. I'm doing 4x that, honestly probably getting better outcomes overall. That translates directly to 4x income. It's really enabled me to do what I do, at a greater scale.
(Score: 4, Informative) by Rich on Thursday July 17, @07:50PM
A snippet for iterating over SHELLVAR_1 to SHELLVAR_n in POSIX portable shell? With me forgetting shell syntax details always barely before I have to use it again? Copilot was better than anything StackOverflow, gave the most concise output, but correctly considering the pitfalls. Nice.
To solve a problem that goes beyond data shuffling and can't be found on the net? (E.g. reverse solver for 6 DOF (coordinates/angles) from 6 strut lengths of a hexapod?) Awful fail. It might provide fragments of what an actual solution might have, but so far the output is entirely useless. Maybe the new "reasoning" models can do better?