Developers tend to scrutinize AI-generated code less critically and they learn less from it [uni-saarland.de]:
When two software developers collaborate on a programming project—known in technical circles as 'pair programming'—it tends to yield a significant improvement in the quality of the resulting software. 'Developers can often inspire one another and help avoid problematic solutions. They can also share their expertise, thus ensuring that more people in their organization are familiar with the codebase,' explains Sven Apel, professor of computer science at Saarland University. Together with his team, Apel has examined whether this collaborative approach works equally well when one of the partners is an AI assistant. [...]
For the study, the researchers used GitHub Copilot, an AI-powered coding assistant introduced by Microsoft in 2021, which, like similar products from other companies, has now been widely adopted by software developers. These tools have significantly changed how software is written. 'It enables faster development and the generation of large volumes of code in a short time. But this also makes it easier for mistakes to creep in unnoticed, with consequences that may only surface later on,' says Sven Apel. The team wanted to understand which aspects of human collaboration enhance programming and whether these can be replicated in human-AI pairings. Participants were tasked with developing algorithms and integrating them into a shared project environment.
'Knowledge transfer is a key part of pair programming,' Apel explains. 'Developers will continuously discuss current problems and work together to find solutions. This does not involve simply asking and answering questions, it also means that the developers share effective programming strategies and volunteer their own insights.' According to the study, such exchanges also occurred in the AI-assisted teams—but the interactions were less intense and covered a narrower range of topics. 'In many cases, the focus was solely on the code,' says Apel. 'By contrast, human programmers working together were more likely to digress and engage in broader discussions and were less focused on the immediate task.
One finding particularly surprised the research team: 'The programmers who were working with an AI assistant were more likely to accept AI-generated suggestions without critical evaluation. They assumed the code would work as intended,' says Apel. 'The human pairs, in contrast, were much more likely to ask critical questions and were more inclined to carefully examine each other's contributions,' explains Apel. He believes this tendency to trust AI more readily than human colleagues may extend to other domains as well. 'I think it has to do with a certain degree of complacency—a tendency to assume the AI's output is probably good enough, even though we know AI assistants can also make mistakes.' Apel warns that this uncritical reliance on AI could lead to the accumulation of 'technical debt', which can be thought of as the hidden costs of the future work needed to correct these mistakes, thereby complicating the future development of the software.