But its suggestions are so annoyingly plausible:
ChatGPT, OpenAI's fabulating chatbot, produces wrong answers to software programming questions more than half the time, according to a study from Purdue University. That said, the bot was convincing enough to fool a third of participants.
The Purdue team analyzed ChatGPT's answers to 517 Stack Overflow questions to assess the correctness, consistency, comprehensiveness, and conciseness of ChatGPT's answers. The US academics also conducted linguistic and sentiment analysis of the answers, and questioned a dozen volunteer participants on the results generated by the model.
"Our analysis shows that 52 percent of ChatGPT answers are incorrect and 77 percent are verbose," the team's paper concluded. "Nonetheless, ChatGPT answers are still preferred 39.34 percent of the time due to their comprehensiveness and well-articulated language style." Among the set of preferred ChatGPT answers, 77 percent were wrong.
OpenAI on the ChatGPT website acknowledges its software "may produce inaccurate information about people, places, or facts." We've asked the lab if it has any comment about the Purdue study.
The pre-print paper is titled, "Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions." It was written by researchers Samia Kabir, David Udo-Imeh, Bonan Kou, and assistant professor Tianyi Zhang.
"During our study, we observed that only when the error in the ChatGPT answer is obvious, users can identify the error," their paper stated. "However, when the error is not readily verifiable or requires external IDE or documentation, users often fail to identify the incorrectness or underestimate the degree of error in the answer."
Even when the answer has a glaring error, the paper stated, two out of the 12 participants still marked the response preferred. The paper attributes this to ChatGPT's pleasant, authoritative style.
"From semi-structured interviews, it is apparent that polite language, articulated and text-book style answers, comprehensiveness, and affiliation in answers make completely wrong answers seem correct," the paper explained.
(Score: 3, Insightful) by turgid on Sunday August 20, @01:58PM (1 child)
ChatGPT is going to be used for coding by the same people who copy and paste from stackoverflow. Superficially, it will make them look very productive to their managers.
The managers will encourage the use of ChatGPT and soon the number of people directly employed in coding will decrease. However, these things will cause a huge amount of latent trouble.
The copy-and-pasters do so because they do not understand what they are doing. They do not understand what they are being asked to do, how to break it down into simple problems to be solved, and often do not no enough about the languages they're using, the operating system, libraries, data structures and algorithms to be able to make informed decisions about what they're doing.
They do not understand enough to be able to ask better questions to get better requirements or to learn what they need to learn. ChatGPT automates this process.
I predict an awful lot of very bad, broken code being written by ChatGPT soon, and deployed in production by people incapable of understanding any of this, or indeed unwilling to understand because that would burst their cheap coding bubble.
In a few more years, the world will grind to a halt and there will be plenty of opportunities for people like us to make a lot of money putting things right. Sit back and enjoy the spectacle with the proverbial popcorn.
(Score: 0) by Anonymous Coward on Sunday August 20, @02:21PM
I think overall you are correct, but I also think that there are a fair number of people who do generally know what they're doing, but still copy from Stackoverflow and will use ChatGPT out of convenience. For instance, say one needs to write some code doing something they haven't done a lot of. They could spend the time to break things down, do research, etc., or they could copy an example and move on. The same way people rely on libraries now and will import the whole library out of convenience (laziness?) instead of pulling in only what they need (or writing it themselves).
(Score: 2) by JoeMerchant on Sunday August 20, @02:28PM
52% is better than most developers I know. If, after two iterations that rate increases to 75% or better, that's a valuable assistant.
