ChatGPT Gets Code Questions Wrong 52% of the Time

posted by requerdanos on Sunday August 20 2023, @01:44PM

from the convincingly-wrong dept.

But its suggestions are so annoyingly plausible:

ChatGPT, OpenAI's fabulating chatbot, produces wrong answers to software programming questions more than half the time, according to a study from Purdue University. That said, the bot was convincing enough to fool a third of participants.
The Purdue team analyzed ChatGPT's answers to 517 Stack Overflow questions to assess the correctness, consistency, comprehensiveness, and conciseness of ChatGPT's answers. The US academics also conducted linguistic and sentiment analysis of the answers, and questioned a dozen volunteer participants on the results generated by the model.
"Our analysis shows that 52 percent of ChatGPT answers are incorrect and 77 percent are verbose," the team's paper concluded. "Nonetheless, ChatGPT answers are still preferred 39.34 percent of the time due to their comprehensiveness and well-articulated language style." Among the set of preferred ChatGPT answers, 77 percent were wrong.
OpenAI on the ChatGPT website acknowledges its software "may produce inaccurate information about people, places, or facts." We've asked the lab if it has any comment about the Purdue study.
The pre-print paper is titled, "Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions." It was written by researchers Samia Kabir, David Udo-Imeh, Bonan Kou, and assistant professor Tianyi Zhang.
"During our study, we observed that only when the error in the ChatGPT answer is obvious, users can identify the error," their paper stated. "However, when the error is not readily verifiable or requires external IDE or documentation, users often fail to identify the incorrectness or underestimate the degree of error in the answer."
Even when the answer has a glaring error, the paper stated, two out of the 12 participants still marked the response preferred. The paper attributes this to ChatGPT's pleasant, authoritative style.
"From semi-structured interviews, it is apparent that polite language, articulated and text-book style answers, comprehensiveness, and affiliation in answers make completely wrong answers seem correct," the paper explained.

Journal Reference:
Kabir, Samia, Udo-Imeh, David N., Kou, Bonan, et al. Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions, arXiv (DOI: 10.48550/arXiv.2308.02312)

Original Submission

This discussion was created by requerdanos (5997) for logged-in users only, but now has been archived. No new comments can be posted.

ChatGPT Gets Code Questions Wrong 52% of the Time | Log In/Create an Account | Top | 37 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Log In

Related Links

ChatGPT Gets Code Questions Wrong 52% of the Time

Related Stories

Popcorn followed by ProfitPopcorn followed by Profit (Score: 5, Insightful) by turgid on Sunday August 20 2023, @01:58PM (18 children)

Re:Popcorn followed by ProfitRe:Popcorn followed by Profit (Score: 2, Insightful) by Anonymous Coward on Sunday August 20 2023, @02:21PM (4 children)

Re:Popcorn followed by ProfitRe:Popcorn followed by Profit (Score: 5, Funny) by krishnoid on Sunday August 20 2023, @04:46PM (1 child)

Re:Popcorn followed by Profit(Score: 0) by Anonymous Coward on Sunday August 20 2023, @04:48PM

Re:Popcorn followed by Profit(Score: 3, Interesting) by krishnoid on Sunday August 20 2023, @05:58PM

Re:Popcorn followed by Profit(Score: 3, Insightful) by sjames on Sunday August 20 2023, @07:56PM

Re:Popcorn followed by ProfitRe:Popcorn followed by Profit (Score: 3, Interesting) by JoeMerchant on Sunday August 20 2023, @03:01PM (5 children)

Re:Popcorn followed by ProfitRe:Popcorn followed by Profit (Score: 5, Informative) by choose another one on Sunday August 20 2023, @03:53PM (2 children)

Re:Popcorn followed by Profit(Score: 3, Informative) by sjames on Sunday August 20 2023, @08:01PM

Re:Popcorn followed by Profit(Score: 2) by JoeMerchant on Sunday August 20 2023, @08:26PM

Re:Popcorn followed by ProfitRe:Popcorn followed by Profit (Score: 2) by mhajicek on Sunday August 20 2023, @05:11PM (1 child)

Re:Popcorn followed by Profit(Score: 3, Funny) by stormreaver on Sunday August 20 2023, @11:17PM

Re:Popcorn followed by ProfitRe:Popcorn followed by Profit (Score: 2) by DadaDoofy on Sunday August 20 2023, @03:30PM (5 children)

Re:Popcorn followed by ProfitRe:Popcorn followed by Profit (Score: 2) by HiThere on Sunday August 20 2023, @04:56PM (4 children)

Re:Popcorn followed by ProfitRe:Popcorn followed by Profit (Score: 2) by DadaDoofy on Sunday August 20 2023, @08:50PM (3 children)

Re:Popcorn followed by ProfitRe:Popcorn followed by Profit (Score: 2) by HiThere on Sunday August 20 2023, @11:17PM (2 children)

Re:Popcorn followed by ProfitRe:Popcorn followed by Profit (Score: 2) by DadaDoofy on Sunday August 20 2023, @11:42PM (1 child)

Re:Popcorn followed by Profit(Score: 2) by HiThere on Monday August 21 2023, @02:07AM

Re:Popcorn followed by Profit(Score: 3, Informative) by Opportunist on Sunday August 20 2023, @04:11PM

52% is pretty good 52% is pretty good (Score: 4, Insightful) by JoeMerchant on Sunday August 20 2023, @02:28PM (6 children)

Re:52% is pretty good Re:52% is pretty good (Score: 4, Insightful) by Opportunist on Sunday August 20 2023, @04:13PM (5 children)

Re:52% is pretty good (Score: 2) by krishnoid on Sunday August 20 2023, @04:48PM

Re:52% is pretty good Re:52% is pretty good (Score: 0) by Anonymous Coward on Sunday August 20 2023, @07:18PM (2 children)

Re:52% is pretty good Re:52% is pretty good (Score: 2) by istartedi on Sunday August 20 2023, @09:27PM (1 child)

Re:52% is pretty good (Score: 2) by Freeman on Monday August 21 2023, @06:55PM

Re:52% is pretty good (Score: 2) by JoeMerchant on Sunday August 20 2023, @08:28PM

In other wordsIn other words (Score: 5, Funny) by sigterm on Sunday August 20 2023, @02:30PM (3 children)

Re:In other wordsRe:In other words (Score: 3, Touché) by JoeMerchant on Sunday August 20 2023, @03:04PM (1 child)

Re:In other words(Score: 2) by krishnoid on Sunday August 20 2023, @04:50PM

Re:In other words(Score: 2) by Opportunist on Sunday August 20 2023, @04:19PM

Déjà vuDéjà vu (Score: 5, Funny) by looorg on Sunday August 20 2023, @03:06PM (2 children)

Re:Déjà vu(Score: 3, Informative) by Thexalon on Monday August 21 2023, @11:38AM

Re:Déjà vu(Score: 2) by DannyB on Monday August 21 2023, @09:16PM

Stack Overflow(Score: 3, Insightful) by VLM on Monday August 21 2023, @03:09PM

Is anyone here old enough to remember?Is anyone here old enough to remember? (Score: 2) by DannyB on Monday August 21 2023, @09:15PM (2 children)

Re:Is anyone here old enough to remember?Re:Is anyone here old enough to remember? (Score: 3, Insightful) by maxwell demon on Tuesday August 22 2023, @04:11AM (1 child)

Re:Is anyone here old enough to remember?(Score: 2) by DannyB on Tuesday August 22 2023, @03:24PM