Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 10 submissions in the queue.
posted by requerdanos on Sunday August 20 2023, @01:44PM   Printer-friendly
from the convincingly-wrong dept.

But its suggestions are so annoyingly plausible:

ChatGPT, OpenAI's fabulating chatbot, produces wrong answers to software programming questions more than half the time, according to a study from Purdue University. That said, the bot was convincing enough to fool a third of participants.

The Purdue team analyzed ChatGPT's answers to 517 Stack Overflow questions to assess the correctness, consistency, comprehensiveness, and conciseness of ChatGPT's answers. The US academics also conducted linguistic and sentiment analysis of the answers, and questioned a dozen volunteer participants on the results generated by the model.

"Our analysis shows that 52 percent of ChatGPT answers are incorrect and 77 percent are verbose," the team's paper concluded. "Nonetheless, ChatGPT answers are still preferred 39.34 percent of the time due to their comprehensiveness and well-articulated language style." Among the set of preferred ChatGPT answers, 77 percent were wrong.

OpenAI on the ChatGPT website acknowledges its software "may produce inaccurate information about people, places, or facts." We've asked the lab if it has any comment about the Purdue study.

The pre-print paper is titled, "Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions." It was written by researchers Samia Kabir, David Udo-Imeh, Bonan Kou, and assistant professor Tianyi Zhang.

"During our study, we observed that only when the error in the ChatGPT answer is obvious, users can identify the error," their paper stated. "However, when the error is not readily verifiable or requires external IDE or documentation, users often fail to identify the incorrectness or underestimate the degree of error in the answer."

Even when the answer has a glaring error, the paper stated, two out of the 12 participants still marked the response preferred. The paper attributes this to ChatGPT's pleasant, authoritative style.

"From semi-structured interviews, it is apparent that polite language, articulated and text-book style answers, comprehensiveness, and affiliation in answers make completely wrong answers seem correct," the paper explained.

Journal Reference:
Kabir, Samia, Udo-Imeh, David N., Kou, Bonan, et al. Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions, arXiv (DOI: 10.48550/arXiv.2308.02312)


Original Submission

Related Stories

Others' Code Sucks for Your Problems 19 comments

Other peoples code might look good at a glance but is wrong at least half the time. AI or help sites don't offer as much help as one would think.

"Among other findings, the authors found ChatGPT is more likely to make conceptual errors than factual ones. "Many answers are incorrect due to ChatGPT's incapability to understand the underlying context of the question being asked," the paper found."

""Stack Overflow's annual Developer Survey of 90,000 coders recently found that 77 percent of developers are favorable of AI tools, but only 42 percent trust the accuracy of those tools. OverflowAI developed with community at the core and with a focus on the accuracy of data and AI-generated content."

https://www.theregister.com/2023/08/07/chatgpt_stack_overflow_ai/
https://arxiv.org/abs/2308.02312


Original Submission

This discussion was created by requerdanos (5997) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Insightful) by turgid on Sunday August 20 2023, @01:58PM (18 children)

    by turgid (4318) Subscriber Badge on Sunday August 20 2023, @01:58PM (#1321086) Journal

    ChatGPT is going to be used for coding by the same people who copy and paste from stackoverflow. Superficially, it will make them look very productive to their managers.

    The managers will encourage the use of ChatGPT and soon the number of people directly employed in coding will decrease. However, these things will cause a huge amount of latent trouble.

    The copy-and-pasters do so because they do not understand what they are doing. They do not understand what they are being asked to do, how to break it down into simple problems to be solved, and often do not no enough about the languages they're using, the operating system, libraries, data structures and algorithms to be able to make informed decisions about what they're doing.

    They do not understand enough to be able to ask better questions to get better requirements or to learn what they need to learn. ChatGPT automates this process.

    I predict an awful lot of very bad, broken code being written by ChatGPT soon, and deployed in production by people incapable of understanding any of this, or indeed unwilling to understand because that would burst their cheap coding bubble.

    In a few more years, the world will grind to a halt and there will be plenty of opportunities for people like us to make a lot of money putting things right. Sit back and enjoy the spectacle with the proverbial popcorn.

    • (Score: 2, Insightful) by Anonymous Coward on Sunday August 20 2023, @02:21PM (4 children)

      by Anonymous Coward on Sunday August 20 2023, @02:21PM (#1321091)

      I think overall you are correct, but I also think that there are a fair number of people who do generally know what they're doing, but still copy from Stackoverflow and will use ChatGPT out of convenience. For instance, say one needs to write some code doing something they haven't done a lot of. They could spend the time to break things down, do research, etc., or they could copy an example and move on. The same way people rely on libraries now and will import the whole library out of convenience (laziness?) instead of pulling in only what they need (or writing it themselves).

      • (Score: 5, Funny) by krishnoid on Sunday August 20 2023, @04:46PM (1 child)

        by krishnoid (1156) on Sunday August 20 2023, @04:46PM (#1321126)

        Well put, i think in general that's an accurate assessment. However, I additionally suspect that there are a lot of roughly competent developers, but still cut and paste from technology question-and-answer online resources and use Github Copilot out of convenience. As an example, say you are tasked with writing software that performs an unfamiliar set of operations. You could engage in analyzing the request, investigate further, et. al., or duplicate an existing sample and continue to the rest of the task. This would be similar to how libraries are currently used, where developers will import the entire library out of convenience (laziness?) instead of just extracting the required routines or recreating it themselves.

        • (Score: 0) by Anonymous Coward on Sunday August 20 2023, @04:48PM

          by Anonymous Coward on Sunday August 20 2023, @04:48PM (#1321127)

          It took me a minute, but I eventually got it. :)

      • (Score: 3, Interesting) by krishnoid on Sunday August 20 2023, @05:58PM

        by krishnoid (1156) on Sunday August 20 2023, @05:58PM (#1321139)

        When I've pulled a whole routine from Stack Overflow, I'll cite the link -- academic anti-plagiarism habits and all -- in a comment above the code. Also, if you're responsible for the correct operation of source code that you check in (within limits), maybe you'll doublecheck the stuff you get out of one of these systems. You know, being able to describe how the code works "in your own words." A culture of that could help in keeping importing ChatGPT code from going out of control.

        Maybe that's a good use for more structured checkin comments; you're allowed to cut and paste in OpenAI code, but you have to tag it as AI-generated and describe how it works in the checkin comment or else it'll get flagged during a quality audit. The worse you describe it, the more the finger points back at you when it breaks.

      • (Score: 3, Insightful) by sjames on Sunday August 20 2023, @07:56PM

        by sjames (2882) on Sunday August 20 2023, @07:56PM (#1321154) Journal

        Perhaps so. They will fail because of the other problem with ChatGPT answers. When it's wrong, it tends to be very plausibly wrong. It can actually be hard to see where and why it is wrong because it's wrong answers are very fluent.

    • (Score: 3, Interesting) by JoeMerchant on Sunday August 20 2023, @03:01PM (5 children)

      by JoeMerchant (3937) on Sunday August 20 2023, @03:01PM (#1321100)

      Coding is actually irrelevant

      >get better requirements

      That's the crux of the human developer value proposition these days. All those algorithms and methods they taught in software school for the last 50 years (and probably still do) are only marginally valuable knowledge today, knowing they exist and how to look them up and apply them is what really matters, and I bet ChatGPT's success rate choosing good sort algorithms and demonstrating their implementation is considerably higher than 52%.

      A frequent problem I used to solve was embodied in this type of exchange:

      Customer says: "I want X so I can Y."

      I explain: "X doesn't really exist, we have A and/or B which can Y, but, are you really trying to Y? Seems to me that you would get more value out of Z."

      "Oh, you can Z? I didn't know that was possible.". Often followed by: "We have never Zed before, I don't see the value."

      When feasible, I then give them a prototype that Zs to play with and they either ignore it or play with it, see the value, and we develop Z idea as a dramatically improved way of meeting the needs traditionally served by Y.

      The software that Zs is like plumbing and wiring in a building, you have to have it, but where and how you build a building is usually far more important than how it is plumbed or wired. I'm happy to have AI do as much of the standard plumbing and wiring work as it is capable of, the final inspection should show any problems in its work.

      --
      🌻🌻🌻 [google.com]
      • (Score: 5, Informative) by choose another one on Sunday August 20 2023, @03:53PM (2 children)

        by choose another one (515) on Sunday August 20 2023, @03:53PM (#1321114)

        Yes, but is this really a _developer_ value proposition or did you just reinvent the business analyst ??

        Lets face it, bridging the gap between "what we can make the computer do" and "what the user wants or needs (no necessarily the same thing)" has been a (human) skillset and value proposition for decades.

        Dev methodologies like Scrum try to bridge the gap by throwing "might this be what you want?" stuff from devs to users on a very early and very regular basis but in the end this is very heavy of user engagement time and costs, meaning you usually end up with someone effectively being a knowledgeable user-proxy interfacing with dev team - aka a business analyst.

        Business Analyst, Product Manager, Product Owner - I've been called all three at various times, I think there is more overlap between the roles than most people think. I've seen articles that attempt to define the differences between some of those roles and come out diametrically opposed (i.e. one says X doea A Y does B, another says X does B while Y does A) - conclusion: overlap, unclear definitions, put whichever souds best on your cv. I've even seen articles where people asked ChatGPT to define and contrast the roles, result wasn't much better. _Sounded_ good though.

        Overall conclusion: ChatGPT shoud be in sales.

        • (Score: 3, Informative) by sjames on Sunday August 20 2023, @08:01PM

          by sjames (2882) on Sunday August 20 2023, @08:01PM (#1321155) Journal

          There's a reason why the senior developer positions used to be called programmer/analyst.

        • (Score: 2) by JoeMerchant on Sunday August 20 2023, @08:26PM

          by JoeMerchant (3937) on Sunday August 20 2023, @08:26PM (#1321158)

          Chat GPT has been in sales since the day it released, and before. The old job of fluff article writer for hire: also subsumed by ChatGPT, and often ChatGPT turns out better copy than what they used to pay a penny a word for.

          --
          🌻🌻🌻 [google.com]
      • (Score: 2) by mhajicek on Sunday August 20 2023, @05:11PM (1 child)

        by mhajicek (51) on Sunday August 20 2023, @05:11PM (#1321133)

        I would like to buy some Z, please.

        --
        The spacelike surfaces of time foliations can have a cusp at the surface of discontinuity. - P. Hajicek
        • (Score: 3, Funny) by stormreaver on Sunday August 20 2023, @11:17PM

          by stormreaver (5101) on Sunday August 20 2023, @11:17PM (#1321174)

          I would like to buy some Z, please.

          I've been running short on z's for years. Interestingly, I started noticing my deficit after I had kids. I think they're stealing them from me.

    • (Score: 2) by DadaDoofy on Sunday August 20 2023, @03:30PM (5 children)

      by DadaDoofy (23827) on Sunday August 20 2023, @03:30PM (#1321109)

      I believe this is correct. Marketing types have been selling low code, no code for at least the last 35 years. It has always failed to live up to the hype. ChatGPT is probably the closest think to something semi-useful.

      You are dead on about the clean-up that will be necessary as managers push this code into production and hope for the best. In most organizations, mangement gambles instead of investing in proper testing. Sometimes they win, and a lot of times they lose, at which time high priced consultant are brought in to save the day. So yeah, rather than being a threat to human coders, it's actually job security for coders who know what they are doing.

      • (Score: 2) by HiThere on Sunday August 20 2023, @04:56PM (4 children)

        by HiThere (866) on Sunday August 20 2023, @04:56PM (#1321132) Journal

        No. Many of the prior approaches were dead accurate within a very limited domain, and didn't work at all outside that domain. ChatGPT always appears to work, and the domain where it's valid has not been specified.

        --
        Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
        • (Score: 2) by DadaDoofy on Sunday August 20 2023, @08:50PM (3 children)

          by DadaDoofy (23827) on Sunday August 20 2023, @08:50PM (#1321162)

          "Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer."

          Yeah so, I'm reasonably sure you have not reviewed the several million lines of code that make up the operating system you allow to run your computer. What am I missing here? That a "known" third party is harmless?

          • (Score: 2) by HiThere on Sunday August 20 2023, @11:17PM (2 children)

            by HiThere (866) on Sunday August 20 2023, @11:17PM (#1321173) Journal

            Not personally, but it's Linux...several different groups of people have.

            --
            Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
            • (Score: 2) by DadaDoofy on Sunday August 20 2023, @11:42PM (1 child)

              by DadaDoofy (23827) on Sunday August 20 2023, @11:42PM (#1321177)

              Sure, but "several different people" probably "reviewed" the JavaScrpit that runs in your browser as well. I'm missing the distinction.

              • (Score: 2) by HiThere on Monday August 21 2023, @02:07AM

                by HiThere (866) on Monday August 21 2023, @02:07AM (#1321189) Journal

                No, that's not necessarily true at all. A lot of dynamically loaded libraries never get properly reviewed, or at least that's what the news over the last several years leads me to believe. Including some this month.

                --
                Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
    • (Score: 3, Informative) by Opportunist on Sunday August 20 2023, @04:11PM

      by Opportunist (5545) on Sunday August 20 2023, @04:11PM (#1321119)

      Maybe so, but it's great job security for me.

      Yours,
      Security consultant and pentester

  • (Score: 4, Insightful) by JoeMerchant on Sunday August 20 2023, @02:28PM (6 children)

    by JoeMerchant (3937) on Sunday August 20 2023, @02:28PM (#1321094)

    52% is better than most developers I know. If, after two iterations that rate increases to 75% or better, that's a valuable assistant.

    --
    🌻🌻🌻 [google.com]
    • (Score: 4, Insightful) by Opportunist on Sunday August 20 2023, @04:13PM (5 children)

      by Opportunist (5545) on Sunday August 20 2023, @04:13PM (#1321120)

      52% is a coin flip. It's better than asking your manager for his opinion, granted, but if you call yourself a programmer and get 52% of your tasks wrong, you should be fired.

      • (Score: 2) by krishnoid on Sunday August 20 2023, @04:48PM

        by krishnoid (1156) on Sunday August 20 2023, @04:48PM (#1321128)

        Yeah, but if 52% of the time it works every time, that's hard to beat.

      • (Score: 0) by Anonymous Coward on Sunday August 20 2023, @07:18PM (2 children)

        by Anonymous Coward on Sunday August 20 2023, @07:18PM (#1321151)

        If a train leaves station A going at .. and a train leaves station B going at... compute the point at which they will intersect.

        Here's a coin. Show your work.

        • (Score: 2) by istartedi on Sunday August 20 2023, @09:27PM (1 child)

          by istartedi (123) on Sunday August 20 2023, @09:27PM (#1321166) Journal

          Trick question. The answer is "Never, assuming you have double tracks and/or a properly functioning signal system and engineers who aren't drunk".

          --
          Appended to the end of comments you post. Max: 120 chars.
          • (Score: 2) by Freeman on Monday August 21 2023, @06:55PM

            by Freeman (732) on Monday August 21 2023, @06:55PM (#1321288) Journal

            You're assuming they're not on the same track and not heading towards each other. Lots of assumptions to be had here.

            --
            Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
      • (Score: 2) by JoeMerchant on Sunday August 20 2023, @08:28PM

        by JoeMerchant (3937) on Sunday August 20 2023, @08:28PM (#1321160)

        I get more than 52% of my tasks wrong all day long. I don't hand it off to the next person on the team until I've got it at least 90% right...

        --
        🌻🌻🌻 [google.com]
  • (Score: 5, Funny) by sigterm on Sunday August 20 2023, @02:30PM (3 children)

    by sigterm (849) on Sunday August 20 2023, @02:30PM (#1321095)

    "Algorithm generating sentences by scoring words according to their occurrence in training data, fails to produce functioning computer programs when prompted."

    I'm shocked.

  • (Score: 5, Funny) by looorg on Sunday August 20 2023, @03:06PM (2 children)

    by looorg (578) on Sunday August 20 2023, @03:06PM (#1321102)

    Did ChatGPT write this? Cause this was on the frontpage about 10 days ago. Just with another headline. But the same paper. The same conclusion.

    https://soylentnews.org/article.pl?sid=23/08/11/1049207 [soylentnews.org]

    • (Score: 3, Informative) by Thexalon on Monday August 21 2023, @11:38AM

      by Thexalon (636) on Monday August 21 2023, @11:38AM (#1321235)

      Well, Soylent was aiming to be like the green site used to be, and dupes are very much part of the tradition!

      --
      "Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
    • (Score: 2) by DannyB on Monday August 21 2023, @09:16PM

      by DannyB (5839) Subscriber Badge on Monday August 21 2023, @09:16PM (#1321306) Journal

      ChatGPT doesn't have to write the articles. It merely has to accept the articles to appear on the front page.

      --
      The server will be down for replacement of vacuum tubes, belts, worn parts and lubrication of gears and bearings.
  • (Score: 3, Insightful) by VLM on Monday August 21 2023, @03:09PM

    by VLM (445) Subscriber Badge on Monday August 21 2023, @03:09PM (#1321251)

    The problem with SO, aside from being turbotoxic for years, is the karma farmers.

    ChatGPT is used by the farmers to farm, making useless 'answers'. That's why traffic to SO is declining. Chatbots are ruining the supply side not the demand side, and without a useful supply to middleman, the site has no meaning and the users don't go there.

  • (Score: 2) by DannyB on Monday August 21 2023, @09:15PM (2 children)

    by DannyB (5839) Subscriber Badge on Monday August 21 2023, @09:15PM (#1321305) Journal

    Do you remember:

    The Last One

    or how about this one . . .

    Savvy

    Maybe I'm getting too old. Remember ads that ran for months, getting everyone excited that programmers would soon be obsolete!

    --
    The server will be down for replacement of vacuum tubes, belts, worn parts and lubrication of gears and bearings.
    • (Score: 3, Insightful) by maxwell demon on Tuesday August 22 2023, @04:11AM (1 child)

      by maxwell demon (1608) on Tuesday August 22 2023, @04:11AM (#1321338) Journal

      Programmers are obsolete. Nowadays we have software developers. :-)

      --
      The Tao of math: The numbers you can count are not the real numbers.
      • (Score: 2) by DannyB on Tuesday August 22 2023, @03:24PM

        by DannyB (5839) Subscriber Badge on Tuesday August 22 2023, @03:24PM (#1321376) Journal

        I am too old. I don't know which term to use any longer.

        --
        The server will be down for replacement of vacuum tubes, belts, worn parts and lubrication of gears and bearings.
(1)