Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Saturday May 30, @08:02AM   Printer-friendly

The rapid introduction of AI-generated code is increasingly leading to production failures, according to a survey.

In a study by the software company CloudBees, more than 200 technology executives were surveyed about the use of AI in their companies. 81 percent reported problems such as functional errors, security vulnerabilities, and performance issues after deployment that are related to AI-generated code. 63 percent additionally reported compliance violations caused by the AI. These also sometimes made their way into productive business.

One issue appears to be, according to CloudBees' survey results, that testers can no longer keep up with validating AI code. 62 percent increased automated tests, 30 percent added more manual verification steps. However, only half believe that the formal review processes for AI code are truly always applied in their company. For many, managing the test environment has become a greater burden than writing the code itself.

[...] Classic software engineering is based on systems that deliver identical results for identical inputs. Generative AI, on the other hand, works with probabilities and can produce different variants of the same code even with consistent logic. This stochastic behavior leads to problems, particularly where hundred percent accuracy is a central criterion; for example, in security-critical development environments.

[Source]: heise online


Original Submission

This discussion was created by janrinok (52) for logged-in users only. Log in and try again!
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Informative) by turgid on Saturday May 30, @08:49AM (2 children)

    by turgid (4318) Subscriber Badge on Saturday May 30, @08:49AM (#1443889) Journal

    What did you think was going to happen? More popcorn...

    • (Score: 5, Funny) by JoeMerchant on Saturday May 30, @12:31PM

      by JoeMerchant (3937) on Saturday May 30, @12:31PM (#1443906)

      Sounds to me like a whole lot of inexperienced software development managers just got a fresh crop of interns they don't know how to handle.

      --
      🌻🌻🌻🌻 [google.com]
    • (Score: 3, Funny) by aafcac on Saturday May 30, @08:53PM

      by aafcac (17646) on Saturday May 30, @08:53PM (#1443933)

      That they were all going to get fabulously wealthy before handing the problem off to somebody else.

  • (Score: 5, Insightful) by jb on Saturday May 30, @09:02AM (4 children)

    by jb (338) on Saturday May 30, @09:02AM (#1443892)

    All depends on what you mean by "AI".

    Non-determinism is indeed an issue with most (but not all: ESes provide the canonical counterexample) types of AI. But with some of those (for example ML), the non-determinism doesn't have to be a fatal flaw.

    My guess is that when the authors say "AI" what they really mean is "LLMs" which are not AI at all (no intelligence involved, artificial or otherwise). With LLMs the problem of non-determinism is vastly overshadowed by the fact that all they are designed to do is emit plausible-sounding mimicry. They lack understanding (which ESes have) and they lack adaptation (which both ML and GAs have, each in their own way).

    With all that missing, an LLM has almost no chance at all of coming up with the correct solution to any problem ... unless that particular problem has already been attempted many times before AND those previous attempts were overwhelmingly more often that not successful AND those previous attempts made it into the training data set AND the problem was defined in exactly the same way AND the context window is wider than the problem and its solution together, etc. etc. etc.

    Even if all of those preconditions are met, it should be blindingly obvious that simply reproducing the known good solution directly would STILL be more reliable than hoping that the LLM will do so. And of course most of the time those preconditions are not all met.

    A hammer can only be a useful tool is its user already knows how a nail should be driven in.

    Likewise, an decision support / problem solving tool can only be useful if its user already understands the problem domain and the nature of problem solving itself.

    The key difference is that using a hammer (instead of your fist) to drive in a nail has obvious benefits (you can repeat it all day without ending up with a painful and bloody fist), whereas using an LLM (instead of your brain) to "solve" a problem appears to have no real benefits at all.

    Bottom line: LLMs are toys, not tools. Those contributing to (or even just failing to object to) the myth that they could ever be tools have a lot to answer for.

    • (Score: 2) by VLM on Saturday May 30, @01:53PM (3 children)

      by VLM (445) on Saturday May 30, @01:53PM (#1443907)

      I agree with about 99% of your post, maybe more, but would extend your remarks with this demonstration:

      emit plausible-sounding mimicry

      I've had truly excellent results with LLMs with prompts like "A syllabus including several textbook recommendations for Go language assuming the students already know another C-like language."

      I've had awful results with prompts like "In tinygo. Connect the BME280 digital barometer output only to a MQTT server." Part of the problem is they really only added ESP32 support for wifi in tinygo like last month. Or maybe last winter. Awhile ago anyway. But still... if the LLM can't answer it, it shouldn't just make stuff up and try to pass it off as true as if its a journalist or an academic.

      • (Score: 3, Insightful) by ikanreed on Sunday May 31, @06:16AM (2 children)

        by ikanreed (3164) on Sunday May 31, @06:16AM (#1443973) Journal

        The difference between those cases is whether you need to guarantee quality.

        With a book recommendation, the most essential quality control is at the point of creating the book, and merely recommending one is the easy job, and a syllabus is at most guide posts on the path of learning. A bad choice for either is easily circumvented with a good educator at the helm.

        In the second case, you need the code to do exactly what it says on the box, and the most essential quality control has to be at the point of code creation. An error in implementing a communication standard ruins the whole thing.

        In this way, I've come to think of LLMs as the "I don't give a shit about this" button. You can press it anytime you feel a task's results has no value beyond being "done" and its quality is someone else's problem.

        • (Score: 2) by jb on Sunday May 31, @09:27AM

          by jb (338) on Sunday May 31, @09:27AM (#1443984)

          With a book recommendation, the most essential quality control is at the point of creating the book, and merely recommending one is the easy job

          In theory perhaps, but not in practice. You'd be surprised how many terribly written (and sometimes even outright wrong) textbooks actually get published ... and how many of those somehow end up as the set text for some course or other. I've taught at three institutions. Saw that happen at two of them (only occasionally at one, but most of the time at the other).

          [The one where that never happened at all was an exception simply because they had a policy of "no textbooks" by default (just cite the relevant papers as you go; that's why we pay big money for institutional licenses from all the relevant publishers). Securing an exception to that policy required getting up and justifying in front of all the other academics in the faculty why your course absolutely had to have a textbook and why the one you proposed was the best choice (very few people asked for exceptions; and understandably those who did made very careful choices). It felt a bit weird when I first started teaching there, but it didn't take long to appreciate the wisdom of their policy.]

          So if you ever need to set a text for a course, for goodness sake read the damned thing first! And if it's not a field in which you're already well versed, just delegate the task to someone who is.

          and a syllabus is at most guide posts on the path of learning. A bad choice for either is easily circumvented with a good educator at the helm.

          Absolutely, couldn't agree more. But it's a waste of time (and rather demoralising for both staff & students) when you have to repeat "don't read the set text, it's wrong about X, Y & Z from today's class" in every single lecture (yes I had one course where that actually happened). Eventually I started reserving a block of time at the end of each lecture to point out the most egregious errors in the relevant chapter of the set text then hold a class discussion on how to prove them wrong. Did not make me popular with management (who set the texts, despite having little to no background in the subject matter, and steadfastly refused to change them), but at the end of the day it'd be me (not the delusional textbook author) setting the exam and I wanted my students to have a decent chance of passing it.

        • (Score: 2) by VLM on Sunday May 31, @03:32PM

          by VLM (445) on Sunday May 31, @03:32PM (#1444007)

          In the first example, especially for self education, there is no educator. Soon there will not be as many educators "just replace them all with LLMs". However its plausible, maybe over anthropomorphized, that a LLM "read" all the book reviews and caught on to certain authors "former C programmer" or prose like "as a former C programmer I liked..." and can provide a reasonable helpful customized to me suggestion. I liked the book that was suggested to me. I don't remember the title.

          In the second example, the LLM fails because there's examples online of, like, Zephyr OS having net_mgmt() calls to configure wifi (well, a simplification, on some boards, etc). It can't find anything for the new wifi support in tinygo so it makes up stuff that would look reasonable to a former ZephyrOS programmer. Which is not how tinygo, as implemented last month, works at all. So the response is a confident sounding hallucinatory dumpster fire.

          "I don't give a shit about this" button.

          I'd more or less agree but to put a finer point on it, its the "I could assign this to the summer intern, who knows next to nothing, and with close supervision he might be able to handle it, or maybe not" button.

  • (Score: 5, Insightful) by Anonymous Coward on Saturday May 30, @12:26PM (6 children)

    by Anonymous Coward on Saturday May 30, @12:26PM (#1443905)

    So what they're saying is, this is like Offshoring or Outsourcing, 2.0?

    Gasp. Shock. Color me surprised. Who could have ever expected that great cost savings would lead to lower quality and, in the end, great cost increase, and re-hiring of technically competent developers to clean up the problems / rewrite software / throwing systems away. I couldn't possibly have guessed this would happen.

    • (Score: 3, Interesting) by VLM on Saturday May 30, @02:05PM (5 children)

      by VLM (445) on Saturday May 30, @02:05PM (#1443908)

      I can add to your list

      Every problem is best solved by elaborate object oriented simulations.

      Lets make every piece of code fit a "pattern"

      I'm still not sure what Agile XP means because IRL all it means is the manager pushing it is too stupid to manage for reals but he saw it on the cover of an IT management magazine so he's just going to seagull manage for awhile (fly in, shit on everything, fly out, its someone elses problem to fix now)

      We will implement the solution NOT by programming, but by drawing a ridiculously complicated drawing that boils down to a visual program, then run the diagram thru a translator that spurts out infinitely long boilerplate that someone replaces by hand with ... programming.

      "We don't need programs because we have spreadsheets and spreadsheets don't need programmers". Yeah that works great LOL.

      "We don't need desktops, laptops, or servers anymore because we have phones and tablets"

      A lot of it comes from the addictive mindset. If a little is good then enormous industrial quantities must be better. A beer after work once in awhile is a great thing, so slammin a case every single night until liver failure must be even better. I lift weights and you see this a lot with roid nuts who go a little overboard on the juice... I would go as far as saying "Corn Syrup" is not the ultimate evil as long as you only consume like one slice of pecan pie only on Thanksgiving night. "Every liquid you drink contains corn syrup" is a highway to obesity and T2 diabetes obviously, but lots of people get addicted and follow that road...

      "a little goes a long way" is good advice. One AI prompt per week is likely highly profitable. One a day is likely a net gain possibly not. "You need me to type your prompts into chat because... uh well trust me you need me as a middleman" thats a productivity disaster in the long run and even often in the short run.

      • (Score: 2) by VLM on Saturday May 30, @02:09PM

        by VLM (445) on Saturday May 30, @02:09PM (#1443909)

        "a little goes a long way" is good advice.

        Oooh I thought of another quote that's great for LLMs "Trust, but verify". They're wrong way too often to be blindly trusted without skilled professional verification. But they're correct often enough to be a decent starting point or "inspiration" on the path to the true solution.

      • (Score: 3, Insightful) by JoeMerchant on Saturday May 30, @04:27PM (2 children)

        by JoeMerchant (3937) on Saturday May 30, @04:27PM (#1443920)

        I agree that AI chat all day every day is insane, however, accepting the first response is worse, respond, revise, refine until the response starts making sense.

        --
        🌻🌻🌻🌻 [google.com]
        • (Score: 3, Informative) by VLM on Sunday May 31, @03:14PM (1 child)

          by VLM (445) on Sunday May 31, @03:14PM (#1444005)

          respond, revise, refine

          This reminds me of when desktop publishing lowered office productivity instead of increasing it.

          "Well, in the old days you'd spend 30 minutes with the mimeograph machine but with this new fangled mac and laserwriter you can replicate that work in 10 minutes" and then IRL it results in people spending 2 hours optimizing fonts and alignments and adding clip art instead of "F it, send it" back in the mimeograph days. Remember when a calibration log worksheet only took 30 minutes to create, well it takes two hours now. It looks great but graphics art where its not needed does not increase profits...

          • (Score: 2) by JoeMerchant on Sunday May 31, @04:50PM

            by JoeMerchant (3937) on Sunday May 31, @04:50PM (#1444014)

            It can go both ways, and I definitely find that bringing out the sledgehammer for the flies is inefficient, there are more nimble tools that hit the flies easier, faster and more efficiently.

            Then there are the times when the sledgehammer is exactly the right tool to pound a stake into the earth and trying to do it with lesser instruments ends up being a task of pre-excavation with small digging tools followed by placement of the stake and careful tamping of the earth back around it after placement, when two or three whacks with the big hammer would have done the job better, faster and cheaper - all 3 at once.

            --
            🌻🌻🌻🌻 [google.com]
      • (Score: 3, Informative) by Thexalon on Sunday May 31, @07:53PM

        by Thexalon (636) on Sunday May 31, @07:53PM (#1444041)

        I'm still not sure what Agile XP means because IRL all it means is the manager pushing it is too stupid to manage for reals but he saw it on the cover of an IT management magazine so he's just going to seagull manage for awhile

        In theory, "Agile" and "XP" mean that your developers can more easily respond to requirements changes in the middle of a project, because there are systems to adjust to those changes built into your regular team practices.
        In practice, they give management the idea that the more frequently they demand that developers report what they've done to management, the faster the project gets done.

        Both of these ideas continue to be wrong because they keep trying to fit the square peg of software development into the round hole of industrial production models and the optimization target of Feature List divided by Developer Salaries, with no regard for pretty much any other factor.

        --
        "Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
  • (Score: 5, Funny) by istartedi on Saturday May 30, @02:47PM

    by istartedi (123) on Saturday May 30, @02:47PM (#1443913) Journal

    If you think about where the training data came from, there's a good chance that Stack Overflow might indirectly cause a stack overflow.

    --
    Appended to the end of comments you post. Max: 120 chars.
  • (Score: 5, Insightful) by kolie on Saturday May 30, @04:18PM (6 children)

    by kolie (2622) Subscriber Badge on Saturday May 30, @04:18PM (#1443918) Journal

    Before AI: Human writes code => Human doesn't test it => Production breaks.After AI: AI writes code => Human doesn't test it =>Production breaks.If you massively increase the sheer volume of code being committed into production every day, the total number of bugs will go up proportionally. Why is anyone surprised that AI code is "increasingly leading to failures"? It's increasingly leading the percentage of code written, period. You can't replace the bottleneck of human typing speed with an AI firehose and expect your existing, overworked QA team to just magically absorb the impact.

    • (Score: 4, Interesting) by JoeMerchant on Saturday May 30, @04:55PM

      by JoeMerchant (3937) on Saturday May 30, @04:55PM (#1443923)

      The thing I find: before AI human writes code, refines until it seems good, writes a few requirements, test tests to the requirements which of course work, field reports bugs, code until maintenance ensues.

      What I predict post AI: human writes specs, AI writes code and tests, refines until tests all pass. Field reports bugs, spec maintenance ensues.

      One thing that I do see being different post AI is that document synchrony should improve if people take the time to ask for it to. Review and exposure of known vulnerabilities should also improve.

      --
      🌻🌻🌻🌻 [google.com]
    • (Score: 5, Insightful) by https on Sunday May 31, @12:28AM (2 children)

      by https (5248) on Sunday May 31, @12:28AM (#1443947) Journal

      What's a QA team?

      --
      Offended and laughing about it.
    • (Score: 2) by VLM on Sunday May 31, @03:21PM (1 child)

      by VLM (445) on Sunday May 31, @03:21PM (#1444006)

      I would agree with and extend your remarks WRT adding a 4th step, a human takes responsibility for the failure and tries to fix it. After all, the business depends on it working.

      In example 1, if they or a close-ish coworker wrote the code, they stand a fighting chance of actually fixing what they're responsible for fixing.

      In example 2, they can't write the code they're responsible for fixing, its too expensive to keep those level of peeps on the payroll once programming is replaced with prompt writing. This turns into a major business management problem. Who's going to fix the security hole thats so complicated the prompt writers don't know how to describe it or how to avoid repeating the problem in the future? Well... they won't. Maybe they could hire me as a consultant?

      The question for AI generated code isn't really who wrote it, but who's responsible and capable of fixing it when it doesn't work? Thats easy almost free when a human is hired to write it. For AI/LLMs there's just a shrinkwrap license that some megacorporation, perhaps not even in your country, is not responsible for it not working, so ... tough luck.

      • (Score: 2) by Thexalon on Sunday May 31, @08:06PM

        by Thexalon (636) on Sunday May 31, @08:06PM (#1444046)

        I would agree with and extend your remarks WRT adding a 4th step, a human takes responsibility for the failure and tries to fix it. After all, the business depends on it working.

        Oh, my sweet summer child: In dysfunctional organizations (a.k.a. most of them), humans avoid taking responsibility for the failure as much as they possibly can, because taking responsibility results in professional penalties. And in very dysfunctional organizations, they also avoid trying to fix it, because (a) that's work and everybody is trying to avoid that, and (b) if you figured out a way to fix it they will conclude that you must have caused it. When the dust settles, the only people who will get rewarded for the problem are the people that spent lots of time calling lots of meetings to talk about the problem. The person or people who worked from 8 AM to 5 AM the next day and fixed the problem will either get no credit at all or will be penalized for their role in causing it (see point b).

        --
        "Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
  • (Score: 4, Insightful) by PinkyGigglebrain on Sunday May 31, @12:06AM

    by PinkyGigglebrain (4458) on Sunday May 31, @12:06AM (#1443945)

    ... more than 200 technology executives were surveyed about the use of AI in their companies. 81 percent admitted to problems such as functional errors, security vulnerabilities, and performance issues after deployment that are related to AI-generated code. 63 percent additionally reported compliance violations caused by the AI. ...

    FTFY

    problems are like cockroaches, for every one you see there are hundreds still hidden in the walls (of code).

    --
    "Beware those who would deny you Knowledge, For in their hearts they dream themselves your Master."
  • (Score: 3, Funny) by Anonymous Coward on Sunday May 31, @05:48AM

    by Anonymous Coward on Sunday May 31, @05:48AM (#1443971)

    from the no-shit-sherlock dept.

(1)