Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 19 submissions in the queue.
posted by hubie on Friday January 05, @03:58AM   Printer-friendly

Arthur T Knackerbracket has processed the following story:

The kind of benchmark that IT normally worries about isn't without importance. How fast a particular data set is learned, how quickly prompts can be processed, what resources are required and how it all scales? If you're creating an AI system as part of your business, you'd better get those things right, or at the least understand their bounds.

They don't much matter otherwise, although you can be sure marketing and pundits will disagree. The fact it doesn't matter is a good thing: benchmarks too often become targets that distort function, and they should be kept firmly in their kennels.

The most important benchmark for AI is how truthful it is or, more usefully, how little it is misrepresented by those who sell or use it. As Pontius Pilate was said to have said 2,000 years ago, what is truth? There is no benchmark. Despite the intervening millennia and infinite claims to have fixed this, there still isn't. The most egregious of liars can command the support of nations in the midst of what should be a golden age of reason. If nobody's prepared or able to stop them, what chance have we got to keep AI on the side of the angels?

The one mechanism that's in with a chance is that curious synthesis of regulatory bodies and judicial systems which exists – in theory – outside politics but inside democratic control. Regulators set standards, the courts act as backstop to those powers and adjudicators of disputes.

[...] Which takes us to the regulators. It should be that the more technical and measurable the field being regulated, the easier the regulator's job is. If you're managing the radio spectrum or the railways, something going wrong shows up quickly in the numbers. Financial regulators, operating in the miasma of capital economics and corporate misdirection, go through a cycle of being weakened in the name of strong growth until everything falls apart and a grand reset follows. Wince and repeat. Yet very technical regulators can go wrong, as with the FAA and Boeing's 737 MAX. Regulatory capture by their industries or the politicians is a constant threat. And sometimes we just can't tell – GDPR has been with us for five years. Is it working?

[...] It is here that the nature of AI may hint at a regulatory foothold in responsibly integrating machines with the affairs of humanity. There is not just one AI, there are multitudes of models, of hardware platforms, of approaches, of experiments. They are machines, and we can make as many as we need. Ultimate truth is ultimately unknowable, but a workable consensus is achievable – or even a workable majority.

If you have a critical task where an AI is involved and there's no way to immediately spot a blooper, get another one in parallel. And another. Compare answers. If you can't find enough independent AI for a particular problem, don't use AI until you can.

Redundancy is a powerful weapon against error. Apollo got to the Moon not because the systems were perfect, but because they had redundancy in place in the expectation of failure. The Soviet manned lunar effort eschewed that in favor of what looked like expediency, but ended in ignominy.

We don't have to trust AI, which is just as well – what is truth? It knows no more than we do. But we have fashioned a workable society around systems that trust and verify, and we see this working in judges as well as jet planes. The philosophy, potential, and pitfalls of our increasingly thoughtful companions will work out over time, we should just set the odds in our favor while we can. Whatever we do, we won't be able to wash our hands of it.


Original Submission

Related Stories

Exploring the Emergence of Technoauthoritarianism 7 comments

The theoretical promise of AI is as hopeful as the promise of social media once was, and as dazzling as its most partisan architects project. AI really could cure numerous diseases. It really could transform scholarship and unearth lost knowledge. Except that Silicon Valley, under the sway of its worst technocratic impulses, is following the playbook established in the mass scaling and monopolization of the social web:

Facebook (now Meta) has become an avatar of all that is wrong with Silicon Valley. Its self-interested role in spreading global disinformation is an ongoing crisis. Recall, too, the company’s secret mood-manipulation experiment in 2012, which deliberately tinkered with what users saw in their News Feed in order to measure how Facebook could influence people’s emotional states without their knowledge. Or its participation in inciting genocide in Myanmar in 2017. Or its use as a clubhouse for planning and executing the January 6, 2021, insurrection. (In Facebook’s early days, Zuckerberg listed “revolutions” among his interests. This was around the time that he had a business card printed with I’M CEO, BITCH.)

And yet, to a remarkable degree, Facebook’s way of doing business remains the norm for the tech industry as a whole, even as other social platforms (TikTok) and technological developments (artificial intelligence) eclipse Facebook in cultural relevance.

The new technocrats claim to embrace Enlightenment values, but in fact they are leading an antidemocratic, illiberal movement.

[...] The Shakespearean drama that unfolded late last year at OpenAI underscores the extent to which the worst of Facebook’s “move fast and break things” mentality has been internalized and celebrated in Silicon Valley. OpenAI was founded, in 2015, as a nonprofit dedicated to bringing artificial general intelligence into the world in a way that would serve the public good. Underlying its formation was the belief that the technology was too powerful and too dangerous to be developed with commercial motives alone.

Related:


Original Submission

This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Informative) by Anonymous Coward on Friday January 05, @04:33AM (3 children)

    by Anonymous Coward on Friday January 05, @04:33AM (#1339141)

    from the comments

    I don't think it'll pass - there's a lot of genuinely useful stuff coming out, especially when it comes to user-assist stuff. But equally, I do suspect the hype bubble will crash soon.

    There was a recent article by Cory Doctorow which summed things up pretty nicely.

    https://locusmag.com/2023/12/commentary-cory-doctorow-what-kind-of-bubble-is-ai/ [locusmag.com]

    The tl;dr version is that LLM models are both costly and unreliable. So you can't sell to the profitable high-end (where reliability is key), and it's too expensive for the low end (where price is key).

    And once people realise that, the VC money is going to dry up pretty quickly...

    • (Score: 1, Insightful) by Anonymous Coward on Friday January 05, @07:06AM (1 child)

      by Anonymous Coward on Friday January 05, @07:06AM (#1339144)

      The fight for a sensible definition of "AI" is long lost. You have to use something like "strong AI" or "artificial general intelligence" instead.

      Optimization could get the cost of LLMs down. The smaller ones can already be made to run on smartphones. Reliability is a tough nut to crack. The bubble bursting would give people time to slow down and work on new approaches that don't need to hit the market immediately.

      • (Score: 4, Insightful) by HiThere on Friday January 05, @02:26PM

        by HiThere (866) Subscriber Badge on Friday January 05, @02:26PM (#1339165) Journal

        It's not only long lost, it was lost from the beginning because there's no good operational definition of "intelligence". I tend to think of "intelligence" as a gradient, with something like a thermostat+airconditioner at one end and the other end unknown. Currently computers are better at lots of things that people used to use as signs of intelligence.

        Actually, it's probably not just a gradient, but a gradient across multiple dimensions.

        --
        Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
    • (Score: 2) by ls671 on Friday January 05, @09:18AM

      by ls671 (891) Subscriber Badge on Friday January 05, @09:18AM (#1339150) Homepage

      LOL! Is it related to the "passage of time"?

      --
      Everything I write is lies, including this sentence.
  • (Score: 4, Insightful) by Snospar on Friday January 05, @07:46AM (6 children)

    by Snospar (5366) Subscriber Badge on Friday January 05, @07:46AM (#1339146)

    Most of the article is in the summary so don't feel you have to click through (who does?). It's a nicely written piece and raises some interesting points. My own experience of AI (and I also detest that name for what we have here) is one of frustration, if I ask Google Bard about something where I already have some knowledge I spend my time feeding back corrections. This happens so frequently that I know I can't trust any of the answers it gives me without double checking elsewhere.

    It's a bit like mainstream media in that regard, when you know a topic and it's in the news you tend to hear them spout a lot of rubbish or get things completely wrong. When you don't know a topic it all sounds very reasonable but why would you trust it when you know the mistakes they make elsewhere.

    Training an AI to check an AI... doesn't sound like a solution to me, just a chance for more AI money to be made.

    --
    Huge thanks to all the Soylent volunteers without whom this community (and this post) would not be possible.
    • (Score: 1, Insightful) by Anonymous Coward on Friday January 05, @08:45AM

      by Anonymous Coward on Friday January 05, @08:45AM (#1339149)

      Google Bard was the only LLM I tried to use extensively, and boy was it bad. Soured me on the concept. I knew ChatGPT was better at the time but the hallucination problem isn't going to magically disappear.

      AI business plan: fake it until you make it, with some much better method that cost billions in R&D.

    • (Score: 2) by ls671 on Friday January 05, @09:25AM

      by ls671 (891) Subscriber Badge on Friday January 05, @09:25AM (#1339152) Homepage

      Training an AI to check an AI... doesn't sound like a solution to me, just a chance for more AI money to be made.

      That's the concept I guess, unlimited recursive and logarithmicaly growing profits! You will then have an AI that checks the AI that checks the AI, AIs made to circumvent AIs checking AIs and son forth.

      --
      Everything I write is lies, including this sentence.
    • (Score: 3, Funny) by OrugTor on Friday January 05, @04:40PM (1 child)

      by OrugTor (5147) on Friday January 05, @04:40PM (#1339182)

      Thank you for mentioning the quality of the writing. What a change to see superb use of the language and well-organized analysis. My only concern is, what if it turns out to be AI-generated?

      • (Score: 2) by Snospar on Friday January 05, @07:49PM

        by Snospar (5366) Subscriber Badge on Friday January 05, @07:49PM (#1339223)

        I did wonder about that myself and quickly read through some of the other articles published under the same name - if it is AI then it's done a beautiful job of copying earlier style. Maybe that should be a red flag. Aaaaah! What's real!?!

        --
        Huge thanks to all the Soylent volunteers without whom this community (and this post) would not be possible.
    • (Score: 2) by Rich on Friday January 05, @05:21PM

      by Rich (945) on Friday January 05, @05:21PM (#1339194) Journal

      It's a nicely written piece and raises some interesting points

      I thought different, and really so as they mentioned the Soviet moon rocket. The use of two latin words in one sentence doesn't make it right. The N1 in fact, had MORE redundancy than the Saturn V. In theory, it could take more engine failures (4 of 30). The management system was called "KORD" (which then, in its crudeness iirc screwed up an entire launch). An issue was that they couldn't TEST the engines, because they had some single-use valves for simplicity.

      The remainder of the article also looked much like the usual clueless blah. It all boils down to that ML models cannot act to according to well defined rules. If you knew the rules, you'd not need the model (with the exception of a neural net that is verified for its entire set of inputs, but operates more efficiently than applying the rules). The rest is probabilities, and there the Trolley-problem wankage of the soft-skillers can begin about if a ML auto-drive can be deployed if it predictably kills 30 people per year - where human drivers would kill 300.

    • (Score: 1) by khallow on Saturday January 06, @06:43PM

      by khallow (3766) Subscriber Badge on Saturday January 06, @06:43PM (#1339360) Journal

      Training an AI to check an AI... doesn't sound like a solution to me, just a chance for more AI money to be made.

      Actually, it's the next step to developing better AI. The more of the process you can automate, the lower the overall cost. For example, suppose you have worked on a chess playing program. Do you hand it off to the grand master chess player who's been twiddling his thumbs for the past six months and whom will be able to give you his opinion in a few days. Or just run a few thousand chess games against an existing chess program?

  • (Score: 5, Interesting) by acid andy on Friday January 05, @05:14PM (3 children)

    by acid andy (1683) on Friday January 05, @05:14PM (#1339191) Homepage Journal

    If you have a critical task where an AI is involved

    Then I'd suggest you're probably doing it wrong. Best not to use them for critical tasks, at least not for a few more years.

    These generative AIs I would argue are best suited to the more creative, social and marketing roles. Us nerds are unimpressed because they're bad at STEM, but when you stop to think how little most managers value those skills and how bad most employees are at things like logic and critical thinking, it makes more sense that the managers see uses for these AIs. So many jobs out there from a nerd perspective are an illogical waste of time and money. The AIs can do those. And they will do jobs like artwork (they already do), basic creative writing (they already do) , maybe work as bad receptionists, customer service (they already do), hospitality, maybe even some types of psychotherapy (though that's arguably a bit too safety critical). I can see they may have uses to just help with brainstorming in a design process or for marketing, just providing lots of suggestions and ideas that humans can develop.

    Regarding the "hallucinations" I expect this will be improved in the years to come. I imagine one solution would be developing them to do what a sensible human does when they need to be sure of something -- look it up in a reliable reference source, which only needs to be in short term memory, and fact checking it. Perhaps they could be trained to alter their statements to give an indication of how sure they are of a statement they make, though obviously that in itself shouldn't give confidence as it could still be wrong. If they admit they are unsure about something though, that at least might make some of the errors quicker to spot. Here I'm thinking that the things they could learn to feel more sure about could be things that have appeared in more different training examples (possibly stronger signals firing in the neurons), but I don't know enough about LLM internals to comment on this further.

    I tried to persuade ChatGPT to do some tasks in small steps, in the hope that this could possibly reveal a bit more internal state. You're probably not surprised that the result was inconsistent and mildly entertaining:

    Me: Please can you list the letters in my new acronym in alphabetical order and tell me step by step how you achieved this?

    ChatGPT: Certainly! To list the letters in your new acronym "ZFGAD" in alphabetical order, we can follow these steps:

                    Identify Letters:
                                    Identify each letter in the acronym "ZFGAD." In this case, the letters are Z, F, G, A, and D.

                    Create a List:
                                    Create a list containing these letters: [Z, F, G, A, D].

                    Sort the List:
                                    Use a sorting algorithm to arrange the letters in alphabetical order. In this case, the list would become [A, D, F, G, Z].

                    Present the Result:
                                    Present the sorted list as the final result, where the letters are listed in alphabetical order: A, D, F, G, Z.

    These steps outline a basic algorithmic process for sorting letters alphabetically. It's worth noting that the sorting algorithm used here is a standard alphabetical sorting method. In a programming context, one could use built-in functions or manual sorting techniques to achieve the same result.

    Me: How are you able to apply a sorting algorithm? Surely that is beyond the capabilities of a pure Large Language Model?

    ChatGPT: You're correct. I apologize for any confusion. I don't have the capability to execute algorithms or perform specific operations like sorting directly. My responses are generated based on patterns learned during training, and while I can provide information and guidance on various topics, I don't have the ability to perform computational tasks or execute algorithms in real-time.

    When I mentioned the steps for sorting alphabetically, I was describing a conceptual, algorithmic process that a human might follow. In a programming context, a developer might use a sorting algorithm, but this is not something I execute myself. I can, however, offer explanations, guidance, or examples related to algorithms based on the information present in my training data.

    Me: Since you say you cannot perform specific operations like sorting directly, how exactly did you just sort the letters Z, F, G, A, D into A, D, F, G, Z?

    ChatGPT: I don't have the ability to perform operations like sorting in a computational sense. When I mentioned sorting the letters Z, F, G, A, D into A, D, F, G, Z, I was describing the logical steps a human might take to sort those letters alphabetically. I can provide explanations, guide you through processes, and describe concepts, but I don't execute algorithms or perform computations in the way a computer or programming script would.

    If you have a specific need for sorting letters or other tasks, it would typically be done through programming languages or software that supports such operations. I'm here to assist with information, answer questions, and provide explanations to the best of my ability based on the data on which I was trained.

    [...]

    Me: Anyway, can tell me if the acronym "ZFGAD" contains the letter 'H'?

    ChatGPT: No, the acronym "ZFGAD" does not contain the letter 'H'.

    Me: Can you suggest just one amusing example of words that the above acronym could stand for?

    ChatGPT: Certainly! One amusing interpretation for the acronym "ZFGAD" could be:

    "Zebras Frolic Gleefully At Dusk"

    --
    If a cat has kittens, does a rat have rittens, a bat bittens and a mat mittens?
    • (Score: 1, Insightful) by Anonymous Coward on Friday January 05, @08:00PM

      by Anonymous Coward on Friday January 05, @08:00PM (#1339226)

      "Us nerds are unimpressed because they're bad at STEM"

      Us nerds are unimpressed because we value independent verification from reference sources
      and the scientific method.

    • (Score: 0) by Anonymous Coward on Friday January 05, @08:03PM (1 child)

      by Anonymous Coward on Friday January 05, @08:03PM (#1339229)

      "I would argue are best suited to the more creative, social and marketing roles"

      https://www.penny-arcade.com/news/post/2024/01/03/making-art [penny-arcade.com]

      • (Score: 2) by helel on Saturday January 06, @04:59AM

        by helel (2949) on Saturday January 06, @04:59AM (#1339299)

        I feel like he really misses the point. He loves drawing but doesn't like the process of making physical stickers/paper-craft himself so he offloads that work to a device he bought. Other people don't like the process of drawing so they offload that work onto AI programs they bought and that means they're "offload(ing) their creative potential into a machine never to be investigated."

        Some people are skilled at calligraphy and technical lettering. Some people even enjoy the process enough they do it as a hobby. Other people just pick a font and press print. We don't say the later "suck and refuse to learn."

        AI is just a tool. If the tool doesn't accomplish a task you need done that doesn't mean the tool is worthless or that the people who do find value in it are stupid, it just means you're not the target market for that particular tool.

(1)