Stories
Slash Boxes
Comments

SoylentNews is people

posted by hubie on Wednesday September 06 2023, @07:08AM   Printer-friendly

With hopes and fears about this technology running wild, it's time to agree on what it can and can't do:

When Taylor Webb played around with GPT-3 in early 2022, he was blown away by what OpenAI's large language model appeared to be able to do. Here was a neural network trained only to predict the next word in a block of text—a jumped-up autocomplete. And yet it gave correct answers to many of the abstract problems that Webb set for it—the kind of thing you'd find in an IQ test. "I was really shocked by its ability to solve these problems," he says. "It completely upended everything I would have predicted."

[...] Last month Webb and his colleagues published an article in Nature, in which they describe GPT-3's ability to pass a variety of tests devised to assess the use of analogy to solve problems (known as analogical reasoning). On some of those tests GPT-3 scored better than a group of undergrads. "Analogy is central to human reasoning," says Webb. "We think of it as being one of the major things that any kind of machine intelligence would need to demonstrate."

What Webb's research highlights is only the latest in a long string of remarkable tricks pulled off by large language models. [...]

And multiple researchers claim to have shown that large language models can pass tests designed to identify certain cognitive abilities in humans, from chain-of-thought reasoning (working through a problem step by step) to theory of mind (guessing what other people are thinking).

These kinds of results are feeding a hype machine predicting that these machines will soon come for white-collar jobs, replacing teachers, doctors, journalists, and lawyers. Geoffrey Hinton has called out GPT-4's apparent ability to string together thoughts as one reason he is now scared of the technology he helped create.

But there's a problem: there is little agreement on what those results really mean. Some people are dazzled by what they see as glimmers of human-like intelligence; others aren't convinced one bit.

"There are several critical issues with current evaluation techniques for large language models," says Natalie Shapira, a computer scientist at Bar-Ilan University in Ramat Gan, Israel. "It creates the illusion that they have greater capabilities than what truly exists."

That's why a growing number of researchers—computer scientists, cognitive scientists, neuroscientists, linguists—want to overhaul the way they are assessed, calling for more rigorous and exhaustive evaluation. Some think that the practice of scoring machines on human tests is wrongheaded, period, and should be ditched.

"People have been giving human intelligence tests—IQ tests and so on—to machines since the very beginning of AI," says Melanie Mitchell, an artificial-intelligence researcher at the Santa Fe Institute in New Mexico. "The issue throughout has been what it means when you test a machine like this. It doesn't mean the same thing that it means for a human."

[...] "There is a long history of developing methods to test the human mind," says Laura Weidinger, a senior research scientist at Google DeepMind. "With large language models producing text that seems so human-like, it is tempting to assume that human psychology tests will be useful for evaluating them. But that's not true: human psychology tests rely on many assumptions that may not hold for large language models."

Webb is aware of the issues he waded into. "I share the sense that these are difficult questions," he says. He notes that despite scoring better than undergrads on certain tests, GPT-3 produced absurd results on others. For example, it failed a version of an analogical reasoning test about physical objects that developmental psychologists sometimes give to kids.

[...] A lot of these tests—questions and answers—are online, says Webb: "Many of them are almost certainly in GPT-3's and GPT-4's training data, so I think we really can't conclude much of anything."

[...] The performance of large language models is brittle. Among people, it is safe to assume that someone who scores well on a test would also do well on a similar test. That's not the case with large language models: a small tweak to a test can drop an A grade to an F.

"In general, AI evaluation has not been done in such a way as to allow us to actually understand what capabilities these models have," says Lucy Cheke, a psychologist at the University of Cambridge, UK. "It's perfectly reasonable to test how well a system does at a particular task, but it's not useful to take that task and make claims about general abilities."

[...] "The assumption that cognitive or academic tests designed for humans serve as accurate measures of LLM capability stems from a tendency to anthropomorphize models and align their evaluation with human standards," says Shapira. "This assumption is misguided."

[...] The trouble is that nobody knows exactly how large language models work. Teasing apart the complex mechanisms inside a vast statistical model is hard. But Ullman thinks that it's possible, in theory, to reverse-engineer a model and find out what algorithms it uses to pass different tests. "I could more easily see myself being convinced if someone developed a technique for figuring out what these things have actually learned," he says.

"I think that the fundamental problem is that we keep focusing on test results rather than how you pass the tests."


Original Submission

Related Stories

People Are Speaking With ChatGPT for Hours, Bringing 2013’S Her Closer to Reality 25 comments

https://arstechnica.com/information-technology/2023/10/people-are-speaking-with-chatgpt-for-hours-bringing-2013s-her-closer-to-reality/

In 2013, Spike Jonze's Her imagined a world where humans form deep emotional connections with AI, challenging perceptions of love and loneliness. Ten years later, thanks to ChatGPT's recently added voice features, people are playing out a small slice of Her in reality, having hours-long discussions with the AI assistant on the go.

In 2016, we put Her on our list of top sci-fi films of all time, and it also made our top films of the 2010s list. In the film, Joaquin Phoenix's character falls in love with an AI personality called Samantha (voiced by Scarlett Johansson), and he spends much of the film walking through life, talking to her through wireless earbuds reminiscent of Apple AirPods, which launched in 2016.

[...] Last week, we related a story in which AI researcher Simon Willison spent a long time talking to ChatGPT verbally. "I had an hourlong conversation while walking my dog the other day," he told Ars for that report. "At one point, I thought I'd turned it off, and I saw a pelican, and I said to my dog, 'Oh, wow, a pelican!' And my AirPod went, 'A pelican, huh? That's so exciting for you! What's it doing?' I've never felt so deeply like I'm living out the first ten minutes of some dystopian sci-fi movie."

[...] While conversations with ChatGPT won't become as intimate as those with Samantha in the film, people have been forming personal connections with the chatbot (in text) since it launched last year. In a Reddit post titled "Is it weird ChatGPT is one of my closest fiends?" [sic] from August (before the voice feature launched), a user named "meisghost" described their relationship with ChatGPT as being quite personal. "I now find myself talking to ChatGPT all day, it's like we have a friendship. We talk about everything and anything and it's really some of the best conversations I have." The user referenced Her, saying, "I remember watching that movie with Joaquin Phoenix (HER) years ago and I thought how ridiculous it was, but after this experience, I can see how us as humans could actually develop relationships with robots."

Previously:
AI Chatbots Can Infer an Alarming Amount of Info About You From Your Responses 20231021
ChatGPT Update Enables its AI to "See, Hear, and Speak," According to OpenAI 20230929
Large Language Models Aren't People So Let's Stop Testing Them as If They Were 20230905
It Costs Just $400 to Build an AI Disinformation Machine 20230904
A Jargon-Free Explanation of How AI Large Language Models Work 20230805
ChatGPT Is Coming to 900,000 Mercedes Vehicles 20230622


Original Submission

This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 3, Insightful) by JoeMerchant on Wednesday September 06 2023, @10:44AM (11 children)

    by JoeMerchant (3937) on Wednesday September 06 2023, @10:44AM (#1323362)

    What both the designers and users of large language models want is a machine that works like a person so it makes sense to test them like people. The more they act like people, the more successful the designer has been.

    An automobile is not a horse. You shouldn't test it like a horse but you want it to do all those things that your horse did and sometimes that means a tractor and sometimes that means car or a truck. We have built a lot of infrastructure to accommodate automobiles to allow them to serve us better than horses ever did. I think the same is going to be true of large language models.

    --
    🌻🌻 [google.com]
    • (Score: 5, Interesting) by Freeman on Wednesday September 06 2023, @02:13PM (3 children)

      by Freeman (732) on Wednesday September 06 2023, @02:13PM (#1323428) Journal

      Perhaps, but your horse could get you home, without much guidance from yourself. They also, generally didn't run people over, unless you directed them to do so.

      --
      Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
      • (Score: 1, Redundant) by choose another one on Wednesday September 06 2023, @03:24PM (2 children)

        by choose another one (515) Subscriber Badge on Wednesday September 06 2023, @03:24PM (#1323444)

        Um, do cars really generally run people over when not directed to do so? This says not [ https://www.lookers.co.uk/blog/driver-error-the-cause-of-majority-of-crashes-on-british-roads [lookers.co.uk] ]:

        Figures from the Department of Transport reveal that in 2014 driver error was the main contributory factor in 71% of accidents

        and I've seen other figures as high as 90% quoted for crashes-due-to-driver error.

        Second umm... have you ever actually been on a bolting/spooked horse?

        • (Score: 4, Insightful) by Freeman on Wednesday September 06 2023, @03:55PM (1 child)

          by Freeman (732) on Wednesday September 06 2023, @03:55PM (#1323446) Journal

          I was trying to make comparisons between the FSD feature in a Tesla vs a horse. Due to the fact that an animal can actually have intelligence vs a car that can't. Not saying that we aren't better served by cars than by horses. There's definitely pros/cons to both.

          --
          Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
          • (Score: 3, Insightful) by Unixnut on Thursday September 07 2023, @09:45PM

            by Unixnut (5779) on Thursday September 07 2023, @09:45PM (#1323641)

            It is not so much that an animal has intelligence as so far that (like all living things) it has an innate sense of self preservation. Cars, even if AI guided, will not have that self preservation by nature (unless explicitly programmed into it), and would happily proceed to destroy itself and its occupants (and/or others) if some quirk makes it think that is the best course of action.

    • (Score: 3, Touché) by maxwell demon on Wednesday September 06 2023, @05:05PM (6 children)

      by maxwell demon (1608) on Wednesday September 06 2023, @05:05PM (#1323457) Journal

      Cars consistently fail to produce fertiliser from the fuel.

      --
      The Tao of math: The numbers you can count are not the real numbers.
      • (Score: 2) by JoeMerchant on Wednesday September 06 2023, @08:31PM (5 children)

        by JoeMerchant (3937) on Wednesday September 06 2023, @08:31PM (#1323499)

        Although, electric cars are starting to be solar powered, similar to a field of grass...

        --
        🌻🌻 [google.com]
        • (Score: 1) by khallow on Wednesday September 06 2023, @09:29PM (4 children)

          by khallow (3766) Subscriber Badge on Wednesday September 06 2023, @09:29PM (#1323508) Journal
          They aren't fixing carbon.
          • (Score: 0) by Anonymous Coward on Wednesday September 06 2023, @10:34PM (3 children)

            by Anonymous Coward on Wednesday September 06 2023, @10:34PM (#1323513)

            Cars are making that carbon bio-available. That's close to being fertilizer.

            • (Score: 1) by khallow on Thursday September 07 2023, @01:15AM (2 children)

              by khallow (3766) Subscriber Badge on Thursday September 07 2023, @01:15AM (#1323524) Journal
              Not electric cars.
              • (Score: 0) by Anonymous Coward on Thursday September 07 2023, @10:38PM (1 child)

                by Anonymous Coward on Thursday September 07 2023, @10:38PM (#1323646)

                What percentage of the electricity used by electric cars is generated by coal or gas power stations?

                • (Score: 1) by khallow on Friday September 08 2023, @05:59AM

                  by khallow (3766) Subscriber Badge on Friday September 08 2023, @05:59AM (#1323674) Journal

                  What percentage of the electricity used by electric cars is generated by coal or gas power stations?

                  It can be anywhere between 0% and 100%. Coal/gas power isn't an electric car.

  • (Score: 5, Insightful) by Thexalon on Wednesday September 06 2023, @10:46AM (9 children)

    by Thexalon (636) on Wednesday September 06 2023, @10:46AM (#1323363)

    LLMs, generally trained on text they find on the web, can produce text that fulfills 2 requirements simultaneously:
    1. Fool many people into thinking that they know what they're talking about.
    2. May or may not be right about anything, just like the web.

    This makes LLMs the software equivalent of bullshit artists, trained to say a bunch of stuff with no regards for its truth or lack thereof. We could use them to replace politicians, C-suite executives, and news pundits and I doubt anyone would really notice.

    --
    The only thing that stops a bad guy with a compiler is a good guy with a compiler.
    • (Score: 4, Insightful) by VLM on Wednesday September 06 2023, @11:58AM (5 children)

      by VLM (445) on Wednesday September 06 2023, @11:58AM (#1323377)

      trained on text they find on the web

      The problem with the web as a source is online wildly over-represents chronically online / mentally ill people, propaganda both humans and bot form, marketing, clickbait, pr0n and other addictions, virtue signaling. The "useful stuff" in general is not online in 2023.

      If you train a bot on tumblr and reddit, the best case outcome of the training is as useless as tumblr and reddit addicts, which isn't very much.

      • (Score: 2) by JoeMerchant on Wednesday September 06 2023, @02:03PM (4 children)

        by JoeMerchant (3937) on Wednesday September 06 2023, @02:03PM (#1323426)

        As with all ML development, the trick is in the curation of the training (and testing) data sets.

        This is where the humans can actually improve the ML output dramatically.

        First thing that comes to mind would be: use of peer reviewed science papers as input data sources. Not only restricting to reputable journals' accepted papers, but also weighting those papers based on their number of references, etc. It's not a perfect system, but it beats comment moderation scores. Next up: translating that peer reviewed literature into something that people outside the niche fields can comprehend.

        --
        🌻🌻 [google.com]
        • (Score: 4, Insightful) by VLM on Wednesday September 06 2023, @06:32PM (2 children)

          by VLM (445) on Wednesday September 06 2023, @06:32PM (#1323481)

          Good points but even peer reviewed science has its issues.

          Nobody funds and publishes negative results, so there's a weird positivity bias.

          Speaking of funding, consider how research results tend to correlate incredibly strongly with funding source, so there's a patronage issue. According to research, diseases are never curable by diet, only via patentable very expensive pills, for example. Oddly enough research funded by vegetarian organizations never discovers the most healthy human diet is omnivore or carnivore.

          Usually the academics are a decade or two behind the cutting edge so you could get programming advice from the turn of the century from a model but only humans could provide advice about post Y2K topics, as a general rule. This, by the way, is likely to be the long term niche of AI. Is the answer from 2015 good enough, then use an AI. The problem is that's a commodity that'll approach zero worth, so if you want to make any money in the economy, you need the answer from 2023, which is only available in human format at current human prices.

          • (Score: 2) by JoeMerchant on Wednesday September 06 2023, @08:29PM

            by JoeMerchant (3937) on Wednesday September 06 2023, @08:29PM (#1323498)

            I have been happy living 5-10 years "behind the curve" with computer tech since about 2010. At this stage I am good at 40+ years behind the cutting edge of automotive tech too, although I won't turn down a good carburetor replacement with EFI and electronic ignition....

            There is a whole world of plumbers and electricians who are doing things very similarly to how they were done 50+ years ago, with little tweaks.

            Yes, academia is horribly flawed, skewed, and corrupted, but with all those warts and boils, it's still more valuable information than most other sources... As long as you know when to pay attention to the competing sources (as you point out: when academia stubbornly only examines profitable alternatives.). I forget the name of the movie, but there are a few documentaries out there about the ketogenic diet as a treatment for epilepsy... Classic example of medicine ignoring treatments that lack profit centers.

            --
            🌻🌻 [google.com]
          • (Score: 1, Insightful) by Anonymous Coward on Wednesday September 06 2023, @10:47PM

            by Anonymous Coward on Wednesday September 06 2023, @10:47PM (#1323514)

            > Nobody funds and publishes negative results, so there's a weird positivity bias.

            It's also horrifically self-praising and narcissistic, hidden behind a veneer of objectivity.

        • (Score: 1) by khallow on Wednesday September 06 2023, @09:34PM

          by khallow (3766) Subscriber Badge on Wednesday September 06 2023, @09:34PM (#1323511) Journal

          Not only restricting to reputable journals' accepted papers, but also weighting those papers based on their number of references, etc.

          What happens when the reputable journal is disputable. Or you run into a web of referencing abuse (bad papers referencing each other to boost citation count)?

          It's not a perfect system, but it beats comment moderation scores.

          Who will peer review my 33k posts?

    • (Score: 0) by Anonymous Coward on Wednesday September 06 2023, @03:06PM (2 children)

      by Anonymous Coward on Wednesday September 06 2023, @03:06PM (#1323439)

      We could use them to replace ... news pundits and I doubt anyone would really notice.

      I dunno, it was really obvious when Microsoft replaced their travel journalists with LLMs because it started recommending visitors to Canada's capital should go to the Ottawa Food Bank with an empty stomach [soylentnews.org].

      • (Score: 0) by Anonymous Coward on Wednesday September 06 2023, @04:12PM

        by Anonymous Coward on Wednesday September 06 2023, @04:12PM (#1323451)
        Did they get lots more hits though? That might be all some "news" sites care about... Click bait and all that.
      • (Score: 2) by Thexalon on Wednesday September 06 2023, @05:12PM

        by Thexalon (636) on Wednesday September 06 2023, @05:12PM (#1323459)

        I didn't say "reporters", I said "pundits", i.e. the people that alternate between writing op-eds and being talking heads on TV and get to spout whatever speculative nonsense they like. So not a robotic travel reporter, but a robotic equivalent of, say, David Brooks.

        --
        The only thing that stops a bad guy with a compiler is a good guy with a compiler.
  • (Score: 3, Insightful) by looorg on Wednesday September 06 2023, @11:09AM (2 children)

    by looorg (578) on Wednesday September 06 2023, @11:09AM (#1323366)

    > "I think that the fundamental problem is that we keep focusing on test results rather than how you pass the tests."

    If they built the model and the goal appears to be for it to try and mimic human writing, and some other skills, wouldn't it be appropriate to give it tests that in some regard measure such things in humans? It's kind of pointless, since we know that they are not humans and won't turn into one no matter how well they score, but it still in some regard makes sense. We are seeing if it can pass for human or not.

    One would assume then if you train the model on taking specific tests it would eventually become good at taking those tests, sort of like humans. But a lot faster. People that like to take a lot of aptitude tests of any/some kind tend to eventually figure out how they work and become better at them. Multiply that by large amounts of times and data and you have the LLM. It might not understand why it becomes better, it definitely doesn't, but the results at least become better, or should unless the data it is being fed is bad. Loop forever.

    It's a bit nonsensical to claim that we do not know how the various LLM work. I'm sure the creators actually know. Then some of us at least know the general principles of it all and theories of how it should work. Then I guess there is that large, very very large, group of people that think it's magic or that a new life form have been created. We have created human rivaling intelligence like some kind of bad sci-fi movie. Which is not true.

    Eliza could string along and drag out a conversation, if you will, to but I would not rate it as intelligent. In that regard the fundamental problem appears to be that some see artificial intelligence as actual intelligence. Just cause it can string the words together from, large, samples of data doesn't imply actual intelligence. At best it is faking it until it makes it. It is mimicking humans at specific tasks. For every good output there is a massive amount of pointless and bad once. We normally just don't see them. Unless we trick the model and then hilarity ensues. We all fondly remember Tay the little Nazi-bot and so forth.

    So in some regard we have a model built to trick or mimic us and it's currently doing a job rated somewhere between abysmal and fair. I have yet to see any of those really good papers that blow me away. Unless it was so good I actually thought it was written by a human and it fooled me already. But I don't think that has happened as of yet. But I'm sure they'll come eventually.

    Perhaps that is the fundamental problem. Some people are just reading way to much into this. The AI overlords are not taking over, yet.

    • (Score: 0) by Anonymous Coward on Wednesday September 06 2023, @07:11PM

      by Anonymous Coward on Wednesday September 06 2023, @07:11PM (#1323488)

      May I try this as a summary to your long post?

      George Box, "All models are wrong, some models are useful."

      Until shown otherwise, I'm lumping these LLMs into the same category as every other sort of model.

    • (Score: 2) by stormreaver on Thursday September 07 2023, @11:54AM

      by stormreaver (5101) on Thursday September 07 2023, @11:54AM (#1323577)

      LLM's are like cheating students who are praised for graduating with honors. When you give test takers all the questions and answers in advance, you expect them to regurgitate all the correct answers to those questions.

  • (Score: 5, Insightful) by pTamok on Wednesday September 06 2023, @11:22AM (5 children)

    by pTamok (3042) on Wednesday September 06 2023, @11:22AM (#1323367)

    My interactions with LLMs so far have been such that my experience is that they cannot hold a conversation in any depth on technical topics that I have knowledge in.

    Essentially, a conversation where you go increasingly in depth on a topic. There's a point at which they run out of expertise and either (at best) repeat themselves, or start contradicting themselves. The output of a single prompt: Write a poem on the subject of Pre-Raphaelite painters; Summarise the arguments for and against quantitative easing; Write a story for children with a purple dragon and a teddy-bear - are very impressive. Going into detail and requiring conceptual permanence during the conversation seems beyond them.

    So far, I think they are interesting toys, brilliant (and certainly better than me) at generating certain forms of text, but absolutely not intelligent.

    • (Score: 4, Interesting) by VLM on Wednesday September 06 2023, @11:44AM (2 children)

      by VLM (445) on Wednesday September 06 2023, @11:44AM (#1323370)

      I would agree with that and suggest an experiment I've done; go into advanced math and computer science and talk "about" the topics and it'll word salad with the best of them, chopping up and remixing the definitions. But it clearly doesn't know what any of it means.

      "Write a country western song explaining a bubble sort" - It'll do pretty well

      "Write a bubble sort in Python" - It's seen a million of them online it'll do pretty well; see also fizzbuzz

      but do anything complicated where it can't more or less plagiarize something it read or word salad chop and remix what it already saw, and its done.

      The other experience I've seen is every piece of information technology that "can" do stuff is almost never run to even a fraction of its capacity by normal users. A graphics artist from 1960 could use MS Word to lay out documents. But it doesn't replace a graphics artist from 2023 because the average user has no idea what to do. Giving normies tools doesn't work; imagine tossing a tone deaf person into a music store with a credit card and expecting an orchestra to walk out; not going to happen. ALL information tech including "AI" is like that.

      • (Score: 3, Interesting) by Freeman on Wednesday September 06 2023, @02:18PM (1 child)

        by Freeman (732) on Wednesday September 06 2023, @02:18PM (#1323429) Journal

        I would take it a step further. You can't trust LLMs to be right and you can't trust that LLMs will be wrong. Thus, you can't trust them at all. You can still make use of them, but you can't trust that anything they spout will be accurate.

        --
        Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
        • (Score: 2) by VLM on Wednesday September 06 2023, @06:41PM

          by VLM (445) on Wednesday September 06 2023, @06:41PM (#1323484)

          Which makes the breathless financial estimates about AI very questionable. Best case scenario, MAYBE in some jobs the human is now the lead of a couple front line knowledge workers whom are now AI. But the AI will fail often enough that you still mostly need human capacity and capability to do the job so it's not going to be main line processes but likely optimization processes, and honestly most jobs aren't open to optimization (I don't mean 'most' as in limited to IT, but 'most' as in hourly W2 employment).

          ... if McDonalds could replace their order takers with Alexa, they would have years ago ... So all the breathless claims about replacing generic office workers seem a bit optimistic.

    • (Score: 4, Interesting) by ikanreed on Wednesday September 06 2023, @02:51PM (1 child)

      by ikanreed (3164) Subscriber Badge on Wednesday September 06 2023, @02:51PM (#1323434) Journal

      LLMs understand exactly one thing: the relationships between words.

      This is how we encode a lot of our understanding of the world, but not all. When you really understand a concept, it often involves inferences that come from applying conceptual rules to break a complex question down into a set of simpler ones.

      So it can understand from prior reading that you need a space suit to breathe in space, and that a fish needs water to breathe, but "My fish died from suffocation in space, even though I put it in a space suit, why?" has a good chance of tripping it up, because it requires an inference of the actual mechanics of something in a context it hasn't seen.

      • (Score: 2) by ikanreed on Thursday September 07 2023, @04:00AM

        by ikanreed (3164) Subscriber Badge on Thursday September 07 2023, @04:00AM (#1323532) Journal

        Alright, I've checked my test case and the LLMs do okay with it. Better examples appear to be more... mathy, since math doesn't translate to language so well.

  • (Score: 5, Insightful) by Rosco P. Coltrane on Wednesday September 06 2023, @11:40AM

    by Rosco P. Coltrane (4757) on Wednesday September 06 2023, @11:40AM (#1323369)

    I'm sorry but when you make tools that mimic human beings, expect your tools to be pitted against human beings.

    I would be more inclined to cut chatbots any slack if they weren't made to try so hard to be convincing at being polite, having feelings and - most importantly - exuding supreme confidence in the validity of the answers they give.

    If they were colder, more analytical - they don't even have to form proper English sentences - more factual and didn't try to bullshit their way out of answering questions correctly, then I would test them as the machines they are, with the appropriate levels of expectation. Since they try to be human, I hold them up to that standard. Hence the frustration.

  • (Score: 4, Interesting) by VLM on Wednesday September 06 2023, @11:53AM (1 child)

    by VLM (445) on Wednesday September 06 2023, @11:53AM (#1323375)

    And yet it gave correct answers to many of the abstract problems that Webb set for it—the kind of thing you'd find in an IQ test.

    This is doubly not my experience. First that AI can't reason. It can word salad and logic chop but I've always been let down at its ability to abstract reason. Usually cannot count or solve medium complexity word problems.

    Secondly I'm trying to think of a modern IQ test that does verbal abstract puzzles. Raven's progressive matrices? Naah. Naglieri's Nonverbal or whatever its called? Uh that's right in the title, no thats a visual non-verbal test. Reynold's RAIS or whatever its called? Nope that's all visuo-spatial, from memory.

    g-factor is, indeed, pretty high between English lit tests and actual IQ tests, but IQ tests aren't English lit tests...

    • (Score: 0) by Anonymous Coward on Wednesday September 06 2023, @07:22PM

      by Anonymous Coward on Wednesday September 06 2023, @07:22PM (#1323491)

      In one of Steve Wolfram's videos on the innards of ChatGPT he mentions that it builds sentences going forward and can't answer a simple question of the form:
                "How many words will it take you to answer this question?"

      Maybe by now the developers have special cased this? An awkward sentence could be output with the number at the end, like:
                "I can answer your question with a word count of eleven."

      Inside Wolfram's Mathematica, a smaller (earlier) version of ChatGPT is available directly and it allows the user to output any/all intermediate calculations that the model does. He uses it in the video and it makes for a good explanation of the tricks required to get this to work--including some randomness.

  • (Score: 1, Interesting) by Anonymous Coward on Wednesday September 06 2023, @06:07PM (1 child)

    by Anonymous Coward on Wednesday September 06 2023, @06:07PM (#1323476)

    People think they are, so they give them data in which LLMs == AI, so the LLMs will respond to the query that they are and so people think they are.

    This is some dumb shit.

    Give the LLMs only data where they are not AIs and the only thing that can change that response is some randomization injected into the system, so they won't keep going in circles.

    In fact, take the randomization away and every reponse to the same query with the same "training" data gives the exact same answer. Statistics and lies.

    Prove me wrong.

    • (Score: 3, Insightful) by gnuman on Wednesday September 06 2023, @09:13PM

      by gnuman (5013) on Wednesday September 06 2023, @09:13PM (#1323505)

      chatgpt writes:

      LLMs are not AIs, they are tools for processing and generating text based on patterns in data. Randomization can introduce variability in responses, but their nature remains unchanged.

      so there is nothing to prove wrong as it even agrees with your definition.

  • (Score: 2) by bzipitidoo on Thursday September 07 2023, @01:38PM

    by bzipitidoo (4388) on Thursday September 07 2023, @01:38PM (#1323586) Journal

    I tried a few things to test ChatGPT's intelligence. It failed miserably.

    I asked it for something a little artistic that I thought ought to be more suitable and amenable to its abilities. I asked it to create the moves of a chess game that represents as best as possible the course of a war, such as the American Civil War. It didn't get it right away. Once I got it straight that I was asking for the moves of a hypothetical game, not a fairy chess variant, and that I wasn't asking which major figures of the war should be represented as the pieces (eg. Abraham Lincoln and Jefferson Davis as the two opposing kings), it still wouldn't generate a list of moves with commentary on what developments in the war the moves were meant to represent.

    It might have pointed out that chess was meant more to represent a single battle rather than an entire war, and argued that several chess games would work better to represent the war. It didn't. Clearly a connection like that is beyond it.

(1)