Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 13 submissions in the queue.
posted by Fnord666 on Sunday August 06 2023, @11:16AM   Printer-friendly
from the peering-into-the-abyss dept.

https://arstechnica.com/science/2023/07/a-jargon-free-explanation-of-how-ai-large-language-models-work/

When ChatGPT was introduced last fall, it sent shockwaves through the technology industry and the larger world. Machine learning researchers had been experimenting with large language models (LLMs) for a few years by that point, but the general public had not been paying close attention and didn't realize how powerful they had become.

Today, almost everyone has heard about LLMs, and tens of millions of people have tried them out. But not very many people understand how they work.

If you know anything about this subject, you've probably heard that LLMs are trained to "predict the next word" and that they require huge amounts of text to do this. But that tends to be where the explanation stops. The details of how they predict the next word is often treated as a deep mystery.
[...]
To understand how language models work, you first need to understand how they represent words. Humans represent English words with a sequence of letters, like C-A-T for "cat." Language models use a long list of numbers called a "word vector." For example, here's one way to represent cat as a vector:

[0.0074, 0.0030, -0.0105, 0.0742, 0.0765, -0.0011, 0.0265, 0.0106, 0.0191, 0.0038, -0.0468, -0.0212, 0.0091, 0.0030, -0.0563, -0.0396, -0.0998, -0.0796, ..., 0.0002]

(The full vector is 300 numbers long—to see it all, click here and then click "show the raw vector.")

Why use such a baroque notation? Here's an analogy. Washington, DC, is located at 38.9 degrees north and 77 degrees west. We can represent this using a vector notation:

  • Washington, DC, is at [38.9, 77]
  • New York is at [40.7, 74]
  • London is at [51.5, 0.1]
  • Paris is at [48.9, -2.4]

This is useful for reasoning about spatial relationships.
[...]
For example, the words closest to cat in vector space include dog, kitten, and pet. A key advantage of representing words with vectors of real numbers (as opposed to a string of letters, like C-A-T) is that numbers enable operations that letters don't.

Words are too complex to represent in only two dimensions, so language models use vector spaces with hundreds or even thousands of dimensions.
[...]
Researchers have been experimenting with word vectors for decades, but the concept really took off when Google announced its word2vec project in 2013. Google analyzed millions of documents harvested from Google News to figure out which words tend to appear in similar sentences. Over time, a neural network trained to predict which words co-occur with other words learned to place similar words (like dog and cat) close together in vector space.
[...]
Because these vectors are built from the way humans use words, they end up reflecting many of the biases that are present in human language. For example, in some word vector models, "doctor minus man plus woman" yields "nurse." Mitigating biases like this is an area of active research.
[...]
Traditional software is designed to operate on data that's unambiguous. If you ask a computer to compute "2 + 3," there's no ambiguity about what 2, +, or 3 mean. But natural language is full of ambiguities that go beyond homonyms and polysemy:

  • In "the customer asked the mechanic to fix his car," does "his" refer to the customer or the mechanic?
  • In "the professor urged the student to do her homework" does "her" refer to the professor or the student?
  • In "fruit flies like a banana" is "flies" a verb (referring to fruit soaring across the sky) or a noun (referring to banana-loving insects)?

People resolve ambiguities like this based on context, but there are no simple or deterministic rules for doing this. Rather, it requires understanding facts about the world. You need to know that mechanics typically fix customers' cars, that students typically do their own homework, and that fruit typically doesn't fly.

Word vectors provide a flexible way for language models to represent each word's precise meaning in the context of a particular passage.
[...]
Research suggests that the first few layers focus on understanding the sentence's syntax and resolving ambiguities like we've shown above. Later layers (which we're not showing to keep the diagram a manageable size) work to develop a high-level understanding of the passage as a whole.
[...]
In short, these nine attention heads enabled GPT-2 to figure out that "John gave a drink to John" doesn't make sense and choose "John gave a drink to Mary" instead.

We love this example because it illustrates just how difficult it will be to fully understand LLMs. The five-member Redwood team published a 25-page paper explaining how they identified and validated these attention heads. Yet even after they did all that work, we are still far from having a comprehensive explanation for why GPT-2 decided to predict "Mary" as the next word.
[...]
In a 2020 paper, researchers from Tel Aviv University found that feed-forward layers work by pattern matching: Each neuron in the hidden layer matches a specific pattern in the input text.
[...]
Recent research from Brown University revealed an elegant example of how feed-forward layers help to predict the next word. Earlier, we discussed Google's word2vec research showing it was possible to use vector arithmetic to reason by analogy. For example, Berlin - Germany + France = Paris.

The Brown researchers found that feed-forward layers sometimes use this exact method to predict the next word.
[...]
All the parts of LLMs we've discussed in this article so far—the neurons in the feed-forward layers and the attention heads that move contextual information between words—are implemented as a chain of simple mathematical functions (mostly matrix multiplications) whose behavior is determined by adjustable weight parameters. Just as the squirrels in my story loosen and tighten the valves to control the flow of water, so the training algorithm increases or decreases the language model's weight parameters to control how information flows through the neural network.
[....]
(If you want to learn more about backpropagation, check out our 2018 explainer on how neural networks work.)
[...]
Over the last five years, OpenAI has steadily increased the size of its language models. In a widely read 2020 paper, OpenAI reported that the accuracy of its language models scaled "as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude."

The larger their models got, the better they were at tasks involving language. But this was only true if they increased the amount of training data by a similar factor. And to train larger models on more data, you need a lot more computing power.
[...]
Psychologists call this capacity to reason about the mental states of other people "theory of mind." Most people have this capacity from the time they're in grade school. Experts disagree about whether any non-human animals (like chimpanzees) have theory of mind, but there's a general consensus that it's important for human social cognition.

Earlier this year, Stanford psychologist Michal Kosinski published research examining the ability of LLMs to solve theory-of-mind tasks. He gave various language models passages like the one we quoted above and then asked them to complete a sentence like "she believes that the bag is full of." The correct answer is "chocolate," but an unsophisticated language model might say "popcorn" or something else.
[...]
It's worth noting that researchers don't all agree that these results indicate evidence of theory of mind; for example, small changes to the false-belief task led to much worse performance by GPT-3, and GPT-3 exhibits more variable performance across other tasks measuring theory of mind. As one of us (Sean) has written, it could be that successful performance is attributable to confounds in the task—a kind of "clever Hans" effect, only in language models rather than horses.
[...]
In April, researchers at Microsoft published a paper arguing that GPT-4 showed early, tantalizing hints of artificial general intelligence—the ability to think in a sophisticated, human-like way.

For example, one researcher asked GPT-4 to draw a unicorn using an obscure graphics programming language called TiKZ. GPT-4 responded with a few lines of code that the researcher then fed into the TiKZ software. The resulting images were crude, but they showed clear signs that GPT-4 had some understanding of what unicorns look like.
[...]
At the moment, we don't have any real insight into how LLMs accomplish feats like this. Some people argue that such examples demonstrate that the models are starting to truly understand the meanings of the words in their training set. Others insist that language models are "stochastic parrots" that merely repeat increasingly complex word sequences without truly understanding them.
[...]
Further, prediction may be foundational to biological intelligence as well as artificial intelligence. In the view of philosophers like Andy Clark, the human brain can be thought of as a "prediction machine" whose primary job is to make predictions about our environment that can then be used to navigate that environment successfully.
[...]
Traditionally, a major challenge for building language models was figuring out the most useful way of representing different words—especially because the meanings of many words depend heavily on context. The next-word prediction approach allows researchers to sidestep this thorny theoretical puzzle by turning it into an empirical problem. It turns out that if we provide enough data and computing power, language models end up learning a lot about how human language works simply by figuring out how to best predict the next word. The downside is that we wind up with systems whose inner workings we don't fully understand.


Original Submission

 
This discussion was created by Fnord666 (652) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 3, Touché) by Isia on Sunday August 06 2023, @11:37AM

    by Isia (25931) on Sunday August 06 2023, @11:37AM (#1319359) Journal

    Where the ChatGPT hype comes from.
    The average person could not imagine an AI in this performance class.
    What they lack is to have seen and understood https://openai.com/blog/emergent-tool-use/ [openai.com]
    And ChatGPT has given these people an interface to AI that they didn't have before to understand it.
    And now all the dummies are making a fuss about ChatGPT.

    The real fun is in the 'fault adaptive deep reinforcement learning algorithm'.
    What can you do with it?
    AI quick self learning.
    Walking: Learning to walk in the real world in 1 hour (no simulator) https://www.youtube.com/watch?v=xAXvfVTgqr0 [youtube.com]
                                A small step to https://www.youtube.com/watch?v=G6fMV1UPzkg [youtube.com]
    Playing Go: No human can defeat the latest Go AI. The best human Go players were defeated 60:0 (60 games, 60 times lost).
                                        It's so bad that humans don't even understand the moves anymore...until they are "suddenly" defeated.
    Playing Stratego: No human can defeat the latest Stratego AI. Great idea to teach an AI the art or war.

    --
    Belief in a higher being is for the stupid, the weak and the cowardly.
  • (Score: 2) by looorg on Sunday August 06 2023, @01:18PM (1 child)

    by looorg (578) on Sunday August 06 2023, @01:18PM (#1319365)

    To many words. It could be optimized to just say Linear Algebra, this is what all the various vector calculations would fall under, and Statistics. But perhaps that just isn't technomacy enough. Also Linear Algebra makes most people have nightmares and horrible flashbacks to their youth.

    • (Score: 0) by Anonymous Coward on Sunday August 06 2023, @02:02PM

      by Anonymous Coward on Sunday August 06 2023, @02:02PM (#1319367)

      > ...most people ...

      ... most people around here ...
      ftfy

      I don't think that most people, in the normal broad meaning, have ever heard of linear algebra. In my case linear algebra is all good, a guy I work with is highly skilled at it, and I get to see the results.

  • (Score: 4, Interesting) by VLM on Sunday August 06 2023, @03:59PM (1 child)

    by VLM (445) on Sunday August 06 2023, @03:59PM (#1319382)

    Its overly complicated for a "jargon free explanation"

    Its simpler, really. Joe 6 Pack is familiar with web searches over the last quarter century. Also with the magic of spell checkers and recent progress in grammar checkers. Also familiar with crappy response-bots that never really work well.

    What if when you typed in something, it did some limited pre-filtering and formatting, then piled together a lot o web searches, ran the pile thru a grammar checker, some filtering again, then another checker, eventually it just gave you the result.

    Thats not bad for a non-math answer.

    • (Score: 0) by Anonymous Coward on Thursday August 10 2023, @06:15AM

      by Anonymous Coward on Thursday August 10 2023, @06:15AM (#1319754)
      The explanation I like is an LLM is basically auto-complete on steroids. You supply it some keywords and it tried to auto-complete them based on the training data that it got from the Internet. And people have found that picking the statistically most likely word all the time doesn't produce good results, picking something less likely works better.

      So in many cases the autocomplete works fine. In other cases it can be pretty bad.
  • (Score: 3, Interesting) by captain normal on Sunday August 06 2023, @06:45PM (1 child)

    by captain normal (2205) on Sunday August 06 2023, @06:45PM (#1319395)

    There are over 7000 languages on this planet. Seems the Boffins are just scratching the surface of one language. How big a data center will be required to deal with all these? And that's just written or typed language. Many languages depend on visual cues from the speaker to covey meaning. For instance, I spent a near a year in Sri Lanka where there are several languages spoken but there is one thing common with all, nodding the head up and down indicates "no" and nodding the head side to side means "yes". I can see a potential for AI in working on taming all this babel just for the sake of communication. Yet it seems all that all this work is just to try to get people to purchase a certain item or vote for a certain person or issue. Is it really worth all the person hours and resources being thrown at it?

    --
    "If men were angels, government would not be necessary." James Madison
    • (Score: 4, Interesting) by inertnet on Sunday August 06 2023, @10:08PM

      by inertnet (4071) on Sunday August 06 2023, @10:08PM (#1319403) Journal

      Google translate does that, it clearly tokenizes everything into English before translating. Which is very annoying because when I want to double check my German writing, it compresses every "du, Sie, ihr" into a simple "you", which won't be translated in the corresponding Dutch word (my native language) or vice versa. Very annoying, deepl.com does a much better job, or at least it gives clear options. I understand German well enough to see where Google gets it wrong, but it's been 50 years since I learned it in school, so I like to check my correspondence before I send it. I'm glad I found deepl.com for that.

  • (Score: 2) by krishnoid on Sunday August 06 2023, @09:36PM

    by krishnoid (1156) on Sunday August 06 2023, @09:36PM (#1319402)

    But natural language is full of ambiguities that go beyond homonyms and polysemy:

    • In "the customer asked the mechanic to fix his car," does "his" refer to the customer or the mechanic?
    • In "the professor urged the student to do her homework" does "her" refer to the professor or the student?
      ...

    See? You can complain about homo something and poly stuff, but it's the pronouns that really cause the problems. But then again, (#notall)humans created pronouns, and language, and nowadays, are variously retargeting pronouns towards societal strife. So maybe artificial intelligence will eventually realize the root cause of the problem, fix the glitch [youtu.be], and things will go back to the way they were.

  • (Score: 0) by Anonymous Coward on Monday August 07 2023, @02:30AM

    by Anonymous Coward on Monday August 07 2023, @02:30AM (#1319421)

    For example, the words closest to cat in vector space include dog, kitten, and pet. A key advantage of representing words with vectors of real numbers (as opposed to a string of letters, like C-A-T) is that numbers enable operations that letters don't.

    Have they already gone beyond location and added on relationships?

    e.g. in some cases cow is to grass the way horse is to grass and man is to cow.

    Or is that not helpful?

  • (Score: 0) by Anonymous Coward on Monday August 07 2023, @11:51AM (3 children)

    by Anonymous Coward on Monday August 07 2023, @11:51AM (#1319456)

    Think this might be helpful to help people minimize their tax burden?

    If nothing else, give them the burden of having to reply to teams of computer generated bullshit.

    • (Score: 1, Touché) by Anonymous Coward on Monday August 07 2023, @01:42PM (2 children)

      by Anonymous Coward on Monday August 07 2023, @01:42PM (#1319467)

      The last place I want a lying AI is doing my taxes! If the AI is given instructions to "reduce taxes" it will probably try to claim the 15% depletion allowance on oil production...even though I don't own any oil stock...

      • (Score: 2) by Freeman on Monday August 07 2023, @03:49PM (1 child)

        by Freeman (732) on Monday August 07 2023, @03:49PM (#1319479) Journal

        Very much so, the likes of ChatGPT and other "AIs" tend to make things up. Which you definitely don't want to happen with regards to your taxes. Hmm..., why is this guy claiming X Y Z credits, when they didn't have anything like that the year before? Audit time!

        --
        Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
        • (Score: 0) by Anonymous Coward on Monday August 07 2023, @09:11PM

          by Anonymous Coward on Monday August 07 2023, @09:11PM (#1319508)

          > Audit time!

          For an ultimate dystopian scenario, what if the IRS (taxing agency) starts using AI to determine who to audit??!!

(1)