Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Friday April 21 2023, @09:06AM   Printer-friendly

The Hyena code is able to handle amounts of data that make GPT-style technology run out of memory and fail:

In a paper published in March, artificial intelligence (AI) scientists at Stanford University and Canada's MILA institute for AI proposed a technology that could be far more efficient than GPT-4 -- or anything like it -- at gobbling vast amounts of data and transforming it into an answer.

Known as Hyena, the technology is able to achieve equivalent accuracy on benchmark tests, such as question answering, while using a fraction of the computing power. In some instances, the Hyena code is able to handle amounts of text that make GPT-style technology simply run out of memory and fail.

"Our promising results at the sub-billion parameter scale suggest that attention may not be all we need," write the authors. That remark refers to the title of a landmark AI report of 2017, 'Attention is all you need'. In that paper, Google scientist Ashish Vaswani and colleagues introduced the world to Google's Transformer AI program. The transformer became the basis for every one of the recent large language models.

But the Transformer has a big flaw. It uses something called "attention," where the computer program takes the information in one group of symbols, such as words, and moves that information to a new group of symbols, such as the answer you see from ChatGPT, which is the output.

That attention operation -- the essential tool of all large language programs, including ChatGPT and GPT-4 -- has "quadratic" computational complexity (Wiki "time complexity" of computing). That complexity means the amount of time it takes for ChatGPT to produce an answer increases as the square of the amount of data it is fed as input.

At some point, if there is too much data -- too many words in the prompt, or too many strings of conversations over hours and hours of chatting with the program -- then either the program gets bogged down providing an answer, or it must be given more and more GPU chips to run faster and faster, leading to a surge in computing requirements.

In the new paper, 'Hyena Hierarchy: Towards Larger Convolutional Language Models', posted on the arXiv pre-print server, lead author Michael Poli of Stanford and his colleagues propose to replace the Transformer's attention function with something sub-quadratic, namely Hyena.

[...] The paper's contributing authors include luminaries of the AI world, such as Yoshua Bengio, MILA's scientific director, who is a recipient of a 2019 Turing Award, computing's equivalent of the Nobel Prize. Bengio is widely credited with developing the attention mechanism long before Vaswani and team adapted it for the Transformer.

Also among the authors is Stanford University computer science associate professor Christopher Ré, who has helped in recent years to advance the notion of AI as "software 2.0".

To find a sub-quadratic alternative to attention, Poli and team set about studying how the attention mechanism is doing what it does, to see if that work could be done more efficiently.

A recent practice in AI science, known as mechanistic interpretability, is yielding insights about what is going on deep inside a neural network, inside the computational "circuits" of attention. You can think of it as taking apart software the way you would take apart a clock or a PC to see its parts and figure out how it operates.

One work cited by Poli and team is a set of experiments by researcher Nelson Elhage of AI startup Anthropic. Those experiments take apart the Transformer programs to see what attention is doing.

In essence, what Elhage and team found is that attention functions at its most basic level by very simple computer operations, such as copying a word from recent input and pasting it into the output.

For example, if one starts to type into a large language model program such as ChatGPT a sentence from Harry Potter and the Sorcerer's Stone, such as "Mr. Dursley was the director of a firm called Grunnings...", just typing "D-u-r-s", the start of the name, might be enough to prompt the program to complete the name "Dursley" because it has seen the name in a prior sentence of Sorcerer's Stone. The system is able to copy from memory the record of the characters "l-e-y" to autocomplete the sentence.

However, the attention operation runs into the quadratic complexity problem as the amount of words grows and grows. More words require more of what are known as "weights" or parameters, to run the attention operation.

As the authors write: "The Transformer block is a powerful tool for sequence modeling, but it is not without its limitations. One of the most notable is the computational cost, which grows rapidly as the length of the input sequence increases."

While the technical details of ChatGPT and GPT-4 haven't been disclosed by OpenAI, it is believed they may have a trillion or more such parameters. Running these parameters requires more GPU chips from Nvidia, thus driving up the compute cost.

To reduce that quadratic compute cost, Poli and team replace the attention operation with what's called a "convolution", which is one of the oldest operations in AI programs, refined back in the 1980s. A convolution is just a filter that can pick out items in data, be it the pixels in a digital photo or the words in a sentence.

Poli and team do a kind of mash-up: they take work done by Stanford researcher Daniel Y. Fu and team to apply convolutional filters to sequences of words, and they combine that with work by scholar David Romero and colleagues at the Vrije Universiteit Amsterdam that lets the program change filter size on the fly. That ability to flexibly adapt cuts down on the number of costly parameters, or, weights, the program needs to have.

The result of the mash-up is that a convolution can be applied to an unlimited amount of text without requiring more and more parameters in order to copy more and more data. It's an "attention-free" approach, as the authors put it.


Original Submission

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Funny) by Rosco P. Coltrane on Friday April 21 2023, @09:28AM (13 children)

    by Rosco P. Coltrane (4757) on Friday April 21 2023, @09:28AM (#1302370)

    Not too long ago, AI was supposed to put me out of a job in 2054. Then it was 2030. Then suddenly it became 2024 a few months ago, and now it's next month? Come on...

    • (Score: 3, Funny) by khallow on Friday April 21 2023, @10:29AM (3 children)

      by khallow (3766) Subscriber Badge on Friday April 21 2023, @10:29AM (#1302372) Journal
      You're on your own. I'm more concerned about the nearly extinct gamer:

      At some point, if there is too much data -- too many words in the prompt, or too many strings of conversations over hours and hours of chatting with the program -- then either the program gets bogged down providing an answer, or it must be given more and more GPU chips to run faster and faster, leading to a surge in computing requirements.

      Who will render their pixels now?

    • (Score: 1, Touché) by Anonymous Coward on Friday April 21 2023, @12:47PM (4 children)

      by Anonymous Coward on Friday April 21 2023, @12:47PM (#1302384)

      How long before we can ask, "Oh great Deep Thought, can you tell us the answer, the answer to life, the universe and everything"?
      Yes, Douglas Adams saw this coming...

      • (Score: 2) by HiThere on Friday April 21 2023, @12:54PM (3 children)

        by HiThere (866) Subscriber Badge on Friday April 21 2023, @12:54PM (#1302386) Journal

        That's an easy question, though.
        The answer is "No".

        --
        Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
        • (Score: 3, Insightful) by Anonymous Coward on Friday April 21 2023, @01:21PM (2 children)

          by Anonymous Coward on Friday April 21 2023, @01:21PM (#1302389)

          "Easy for you, Human" (spoken with scorn by a future ChatGPT), "but my remit is to produce a paragraph or more to answer any question, I have no choice but to blather on about things I don't understand."

          • (Score: 2, Touché) by Rosco P. Coltrane on Friday April 21 2023, @02:03PM (1 child)

            by Rosco P. Coltrane (4757) on Friday April 21 2023, @02:03PM (#1302399)

            Because you think most real humans understand what they're talking about today?

            • (Score: 0, Interesting) by Anonymous Coward on Friday April 21 2023, @02:28PM

              by Anonymous Coward on Friday April 21 2023, @02:28PM (#1302405)

              > Because you think most real humans understand what they're talking about today?

              Some of us recognize MAS (male answer syndrome) and do our best to avoid talking blithely about topics where we only have buzzword understanding.

    • (Score: 2) by legont on Friday April 21 2023, @02:17PM (3 children)

      by legont (4179) on Friday April 21 2023, @02:17PM (#1302402)

      It happened already.

      https://www.nytimes.com/2023/04/21/opinion/chatgpt-journalism.html?unlocked_article_code=ls6b8R0u8HgMEWluPsK7cb7HY18eqIy-DPPzRKMa-mkRUBKDdBvTEp-_OW9HDL2m0NqbI9Nzyv2J8tyQFaK1kdNDyB2r_Utm4tZm_-z3mPk-QuwwG49zEaWNunYPFFC9pXhXXXUNPod7TeT00Ha-rzRIG5bAHkqZApq4RGeSdorgDZNx7hl-jcX9R-NfcBjnUr4ElZqNLMPAn3IBp_lZddzFDlJTqhV0mDlpfCE9bgB-Bq1yuuIYggzbqENXqWwiaPEi63yfmPq-TT0Mi-mTqO3OUb7eeoEBehLfmmrpBJQ2anuPbxaMo1rlDeA0B9GaoYMGElE6XiHt&giftCopy=0_NoCopy&smid=url-share [nytimes.com]

      I am looking forward to watching rat eats rat in journalism circles as 99% of them are not needed any more.

      Folks in chat rooms will provide content while a couple of editors will create the paper assuming it survives at all. The rest will go down fighting each other on truth basis. It's gonna be such a fun to watch.

      --
      "Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
      • (Score: 1, Interesting) by Anonymous Coward on Friday April 21 2023, @02:43PM (2 children)

        by Anonymous Coward on Friday April 21 2023, @02:43PM (#1302409)

        > ... as 99% of them are not needed any more.

        What about stories that benefit from an interview with a subject matter expert, or, in other cases someone that witnessed the story first hand? In my engineering niche I'm occasionally interviewed for a story and the journalist will use quotes from me to [attempt to] show that they aren't just blowing smoke. I'm damned if I'll offer that service to a chatbot!

        I've learned to discuss a few ground rules before the interview gets going, primarily that I want to see the quotes that will be used and have a chance to edit them before publication. Reasoning--I feel at a disadvantage in an interview with a journalist--they interview people all the time and the good ones are very good. Since formal interviews are rare for me, I can easily get tripped up. The ability to edit comments somewhat levels the playing field. Note that very few journalists will let an expert preview the whole story, somehow that's protected.

        Now that I think about it I'm going to add a question about use of chatbots/"AI's" and decline the interview if parts of the story will be written by one.

        • (Score: 0, Flamebait) by Anonymous Coward on Friday April 21 2023, @08:08PM (1 child)

          by Anonymous Coward on Friday April 21 2023, @08:08PM (#1302461)

          Don't interact with journoscum. Write a book, blog post, or let your knowledge die with you.

          • (Score: 0, Interesting) by Anonymous Coward on Saturday April 22 2023, @10:34AM

            by Anonymous Coward on Saturday April 22 2023, @10:34AM (#1302550)

            > Write a book, ...

            Same AC here, as it turns out, I did write an engineering reference book, nearly 30 years ago (it's still in print and selling several hundred copies/year). It's a small niche, there aren't very many newer references in the field.

            Perhaps you are correct and I should stop all, "interact with journoscum"?

            However, I would like new/young people to know about my book. Occasionally talking with a journalist is a traditional way to do this--when the book was new the publisher sent out free review copies and got some nice reviews. These days it's minimal time & effort on my part--basically I'm leveraging the journo's audience (in exchange for adding some credibility to the journalist). At this point in my life, I've got better things to do than try and build/maintain a blog or social-media audience of my own (SN is the only "social media" where I post).

  • (Score: 2) by legont on Friday April 21 2023, @02:12PM (1 child)

    by legont (4179) on Friday April 21 2023, @02:12PM (#1302400)

    To reduce that quadratic compute cost, Poli and team replace the attention operation with what's called a "convolution", which is one of the oldest operations in AI programs, refined back in the 1980s. A convolution is just a filter that can pick out items in data, be it the pixels in a digital photo or the words in a sentence.

    Finally somebody have read papers from 60s. Yes, 60s as in 80s they mostly forgot them.
    Or perhaps still not.

    --
    "Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
    • (Score: 3, Interesting) by The Vocal Minority on Saturday April 22 2023, @03:49AM

      by The Vocal Minority (2765) on Saturday April 22 2023, @03:49AM (#1302508) Journal

      Convolutions are used extensively in ANN models, this is just the first time (maybe) that they have been used in the transformer attention mechanism.

  • (Score: 0) by Anonymous Coward on Friday April 21 2023, @03:14PM (1 child)

    by Anonymous Coward on Friday April 21 2023, @03:14PM (#1302415)

    Jesus, these are more like religious/self-help articles. The vanity of academia (and Stanford in particular) is shocking! I predict we will need all this computing power and more to administer therapy for all the broken narcissists being turned out by the Great Institution when they figure out they didn't learn shit and only have a pompous conference article for their efforts. Either that, or they get back into servicing Professors in bowties and continue their psychiatric voyage into the gaseous upper realms of bullshit.

    • (Score: 1, Funny) by Anonymous Coward on Sunday April 23 2023, @07:50AM

      by Anonymous Coward on Sunday April 23 2023, @07:50AM (#1302632)

      we will need all this computing power and more to administer therapy for all the broken narcissists

      Actually a lot of them are probably asking similar crap, so I wonder how much of the computations can be cached.

(1)