Stories
Slash Boxes
Comments

SoylentNews is people

posted by hubie on Wednesday October 12 2022, @12:24AM   Printer-friendly

DeepMind unveils first AI to discover faster matrix multiplication algorithms:

Can artificial intelligence (AI) create its own algorithms to speed up matrix multiplication, one of machine learning’s most fundamental tasks? Today, in a paper published in Nature, DeepMind unveiled AlphaTensor, the “first artificial intelligence system for discovering novel, efficient and provably correct algorithms.” The Google-owned lab said the research “sheds light” on a 50-year-old open question in mathematics about finding the fastest way to multiply two matrices.

Ever since the Strassen algorithm was published in 1969, computer science has been on a quest to surpass its speed of multiplying two matrices. While matrix multiplication is one of algebra’s simplest operations, taught in high school math, it is also one of the most fundamental computational tasks and, as it turns out, one of the core mathematical operations in today’s neural networks.

[...] This research delves into how AI could be used to improve computer science itself, said Pushmeet Kohli, head of AI for science at DeepMind, at a press briefing.

“If we’re able to use AI to find new algorithms for fundamental computational tasks, this has enormous potential because we might be able to go beyond the algorithms that are currently used, which could lead to improved efficiency,” he said.

This is a particularly challenging task, he explained, because the process of discovering new algorithms is so difficult, and automating algorithmic discovery using AI requires a long and difficult reasoning process — from forming intuition about the algorithmic problem to actually writing a novel algorithm and proving that the algorithm is correct on specific instances.

“This is a difficult set of steps and AI has not been very good at that so far,” he said.

[...] According to DeepMind, AlphaTensor discovered algorithms that are more efficient than the state of the art for many matrix sizes and outperform human-designed ones.

AlphaTensor begins without any knowledge about the problem, Kohli explained, and then gradually learns what is happening and improves over time. “It first finds this classroom algorithm that we were taught, and then it finds historical algorithms such as Strassen’s and then at some point, it surpasses them and discovers completely new algorithms that are faster than previously.”

Kohli said he hopes that this paper inspires others in using AI to guide algorithmic discovery for other fundamental competition tasks. “We think this is a major step in our path towards really using AI for algorithmic discovery,” he said.

What other new algorithms will be discovered? I wonder when they will attempt to apply this to factorization?


Original Submission

 
This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Insightful) by stormwyrm on Wednesday October 12 2022, @12:56AM (9 children)

    by stormwyrm (717) on Wednesday October 12 2022, @12:56AM (#1276155) Journal

    Here is the paper. [nature.com] (doi:10.1038/s41586-022-05172-4). And the blog post from the researchers [deepmind.com]. TFA sounds more like a PR puff piece.

    I am a bit sceptical that this actually represents some kind of practical performance improvement. Strassen's algorithm for 2×2 matrices requires seven multiplications as opposed to the eight required by the standard O(n³) algorithm, but the standard algorithm requires four additions and Strassen's uses 17. I don't think that a single multiplication takes 13 more cycles than an addition on modern silicon. I don't think this was true even when Strassen discovered his algorithm. The 4×5 and 5×5 matrix multiplication, for which the standard algorithm does 100 multiplications while Strassen's does 80, but the new technique Deepmind used has discovered one with 76 multiplications. Now how many additions will it take, and how much more extra storage? Will saving 24 multiplications be worth all the extra additions and memory and cache overhead of the new technique?

    Just like Strassen's algorithm, it seems like no more than a theoretical curiosity. It seems doubtful it can deliver any actual performance gains. Still, it's intriguing that they were able to extend the domain into mathematics.

    --
    Numquam ponenda est pluralitas sine necessitate.
    Starting Score:    1  point
    Moderation   +3  
       Insightful=1, Interesting=1, Informative=1, Total=3
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 2, Interesting) by khallow on Wednesday October 12 2022, @01:17AM

    by khallow (3766) Subscriber Badge on Wednesday October 12 2022, @01:17AM (#1276160) Journal

    Just like Strassen's algorithm, it seems like no more than a theoretical curiosity.

    Strassen's algorithm is a genuine speed improvement. Block matrix multiplication is more costly than block matrix addition, and this goes up as the matrix increases in size. The matrix needs to be large enough - IIRC, I think it needs to be somewhere around 32 or more rows/columns in each of the matrix dimensions to justify the overhead of extra additions.

    I don't know about the present Google approach, but my understanding has been that as one gets the theoretical number of block matrix multiplications down, the rest of the overhead goes up. Some of these algorithms see performance advantage at absurd sizes, maybe like millions of rows/columns. They probably also require considerable massaging to get the right number of rows/columns (like Strassen which operates on even rows/columns).

  • (Score: 2) by bzipitidoo on Wednesday October 12 2022, @02:55AM (4 children)

    by bzipitidoo (4388) Subscriber Badge on Wednesday October 12 2022, @02:55AM (#1276177) Journal

    Yes, it will be worth any finite number of extra addition operations to avoid just 1 multiplication. You might have to scale up to very large numbers and large matrices to realize the savings. But it's there.

    The issue is that for m digits, multiplication takes at least m log m operations, if m is large enough for the Fast Fourier Transform to be worth using, otherwise, you're stuck with grade school multiplication, which takes m^2 operations. Addition takes m operations. Even if you have to do 100 extra additions, 100m will be smaller than m log m, for sufficiently large m.

    • (Score: 3, Informative) by stormwyrm on Wednesday October 12 2022, @03:55AM (1 child)

      by stormwyrm (717) on Wednesday October 12 2022, @03:55AM (#1276184) Journal

      Absolute hogwash when we're talking about ordinary everyday systems that most often arise in practice, especially those that use integer and floating point data types that are provided by modern hardware. Even moving data around is not free, and these other algorithms do that way more than the basic one. Strassen's algorithm for instance is actually counterproductive for matrices that are less than around 1000 in size because of the algorithm's substantial overhead. Even for such larger matrices performance gains are marginal at around 10% improvement at best over the basic matrix multiplication algorithm. It also incurs a cost in increased memory usage (meaning in addition to the eight megabytes each of your 1000×1000 double precision floating point matrices already consumes you'll need several times more to multiply them), and for floating point implementations, decreased numerical stability. The Coppersmith–Winograd algorithm and this new one developed at DeepMind are likely even worse in this respect. There is a sizeable constant factor in the performance of these "better" algorithms that is hidden by big-O notation, that make them not worth using except in highly exceptional circumstances.

      --
      Numquam ponenda est pluralitas sine necessitate.
      • (Score: 1) by khallow on Thursday October 13 2022, @03:22PM

        by khallow (3766) Subscriber Badge on Thursday October 13 2022, @03:22PM (#1276447) Journal

        Strassen's algorithm for instance is actually counterproductive for matrices that are less than around 1000 in size because of the algorithm's substantial overhead.

        You're also assuming the existence of fast vector operations which a lot of CPUs have which greatly speed up normal matrix multiplication. If they don't, it achieves its advantage at much smaller matrix size.

    • (Score: 2) by FatPhil on Wednesday October 12 2022, @07:21PM (1 child)

      by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Wednesday October 12 2022, @07:21PM (#1276280) Homepage
      You do realise that 'm' is the *size* of the inputs.
      So if lg(m) is 100, then you're talking 10^30 digit numbers. We don't have the capacity to even store such numbers, let alone manipulate them, yet.
      So, nah, 100m is a loss compared to m.lg(m).
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
      • (Score: 2) by bzipitidoo on Wednesday October 12 2022, @08:56PM

        by bzipitidoo (4388) Subscriber Badge on Wednesday October 12 2022, @08:56PM (#1276304) Journal

        True enough, I went overboard, 100 is too big a constant. But 20 is likely worthwhile.

  • (Score: 2) by FatPhil on Wednesday October 12 2022, @07:34PM (1 child)

    by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Wednesday October 12 2022, @07:34PM (#1276284) Homepage
    Can confirm - sniffing arounf their git repo, I came across: https://github.com/deepmind/alphatensor/blob/main/recombination/sota.py
    Which has 220 lines of data, but only these three lines of relevance:
            (3, 4, 5): 47, # Discovered by AlphaTensor.
            (4, 4, 5): 63, # Discovered by AlphaTensor.
            (4, 5, 5): 76, # Discovered by AlphaTensor.
    So it's a minor improvement, even theoretically, to a tiny subset of the problem space. However, still an interesting curiosity.

    I'm surprised that these haven't been sovled by exhaustive search. There are only 2^12 and 2^20 sums, so only 2^32 products. Surely that's a smaller linear relation problem than some NFS factorisations? Or is it a finite cover problem? Or a lattice reduction problem (c.f. Noam Elkies' sets anagram solution to what is a finite cover problem)? Or some other problem type - there's always more than one way to approach these things?
    --
    Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
    • (Score: 2) by FatPhil on Wednesday October 12 2022, @07:46PM

      by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Wednesday October 12 2022, @07:46PM (#1276288) Homepage
      Ah, they searched much larger matrix sizes than just the m<=n<=p<=12 in that file, and found a bunch of improvement over prior state-of-the-art algorithms.
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
  • (Score: 3, Funny) by FatPhil on Wednesday October 12 2022, @07:37PM

    by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Wednesday October 12 2022, @07:37PM (#1276286) Homepage
    Damn, I forgot about this!

    I think at least you will be smart enough to snigger at these two lines from the SOTA file I linked to above:

          a, b, c = sorted([a, b, c])
    ...
          return int(math.ceil((3 * b * c + max(b, c)) / 2))
    --
    Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves