DeepMind unveils first AI to discover faster matrix multiplication algorithms:
Can artificial intelligence (AI) create its own algorithms to speed up matrix multiplication, one of machine learning’s most fundamental tasks? Today, in a paper published in Nature, DeepMind unveiled AlphaTensor, the “first artificial intelligence system for discovering novel, efficient and provably correct algorithms.” The Google-owned lab said the research “sheds light” on a 50-year-old open question in mathematics about finding the fastest way to multiply two matrices.
Ever since the Strassen algorithm was published in 1969, computer science has been on a quest to surpass its speed of multiplying two matrices. While matrix multiplication is one of algebra’s simplest operations, taught in high school math, it is also one of the most fundamental computational tasks and, as it turns out, one of the core mathematical operations in today’s neural networks.
[...] This research delves into how AI could be used to improve computer science itself, said Pushmeet Kohli, head of AI for science at DeepMind, at a press briefing.
“If we’re able to use AI to find new algorithms for fundamental computational tasks, this has enormous potential because we might be able to go beyond the algorithms that are currently used, which could lead to improved efficiency,” he said.
This is a particularly challenging task, he explained, because the process of discovering new algorithms is so difficult, and automating algorithmic discovery using AI requires a long and difficult reasoning process — from forming intuition about the algorithmic problem to actually writing a novel algorithm and proving that the algorithm is correct on specific instances.
“This is a difficult set of steps and AI has not been very good at that so far,” he said.
[...] According to DeepMind, AlphaTensor discovered algorithms that are more efficient than the state of the art for many matrix sizes and outperform human-designed ones.
AlphaTensor begins without any knowledge about the problem, Kohli explained, and then gradually learns what is happening and improves over time. “It first finds this classroom algorithm that we were taught, and then it finds historical algorithms such as Strassen’s and then at some point, it surpasses them and discovers completely new algorithms that are faster than previously.”
Kohli said he hopes that this paper inspires others in using AI to guide algorithmic discovery for other fundamental competition tasks. “We think this is a major step in our path towards really using AI for algorithmic discovery,” he said.
What other new algorithms will be discovered? I wonder when they will attempt to apply this to factorization?
(Score: 2) by bzipitidoo on Wednesday October 12 2022, @02:55AM (4 children)
Yes, it will be worth any finite number of extra addition operations to avoid just 1 multiplication. You might have to scale up to very large numbers and large matrices to realize the savings. But it's there.
The issue is that for m digits, multiplication takes at least m log m operations, if m is large enough for the Fast Fourier Transform to be worth using, otherwise, you're stuck with grade school multiplication, which takes m^2 operations. Addition takes m operations. Even if you have to do 100 extra additions, 100m will be smaller than m log m, for sufficiently large m.
(Score: 3, Informative) by stormwyrm on Wednesday October 12 2022, @03:55AM (1 child)
Absolute hogwash when we're talking about ordinary everyday systems that most often arise in practice, especially those that use integer and floating point data types that are provided by modern hardware. Even moving data around is not free, and these other algorithms do that way more than the basic one. Strassen's algorithm for instance is actually counterproductive for matrices that are less than around 1000 in size because of the algorithm's substantial overhead. Even for such larger matrices performance gains are marginal at around 10% improvement at best over the basic matrix multiplication algorithm. It also incurs a cost in increased memory usage (meaning in addition to the eight megabytes each of your 1000×1000 double precision floating point matrices already consumes you'll need several times more to multiply them), and for floating point implementations, decreased numerical stability. The Coppersmith–Winograd algorithm and this new one developed at DeepMind are likely even worse in this respect. There is a sizeable constant factor in the performance of these "better" algorithms that is hidden by big-O notation, that make them not worth using except in highly exceptional circumstances.
Numquam ponenda est pluralitas sine necessitate.
(Score: 1) by khallow on Thursday October 13 2022, @03:22PM
You're also assuming the existence of fast vector operations which a lot of CPUs have which greatly speed up normal matrix multiplication. If they don't, it achieves its advantage at much smaller matrix size.
(Score: 2) by FatPhil on Wednesday October 12 2022, @07:21PM (1 child)
So if lg(m) is 100, then you're talking 10^30 digit numbers. We don't have the capacity to even store such numbers, let alone manipulate them, yet.
So, nah, 100m is a loss compared to m.lg(m).
Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
(Score: 2) by bzipitidoo on Wednesday October 12 2022, @08:56PM
True enough, I went overboard, 100 is too big a constant. But 20 is likely worthwhile.