Is Matrix Multiplication Ugly?
A few weeks ago I was minding my own business, peacefully reading a well-written and informative article about artificial intelligence, when I was ambushed by a passage in the article that aroused my pique. That's one of the pitfalls of knowing too much about a topic a journalist is discussing; journalists often make mistakes that most readers wouldn't notice but that raise the hackles or at least the blood pressure of those in the know.
The article in question appeared in The New Yorker. The author, Stephen Witt, was writing about the way that your typical Large Language Model, starting from a blank slate, or rather a slate full of random scribbles, is able to learn about the world, or rather the virtual world called the internet. Throughout the training process, billions of numbers called weights get repeatedly updated so as to steadily improve the model's performance. Picture a tiny chip with electrons racing around in etched channels, and slowly zoom out: there are many such chips in each server node and many such nodes in each rack, with racks organized in rows, many rows per hall, many halls per building, many buildings per campus. It's a sort of computer-age version of Borges' Library of Babel. And the weight-update process that all these countless circuits are carrying out depends heavily on an operation known as matrix multiplication.
Witt explained this clearly and accurately, right up to the point where his essay took a very odd turn.
Here's what Witt went on to say about matrix multiplication:
"'Beauty is the first test: there is no permanent place in the world for ugly mathematics,' the mathematician G. H. Hardy wrote, in 1940. But matrix multiplication, to which our civilization is now devoting so many of its marginal resources, has all the elegance of a man hammering a nail into a board. It is possessed of neither beauty nor symmetry: in fact, in matrix multiplication, a times b is not the same as b times a."
The last sentence struck me as a bizarre non sequitur, somewhat akin to saying "Number addition has neither beauty nor symmetry, because when you write two numbers backwards, their new sum isn't just their original sum written backwards; for instance, 17 plus 34 is 51, but 71 plus 43 isn't 15."
The next day I sent the following letter to the magazine:
"I appreciate Stephen Witt shining a spotlight on matrices, which deserve more attention today than ever before: they play important roles in ecology, economics, physics, and now artificial intelligence ("Information Overload", November 3). But Witt errs in bringing Hardy's famous quote ("there is no permanent place in the world for ugly mathematics") into his story. Matrix algebra is the language of symmetry and transformation, and the fact that a followed by b differs from b followed by a is no surprise; to expect the two transformations to coincide is to seek symmetry in the wrong place — like judging a dog's beauty by whether its tail resembles its head. With its two-thousand-year-old roots in China, matrix algebra has secured a permanent place in mathematics, and it passes the beauty test with flying colors. In fact, matrices are commonplace in number theory, the branch of pure mathematics Hardy loved most."
[...] I'm guessing that part of Witt's confusion arises from the fact that actually multiplying matrices of numbers to get a matrix of bigger numbers can be very tedious, and tedium is psychologically adjacent to distaste and a perception of ugliness. But the tedium of matrix multiplication is tied up with its symmetry (whose existence Witt mistakenly denies). When you multiply two n-by-n matrices A and B in the straightforward way, you have to compute n2 numbers in the same unvarying fashion, and each of those n2 numbers is the sum of n terms, and each of those n terms is the product of an element of A and an element of B in a simple way. It's only human to get bored and inattentive and then make mistakes because the process is so repetitive. We tend to think of symmetry and beauty as synonyms, but sometimes excessive symmetry breeds ennui; repetition in excess can be repellent. Picture the Library of Babel and the existential dread the image summons.
G. H. Hardy, whose famous remark Witt quotes, was in the business of proving theorems, and he favored conceptual proofs over calculational ones. If you showed him a proof of a theorem in which the linchpin of your argument was a 5-page verification that a certain matrix product had a particular value, he'd say you didn't really understand your own theorem; he'd assert that you should find a more conceptual argument and then consign your brute-force proof to the trash. But Hardy's aversion to brute force was specific to the domain of mathematical proof, which is far removed from math that calculates optimal pricing for annuities or computes the wind-shear on an airplane wing or fine-tunes the weights used by an AI. Furthermore, Hardy's objection to your proof would focus on the length of the calculation, and not on whether the calculation involved matrices. If you showed him a proof that used 5 turgid pages of pre-19th-century calculation that never mentioned matrices once, he'd still say "Your proof is a piece of temporary mathematics; it convinces the reader that your theorem is true without truly explaining why the theorem is true."
If you forced me at gunpoint to multiply two 5-by-5 matrices together, I'd be extremely unhappy, and not just because you were threatening my life; the task would be inherently unpleasant. But the same would be true if you asked me to add together a hundred random two-digit numbers. It's not that matrix-multiplication or number-addition is ugly; it's that such repetitive tasks are the diametrical opposite of the kind of conceptual thinking that Hardy loved and I love too. Any kind of mathematical content can be made stultifying when it's stripped of its meaning and reduced to mindless toil. But that casts no shade on the underlying concepts. When we outsource number-addition or matrix-multiplication to a computer, we rightfully delegate the soul-crushing part of our labor to circuitry that has no soul. If we could peer into the innards of the circuits doing all those matrix multiplications, we would indeed see a nightmarish, Borgesian landscape, with billions of nails being hammered into billions of boards, over and over again. But please don't confuse that labor with mathematics.
(Score: 4, Touché) by ledow on Wednesday November 26, @09:15AM (3 children)
Subtraction and division are non commutative in a similar way.
a - b does not equal b - a.
But apparently, we can just ignore that most basic of mathematical functions because it's inconvenient to our argument.
You know... like mathematicians always do with such things. They're reknowned for just being that damn illogical, right?
(Score: 2, Informative) by shrewdsheep on Wednesday November 26, @10:14AM (1 child)
I agree on your point wholeheartedly. The author obviously did not study any linear algebra, otherwise he would have been overwhelmed by its elegance and beauty.
To nitpick, subtraction and division or not binary operators in mathematics. - is shorthand for inverse, i.e. a unary operator: a - a = a + Inv(a) = 0, so that a - b = a + Inv(b) = Inv(b) + a = -b + a. Likewise for division.
(Score: 3, Interesting) by ledow on Sunday November 30, @01:32PM
You are of course correct, I'm just being facetious about their overwrought reaction and incredibly contrived example (and there are many examples of truly non-commutative operations, but not something that would get me any upvotes on SoylentNews, though...)
I adore the first few chapters of The OpenGL SuperBible (3rd edition or below) purely because it showed me that everything in 3D computer graphics was nothing more than a matrix multiplication - the viewpoint, the camera angles, moving the object points, projecting them to a 2D plane, forming shadows [really just 2D projection again], etc is just a bunch of matrices multiplied by the vector co-ordinates of the object in the 3D space.
As someone who studied mathematics, I just found that beautiful.
Maths is often very much about CONVERTING EVERYTHING YOU'RE DOING to the right paradigm to make the job as easy as possible so you can draw analogies in fields that are far simpler to work in, even if you have to convert back and forth at the end. Pretty much everything to do with computers pushing things through matrix multiplication is really using maths to speak a language that computers are extraordinarily good at, to extract an incredibly complex answer.
(Score: 2) by aafcac on Wednesday November 26, @06:33PM
Yes, but that's a case for the negative sign belonging to the term rather than using a minus to hide it. Sign convention is actually a pretty important thing that often times isn't mentioned. Some physics books use a positive g for gravitational acceleration and others use a negative. I don't personally like the positive g, because it then means that you get nonsense like an extra subtraction sign for things that follow a similar constant acceleration due to other forces.
I'm not a big fan of including subtraction signs unless the intent is to take things away rather than to avoid having to use a negative constant like often times happens with constant acceleration due to gravity. A good example of a subtraction making sense would be in statistics when you often times see Q being 1-P rather than 1+-P. I can't recall ever having seen the latter and sometimes, they don't even use Q, it's just 1-P as needed.