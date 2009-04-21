from the hashed-potatoes-or-corned-beef-taste-better dept.
Rice, Intel Optimize AI Training for Commodity Hardware
Researchers claim they can get do deep learning faster and cheaper on a commodity CPU than with a GPU.
The press release actually contains links to the original papers!
“The whole industry is fixated on one kind of improvement — faster matrix multiplications,” Shrivastava said. “Everyone is looking at specialized hardware and architectures to push matrix multiplication. People are now even talking about having specialized hardware-software stacks for specific kinds of deep learning. Instead of taking an expensive algorithm and throwing the whole world of system optimization at it, I’m saying, ‘Let’s revisit the algorithm.'”
Shrivastava’s lab did that in 2019, recasting DNN training as a search problem that could be solved with hash tables. Their “sub-linear deep learning engine” (SLIDE) is specifically designed to run on commodity CPUs, and Shrivastava and collaborators from Intel showed it could outperform GPU-based training when they unveiled it at MLSys 2020.
Can we hope that this will release the GPU inventories for actual gamers?
SLIDE Algorithm for Training Deep Neural Nets Faster on CPUs than GPUs
From SLIDE algorithm for training deep neural nets faster on CPUs than GPUs:
Rice University computer scientists have overcome a major obstacle in the burgeoning artificial intelligence industry by showing it is possible to speed up deep learning technology without specialized acceleration hardware like GPUs.
Computer scientists from Rice, supported by collaborators from Intel, will present their results today at the Austin Convention Center as a part of the machine learning systems conference MLSys.
[...] SLIDE doesn’t need GPUs because it takes a fundamentally different approach to deep learning. The standard “back-propagation” training technique for deep neural networks requires matrix multiplication, an ideal workload for GPUs. With SLIDE, Shrivastava, Chen and Medini turned neural network training into a search problem that could instead be solved with hash tables.
This radically reduces the computational overhead for SLIDE compared to back-propagation training. For example, a top-of-the-line GPU platform like the ones Amazon, Google and others offer for cloud-based deep learning services has eight Tesla V100s and costs about $100,000, Shrivastava said.
We have one in the lab, and in our test case we took a workload that’s perfect for V100, one with more than 100 million parameters in large, fully connected networks that fit in GPU memory,” he said. “We trained it with the best (software) package out there, Google’s TensorFlow, and it took 3 1/2 hours to train. We then showed that our new algorithm can do the training in one hour, not on GPUs but on a 44-core Xeon-class CPU.”
(Emphasis retained from original source.)