from the if-you-have-to-ask-how-much-it-costs... dept.
Google has assembled thousands of Tensor Processor Units (TPUs) into giant programmable supercomputers and made them available on Google Cloud
[...] To be precise, Google has used a "two-dimensional toroidal" mesh network to enable multiple racks of TPUs to be programmable as one colossal AI supercomputer. The company says more than 1,000 TPU chips can be connected by the network.
Google claims each TPU v3 pod can deliver more than 100 petaFLOPS of computing power, which puts them amongst the world's top five supercomputers in terms of raw mathematical operations per second. Google added the caveat, however, that the pods operate at a lower numerical precision, making them more appropriate for superfast speech recognition or image classification – workloads that do not need high levels of precision.
Update, 9 June 2021: Google reports this week in the journal Nature that its next generation AI chip, succeeding the TPU version 4, was designed in part using an AI that researchers described to IEEE Spectrum last year. They've made some improvements since Spectrum last spoke to them. The AI now needs fewer than six hours to generate chip floorplans that match or beat human-produced designs at power consumption, performance, and area. Expert humans typically need months of iteration to do this task.
Original blog post from 23 March 2020 follows:
There's been a lot of intense and well-funded work developing chips that are specially designed to perform AI algorithms faster and more efficiently. The trouble is that it takes years to design a chip, and the universe of machine learning algorithms moves a lot faster than that. Ideally you want a chip that's optimized to do today's AI, not the AI of two to five years ago. Google's solution: have an AI design the AI chip.
"We believe that it is AI itself that will provide the means to shorten the chip design cycle, creating a symbiotic relationship between hardware and AI, with each fueling advances in the other," they write in a paper describing the work that posted today to Arxiv.
"We have already seen that there are algorithms or neural network architectures that... don't perform as well on existing generations of accelerators, because the accelerators were designed like two years ago, and back then these neural nets didn't exist," says Azalia Mirhoseini, a senior research scientist at Google. "If we reduce the design cycle, we can bridge the gap."
1.) Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, et al. A graph placement methodology for fast chip design, Nature (DOI: 10.1038/s41586-021-03544-w)
2.) Anna Goldie, Azalia Mirhoseini. Placement Optimization with Deep Reinforcement Learning, (DOI: https://arxiv.org/abs/2003.08445)
Related: Google Reveals Homegrown "TPU" For Machine Learning
Google Pulls Back the Covers on Its First Machine Learning Chip
Hundred Petaflop Machine Learning Supercomputers Now Available on Google Cloud
Google Replaced Millions of Intel Xeons with its Own "Argos" Video Transcoding Units