from the seen-any-good-deals-on-Xeons-lately? dept.
Google has designed its own new processors, the Argos video (trans)coding units (VCU), that have one solitary purpose: processing video. The highly efficient new chips have allowed the technology giant to replace tens of millions of Intel CPUs with its own silicon.
For many years Intel's video decoding/encoding engines that come built into its CPUs have dominated the market both because they offered leading-edge performance and capabilities and because they were easy to use. But custom-built application-specific integrated circuits (ASICs) tend to outperform general-purpose hardware because they are designed for one workload only. As such, Google turned to developing its own specialized hardware for video processing tasks for YouTube, and to great effect.
However, Intel may have a trick up its sleeve with its latest tech that could win back Google's specialized video processing business.
[...] Instead of stream processors like we see in GPUs, Google's VCU integrates ten H.264/VP9 encoder engines, several decoder cores, four LPDDR4-3200 memory channels (featuring 4x32-bit interfaces), a PCIe interface, a DMA engine, and a small general-purpose core for scheduling purposes. Most of the IP, except the in-house designed encoders/transcoders, were licensed from third parties to cut down on development costs. Each VCU is also equipped with 8GB of usable ECC LPDDR4 memory.
[...] Intel isn't standing still, though. The company's DG1 Xe-LP-based quad-chip SG1 server card can decode up to 28 4Kp60 streams as well as transcode up to 12 simultaneous streams. Essentially, Intel's SG1 does exactly what Google's Argos VCU does: scale video decoding and transcoding performance separately from the server count and thus reduce the number of general-purpose processors required in a data center used for video applications.
Google still uses Xeon servers to attach up to 20 of the Argos VCUs. It's estimated that it replaced between 4 to 33 million Xeons.
Update, 9 June 2021: Google reports this week in the journal Nature that its next generation AI chip, succeeding the TPU version 4, was designed in part using an AI that researchers described to IEEE Spectrum last year. They've made some improvements since Spectrum last spoke to them. The AI now needs fewer than six hours to generate chip floorplans that match or beat human-produced designs at power consumption, performance, and area. Expert humans typically need months of iteration to do this task.
Original blog post from 23 March 2020 follows:
There's been a lot of intense and well-funded work developing chips that are specially designed to perform AI algorithms faster and more efficiently. The trouble is that it takes years to design a chip, and the universe of machine learning algorithms moves a lot faster than that. Ideally you want a chip that's optimized to do today's AI, not the AI of two to five years ago. Google's solution: have an AI design the AI chip.
"We believe that it is AI itself that will provide the means to shorten the chip design cycle, creating a symbiotic relationship between hardware and AI, with each fueling advances in the other," they write in a paper describing the work that posted today to Arxiv.
"We have already seen that there are algorithms or neural network architectures that... don't perform as well on existing generations of accelerators, because the accelerators were designed like two years ago, and back then these neural nets didn't exist," says Azalia Mirhoseini, a senior research scientist at Google. "If we reduce the design cycle, we can bridge the gap."
1.) Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, et al. A graph placement methodology for fast chip design, Nature (DOI: 10.1038/s41586-021-03544-w)
2.) Anna Goldie, Azalia Mirhoseini. Placement Optimization with Deep Reinforcement Learning, (DOI: https://arxiv.org/abs/2003.08445)
Related: Google Reveals Homegrown "TPU" For Machine Learning
Google Pulls Back the Covers on Its First Machine Learning Chip
Hundred Petaflop Machine Learning Supercomputers Now Available on Google Cloud
Google Replaced Millions of Intel Xeons with its Own "Argos" Video Transcoding Units