Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Friday June 04, @06:02PM   Printer-friendly
from the seen-any-good-deals-on-Xeons-lately? dept.

Google Replaces Millions of Intel's CPUs With Its Own Homegrown Chips

Google has designed its own new processors, the Argos video (trans)coding units (VCU), that have one solitary purpose: processing video. The highly efficient new chips have allowed the technology giant to replace tens of millions of Intel CPUs with its own silicon.

For many years Intel's video decoding/encoding engines that come built into its CPUs have dominated the market both because they offered leading-edge performance and capabilities and because they were easy to use. But custom-built application-specific integrated circuits (ASICs) tend to outperform general-purpose hardware because they are designed for one workload only. As such, Google turned to developing its own specialized hardware for video processing tasks for YouTube, and to great effect.

However, Intel may have a trick up its sleeve with its latest tech that could win back Google's specialized video processing business.

[...] Instead of stream processors like we see in GPUs, Google's VCU integrates ten H.264/VP9 encoder engines, several decoder cores, four LPDDR4-3200 memory channels (featuring 4x32-bit interfaces), a PCIe interface, a DMA engine, and a small general-purpose core for scheduling purposes. Most of the IP, except the in-house designed encoders/transcoders, were licensed from third parties to cut down on development costs. Each VCU is also equipped with 8GB of usable ECC LPDDR4 memory.

[...] Intel isn't standing still, though. The company's DG1 Xe-LP-based quad-chip SG1 server card can decode up to 28 4Kp60 streams as well as transcode up to 12 simultaneous streams. Essentially, Intel's SG1 does exactly what Google's Argos VCU does: scale video decoding and transcoding performance separately from the server count and thus reduce the number of general-purpose processors required in a data center used for video applications.

Google still uses Xeon servers to attach up to 20 of the Argos VCUs. It's estimated that it replaced between 4 to 33 million Xeons.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Interesting) by dltaylor on Friday June 04, @09:44PM (1 child)

    by dltaylor (4693) on Friday June 04, @09:44PM (#1141891)

    One of the features of the Amiga brought to the game, and that required 486/Pentium-class processors to supplant original 68000s, was the video/audio coprocessor named "Copper". The Amiga could display multiple color densities on a raster line by line basis, so video could be monochrome (lowest memory use) up to 4096 (take that, EGA), but use more memory. It had the ability to perform Boolean operations on video memory, sprites and collision detection, and memory relocation with DMA (IIRC on the last, it has been a while). Modern GPU have much more capability, but it has been 35 years.

    It also took until the Sound Blaster 16 (1992) for PCs to catch up on sound, but that's a different story.

    At one point Motorola built communication processors (DMA, and protocol packing/unpacking for many frame types) 4 at once into a chip with a 68000 core to supervise them, Put 4 of those (16 channels) on a single VME card, shut down the internal CPUs to free up some memory bandwidth and supervised them with a 68040. Streamed 16 video channels per card, 4 cards in the system, from hard drives for hotel video distribution back in '91. Did more cards per backplane in testing, but the customer spec'd 4x4.

    Higher density silicon allows many of the coprocessor function to be built into a single chip, but dedicated coprocessors taking load from the core CPU is an architecture that goes back, at least, to the Amiga.

    Starting Score:    1  point
    Moderation   +3  
       Insightful=1, Interesting=2, Total=3
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 2) by FatPhil on Saturday June 05, @08:20AM

    Sounds quite similar to the TI C80MVP - one RISC master core, 4 DSPs (56K family, IIRC), and a SDMA unit on one chip - that I was doing similar stuff on back in the early 90s. Similar application too. Happy days. When microkernels fit in 3.5K.
    --
    I know I'm God, because every time I pray to him, I find I'm talking to myself.