Slash Boxes

SoylentNews is people

posted by martyb on Saturday April 08 2017, @11:23PM   Printer-friendly
from the if-spammers-used-'em-would-we-have-phish-and-chips? dept.

This week Google released a report detailing the design and performance characteristics of the Tensor Processing Unit (TPU), its custom ASIC for the inference phase of neural networks (NN). Google has been using the machine learning accelerator in its datacenters since 2015, but hasn't said much about the hardware until now.

In a blog post published yesterday (April 5, 2017), Norm Jouppi, distinguished hardware engineer at Google, observes, "The need for TPUs really emerged about six years ago, when we started using computationally expensive deep learning models in more and more places throughout our products. The computational expense of using these models had us worried. If we considered a scenario where people use Google voice search for just three minutes a day and we ran deep neural nets for our speech recognition system on the processing units we were using, we would have had to double the number of Google data centers!"

The paper, "In-Datacenter Performance Analysis of a Tensor Processing Unit​," (the joint effort of more than 70 authors) describes the TPU thusly:

"The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power."

Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by Snotnose on Sunday April 09 2017, @01:59AM (2 children)

    by Snotnose (1623) on Sunday April 09 2017, @01:59AM (#491039)

    When I came of age, think early 80's, chip advances were in the "how low can they go". I had several discussions at trade shows along the lines of "they're at the human hair level, can't get much smaller". "They're at 10 atoms, how small can they go?".

    Now it's "costs too much to shrink the die, hmm, lets optimize the CPU for different work loads".

    Not really seeing a problem in that, you can only shrink transistors so far until architecture takes over.

    Forget the past, ya can't change it. Forget the future, ya can't predict it. Forget the present, I didn't get you one
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by takyon on Sunday April 09 2017, @02:09AM

    by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Sunday April 09 2017, @02:09AM (#491044) Journal

    Look at the TPU's lower power consumption, or the even lower power consumption of neuromorphic (NPU?) chip designs. These are just begging to be scaled vertically, and could lead to orders of magnitude better performance for their niche.

    [SIG] 10/28/2017: Soylent Upgrade v14 []
  • (Score: 2) by kaszz on Sunday April 09 2017, @05:43PM

    by kaszz (4211) on Sunday April 09 2017, @05:43PM (#491207) Journal

    Rather about which approach will be the lowest hanging fruit for the time being. If it's cheaper to just pack more transistors and increase the frequency, then that will be done. If architectural approaches means more for the bang-vs-buck factor. Then that is what will be done.