Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Saturday April 08 2017, @11:23PM   Printer-friendly
from the if-spammers-used-'em-would-we-have-phish-and-chips? dept.

This week Google released a report detailing the design and performance characteristics of the Tensor Processing Unit (TPU), its custom ASIC for the inference phase of neural networks (NN). Google has been using the machine learning accelerator in its datacenters since 2015, but hasn't said much about the hardware until now.

In a blog post published yesterday (April 5, 2017), Norm Jouppi, distinguished hardware engineer at Google, observes, "The need for TPUs really emerged about six years ago, when we started using computationally expensive deep learning models in more and more places throughout our products. The computational expense of using these models had us worried. If we considered a scenario where people use Google voice search for just three minutes a day and we ran deep neural nets for our speech recognition system on the processing units we were using, we would have had to double the number of Google data centers!"

The paper, "In-Datacenter Performance Analysis of a Tensor Processing Unit​," (the joint effort of more than 70 authors) describes the TPU thusly:

"The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power."


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by kaszz on Sunday April 09 2017, @05:43PM

    by kaszz (4211) on Sunday April 09 2017, @05:43PM (#491207) Journal

    Rather about which approach will be the lowest hanging fruit for the time being. If it's cheaper to just pack more transistors and increase the frequency, then that will be done. If architectural approaches means more for the bang-vs-buck factor. Then that is what will be done.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2