Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Saturday April 08 2017, @11:23PM   Printer-friendly
from the if-spammers-used-'em-would-we-have-phish-and-chips? dept.

This week Google released a report detailing the design and performance characteristics of the Tensor Processing Unit (TPU), its custom ASIC for the inference phase of neural networks (NN). Google has been using the machine learning accelerator in its datacenters since 2015, but hasn't said much about the hardware until now.

In a blog post published yesterday (April 5, 2017), Norm Jouppi, distinguished hardware engineer at Google, observes, "The need for TPUs really emerged about six years ago, when we started using computationally expensive deep learning models in more and more places throughout our products. The computational expense of using these models had us worried. If we considered a scenario where people use Google voice search for just three minutes a day and we ran deep neural nets for our speech recognition system on the processing units we were using, we would have had to double the number of Google data centers!"

The paper, "In-Datacenter Performance Analysis of a Tensor Processing Unit​," (the joint effort of more than 70 authors) describes the TPU thusly:

"The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power."


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Insightful) by fishybell on Sunday April 09 2017, @12:20AM (4 children)

    by fishybell (3156) on Sunday April 09 2017, @12:20AM (#491015)

    I'm sure as more and more large companies with large datacenters start looking at these results you'll see more of them jump on the ASIC bandwagon. Given a large enough requirement for the same type of operation over and over, ASICs will always win out in the long run. We've seen it with Bitcoin, and now we're seeing it with datacenters. The fact that it's doing neural-net calculations is completely irrelevant.

    • (Score: 1, Redundant) by Ethanol-fueled on Sunday April 09 2017, @01:01AM (1 child)

      by Ethanol-fueled (2792) on Sunday April 09 2017, @01:01AM (#491023) Homepage

      In case anybody asks why ASICs aren't used more, it's because they're way expensive compared to, say, an FPGA or CPLD. It wouldn't make sense to order just 20 of them unless you're fucking Google and possess the required Jew Golds.

      • (Score: 0) by Anonymous Coward on Tuesday April 11 2017, @05:07PM

        by Anonymous Coward on Tuesday April 11 2017, @05:07PM (#492358)

        Why are you so afraid of jews to the point where you have to spout tired cartman-esque nonsense? What, they killed your family and burnt down your village or something?

    • (Score: 2) by RamiK on Sunday April 09 2017, @08:44AM

      by RamiK (1813) on Sunday April 09 2017, @08:44AM (#491121)

      The fact that it's doing neural-net calculations is completely irrelevant.

      Yup. Once everyone sees how efficient my orthodontist's distal cutters are, they'll all want their own specialized, custom-made tools. CNC always wins out in the long run.

      --
      compiling...
    • (Score: 2) by kaszz on Sunday April 09 2017, @06:49PM

      by kaszz (4211) on Sunday April 09 2017, @06:49PM (#491227) Journal

      So how much does a ASIC cost these days?

      Say 130 nm process, 2 million transistors?

  • (Score: 2) by takyon on Sunday April 09 2017, @12:32AM

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Sunday April 09 2017, @12:32AM (#491018) Journal

    Google sees Domain specific custom chips as the future with chips 200 times or more faster than Intel chips [nextbigfuture.com]

    The TPU die leverages its advantage in MACs and on-chip memory to run short programs written using the domain-specific TensorFlow framework 15 times as fast as the K80 GPU die, resulting in a performance/Watt advantage of 29 times, which is correlated with performance/total cost of ownership. Compared to the Haswell CPU die, the corresponding ratios are 29 and 83. While future CPUs and GPUs will surely run inference faster, a redesigned TPU using circa 2015 GPU memory would go two to three times as fast and boost the performance/Watt advantage of nearly 70 over the K80 and 200 over Haswell.

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
  • (Score: 2) by Snotnose on Sunday April 09 2017, @01:59AM (2 children)

    by Snotnose (1623) on Sunday April 09 2017, @01:59AM (#491039)

    When I came of age, think early 80's, chip advances were in the "how low can they go". I had several discussions at trade shows along the lines of "they're at the human hair level, can't get much smaller". "They're at 10 atoms, how small can they go?".

    Now it's "costs too much to shrink the die, hmm, lets optimize the CPU for different work loads".

    Not really seeing a problem in that, you can only shrink transistors so far until architecture takes over.

    --
    When the dust settled America realized it was saved by a porn star.
    • (Score: 2) by takyon on Sunday April 09 2017, @02:09AM

      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Sunday April 09 2017, @02:09AM (#491044) Journal

      Look at the TPU's lower power consumption, or the even lower power consumption of neuromorphic (NPU?) chip designs. These are just begging to be scaled vertically, and could lead to orders of magnitude better performance for their niche.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 2) by kaszz on Sunday April 09 2017, @05:43PM

      by kaszz (4211) on Sunday April 09 2017, @05:43PM (#491207) Journal

      Rather about which approach will be the lowest hanging fruit for the time being. If it's cheaper to just pack more transistors and increase the frequency, then that will be done. If architectural approaches means more for the bang-vs-buck factor. Then that is what will be done.

  • (Score: 0) by Anonymous Coward on Sunday April 09 2017, @02:11PM

    by Anonymous Coward on Sunday April 09 2017, @02:11PM (#491163)

    that paper is well written, but unless you are already in the field, it would be a research project to figure out what it says.

    For a description of what they built, how about a nice C implementation that runs slowly, but simply?

(1)