Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Friday May 19 2017, @12:34AM   Printer-friendly
from the Are-you-thinking-what-I'm-thinking? dept.

Google's machine learning oriented chips have gotten an upgrade:

At Google I/O 2017, Google announced its next-generation machine learning chip, called the "Cloud TPU." The new TPU no longer does only inference--now it can also train neural networks.

[...] In last month's paper, Google hinted that a next-generation TPU could be significantly faster if certain modifications were made. The Cloud TPU seems to have have received some of those improvements. It's now much faster, and it can also do floating-point computation, which means it's suitable for training neural networks, too.

According to Google, the chip can achieve 180 teraflops of floating-point performance, which is six times more than Nvidia's latest Tesla V100 accelerator for FP16 half-precision computation. Even when compared against Nvidia's "Tensor Core" performance, the Cloud TPU is still 50% faster.

[...] Google will also donate access to 1,000 Cloud TPUs to top researchers under the TensorFlow Research Cloud program to see what people do with them.

Also at EETimes and Google.

Previously: Google Reveals Homegrown "TPU" For Machine Learning
Google Pulls Back the Covers on Its First Machine Learning Chip
Nvidia Compares Google's TPUs to the Tesla P40
NVIDIA's Volta Architecture Unveiled: GV100 and Tesla V100


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by takyon on Friday May 19 2017, @02:09AM (3 children)

    by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Friday May 19 2017, @02:09AM (#511942) Journal

    The world's fastest supercomputer, Sunway TaihuLight, has 40,960 "Chinese-designed SW26010 manycore 64-bit RISC processors based on the Sunway architecture". Speed is 105 petaflops, 125 petaflops peak (LINPACK, so take it with some salt).

    I believe the "Cloud TPU" is 4 smaller TPUs in one unit (not sure). So tensor performance per individual TPU is 45 (tensor) "teraflops". So you get these numbers [nextbigfuture.com]:

    • Google will make 1,000 Cloud TPUs (44 petaFLops) available at no cost to ML researchers via the TensorFlow Research Cloud.
    • 24 second generation TPUs would deliver over 1 petaFlops
    • 256 second generation TPUs in a cluster can deliver 11.5 petaFlops

    It seems to scale well. Anyway, to reach 125 petaflops you would need 2,778 of them, and to get to 1 exaflops, 22,222. It would probably cost well under a billion dollars for Google to build the machine learning equivalent of an exaflop.

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by kaszz on Friday May 19 2017, @04:32AM (2 children)

    by kaszz (4211) on Friday May 19 2017, @04:32AM (#512013) Journal

    36.8e15 FLOPS is the estimated computational power required to simulate a human brain in real time..

    At a price of 30 million dollars?

    • (Score: 2) by HiThere on Friday May 19 2017, @05:10PM (1 child)

      by HiThere (866) Subscriber Badge on Friday May 19 2017, @05:10PM (#512258) Journal

      It depends on which estimate you use. We don't even have nearly an order of magnitude of that number. Particularly if you allow the exclusion of parts of the brain that are dedicated to, e.g., handling blood chemistry. And particularly if you include speculation that some quantum effects happen in thought.

      In fact, the entire basis of thought isn't really understood, so flops might be a poor way to simulate it. Perhaps integer arithmetic is better. Or fixed point. That flops are important is due to the selected algorithm, and I'm really dubious about it. That said, this doesn't imply that the current "deep learning" approach won't work. It's just that you can't assume that its computational requirements will be equivalent. They could also be either much higher or much lower.

      --
      Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
      • (Score: 2) by kaszz on Friday May 19 2017, @05:27PM

        by kaszz (4211) on Friday May 19 2017, @05:27PM (#512270) Journal

        Well now that the capacity becomes available. Maybe it will enable research to find out?