Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Tuesday April 11 2017, @04:24PM   Printer-friendly

Following Google's release of a paper detailing how its tensor processing units (TPUs) beat 2015 CPUs and GPUs at machine learning inference tasks, Nvidia has countered with results from its Tesla P40:

Google's TPU went online in 2015, which is why the company compared its performance against other chips that it was using at that time in its data centers, such as the Nvidia Tesla K80 GPU and the Intel Haswell CPU.

Google is only now releasing the results, possibly because it doesn't want other machine learning competitors (think Microsoft, rather than Nvidia or Intel) to learn about the secrets that make its AI so advanced, at least until it's too late to matter. Releasing the TPU results now could very well mean Google is already testing or even deploying its next-generation TPU.

Nevertheless, Nvidia took the opportunity to show that its latest inference GPUs, such as the Tesla P40, have evolved significantly since then, too. Some of the increase in inference performance seen by Nvidia GPUs is due to the company jumping from the previous 28nm process node to the 16nm FinFET node. This jump offered its chips about twice as much performance per Watt.

Nvidia also further improved its GPU architecture for deep learning in Maxwell, and then again in Pascal. Yet another reason for why the new GPU is so much faster for inferencing is that Nvidia's deep learning and inference-optimized software has improved significantly as well.

Finally, perhaps the main reason for why the Tesla P40 can be up to 26x faster than the old Tesla K80, according to Nvidia, is because the Tesla P40 supports INT8 computation, as opposed to the FP32-only support for the K80. Inference doesn't need too high accuracy when doing calculations and 8-bit integers seem to be enough for most types of neural networks.

Google's TPUs use less power, have an unknown cost (the P40 can cost $5,700), and may have advanced considerably since 2015.

Previously: Google Reveals Homegrown "TPU" For Machine Learning


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0) by Anonymous Coward on Tuesday April 11 2017, @08:22PM

    by Anonymous Coward on Tuesday April 11 2017, @08:22PM (#492436)

    Article is a bit loose with the terms precision and accuracy. 8 bit versus 32 bit is purely precision. Accuracy is something else.

    E.g. 10 +/- 1000000 may be an accurate estimate. 10.000 +/- 0.0001 may be an inaccurate estimate.

    The precision is different. It says nothing about the accuracy of 10.

  • (Score: 0) by Anonymous Coward on Tuesday April 11 2017, @08:39PM

    by Anonymous Coward on Tuesday April 11 2017, @08:39PM (#492447)

    Anyone else read the headline and think it was partly about Tesla cars? The current top Model S is P100D, so the names are very close.

  • (Score: 2) by Techwolf on Tuesday April 11 2017, @09:23PM (1 child)

    by Techwolf (87) on Tuesday April 11 2017, @09:23PM (#492468)

    I always wonder about this for games. Why do they use floating point instead of pure integers that are much faster to handle?

    • (Score: 2) by ledow on Wednesday April 12 2017, @11:53AM

      by ledow (5567) on Wednesday April 12 2017, @11:53AM (#492696) Homepage

      Same reason the 286/386 used to have a floating-point counterpart chip.

      Refactoring your code to perform the same calculations to the same level of accuracy as just using a basic float/double is always possible but handling that sucks up more in lost CPU time than it would to just floating point in the first place.

      Although you CAN do everything in integers, you're basically using them to emulate floating-point (actually, more likely fixed-point) and that means more work to interpret, handle error-amounts, etc. than it would do to just use dedicated floating point instructions.

      And what precisely do you think the hardware-accelerated FPU's of old did that make them viable to tuck into all modern processors and GPUs? Floating point. By making the underlying binary bits do what you need to to make binary bits represent a floating point number.

      Good luck making a HD 3D game using fixed-point / integer arithmetic only and getting the fine smoothness from it in anywhere near the same CPU/GPU time.

      It's like saying "Hey, just do all your maths on your fingers, because everyone can count to 10 quickly!" Sure. You can. And you can break down any maths problem in the world to something you can do via counting on your fingers in various elaborate ways. But it is never going to be quicker even if a single base operation is quicker to perform.

(1)