Stories
Slash Boxes
Comments

SoylentNews is people

posted by on Thursday February 23 2017, @05:47AM   Printer-friendly
from the is-baidu-evil-or-not? dept.

Baidu has released an implementation for an algorithm that is intended to reduce GPU bottlenecks when training neural networks:

Baidu's Silicon Valley AI Lab (SVAIL) announced an implementation of the ring allreduce algorithm for the deep learning community, which will enable significantly faster training of neural networks across GPU models. As neural networks have grown to include hundreds of millions or even over a billion parameters, the number of GPU nodes needed to do the training has also increased. However, the higher the number of nodes grows, the less efficient the system becomes in terms of how much computation is done by each node. Therefore, the need for algorithms that maximize the performance across the highly parallel system has also increased.

Using all the GPU nodes more efficiently means the neural network training can be done faster and that the company training a neural network doesn't have to spend as much on hardware that would otherwise be underutilized. Baidu has taken one algorithm, called the "ring allreduce," from the high-performance computing (HPC) world and brought it to deep learning to increase the efficiency of its GPU nodes. The ring allreduce algorithm could speed up the training of an example neural network by 31x across 40 GPUs, compared to using a single GPU.

[...] The group released its ring allreduce implementation as both a standalone C++ library as well as a patch for TensorFlow.

It was recently announced that Japan's Tokyo Institute of Technology would build TSUBAME3.0 using Nvidia Pascal P100 GPUs with the goal of creating "Japan's fastest AI supercomputer":

Projections are that it will deliver 12.2 double-precision petaflops and 64.3 half-precision (peak specs). [...] Increasingly, we're seeing Nvidia refer to half-precision floating point capability as "AI computation." Half-precision is suitable for many AI training workloads (but by no means all) and it's usually sufficient for inferencing tasks.

With this rubric in mind, Nvidia says TSUBAME3.0 is expected to deliver more than 47 petaflops of "AI horsepower" and when operated in tandem with TSUBAME2.5, the top speed increases to 64.3 petaflops, which would give it the distinction of being Japan's highest performing AI supercomputer.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Thursday February 23 2017, @07:37PM

    by Anonymous Coward on Thursday February 23 2017, @07:37PM (#470848)

    I guess it's because there is no copyright law in China...