Slash Boxes

SoylentNews is people

Submission Preview

Link to Story

Nvidia Announces Turing Architecture With Focus on Ray-Tracing and Lower-Precision Operations

Accepted submission by takyon at 2018-08-15 13:00:24

NVIDIA Reveals Next-Gen Turing GPU Architecture: NVIDIA Doubles-Down on Ray Tracing, GDDR6, & More []

The big change here is that NVIDIA is going to be including even more ray tracing hardware with Turing in order to offer faster and more efficient hardware ray tracing acceleration. New to the Turing architecture is what NVIDIA is calling an RT core, the underpinnings of which we aren't fully informed on at this time, but serve as dedicated ray tracing processors. These processor blocks accelerate both ray-triangle intersection checks and bounding volume hierarchy (BVH) manipulation, the latter being a very popular data structure for storing objects for ray tracing.

NVIDIA is stating that the fastest Turing parts can cast 10 Billion (Giga) rays per second, which compared to the unaccelerated Pascal is a 25x improvement in ray tracing performance.

The Turing architecture also carries over the tensor cores from Volta [], and indeed these have even been enhanced over Volta. The tensor cores are an important aspect of multiple NVIDIA initiatives. Along with speeding up ray tracing itself, NVIDIA's other tool in their bag of tricks is to reduce the amount of rays required in a scene by using AI denoising to clean up an image, which is something the tensor cores excel at. Of course that's not the only feature tensor cores are for – NVIDIA's entire AI/neural networking empire is all but built on them – so while not a primary focus for the SIGGRAPH crowd, this also confirms that NVIDIA's most powerful neural networking hardware will be coming to a wider range of GPUs.

New to Turing is support for a wider range of precisions, and as such the potential for significant speedups in workloads that don't require high precisions. On top of Volta's FP16 precision mode, Turing's tensor cores also support INT8 and even INT4 precisions. These are 2x and 4x faster than FP16 respectively, and while NVIDIA's presentation doesn't dive too deep here, I would imagine they're doing something similar to the data packing they use for low-precision operations on the CUDA cores. And without going too deep ourselves here, while reducing the precision of a neural network has diminishing returns – by INT4 we're down to a total of just 16(!) values – there are certain models that really can get away with this very low level of precision. And as a result the lower precision modes, while not always useful, will undoubtedly make some users quite happy at the throughput, especially in inferencing tasks.

Also of note is the introduction of GDDR6 into some GPUs. The NVIDIA Quadro RTX 8000 will come with 24 GB of GDDR6 memory and a total memory bandwidth of 672 GB/s, which compares favorably to previous-generation GPUs featuring High Bandwidth Memory. Turing supports the recently announced VirtualLink []. The video encoder block has been updated to include support for 8K [] H.265/HEVC [] encoding.

Ray-tracing combined with various [] (4m27s video) shortcuts [] (4m16s video) could be used for good-looking results in real time.

Also at Engadget [], Notebookcheck [], and The Verge [].

See also: What is Ray Tracing and Why Do You Want it in Your GPU? []

Original Submission