NVIDIA Reveals Next-Gen Turing GPU Architecture: NVIDIA Doubles-Down on Ray Tracing, GDDR6, & More
The big change here is that NVIDIA is going to be including even more ray tracing hardware with Turing in order to offer faster and more efficient hardware ray tracing acceleration. New to the Turing architecture is what NVIDIA is calling an RT core, the underpinnings of which we aren't fully informed on at this time, but serve as dedicated ray tracing processors. These processor blocks accelerate both ray-triangle intersection checks and bounding volume hierarchy (BVH) manipulation, the latter being a very popular data structure for storing objects for ray tracing.
NVIDIA is stating that the fastest Turing parts can cast 10 Billion (Giga) rays per second, which compared to the unaccelerated Pascal is a 25x improvement in ray tracing performance.
The Turing architecture also carries over the tensor cores from Volta, and indeed these have even been enhanced over Volta. The tensor cores are an important aspect of multiple NVIDIA initiatives. Along with speeding up ray tracing itself, NVIDIA's other tool in their bag of tricks is to reduce the amount of rays required in a scene by using AI denoising to clean up an image, which is something the tensor cores excel at. Of course that's not the only feature tensor cores are for – NVIDIA's entire AI/neural networking empire is all but built on them – so while not a primary focus for the SIGGRAPH crowd, this also confirms that NVIDIA's most powerful neural networking hardware will be coming to a wider range of GPUs.
New to Turing is support for a wider range of precisions, and as such the potential for significant speedups in workloads that don't require high precisions. On top of Volta's FP16 precision mode, Turing's tensor cores also support INT8 and even INT4 precisions. These are 2x and 4x faster than FP16 respectively, and while NVIDIA's presentation doesn't dive too deep here, I would imagine they're doing something similar to the data packing they use for low-precision operations on the CUDA cores. And without going too deep ourselves here, while reducing the precision of a neural network has diminishing returns – by INT4 we're down to a total of just 16(!) values – there are certain models that really can get away with this very low level of precision. And as a result the lower precision modes, while not always useful, will undoubtedly make some users quite happy at the throughput, especially in inferencing tasks.
Also of note is the introduction of GDDR6 into some GPUs. The NVIDIA Quadro RTX 8000 will come with 24 GB of GDDR6 memory and a total memory bandwidth of 672 GB/s, which compares favorably to previous-generation GPUs featuring High Bandwidth Memory. Turing supports the recently announced VirtualLink. The video encoder block has been updated to include support for 8K H.265/HEVC encoding.
Ray-tracing combined with various (4m27s video) shortcuts (4m16s video) could be used for good-looking results in real time.
Also at Engadget, Notebookcheck, and The Verge.
See also: What is Ray Tracing and Why Do You Want it in Your GPU?
Related Stories
VR rivals come together to develop a single-cable spec for VR headsets
Future generations of virtual reality headsets for PCs could use a single USB Type-C cable for both power and data. That's thanks to a new standardized spec from the VirtualLink Consortium, a group made up of GPU vendors AMD and Nvidia and virtual reality rivals Valve, Microsoft, and Facebook-owned Oculus.
The spec uses the USB Type-C connector's "Alternate Mode" capability to implement different data protocols—such as Thunderbolt 3 data or DisplayPort and HDMI video—over the increasingly common cables, combined with Type-C's support for power delivery. The new headset spec combines four lanes of HBR3 ("high bitrate 3") DisplayPort video (for a total of 32.4 gigabits per second of video data), along with a USB 3.1 generation 2 (10 gigabit per second) data channel for sensors and on-headset cameras, along with 27W of electrical power.
That much video data is sufficient for two 3840×2160 streams at 60 frames per second, or even higher frame rates if Display Stream Compression is also used. Drop the resolution to 2560×1440, and two uncompressed 120 frame per second streams would be possible.
Framerate is too low, and it's not wireless. Lame.
VirtualLink website. Also at The Verge.
NVIDIA Announces the GeForce RTX 20 Series: RTX 2080 Ti & 2080 on Sept. 20th, RTX 2070 in October
NVIDIA's Gamescom 2018 keynote just wrapped up, and as many have been expecting since it was announced last month, NVIDIA is getting ready to launch their next generation of GeForce hardware. Announced at the event and going on sale starting September 20th is NVIDIA's GeForce RTX 20 series, which is succeeding the current Pascal-powered GeForce GTX 10 series. Based on NVIDIA's new Turing GPU architecture and built on TSMC's 12nm "FFN" process, NVIDIA has lofty goals, looking to drive an entire paradigm shift in how games are rendered and how PC video cards are evaluated. CEO Jensen Huang has called Turing NVIDIA's most important GPU architecture since 2006's Tesla GPU architecture (G80 GPU), and from a features standpoint it's clear that he's not overstating matters.
[...] So what does Turing bring to the table? The marquee feature across the board is hybrid rendering, which combines ray tracing with traditional rasterization to exploit the strengths of both technologies. This announcement is essentially a continuation of NVIDIA's RTX announcement from earlier this year, so if you thought that announcement was a little sparse, well then here is the rest of the story.
The big change here is that NVIDIA is going to be including even more ray tracing hardware with Turing in order to offer faster and more efficient hardware ray tracing acceleration. New to the Turing architecture is what NVIDIA is calling an RT core, the underpinnings of which we aren't fully informed on at this time, but serve as dedicated ray tracing processors. These processor blocks accelerate both ray-triangle intersection checks and bounding volume hierarchy (BVH) manipulation, the latter being a very popular data structure for storing objects for ray tracing.
NVIDIA is stating that the fastest GeForce RTX part can cast 10 Billion (Giga) rays per second, which compared to the unaccelerated Pascal is a 25x improvement in ray tracing performance.
Nvidia has confirmed that the machine learning capabilities (tensor cores) of the GPU will used to smooth out problems with ray-tracing. Real-time AI denoising (4m17s) will be used to reduce the amount of samples per pixel needed to achieve photorealism.
Previously: Microsoft Announces Directx 12 Raytracing API
Nvidia Announces Turing Architecture With Focus on Ray-Tracing and Lower-Precision Operations
Related: Real-time Ray-tracing at GDC 2014
Nvidia's Turing pricing strategy has been 'poorly received,' says Instinet
Instinet analyst Romit Shah commented Friday on Nvidia Corp.'s new Turing GPU, now that reviews of the product are out. "The 2080 TI is indisputably the best consumer GPU technology available, but at a prohibitive cost for many gamers," he wrote. "Ray tracing and DLSS [deep learning super sampling], while apparently compelling features, are today just 'call options' for when game developers create content that this technology can support."
Nvidia shares fall after Morgan Stanley says the performance of its new gaming card is disappointing
"As review embargos broke for the new gaming products, performance improvements in older games is not the leap we had initially hoped for," Morgan Stanley analyst Joseph Moore said in a note to clients on Thursday. "Performance boost on older games that do not incorporate advanced features is somewhat below our initial expectations, and review recommendations are mixed given higher price points." Nvidia shares closed down 2.1 percent Thursday.
Moore noted that Nvidia's new RTX 2080 card performed only 3 percent better than the previous generation's 1080Ti card at 4K resolutions.
Nvidia has announced its $2,500 Turing-based Titan RTX GPU. It is said to have a single precision performance of 16.3 teraflops and "tensor performance" of 130 teraflops. Double precision performance has been neutered down to 0.51 teraflops, down from 6.9 teraflops for last year's Volta-based Titan V.
The card includes 24 gigabytes of GDDR6 VRAM clocked at 14 Gbps, for a total memory bandwidth of 672 GB/s.
Drilling a bit deeper, there are really three legs to Titan RTX that sets it apart from NVIDIA's other cards, particularly the GeForce RTX 2080 Ti. Raw performance is certainly once of those; we're looking at about 15% better performance in shading, texturing, and compute, and around a 9% bump in memory bandwidth and pixel throughput.
However arguably the lynchpin to NVIDIA's true desired market of data scientists and other compute users is the tensor cores. Present on all NVIDIA's Turing cards and the heart and soul of NVIIDA's success in the AI/neural networking field, NVIDIA gave the GeForce cards a singular limitation that is none the less very important to the professional market. In their highest-precision FP16 mode, Turing is capable of accumulating at FP32 for greater precision; however on the GeForce cards this operation is limited to half-speed throughput. This limitation has been removed for the Titan RTX, and as a result it's capable of full-speed FP32 accumulation throughput on its tensor cores.
Given that NVIDIA's tensor cores have nearly a dozen modes, this may seem like an odd distinction to make between the GeForce and the Titan. However for data scientists it's quite important; FP32 accumulate is frequently necessary for neural network training – FP16 accumulate doesn't have enough precision – especially in the big money fields that will shell out for cards like the Titan and the Tesla. So this small change is a big part of the value proposition to data scientists, as NVIDIA does not offer a cheaper card with the chart-topping 130 TFLOPS of tensor performance that Titan RTX can hit.
Previously: More Extreme in Every Way: The New Titan Is Here – NVIDIA TITAN Xp
Nvidia Announces Titan V
Nvidia Announces Turing Architecture With Focus on Ray-Tracing and Lower-Precision Operations
Nvidia Announces RTX 2080 Ti, 2080, and 2070 GPUs, Claims 25x Increase in Ray-Tracing Performance
Nvidia's Turing GPU Pricing and Performance "Poorly Received"
Q2VKPT [is] an interesting graphics research project whose goal is to create the first entirely raytraced game with fully dynamic real-time lighting, based on the Quake II engine Q2PRO. Rasterization is used only for the 2D user interface (UI).
Q2VKPT is powered by the Vulkan API and now, with the release of the GeForce RTX graphics cards capable of accelerating ray tracing via hardware, it can get close to 60 frames per second at 1440p (2560×1440) resolution with the RTX 2080 Ti GPU according to project creator Christoph Schied.
The project consists of about 12K lines of code which completely replace the graphics code of Quake II. It's open source and can be freely downloaded via GitHub.
This is how path tracing + denoising (4m16s video) works.
Also at Phoronix.
Related: Nvidia Announces Turing Architecture With Focus on Ray-Tracing and Lower-Precision Operations
Nvidia Announces RTX 2080 Ti, 2080, and 2070 GPUs, Claims 25x Increase in Ray-Tracing Performance
AMD, Nvidia Have Launched the Least-Appealing GPU Upgrades in History
Yesterday, AMD launched the Radeon VII, the first 7nm GPU. The card is intended to compete with Nvidia's RTX family of Turing-class GPUs, and it does, broadly matching the RTX 2080. It also matches the RTX 2080 on price, at $700. Because this card began life as a professional GPU intended for scientific computing and AI/ML workloads, it's unlikely that we'll see lower-end variants. That section of AMD's product stack will be filled by 7nm Navi, which arrives later this year.
Navi will be AMD's first new 7nm GPU architecture and will offer a chance to hit 'reset' on what has been, to date, the least compelling suite of GPU launches AMD and Nvidia have ever collectively kicked out the door. Nvidia has relentlessly moved its stack pricing higher while holding performance per dollar mostly constant. With the RTX 2060 and GTX 1070 Ti fairly evenly matched across a wide suite of games, the question of whether the RTX 2060 is better priced largely hinges on whether you stick to formal launch pricing for both cards or check historical data for actual price shifts.
Such comparisons are increasingly incidental, given that Pascal GPU prices are rising and cards are getting harder to find, but they aren't meaningless for people who either bought a Pascal GPU already or are willing to consider a used card. If you're an Nvidia fan already sitting on top of a high-end Pascal card, Turing doesn't offer you a great deal of performance improvement.
AMD has not covered itself in glory, either. The Radeon VII is, at least, unreservedly faster than the Vega 64. There's no equivalent last-generation GPU in AMD's stack to match it. But it also duplicates the Vega 64's overall power and noise profile, limiting the overall appeal, and it matches the RTX 2080's bad price. A 1.75x increase in price for a 1.32x increase in 4K performance isn't a great ratio even by the standards of ultra-high-end GPUs, where performance typically comes with a price penalty.
Rumors and leaks have suggested that Nvidia will release a Turing-based GPU called the GTX 1660 Ti (which has also been referred to as "1160"), with a lower price but missing the dedicated ray-tracing cores of the RTX 2000-series. AMD is expected to release "7nm" Navi GPUs sometime during 2019.
Radeon VII launch coverage also at AnandTech, Tom's Hardware.
Related: AMD Returns to the Datacenter, Set to Launch "7nm" Radeon Instinct GPUs for Machine Learning in 2018
Nvidia Announces RTX 2080 Ti, 2080, and 2070 GPUs, Claims 25x Increase in Ray-Tracing Performance
AMD Announces "7nm" Vega GPUs for the Enterprise Market
Nvidia Announces RTX 2060 GPU
AMD Announces Radeon VII GPU, Teases Third-Generation Ryzen CPU
AMD Responds to Radeon VII Short Supply Rumors
Crytek has showcased a new real-time raytracing demo which is said to run on most mainstream, contemporary GPUs from NVIDIA and AMD. The minds behind one of the most visually impressive FPS franchise, Crysis, have their new "Noir" demo out which was run on an AMD Radeon RX Vega graphics card which shows that raytracing is possible even without an NVIDIA RTX graphics card.
[...] Crytek states that the experimental ray tracing feature based on CRYENGINE's Total Illumination used to create the demo is both API and hardware agnostic, enabling ray tracing to run on most mainstream, contemporary AMD and NVIDIA GPUs. However, the future integration of this new CRYENGINE technology will be optimized to benefit from performance enhancements delivered by the latest generation of graphics cards and supported APIs like Vulkan and DX12.
Related: Real-time Ray-tracing at GDC 2014
Microsoft Announces Directx 12 Raytracing API
Nvidia Announces Turing Architecture With Focus on Ray-Tracing and Lower-Precision Operations
Nvidia Announces RTX 2080 Ti, 2080, and 2070 GPUs, Claims 25x Increase in Ray-Tracing Performance
Q2VKPT: An Open Source Game Demo with Real-Time Path Tracing
AMD and Nvidia's Latest GPUs Are Expensive and Unappealing
Nvidia Ditches the Ray-Tracing Cores with Lower-Priced GTX 1660 Ti
NVIDIA Releases DirectX Raytracing Driver for GTX Cards; Posts Trio of DXR Demos
Last month at GDC 2019, NVIDIA revealed that they would finally be enabling public support for DirectX Raytracing on non-RTX cards. Long baked into the DXR specification itself – which is designed [to] encourage ray tracing hardware development while also allowing it to be implemented via traditional compute shaders – the addition of DXR support in cards without hardware support for it is a small but important step in the deployment of the API and its underlying technology. At the time of their announcement, NVIDIA announced that this driver would be released in April, and now this morning, NVIDIA is releasing the new driver.
As we covered in last month's initial announcement of the driver, this has been something of a long time coming for NVIDIA. The initial development of DXR and the first DXR demos (including the Star Wars Reflections demo) were all handled on cards without hardware RT acceleration; in particular NVIDIA Volta-based video cards. Microsoft used their own fallback layer for a time, but for the public release it was going to be up to GPU manufacturers to provide support, including their own fallback layer. So we have been expecting the release of this driver in some form for quite some time.
Of course, the elephant in the room in enabling DXR on cards without RT hardware is what it will do for performance – or perhaps the lack thereof.
Also at Wccftech.
See also: NVIDIA shows how much ray-tracing sucks on older GPUs
[For] stuff that really adds realism, like advanced shadows, global illumination and ambient occlusion, the RTX 2080 Ti outperforms the 1080 Ti by up to a factor of six.
To cite some specific examples, Port Royal will run on the RTX 2080 Ti at 53.3 fps at 2,560 x 1,440 with advanced reflections and shadows, along with DLSS anti-aliasing, turned on. The GTX 1080, on the other hand, will run at just 9.2 fps with those features enabled and won't give you any DLSS at all. That effectively makes the feature useless on those cards for that game. With basic reflections on Battlefield V, on the other hand, you'll see 30 fps on the 1080 Ti compared to 68.3 on the 2080 Ti.
Previously:
Microsoft Announces Directx 12 Raytracing API
Nvidia Announces Turing Architecture With Focus on Ray-Tracing and Lower-Precision Operations
Nvidia Announces RTX 2080 Ti, 2080, and 2070 GPUs, Claims 25x Increase in Ray-Tracing Performance
Q2VKPT: An Open Source Game Demo with Real-Time Path Tracing
AMD and Nvidia's Latest GPUs Are Expensive and Unappealing
Nvidia Ditches the Ray-Tracing Cores with Lower-Priced GTX 1660 Ti
Crytek Demos Real-Time Raytracing for AMD and Non-RTX Nvidia GPUs
(Score: 0) by Anonymous Coward on Thursday August 16 2018, @03:33AM (6 children)
In the old days, you'd buy a machine that could had features X, Y, and Z, along with a manual for how to make it do X, or Y, or Z. Then, you'd program it to do interesting things with those features.
Yet, today, I can't figure out how the FUCK I can tell a modern machine what to do. I mean, you can't even set a Macintosh's cpu frequency, or control its fans should you have the need.
GPUs are a whole 'nother beast; the only way to use them is to work within the awful confines of some poorly documented, proprietary API or language. How does anybody make these bloody things work for them?
(Score: 1, Informative) by Anonymous Coward on Thursday August 16 2018, @04:23AM (2 children)
Obviously you've never worked with GPU programming because it isn't as mysterious or proprietary as you think. Programming for GPU's of almost any flavor is easily accomplished through DirectX, OpenGL, and Vulkan which all have support for compute shaders. The CUDA framework is also available if you are nvidia specific. Difficulty wise its not that difficult if you are used to programming already. I can see how this might be daunting to a novice programmer however.
(Score: 0) by Anonymous Coward on Thursday August 16 2018, @04:33AM (1 child)
Get it yet?
(Score: 2) by Aiwendil on Thursday August 16 2018, @07:03AM
You mean like the x86 instruction set?
(Score: 2) by shortscreen on Thursday August 16 2018, @07:16AM (1 child)
Most x86 CPUs of recent decades have a set of MSRs to change the speed. You just need to find the docs that apply to your CPU.
Controlling fans is more complicated (I believe it involves ACPI) but possible.
On GPUs you have a point. You need the binary blob. And as for the APIs... I've only monkeyed around with OpenGL 1.x, because that's the only one I could make any sense of (well, glide looked easy but it's a bit obscure).
(Score: 0) by Anonymous Coward on Thursday August 16 2018, @08:51PM
Chances are that the OP can't just write a program to control those things without also telling Mac OS X to back off; good luck figuring out how to do that in a timely manner, or without insider knowledge.
He's right. People don't own their machines any more.
(Score: 0) by Anonymous Coward on Thursday August 16 2018, @10:03AM
Buy a game and play it?
(Score: 0) by Anonymous Coward on Thursday August 16 2018, @03:13PM
fuck you, nvidia!