from the Deep-learning?-Pfft!-I'm-waiting-for-Deep-Thought! dept.
Nvidia has announced the Titan V, a $3,000 Volta-based flagship GPU capable of around 15 teraflops single-precision and 110 teraflops of "tensor performance (deep learning)". It has slightly greater performance but less VRAM than the Tesla V100, a $10,000 GPU aimed at professional users.
Would you consider it a card for "consumers"?
It seems like Nvidia announces the fastest GPU in history multiple times a year, and that's exactly what's happened again today; the Titan V is "the most powerful PC GPU ever created," in Nvidia's words. It represents a more significant leap than most products that have made that claim, however, as it's the first consumer-grade GPU based around Nvidia's new Volta architecture.
That said, a liberal definition of the word "consumer" is in order here — the Titan V sells for $2,999 and is focused around AI and scientific simulation processing. Nvidia claims 110 teraflops of performance from its 21.1 billion transistors, with 12GB of HBM2 memory, 5120 CUDA cores, and 640 "tensor cores" that are said to offer up to 9x the deep-learning performance of its predecessor.
NVIDIA is releasing the GeForce GTX 1080 Ti, a $699 GPU with performance and specifications similar to that of the NVIDIA Titan X:
Unveiled last week at GDC and launching [March 10th] is the GeForce GTX 1080 Ti. Based on NVIDIA's GP102 GPU – aka Bigger Pascal – the job of GTX 1080 Ti is to serve as a mid-cycle refresh of the GeForce 10 series. Like the GTX 980 Ti and GTX 780 Ti before it, that means taking advantage of improved manufacturing yields and reduced costs to push out a bigger, more powerful GPU to drive this year's flagship video card. And, for NVIDIA and their well-executed dominance of the high-end video card market, it's a chance to run up the score even more. With the GTX 1080 Ti, NVIDIA is aiming for what they're calling their greatest performance jump yet for a modern Ti product – around 35% on average. This would translate into a sizable upgrade for GeForce GTX 980 Ti owners and others for whom GTX 1080 wasn't the card they were looking for.
[...] Going by the numbers then, the GTX 1080 Ti offers just over 11.3 TFLOPS of FP32 performance. This puts the expected shader/texture performance of the card 28% ahead of the current GTX 1080, while the ROP throughput advantage stands 26%, and memory bandwidth at a much greater 51.2%. Real-world performance will of course be influenced by a blend of these factors, with memory bandwidth being the real wildcard. Otherwise, relative to the NVIDIA Titan X, the two cards should end up quite close, trading blows now and then.
Speaking of the Titan, on an interesting side note, NVIDIA isn't going to be doing anything to hurt the compute performance of the GTX 1080 Ti to differentiate the card from the Titan, which has proven popular with GPU compute customers. Crucially, this means that the GTX 1080 Ti gets the same 4:1 INT8 performance ratio of the Titan, which is critical to the cards' high neural networking inference performance. As a result the GTX 1080 Ti actually has slightly greater compute performance (on paper) than the Titan. And NVIDIA has been surprisingly candid in admitting that unless compute customers need the last 1GB of VRAM offered by the Titan, they're likely going to buy the GTX 1080 Ti instead.
The card includes 11 GB of Micron's second-generation GDDR5X memory operating at 11 Gbps compared to 12 GB of GDDR5X at 10 Gbps for the Titan X.
NVIDIA issued a press release for its new card, Titan Xp:
Introduced today [April 6], the Pascal-powered TITAN Xp pushes more cores, faster clocks, faster memory and more TFLOPS than its predecessor, the 2016 Pascal-powered TITAN X.
With the new TITAN Xp we're delivering a card to users who demand the very best NVIDIA GPU, directly from NVIDIA and supported by NVIDIA.
- 12GB of GDDR5X memory running at 11.4 Gbps
- 3,840 CUDA cores running at 1.6GHz
- 12 TFLOPs of brute force
This is extreme performance for extreme users where every drop counts.
Open to Mac Community
Speaking of users, we're also making the new TITAN Xp open to the Mac community with new Pascal drivers, coming this month. For the first time, this gives Mac users access to the immense horsepower delivered by our award-winning Pascal-powered GPUs.
TITAN Xp is available now for $1,200 direct from nvidia.com, and select system builders soon.
Don't shoot the messenger.
[More details can be found on the TITAN Xp product page where you can also place an order (Limit 2 per customer). --Ed.]
Nvidia has announced its $2,500 Turing-based Titan RTX GPU. It is said to have a single precision performance of 16.3 teraflops and "tensor performance" of 130 teraflops. Double precision performance has been neutered down to 0.51 teraflops, down from 6.9 teraflops for last year's Volta-based Titan V.
The card includes 24 gigabytes of GDDR6 VRAM clocked at 14 Gbps, for a total memory bandwidth of 672 GB/s.
Drilling a bit deeper, there are really three legs to Titan RTX that sets it apart from NVIDIA's other cards, particularly the GeForce RTX 2080 Ti. Raw performance is certainly once of those; we're looking at about 15% better performance in shading, texturing, and compute, and around a 9% bump in memory bandwidth and pixel throughput.
However arguably the lynchpin to NVIDIA's true desired market of data scientists and other compute users is the tensor cores. Present on all NVIDIA's Turing cards and the heart and soul of NVIIDA's success in the AI/neural networking field, NVIDIA gave the GeForce cards a singular limitation that is none the less very important to the professional market. In their highest-precision FP16 mode, Turing is capable of accumulating at FP32 for greater precision; however on the GeForce cards this operation is limited to half-speed throughput. This limitation has been removed for the Titan RTX, and as a result it's capable of full-speed FP32 accumulation throughput on its tensor cores.
Given that NVIDIA's tensor cores have nearly a dozen modes, this may seem like an odd distinction to make between the GeForce and the Titan. However for data scientists it's quite important; FP32 accumulate is frequently necessary for neural network training – FP16 accumulate doesn't have enough precision – especially in the big money fields that will shell out for cards like the Titan and the Tesla. So this small change is a big part of the value proposition to data scientists, as NVIDIA does not offer a cheaper card with the chart-topping 130 TFLOPS of tensor performance that Titan RTX can hit.
Previously: More Extreme in Every Way: The New Titan Is Here – NVIDIA TITAN Xp
Nvidia Announces Titan V
Nvidia Announces Turing Architecture With Focus on Ray-Tracing and Lower-Precision Operations
Nvidia Announces RTX 2080 Ti, 2080, and 2070 GPUs, Claims 25x Increase in Ray-Tracing Performance
Nvidia's Turing GPU Pricing and Performance "Poorly Received"