Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 16 submissions in the queue.
posted by martyb on Monday December 03 2018, @07:41PM   Printer-friendly
from the moah-powah dept.

Nvidia has announced its $2,500 Turing-based Titan RTX GPU. It is said to have a single precision performance of 16.3 teraflops and "tensor performance" of 130 teraflops. Double precision performance has been neutered down to 0.51 teraflops, down from 6.9 teraflops for last year's Volta-based Titan V.

The card includes 24 gigabytes of GDDR6 VRAM clocked at 14 Gbps, for a total memory bandwidth of 672 GB/s.

Drilling a bit deeper, there are really three legs to Titan RTX that sets it apart from NVIDIA's other cards, particularly the GeForce RTX 2080 Ti. Raw performance is certainly once of those; we're looking at about 15% better performance in shading, texturing, and compute, and around a 9% bump in memory bandwidth and pixel throughput.

However arguably the lynchpin to NVIDIA's true desired market of data scientists and other compute users is the tensor cores. Present on all NVIDIA's Turing cards and the heart and soul of NVIIDA's success in the AI/neural networking field, NVIDIA gave the GeForce cards a singular limitation that is none the less very important to the professional market. In their highest-precision FP16 mode, Turing is capable of accumulating at FP32 for greater precision; however on the GeForce cards this operation is limited to half-speed throughput. This limitation has been removed for the Titan RTX, and as a result it's capable of full-speed FP32 accumulation throughput on its tensor cores.

Given that NVIDIA's tensor cores have nearly a dozen modes, this may seem like an odd distinction to make between the GeForce and the Titan. However for data scientists it's quite important; FP32 accumulate is frequently necessary for neural network training – FP16 accumulate doesn't have enough precision – especially in the big money fields that will shell out for cards like the Titan and the Tesla. So this small change is a big part of the value proposition to data scientists, as NVIDIA does not offer a cheaper card with the chart-topping 130 TFLOPS of tensor performance that Titan RTX can hit.

Previously: More Extreme in Every Way: The New Titan Is Here – NVIDIA TITAN Xp
Nvidia Announces Titan V
Nvidia Announces Turing Architecture With Focus on Ray-Tracing and Lower-Precision Operations
Nvidia Announces RTX 2080 Ti, 2080, and 2070 GPUs, Claims 25x Increase in Ray-Tracing Performance
Nvidia's Turing GPU Pricing and Performance "Poorly Received"


Original Submission

Related Stories

More Extreme in Every Way: The New Titan Is Here – NVIDIA TITAN Xp 20 comments

NVIDIA issued a press release for its new card, Titan Xp:

Introduced today [April 6], the Pascal-powered TITAN Xp pushes more cores, faster clocks, faster memory and more TFLOPS than its predecessor, the 2016 Pascal-powered TITAN X.

With the new TITAN Xp we're delivering a card to users who demand the very best NVIDIA GPU, directly from NVIDIA and supported by NVIDIA.

Key stats:

  • 12GB of GDDR5X memory running at 11.4 Gbps
  • 3,840 CUDA cores running at 1.6GHz
  • 12 TFLOPs of brute force

This is extreme performance for extreme users where every drop counts.

Open to Mac Community

Speaking of users, we're also making the new TITAN Xp open to the Mac community with new Pascal drivers, coming this month. For the first time, this gives Mac users access to the immense horsepower delivered by our award-winning Pascal-powered GPUs.

TITAN Xp is available now for $1,200 direct from nvidia.com, and select system builders soon.

Don't shoot the messenger.

[More details can be found on the TITAN Xp product page where you can also place an order (Limit 2 per customer). --Ed.]


Original Submission

Nvidia Announces Titan V 1 comment

Nvidia has announced the Titan V, a $3,000 Volta-based flagship GPU capable of around 15 teraflops single-precision and 110 teraflops of "tensor performance (deep learning)". It has slightly greater performance but less VRAM than the Tesla V100, a $10,000 GPU aimed at professional users.

Would you consider it a card for "consumers"?

It seems like Nvidia announces the fastest GPU in history multiple times a year, and that's exactly what's happened again today; the Titan V is "the most powerful PC GPU ever created," in Nvidia's words. It represents a more significant leap than most products that have made that claim, however, as it's the first consumer-grade GPU based around Nvidia's new Volta architecture.

That said, a liberal definition of the word "consumer" is in order here — the Titan V sells for $2,999 and is focused around AI and scientific simulation processing. Nvidia claims 110 teraflops of performance from its 21.1 billion transistors, with 12GB of HBM2 memory, 5120 CUDA cores, and 640 "tensor cores" that are said to offer up to 9x the deep-learning performance of its predecessor.

Previously: Nvidia Releases the GeForce GTX 1080 Ti: 11.3 TFLOPS of FP32 Performance
More Extreme in Every Way: The New Titan Is Here – NVIDIA TITAN Xp


Original Submission

Nvidia Announces Turing Architecture With Focus on Ray-Tracing and Lower-Precision Operations 8 comments

NVIDIA Reveals Next-Gen Turing GPU Architecture: NVIDIA Doubles-Down on Ray Tracing, GDDR6, & More

The big change here is that NVIDIA is going to be including even more ray tracing hardware with Turing in order to offer faster and more efficient hardware ray tracing acceleration. New to the Turing architecture is what NVIDIA is calling an RT core, the underpinnings of which we aren't fully informed on at this time, but serve as dedicated ray tracing processors. These processor blocks accelerate both ray-triangle intersection checks and bounding volume hierarchy (BVH) manipulation, the latter being a very popular data structure for storing objects for ray tracing.

NVIDIA is stating that the fastest Turing parts can cast 10 Billion (Giga) rays per second, which compared to the unaccelerated Pascal is a 25x improvement in ray tracing performance.

The Turing architecture also carries over the tensor cores from Volta, and indeed these have even been enhanced over Volta. The tensor cores are an important aspect of multiple NVIDIA initiatives. Along with speeding up ray tracing itself, NVIDIA's other tool in their bag of tricks is to reduce the amount of rays required in a scene by using AI denoising to clean up an image, which is something the tensor cores excel at. Of course that's not the only feature tensor cores are for – NVIDIA's entire AI/neural networking empire is all but built on them – so while not a primary focus for the SIGGRAPH crowd, this also confirms that NVIDIA's most powerful neural networking hardware will be coming to a wider range of GPUs.

New to Turing is support for a wider range of precisions, and as such the potential for significant speedups in workloads that don't require high precisions. On top of Volta's FP16 precision mode, Turing's tensor cores also support INT8 and even INT4 precisions. These are 2x and 4x faster than FP16 respectively, and while NVIDIA's presentation doesn't dive too deep here, I would imagine they're doing something similar to the data packing they use for low-precision operations on the CUDA cores. And without going too deep ourselves here, while reducing the precision of a neural network has diminishing returns – by INT4 we're down to a total of just 16(!) values – there are certain models that really can get away with this very low level of precision. And as a result the lower precision modes, while not always useful, will undoubtedly make some users quite happy at the throughput, especially in inferencing tasks.

Also of note is the introduction of GDDR6 into some GPUs. The NVIDIA Quadro RTX 8000 will come with 24 GB of GDDR6 memory and a total memory bandwidth of 672 GB/s, which compares favorably to previous-generation GPUs featuring High Bandwidth Memory. Turing supports the recently announced VirtualLink. The video encoder block has been updated to include support for 8K H.265/HEVC encoding.

Ray-tracing combined with various (4m27s video) shortcuts (4m16s video) could be used for good-looking results in real time.

Also at Engadget, Notebookcheck, and The Verge.

See also: What is Ray Tracing and Why Do You Want it in Your GPU?


Original Submission

Nvidia Announces RTX 2080 Ti, 2080, and 2070 GPUs, Claims 25x Increase in Ray-Tracing Performance 23 comments

NVIDIA Announces the GeForce RTX 20 Series: RTX 2080 Ti & 2080 on Sept. 20th, RTX 2070 in October

NVIDIA's Gamescom 2018 keynote just wrapped up, and as many have been expecting since it was announced last month, NVIDIA is getting ready to launch their next generation of GeForce hardware. Announced at the event and going on sale starting September 20th is NVIDIA's GeForce RTX 20 series, which is succeeding the current Pascal-powered GeForce GTX 10 series. Based on NVIDIA's new Turing GPU architecture and built on TSMC's 12nm "FFN" process, NVIDIA has lofty goals, looking to drive an entire paradigm shift in how games are rendered and how PC video cards are evaluated. CEO Jensen Huang has called Turing NVIDIA's most important GPU architecture since 2006's Tesla GPU architecture (G80 GPU), and from a features standpoint it's clear that he's not overstating matters.

[...] So what does Turing bring to the table? The marquee feature across the board is hybrid rendering, which combines ray tracing with traditional rasterization to exploit the strengths of both technologies. This announcement is essentially a continuation of NVIDIA's RTX announcement from earlier this year, so if you thought that announcement was a little sparse, well then here is the rest of the story.

The big change here is that NVIDIA is going to be including even more ray tracing hardware with Turing in order to offer faster and more efficient hardware ray tracing acceleration. New to the Turing architecture is what NVIDIA is calling an RT core, the underpinnings of which we aren't fully informed on at this time, but serve as dedicated ray tracing processors. These processor blocks accelerate both ray-triangle intersection checks and bounding volume hierarchy (BVH) manipulation, the latter being a very popular data structure for storing objects for ray tracing.

NVIDIA is stating that the fastest GeForce RTX part can cast 10 Billion (Giga) rays per second, which compared to the unaccelerated Pascal is a 25x improvement in ray tracing performance.

Nvidia has confirmed that the machine learning capabilities (tensor cores) of the GPU will used to smooth out problems with ray-tracing. Real-time AI denoising (4m17s) will be used to reduce the amount of samples per pixel needed to achieve photorealism.

Previously: Microsoft Announces Directx 12 Raytracing API
Nvidia Announces Turing Architecture With Focus on Ray-Tracing and Lower-Precision Operations

Related: Real-time Ray-tracing at GDC 2014


Original Submission

Nvidia's Turing GPU Pricing and Performance "Poorly Received" 20 comments

Nvidia's Turing pricing strategy has been 'poorly received,' says Instinet

Instinet analyst Romit Shah commented Friday on Nvidia Corp.'s new Turing GPU, now that reviews of the product are out. "The 2080 TI is indisputably the best consumer GPU technology available, but at a prohibitive cost for many gamers," he wrote. "Ray tracing and DLSS [deep learning super sampling], while apparently compelling features, are today just 'call options' for when game developers create content that this technology can support."

Nvidia shares fall after Morgan Stanley says the performance of its new gaming card is disappointing

"As review embargos broke for the new gaming products, performance improvements in older games is not the leap we had initially hoped for," Morgan Stanley analyst Joseph Moore said in a note to clients on Thursday. "Performance boost on older games that do not incorporate advanced features is somewhat below our initial expectations, and review recommendations are mixed given higher price points." Nvidia shares closed down 2.1 percent Thursday.

Moore noted that Nvidia's new RTX 2080 card performed only 3 percent better than the previous generation's 1080Ti card at 4K resolutions.

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0) by Anonymous Coward on Monday December 03 2018, @07:54PM (2 children)

    by Anonymous Coward on Monday December 03 2018, @07:54PM (#769264)

    NVIDIA gimping their fp16 performance is what originally turned me against them (also the telemetry). If there was competition this card would go for half to a quarter of the price.

    And it isnt really about the hardware, it is about CUDA.

    • (Score: 2) by edIII on Monday December 03 2018, @09:34PM

      by edIII (791) on Monday December 03 2018, @09:34PM (#769295)

      The telemetry is why I use the open source drivers and not the official nvidia ones for Ubuntu.

      I wish we could create a telemetry RBL and incorporate that into our firewalls. That way, even network guests, couldn't send back telemetry.

      --
      Technically, lunchtime is at any moment. It's just a wave function.
    • (Score: 2) by Hyperturtle on Tuesday December 04 2018, @04:04PM

      by Hyperturtle (2824) on Tuesday December 04 2018, @04:04PM (#769617)

      I agree, $2500 for a card is hard to justify.

      At least the telemetry can be blocked or disabled without affecting the utility of their present line-up. Everything else is what you are paying for, and it's not worth that unless the performance desired requires numerous other cards (like previous generation titans or Tis) with a topology that costs too much to justify.

      Sometimes, it is cheaper to buy the bloated card, and I think they understand their market enough that anyone willing to pay for it has done the math or isn't paying for it themselves. Or they don't care about prices, because shiny.

  • (Score: 0) by Anonymous Coward on Monday December 03 2018, @07:55PM (2 children)

    by Anonymous Coward on Monday December 03 2018, @07:55PM (#769265)
    • (Score: 3, Informative) by takyon on Tuesday December 04 2018, @01:41AM (1 child)

      by takyon (881) <{takyon} {at} {soylentnews.org}> on Tuesday December 04 2018, @01:41AM (#769380) Journal

      130 TFLOPS is lower precision "tensor performance", not the number given by LINPACK. It's useful to machine learning users, but misleading.

      However if a technology like this [soylentnews.org] pans out, we could see 1 petaflops smartphone SoCs or maybe 1 exaflops desktop PCs.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 2) by opinionated_science on Tuesday December 04 2018, @02:22PM

        by opinionated_science (4031) on Tuesday December 04 2018, @02:22PM (#769547)

        indeed a more appropriate label would be "Titan RTX now 510 GFlops Double precision!!!!".

        If you want a real comparison, the ASCI red machine (1997) had a peak of 1.3 Tflops (LINPACK), remained number 1 for 7 years (2 upgrades).

        That was an x86 chip and they used 9152 of them.

        two decades supercomputer to desktop.

        Let that sink in as you have your coffee....

  • (Score: 2) by MichaelDavidCrawford on Monday December 03 2018, @08:29PM (4 children)

    by MichaelDavidCrawford (2339) Subscriber Badge <mdcrawford@gmail.com> on Monday December 03 2018, @08:29PM (#769275) Homepage Journal

    ... is how long it would pay for itself by mining ASIC-Resistant Crypto.

    Such resistance is typically built in to the mining algorithm by requiring more memory than is feasible to include on an ASIC chip.

    However, some intentionally ASIC-Resistant Cryptos turn out to not really be as resistant as their designers envisioned.

    Carbonizingly,

    MDC

    --
    Yes I Have No Bananas. [gofundme.com]
    • (Score: 3, Interesting) by RamiK on Monday December 03 2018, @09:57PM (3 children)

      by RamiK (1813) on Monday December 03 2018, @09:57PM (#769305)

      So there's this cattery owner. She loves Persians but has no huge love for the trade shows so she's not competing anymore. Keeps a male and a few females and sells the odd litter to folks in the business or even just her vet. She's registered as an NPO and operates at a loss. But she doesn't care. Profits were never the point. She just loves those cats.

      Now, why am I tell you this? Well, here's the thing: Some* miners are gamers. They're not in it for the profits. All they want is to game with the latest and greatest card and when a new one comes out, flip it off on ebay. For them, mining just cuts some costs for their hobby. If it wasn't viable, they wouldn't be able to afford fancy cards so they'd just game on lesser ones.

      So, be careful assuming people only care about "how long it would pay for itself by mining ASIC-Resistant Crypto". There some irrational consumers out there that are normally willing to buy a $600 card just for gaming and are looking at those $2500 cards thinking "how long it would return $1900". And suffice to say, with enough of those customers around the valuation of the coins can get pretty screwy...

      * https://cryptomenow.com/coinshares-released-a-19-page-report-on-bitcoin-mining-here-are-the-highlights/ [cryptomenow.com]

      --
      compiling...
      • (Score: 0) by Anonymous Coward on Monday December 03 2018, @10:36PM

        by Anonymous Coward on Monday December 03 2018, @10:36PM (#769321)

        >For them, mining just cuts some costs for their hobby. If it wasn't viable, they wouldn't be able to afford fancy cards so they'd just game on lesser ones.

        This can be risky. NiceHash fried something on my 1080, causing green artifacts. Between the NiceHash heist and the card it killed, I took a loss. I'm just glad it didn't kill my 1080ti.

      • (Score: 2) by MichaelDavidCrawford on Tuesday December 04 2018, @04:25AM (1 child)

        by MichaelDavidCrawford (2339) Subscriber Badge <mdcrawford@gmail.com> on Tuesday December 04 2018, @04:25AM (#769436) Homepage Journal

        The problem we've got isn't so much that prices have plummeted.

        The problem we've got is that it now costs me significantly more to pay for the electricity to mine than mining will pay in coins.

        That's a _widespread_ phenomenon; network hashrates have become smoking radioactive craters for most if not all cryptos.

        That will make the price crash even worse because what the miners are paid for is confirming transactions. Without enough network hashrate, rather than twenty minutes to confirm a transaction, I expect by now it takes a whole day for some coins. As the price continues to drop, more and more miners stop mining, the confirmations take longer and longer, leading fewer and fewer to be willing to buy because doing so takes so very long.

        When I get some money in Real Soon Now, I fully intend to buy some cryptos - but also fully expect to wait days for those transactions to clear.

        The only way for this problem to be solved is for a whole bunch of miners to hurl themselves on hand-grenades for the good of the community. In my specific case, I only have one rig. I don't expect anyone to run _all_ their rigs but I _do_ expect lots of fellow miners to run at least a few rigs apiece.

        Miners tend to be intimately familiar with why we mine; I haven't raised this issue in a public way before now, but you can be certain that this weekend or so I'll be blasting this concern throughout every corner of The Series Of Tubes.

        --
        Yes I Have No Bananas. [gofundme.com]
        • (Score: 0) by Anonymous Coward on Tuesday December 04 2018, @05:24PM

          by Anonymous Coward on Tuesday December 04 2018, @05:24PM (#769664)

          "but also fully expect to wait days for those transactions to clear."

          get off the cheese, man.

  • (Score: 0) by Anonymous Coward on Monday December 03 2018, @10:50PM (1 child)

    by Anonymous Coward on Monday December 03 2018, @10:50PM (#769326)

    I still am running an ET4000, how much faster would this be?

  • (Score: 2) by RamiK on Tuesday December 04 2018, @08:09PM

    by RamiK (1813) on Tuesday December 04 2018, @08:09PM (#769728)

    https://www.gamersnexus.net/guides/3394-rtx-2080-ti-artifacting-failure-analysis-crashing-black-screens [gamersnexus.net]

    Overheating, core dumps and, rarely, boards cracking due to mechanical stress from the fans and the heat.

    On the a more positive note, along with ebay's entries for 1070s and 1080s and their increased presence in steam's hardware survey [steampowered.com], they're probably selling better now.

    --
    compiling...
(1)