from the that's-a-lot-of-horsepower dept.
During a keynote at GTC 2016, Nvidia announced the Tesla P100, a 16nm FinFET Pascal graphics processing unit with 15.3 billion transistors intended for high performance and cloud computing customers. The GPU includes 16 GB of High Bandwidth Memory 2.0 with 720 GB/s of memory bandwidth and a unified memory architecture. It also uses the proprietary NVLink, an interconnect with 160 GB/s of bandwidth, rather than the slower PCI-Express.
Nvidia claims the Tesla P100 will reach 5.3 teraflops of FP64 (double precision) performance, along with 10.6 teraflops of FP32 and 21.2 teraflops of FP16. 3584 of a maximum possible 3840 stream processors are enabled on this version of the GP100 die.
At the keynote, Nvidia also announced a 170 teraflops (FP16) "deep learning supercomputer" or "datacenter in a box" called DGX-1. It contains eight Tesla P100s and will cost $129,000. The first units will be going to research institutions such as the Massachusetts General Hospital.
AMD was the first and only company to introduce products using HBM1. AMD's Radeon R9 Fury X GPUs featured 4 gigabytes of HBM1 using four 1 GB packages. Both AMD and Nvidia will introduce GPUs equipped with HBM2 memory this year. Samsung's first HBM2 packages will contain 4 GB of memory each, and the press release states that Samsung intends to manufacture 8 GB HBM2 packages within the year.
GPUs could include 8 GB of HBM2 using half of the die space used by AMD's Fury X, or just one-quarter of the die space if 8 GB HBM2 packages are used next year. Correction: HBM2 packages may be slightly physically larger than HBM1 packages. For example, SK Hynix will produce a 7.75 mm × 11.87 mm (91.99 mm2) HBM2 package, compared to 5.48 mm × 7.29 mm (39.94 mm2) HBM1 packages.
The 4GB HBM2 package is created by stacking a buffer die at the bottom and four 8-gigabit (Gb) core dies on top. These are then vertically interconnected by TSV holes and microbumps. A single 8Gb HBM2 die contains over 5,000 TSV holes, which is more than 36 times that of a 8Gb TSV DDR4 die, offering a dramatic improvement in data transmission performance compared to typical wire-bonding based packages.
Samsung's new DRAM package features 256GBps of bandwidth, which is double that of a HBM1 DRAM package. This is equivalent to a more than seven-fold increase over the 36GBps bandwidth of a 4Gb GDDR5 DRAM chip, which has the fastest data speed per pin (9Gbps) among currently manufactured DRAM chips. Samsung's 4GB HBM2 also enables enhanced power efficiency by doubling the bandwidth per watt over a 4Gb-GDDR5-based solution, and embeds ECC (error-correcting code) functionality to offer high reliability.
TSV refers to through-silicon via, a vertical electrical connection used to build 3D chip packages such as High Bandwidth Memory.
Update: HBM2 has been formalized in JEDEC's JESD235A standard, and Anandtech has an article with additional technical details.
Nvidia revealed key details about its upcoming "Pascal" consumer GPUs at a May 6th event. These GPUs are built using a 16nm FinFET process from TSMC rather than the 28nm processes that were used for several previous generations of both Nvidia and AMD GPUs.
The GeForce GTX 1080 will outperform the GTX 980, GTX 980 Ti, and Titan X cards. Nvidia claims that GTX 1080 can reach 9 teraflops of single precision performance, while the GTX 1070 will reach 6.5 teraflops. A single GTX 1080 will be faster than two GTX 980s in SLI.
Both the GTX 1080 and 1070 will feature 8 GB of VRAM. Unfortunately, neither card contains High Bandwidth Memory 2.0 like the Tesla P100 does. Instead, the GTX 1080 has GDDR5X memory while the 1070 is sticking with GDDR5.
The GTX 1080 starts at $599 and is available on May 27th. The GTX 1070 starts at $379 on June 10th. Your move, AMD.
NVIDIA is releasing the GeForce GTX 1080 Ti, a $699 GPU with performance and specifications similar to that of the NVIDIA Titan X:
Unveiled last week at GDC and launching [March 10th] is the GeForce GTX 1080 Ti. Based on NVIDIA's GP102 GPU – aka Bigger Pascal – the job of GTX 1080 Ti is to serve as a mid-cycle refresh of the GeForce 10 series. Like the GTX 980 Ti and GTX 780 Ti before it, that means taking advantage of improved manufacturing yields and reduced costs to push out a bigger, more powerful GPU to drive this year's flagship video card. And, for NVIDIA and their well-executed dominance of the high-end video card market, it's a chance to run up the score even more. With the GTX 1080 Ti, NVIDIA is aiming for what they're calling their greatest performance jump yet for a modern Ti product – around 35% on average. This would translate into a sizable upgrade for GeForce GTX 980 Ti owners and others for whom GTX 1080 wasn't the card they were looking for.
[...] Going by the numbers then, the GTX 1080 Ti offers just over 11.3 TFLOPS of FP32 performance. This puts the expected shader/texture performance of the card 28% ahead of the current GTX 1080, while the ROP throughput advantage stands 26%, and memory bandwidth at a much greater 51.2%. Real-world performance will of course be influenced by a blend of these factors, with memory bandwidth being the real wildcard. Otherwise, relative to the NVIDIA Titan X, the two cards should end up quite close, trading blows now and then.
Speaking of the Titan, on an interesting side note, NVIDIA isn't going to be doing anything to hurt the compute performance of the GTX 1080 Ti to differentiate the card from the Titan, which has proven popular with GPU compute customers. Crucially, this means that the GTX 1080 Ti gets the same 4:1 INT8 performance ratio of the Titan, which is critical to the cards' high neural networking inference performance. As a result the GTX 1080 Ti actually has slightly greater compute performance (on paper) than the Titan. And NVIDIA has been surprisingly candid in admitting that unless compute customers need the last 1GB of VRAM offered by the Titan, they're likely going to buy the GTX 1080 Ti instead.
The card includes 11 GB of Micron's second-generation GDDR5X memory operating at 11 Gbps compared to 12 GB of GDDR5X at 10 Gbps for the Titan X.