from the x-makes-it-better dept.
The new technology is designed to improve bandwidth available to high-performance graphics processing units without fundamentally changing the memory architecture of graphics cards or memory technology itself, similar to other generations of GDDR, although these new specifications are arguably pushing the phyiscal[sic] limits of the technology and hardware in its current form. The GDDR5X SGRAM (synchronous graphics random access memory) standard is based on the GDDR5 technology introduced in 2007 and first used in 2008. The GDDR5X standard brings three key improvements to the well-established GDDR5: it increases data-rates by up to a factor of two, it improves energy efficiency of high-end memory, and it defines new capacities of memory chips to enable denser memory configurations of add-in graphics boards or other devices. What is very important for developers of chips and makers of graphics cards is that the GDDR5X should not require drastic changes to designs of graphics cards, and the general feature-set of GDDR5 remains unchanged (and hence why it is not being called GDDR6).
[...] The key improvement of the GDDR5X standard compared to the predecessor is its all-new 16n prefetch architecture, which enables up to 512 bit (64 Bytes) per array read or write access. By contrast, the GDDR5 technology features 8n prefetch architecture and can read or write up to 256 bit (32 Bytes) of data per cycle. Doubled prefetch and increased data transfer rates are expected to double effective memory bandwidth of GDDR5X sub-systems. However, actual performance of graphics cards will depend not just on DRAM architecture and frequencies, but also on memory controllers and applications. Therefore, we will need to test actual hardware to find out actual real-world benefits of the new memory.
What purpose does GDDR5X serve if superior 1st and 2nd generation High Bandwidth Memory (HBM) are around? GDDR5X memory will be cheaper than HBM and its use is more of an evolutionary than revolutionary change from existing GDDR5-based hardware.
AMD was the first and only company to introduce products using HBM1. AMD's Radeon R9 Fury X GPUs featured 4 gigabytes of HBM1 using four 1 GB packages. Both AMD and Nvidia will introduce GPUs equipped with HBM2 memory this year. Samsung's first HBM2 packages will contain 4 GB of memory each, and the press release states that Samsung intends to manufacture 8 GB HBM2 packages within the year.
GPUs could include 8 GB of HBM2 using half of the die space used by AMD's Fury X, or just one-quarter of the die space if 8 GB HBM2 packages are used next year. Correction: HBM2 packages may be slightly physically larger than HBM1 packages. For example, SK Hynix will produce a 7.75 mm × 11.87 mm (91.99 mm2) HBM2 package, compared to 5.48 mm × 7.29 mm (39.94 mm2) HBM1 packages.
The 4GB HBM2 package is created by stacking a buffer die at the bottom and four 8-gigabit (Gb) core dies on top. These are then vertically interconnected by TSV holes and microbumps. A single 8Gb HBM2 die contains over 5,000 TSV holes, which is more than 36 times that of a 8Gb TSV DDR4 die, offering a dramatic improvement in data transmission performance compared to typical wire-bonding based packages.
Samsung's new DRAM package features 256GBps of bandwidth, which is double that of a HBM1 DRAM package. This is equivalent to a more than seven-fold increase over the 36GBps bandwidth of a 4Gb GDDR5 DRAM chip, which has the fastest data speed per pin (9Gbps) among currently manufactured DRAM chips. Samsung's 4GB HBM2 also enables enhanced power efficiency by doubling the bandwidth per watt over a 4Gb-GDDR5-based solution, and embeds ECC (error-correcting code) functionality to offer high reliability.
TSV refers to through-silicon via, a vertical electrical connection used to build 3D chip packages such as High Bandwidth Memory.
Update: HBM2 has been formalized in JEDEC's JESD235A standard, and Anandtech has an article with additional technical details.
Nvidia revealed key details about its upcoming "Pascal" consumer GPUs at a May 6th event. These GPUs are built using a 16nm FinFET process from TSMC rather than the 28nm processes that were used for several previous generations of both Nvidia and AMD GPUs.
The GeForce GTX 1080 will outperform the GTX 980, GTX 980 Ti, and Titan X cards. Nvidia claims that GTX 1080 can reach 9 teraflops of single precision performance, while the GTX 1070 will reach 6.5 teraflops. A single GTX 1080 will be faster than two GTX 980s in SLI.
Both the GTX 1080 and 1070 will feature 8 GB of VRAM. Unfortunately, neither card contains High Bandwidth Memory 2.0 like the Tesla P100 does. Instead, the GTX 1080 has GDDR5X memory while the 1070 is sticking with GDDR5.
The GTX 1080 starts at $599 and is available on May 27th. The GTX 1070 starts at $379 on June 10th. Your move, AMD.
NVIDIA is releasing the GeForce GTX 1080 Ti, a $699 GPU with performance and specifications similar to that of the NVIDIA Titan X:
Unveiled last week at GDC and launching [March 10th] is the GeForce GTX 1080 Ti. Based on NVIDIA's GP102 GPU – aka Bigger Pascal – the job of GTX 1080 Ti is to serve as a mid-cycle refresh of the GeForce 10 series. Like the GTX 980 Ti and GTX 780 Ti before it, that means taking advantage of improved manufacturing yields and reduced costs to push out a bigger, more powerful GPU to drive this year's flagship video card. And, for NVIDIA and their well-executed dominance of the high-end video card market, it's a chance to run up the score even more. With the GTX 1080 Ti, NVIDIA is aiming for what they're calling their greatest performance jump yet for a modern Ti product – around 35% on average. This would translate into a sizable upgrade for GeForce GTX 980 Ti owners and others for whom GTX 1080 wasn't the card they were looking for.
[...] Going by the numbers then, the GTX 1080 Ti offers just over 11.3 TFLOPS of FP32 performance. This puts the expected shader/texture performance of the card 28% ahead of the current GTX 1080, while the ROP throughput advantage stands 26%, and memory bandwidth at a much greater 51.2%. Real-world performance will of course be influenced by a blend of these factors, with memory bandwidth being the real wildcard. Otherwise, relative to the NVIDIA Titan X, the two cards should end up quite close, trading blows now and then.
Speaking of the Titan, on an interesting side note, NVIDIA isn't going to be doing anything to hurt the compute performance of the GTX 1080 Ti to differentiate the card from the Titan, which has proven popular with GPU compute customers. Crucially, this means that the GTX 1080 Ti gets the same 4:1 INT8 performance ratio of the Titan, which is critical to the cards' high neural networking inference performance. As a result the GTX 1080 Ti actually has slightly greater compute performance (on paper) than the Titan. And NVIDIA has been surprisingly candid in admitting that unless compute customers need the last 1GB of VRAM offered by the Titan, they're likely going to buy the GTX 1080 Ti instead.
The card includes 11 GB of Micron's second-generation GDDR5X memory operating at 11 Gbps compared to 12 GB of GDDR5X at 10 Gbps for the Titan X.
JEDEC has announced that it expects to finalize the DDR5 standard by next year. It says that DDR5 will double bandwidth and density, and increase power efficiency, presumably by lowering the operating voltages again (perhaps to 1.1 V). Availability of DDR5 modules is expected by 2020:
You may have just upgraded your computer to use DDR4 recently or you may still be using DDR3, but in either case, nothing stays new forever. JEDEC, the organization in charge of defining new standards for computer memory, says that it will be demoing the next-generation DDR5 standard in June of this year and finalizing the standard sometime in 2018. DDR5 promises double the memory bandwidth and density of DDR4, and JEDEC says it will also be more power-efficient, though the organization didn't release any specific numbers or targets.
The DDR4 SDRAM specification was finalized in 2012, and DDR3 in 2007, so DDR5's arrival is to be expected (cue the Soylentils still using DDR2). One way to double the memory bandwidth of DDR5 is to double the DRAM prefetch to 16n, matching GDDR5X.
Graphics cards are beginning to ship with GDDR5X. Some graphics cards and Knights Landing Xeon Phi chips include High Bandwidth Memory (HBM). A third generation of HBM will offer increased memory bandwidth, density, and more than 8 dies in a stack. Samsung has also talked about a cheaper version of HBM for consumers with a lower total bandwidth. SPARC64 XIfx chips include Hybrid Memory Cube. GDDR6 SDRAM could raise per-pin bandwidth to 14 Gbps, from the 10-14 Gbps of GDDR5X, while lowering power consumption.
Samsung has announced the mass production of 16 Gb GDDR6 SDRAM chips with a higher-than-expected pin speed. The chips could see use in upcoming graphics cards that are not equipped with High Bandwidth Memory:
Samsung has beaten SK Hynix and Micron to be the first to mass produce GDDR6 memory chips. Samsung's 16Gb (2GB) chips are fabricated on a 10nm process and run at 1.35V. The new chips have a whopping 18Gb/s pin speed and will be able to reach a transfer rate of 72GB/s. Samsung's current 8Gb (1GB) GDDR5 memory chips, besides having half the density, work at 1.55V with up to 9Gb/s pin speeds. In a pre-CES 2018 press release, Samsung briefly mentioned the impending release of these chips. However, the speed on release is significantly faster than the earlier stated 16Gb/s pin speed and 64GB/s transfer rate.
18 Gbps exceeds what the JEDEC standard calls for.
It would seem that Micron this morning has accidentally spilled the beans on the future of graphics card memory technologies – and outed one of NVIDIA's next-generation RTX video cards in the process. In a technical brief that was posted to their website, dubbed "The Demand for Ultra-Bandwidth Solutions", Micron detailed their portfolio of high-bandwidth memory technologies and the market needs for them. Included in this brief was information on the previously-unannounced GDDR6X memory technology, as well as some information on what seems to be the first card to use it, NVIDIA's GeForce RTX 3090.
[...] At any rate, as this is a market overview rather than a technical deep dive, the details on GDDR6X are slim. The document links to another, still-unpublished document, "Doubling I/O Performance with PAM4: Micron Innovates GDDR6X to Accelerate Graphics Memory", that would presumably contain further details on GDDR6X. None the less, even this high-level overview gives us a basic idea of what Micron has in store for later this year.
The key innovation for GDDR6X appears to be that Micron is moving from using POD135 coding on the memory bus – a binary (two state) coding format – to four state coding in the form of Pulse-Amplitude Modulation 4 (PAM4). In short, Micron would be doubling the number of signal states in the GDDR6X memory bus, allowing it to transmit twice as much data per clock.
[...] According to Micron's brief, they're expecting to get GDDR6X to 21Gbps/pin, at least to start with. This is a far cry from doubling GDDR6's existing 16Gbps/pin rate, but it's also a data rate that would be grounded in the limitations of PAM4 and DRAM. PAM4 itself is easier to achieve than binary coding at the same total data rate, but having to accurately determine four states instead of two is conversely a harder task. So a smaller jump isn't too surprising.
The leaked Ampere-based RTX 3090 seems to be Nvidia's attempt to compete with AMD's upcoming RDNA2 ("Big Navi") GPUs without lowering the price of the usual high-end "Titan" GPU (Titan RTX launched at $2,499). Here are some of the latest leaks for the RTX 30 "Ampere" GPU lineup.
Related: PCIe 6.0 Announced for 2021: Doubles Bandwidth Yet Again (uses PAM4)