from the how-much-does-it-cost? dept.
AMD was the first and only company to introduce products using HBM1. AMD's Radeon R9 Fury X GPUs featured 4 gigabytes of HBM1 using four 1 GB packages. Both AMD and Nvidia will introduce GPUs equipped with HBM2 memory this year. Samsung's first HBM2 packages will contain 4 GB of memory each, and the press release states that Samsung intends to manufacture 8 GB HBM2 packages within the year.
GPUs could include 8 GB of HBM2 using half of the die space used by AMD's Fury X, or just one-quarter of the die space if 8 GB HBM2 packages are used next year. Correction: HBM2 packages may be slightly physically larger than HBM1 packages. For example, SK Hynix will produce a 7.75 mm × 11.87 mm (91.99 mm2) HBM2 package, compared to 5.48 mm × 7.29 mm (39.94 mm2) HBM1 packages.
The 4GB HBM2 package is created by stacking a buffer die at the bottom and four 8-gigabit (Gb) core dies on top. These are then vertically interconnected by TSV holes and microbumps. A single 8Gb HBM2 die contains over 5,000 TSV holes, which is more than 36 times that of a 8Gb TSV DDR4 die, offering a dramatic improvement in data transmission performance compared to typical wire-bonding based packages.
Samsung's new DRAM package features 256GBps of bandwidth, which is double that of a HBM1 DRAM package. This is equivalent to a more than seven-fold increase over the 36GBps bandwidth of a 4Gb GDDR5 DRAM chip, which has the fastest data speed per pin (9Gbps) among currently manufactured DRAM chips. Samsung's 4GB HBM2 also enables enhanced power efficiency by doubling the bandwidth per watt over a 4Gb-GDDR5-based solution, and embeds ECC (error-correcting code) functionality to offer high reliability.
TSV refers to through-silicon via, a vertical electrical connection used to build 3D chip packages such as High Bandwidth Memory.
Update: HBM2 has been formalized in JEDEC's JESD235A standard, and Anandtech has an article with additional technical details.
Today was Advanced Micro Devices' (AMD) 2015 Financial Analyst Day. The last one was held in 2012. Since then, the company has changed leadership, put its APUs in the major consoles, and largely abandoned the high-end chip market to Intel. Now AMD says it is focusing on gaming, virtual reality, and datacenters. AMD has revealed details of upcoming CPUs and GPUs at the event:
Perhaps the biggest announcement relates to AMD's x86 Zen CPUs, coming in 2016. AMD is targeting a 40% increase in instructions-per-clock (IPC) with Zen cores. By contrast, Intel's Haswell (a "Tock") increased IPC by about 10-11%, and Broadwell (a "Tick") increased IPC by about 5-6%. AMD is also abandoning the maligned Bulldozer modules with Clustered Multithreading in favor of a Simultaneous Multithreading design, similar to Intel's Hyperthreading. Zen is a high priority for AMD to the extent that it is pushing back its ARM K12 chips to 2017. AMD is also shifting focus away from Project Skybridge, an "ambidextrous framework" that combined x86 and ARM cores in SoCs. Zen cores will target a wide range of designs from "top-to-bottom", including both sub-10W TDPs and up to 100W. The Zen architecture will be followed by Zen+ at some point.
On the GPU front, AMD's 2016 GPUs will use FinFETs. AMD plans to be the first vendor to use High Bandwidth Memory (HBM), a 3D/stacked memory standard that enables much higher bandwidth (hence the name) and saves power. NVIDIA also plans to use HBM in its Pascal GPUs slated for 2016. The HBM will be positioned around the processor, as the GPU's thermal output would make cooling the RAM difficult if it were on top. HBM is competing against the similar Hybrid Memory Cube (HMC) standard.
HBM in a nutshell takes the wide & slow paradigm to its fullest. Rather than building an array of high speed chips around an ASIC to deliver 7Gbps+ per pin over a 256/384/512-bit memory bus, HBM at its most basic level involves turning memory clockspeeds way down – to just 1Gbps per pin – but in exchange making the memory bus much wider. How wide? That depends on the implementation and generation of the specification, but the examples AMD has been showcasing so far have involved 4 HBM devices (stacks), each featuring a 1024-bit wide memory bus, combining for a massive 4096-bit memory bus. It may not be clocked high, but when it's that wide, it doesn't need to be.
AMD will be the only manufacturer using the first generation of HBM, and will be joined by NVIDIA in using the second generation in 2016. HBM2 will double memory bandwidth over HBM1. The benefits of HBM include increased total bandwidth (from 320 GB/s for the R9 290X to 512 GB/s in AMD's "theoretical" 4-stack example) and reduced power consumption. Although HBM1's memory bandwidth per watt is tripled compared to GDDR5, the memory in AMD's example uses a little less than half the power (30 W for the R9 290X down to 14.6 W) due to the increased bandwidth. HBM stacks will also use 5-10% as much area of the GPU to provide the same amount of memory that GDDR5 would. That could potentially halve the size of the GPU:
By AMD's own estimate, a single HBM-equipped GPU package would be less than 70mm × 70mm (4900mm2), versus 110mm × 90mm (9900mm2) for R9 290X.
HBM will likely be featured in high-performance computing GPUs as well as accelerated processing units (APUs). HotHardware reckons that Radeon 300-series GPUs featuring HBM will be released in June.
Samsung has developed the world's first 128 GB DDR4 registered memory modules for servers. From the press release:
Following Samsung's introduction of the world-first 3D TSV DDR4 DRAM (64GB) in 2014, the company's new TSV registered dual inline memory module (RDIMM) marks another breakthrough that opens the door for ultra-high capacity memory at the enterprise level. Samsung's new TSV DRAM module boasts the largest capacity and the highest energy efficiency of any DRAM modules today, while operating at high speed and demonstrating excellent reliability.
From The Register:
The Register is aware of servers with 96 DIMM slots, which means ... WOAH! ... 12.2 terabytes of RAM in a single server if you buy Samsung's new babies.
Samsung says these new DIMMS are special because "the chip dies are ground down to a few dozen micrometers, pierced with hundreds of fine holes and vertically connected by electrodes passing through the holes, allowing for a significant boost in signal transmission."
There's also "a special design through which the master chip of each 4GB package embeds the data buffer function to optimise module performance and power consumption."
The new technology is designed to improve bandwidth available to high-performance graphics processing units without fundamentally changing the memory architecture of graphics cards or memory technology itself, similar to other generations of GDDR, although these new specifications are arguably pushing the phyiscal[sic] limits of the technology and hardware in its current form. The GDDR5X SGRAM (synchronous graphics random access memory) standard is based on the GDDR5 technology introduced in 2007 and first used in 2008. The GDDR5X standard brings three key improvements to the well-established GDDR5: it increases data-rates by up to a factor of two, it improves energy efficiency of high-end memory, and it defines new capacities of memory chips to enable denser memory configurations of add-in graphics boards or other devices. What is very important for developers of chips and makers of graphics cards is that the GDDR5X should not require drastic changes to designs of graphics cards, and the general feature-set of GDDR5 remains unchanged (and hence why it is not being called GDDR6).
[...] The key improvement of the GDDR5X standard compared to the predecessor is its all-new 16n prefetch architecture, which enables up to 512 bit (64 Bytes) per array read or write access. By contrast, the GDDR5 technology features 8n prefetch architecture and can read or write up to 256 bit (32 Bytes) of data per cycle. Doubled prefetch and increased data transfer rates are expected to double effective memory bandwidth of GDDR5X sub-systems. However, actual performance of graphics cards will depend not just on DRAM architecture and frequencies, but also on memory controllers and applications. Therefore, we will need to test actual hardware to find out actual real-world benefits of the new memory.
What purpose does GDDR5X serve if superior 1st and 2nd generation High Bandwidth Memory (HBM) are around? GDDR5X memory will be cheaper than HBM and its use is more of an evolutionary than revolutionary change from existing GDDR5-based hardware.
During a keynote at GTC 2016, Nvidia announced the Tesla P100, a 16nm FinFET Pascal graphics processing unit with 15.3 billion transistors intended for high performance and cloud computing customers. The GPU includes 16 GB of High Bandwidth Memory 2.0 with 720 GB/s of memory bandwidth and a unified memory architecture. It also uses the proprietary NVLink, an interconnect with 160 GB/s of bandwidth, rather than the slower PCI-Express.
Nvidia claims the Tesla P100 will reach 5.3 teraflops of FP64 (double precision) performance, along with 10.6 teraflops of FP32 and 21.2 teraflops of FP16. 3584 of a maximum possible 3840 stream processors are enabled on this version of the GP100 die.
At the keynote, Nvidia also announced a 170 teraflops (FP16) "deep learning supercomputer" or "datacenter in a box" called DGX-1. It contains eight Tesla P100s and will cost $129,000. The first units will be going to research institutions such as the Massachusetts General Hospital.
SK Hynix will begin mass production of 4 GB HBM2 memory stacks soon:
SK Hynix has quietly added its HBM Gen 2 memory stacks to its public product catalog earlier this month, which means that the start of mass production should be imminent. The company will first offer two types of new memory modules with the same capacity, but different transfer-rates, targeting graphics cards, HPC accelerators and other applications. Over time, the HBM2 family will get broader.
SK Hynix intends to initially offer its clients 4 GB HBM2 4Hi stack KGSDs (known good stack dies) based on 8 Gb DRAM devices. The memory devices will feature a 1024-bit bus as well as 1.6 GT/s (H5VR32ESM4H-12C) and 2.0 GT/s (H5VR32ESM4H-20C) data-rates, thus offering 204 GB/s and 256 GB/s peak bandwidth per stack.
Samsung has already manufactured 4 GB stacks. Eventually, there will also be 2 GB and 8 GB stacks available.
In a surprising move, SK Hynix has announced its first memory chips based on the yet-unpublished GDDR6 standard. The new DRAM devices for video cards have capacity of 8 Gb and run at 16 Gbps per pin data rate, which is significantly higher than both standard GDDR5 and Micron's unique GDDR5X format. SK Hynix plans to produce its GDDR6 ICs in volume by early 2018.
GDDR5 memory has been used for top-of-the-range video cards for over seven years, since summer 2008 to present. Throughout its active lifespan, GDDR5 increased its data rate by over two times, from 3.6 Gbps to 9 Gbps, whereas its per chip capacities increased by 16 times from 512 Mb to 8 Gb. In fact, numerous high-end graphics cards, such as NVIDIA's GeForce GTX 1060 and 1070, still rely on the GDDR5 technology, which is not going anywhere even after the launch of Micron's GDDR5X with up to 12 Gbps data rate per pin in 2016. As it appears, GDDR6 will be used for high-end graphics cards starting in 2018, just two years after the introduction of GDDR5X.
In response to increased demand, Samsung is increasing production of the densest HBM2 DRAM available:
Samsung on Tuesday announced that it is increasing production volumes of its 8 GB, 8-Hi HBM2 DRAM stacks due to growing demand. In the coming months the company's 8 GB HBM2 chips will be used for several applications, including those for consumers, professionals, AI, as well as for parallel computing. Meanwhile, AMD's Radeon Vega graphics cards for professionals and gamers will likely be the largest consumers of HBM2 in terms of volume. And while AMD is traditionally a SK Hynix customer, the timing of this announcement with AMD's launches certainly suggests that AMD is likely a Samsung customer this round as well.
Samsung's 8 GB HBM Gen 2 memory KGSDs (known good stacked die) are based on eight 8-Gb DRAM devices in an 8-Hi stack configuration. The memory components are interconnected using TSVs and feature over 5,000 TSV interconnects each. Every KGSD has a 1024-bit bus and offers up to 2 Gbps data rate per pin, thus providing up to 256 GB/s of memory bandwidth per single 8-Hi stack. The company did not disclose power consumption and heat dissipation of its HBM memory components, but we have reached out [to] Samsung for additional details.