from the wishing-our-memory-was-high-bandwidth dept.
HBM in a nutshell takes the wide & slow paradigm to its fullest. Rather than building an array of high speed chips around an ASIC to deliver 7Gbps+ per pin over a 256/384/512-bit memory bus, HBM at its most basic level involves turning memory clockspeeds way down – to just 1Gbps per pin – but in exchange making the memory bus much wider. How wide? That depends on the implementation and generation of the specification, but the examples AMD has been showcasing so far have involved 4 HBM devices (stacks), each featuring a 1024-bit wide memory bus, combining for a massive 4096-bit memory bus. It may not be clocked high, but when it's that wide, it doesn't need to be.
AMD will be the only manufacturer using the first generation of HBM, and will be joined by NVIDIA in using the second generation in 2016. HBM2 will double memory bandwidth over HBM1. The benefits of HBM include increased total bandwidth (from 320 GB/s for the R9 290X to 512 GB/s in AMD's "theoretical" 4-stack example) and reduced power consumption. Although HBM1's memory bandwidth per watt is tripled compared to GDDR5, the memory in AMD's example uses a little less than half the power (30 W for the R9 290X down to 14.6 W) due to the increased bandwidth. HBM stacks will also use 5-10% as much area of the GPU to provide the same amount of memory that GDDR5 would. That could potentially halve the size of the GPU:
By AMD's own estimate, a single HBM-equipped GPU package would be less than 70mm × 70mm (4900mm2), versus 110mm × 90mm (9900mm2) for R9 290X.
HBM will likely be featured in high-performance computing GPUs as well as accelerated processing units (APUs). HotHardware reckons that Radeon 300-series GPUs featuring HBM will be released in June.
Today was Advanced Micro Devices' (AMD) 2015 Financial Analyst Day. The last one was held in 2012. Since then, the company has changed leadership, put its APUs in the major consoles, and largely abandoned the high-end chip market to Intel. Now AMD says it is focusing on gaming, virtual reality, and datacenters. AMD has revealed details of upcoming CPUs and GPUs at the event:
Perhaps the biggest announcement relates to AMD's x86 Zen CPUs, coming in 2016. AMD is targeting a 40% increase in instructions-per-clock (IPC) with Zen cores. By contrast, Intel's Haswell (a "Tock") increased IPC by about 10-11%, and Broadwell (a "Tick") increased IPC by about 5-6%. AMD is also abandoning the maligned Bulldozer modules with Clustered Multithreading in favor of a Simultaneous Multithreading design, similar to Intel's Hyperthreading. Zen is a high priority for AMD to the extent that it is pushing back its ARM K12 chips to 2017. AMD is also shifting focus away from Project Skybridge, an "ambidextrous framework" that combined x86 and ARM cores in SoCs. Zen cores will target a wide range of designs from "top-to-bottom", including both sub-10W TDPs and up to 100W. The Zen architecture will be followed by Zen+ at some point.
On the GPU front, AMD's 2016 GPUs will use FinFETs. AMD plans to be the first vendor to use High Bandwidth Memory (HBM), a 3D/stacked memory standard that enables much higher bandwidth (hence the name) and saves power. NVIDIA also plans to use HBM in its Pascal GPUs slated for 2016. The HBM will be positioned around the processor, as the GPU's thermal output would make cooling the RAM difficult if it were on top. HBM is competing against the similar Hybrid Memory Cube (HMC) standard.
Computex 2015 has seen the enthusiastic promotion of Internet of Things devices, including a "smart diaper" seen on the show floor and a "smart vase" monitoring air quality featured in Intel's keynote.
The 6th generation of Intel Core processors, the 14nm "Tock" Skylake, was shown off in a 10mm thick all-in-one design with 4K resolution, but no new details about the CPUs were given. Sales of another form factor, the 2-in-1, were said to have increased 75% year-on-year, and they are expected to be more affordable this year. Intel also plans to increase the performance of its Atom-based Compute Sticks and release a more powerful Core M version this year.
Intel's Broadwell Xeon server chips will be featuring Iris Pro graphics. For example, the Xeon E3-1200v4 includes Iris Pro P6300, resulting in a chip suitable for video transcription. More details are at The Platform. Two socketed 65W Broadwell desktop processors with Iris Pro 6200 graphics have been announced. Both chips have 128 MB of on-die eDRAM acting as L4 cache. Other Broadwell desktop and laptop chips have been announced, and should be available within the next two months (followed by the first Skylake mobile chips in September).
Intel wants to bring wireless power and connectivity to Skylake laptops and tablets. Some Skylake devices will use WiGig (802.11ad) for data transfer, WiDi for display transfer, and Rezence magnetic resonance wireless charging. The extent to which PC vendors will commit to these cable-cutting wireless standards across Skylake devices remains to be seen. Intel also formally announced the merger of the Power Matters Alliance (PMA) and the Alliance for Wireless Power (A4WP).
Intel has deprecated the current Mini DisplayPort connector for Thunderbolt and adopted USB Type-C as the Thunderbolt 3 connector. Intel intended for the Thunderbolt interface to be used over USB ports in the first place back in 2011, but was blocked by the USB consortium at the time. Now that USB Type-C supports "USB Alternate Mode" functionality, the time has come for Intel to ditch MiniDP, the connector for 100 million Thunderbolt devices (many, but peanuts compared to USB). It has doubled the maximum bandwidth of Thunderbolt 3 to 40 Gbps, four times that of USB 3.1. Power consumption is halved, and the connector can drive two external 4K displays simultaneously or a single 5K display, at 60 Hz.
AMD has announced a launch date for graphics cards employing high-bandwidth memory (HBM): June 16th at E3. The NVIDIA GeForce GTX 980 Ti GPU was unveiled and reviewed the day before Computex. AMD's Carrizo APUs for laptops have been launched, at least on paper. AMD is explicitly targeting the $400-700 laptop segment with 15 W Carrizo chips. AMD has demoed FreeSync-over-HDMI, although hardware support remains scarce.
Broadcom and Qualcomm have unveiled 802.11ac MU-MIMO "Wave 2" products with 4x4 antenna configurations. Eight-antenna access points are capable of reaching an aggregate capacity of 6.77 Gbps. Broadcom also announced a 1 Watt gigabit Ethernet chip supporting the Energy Efficient Ethernet standard 802.3az, targeting European Code of Conduct energy efficiency requirements.
AMD has launched its 300 series GPUs. The new GPUs are considered "refreshes" of the "Hawaii" architecture, although there are some improvements. For example, the Radeon R7 360 has 2 GB of VRAM instead of the 1 GB of the Radeon R7 260, as well as a slightly higher memory clock. Radeon R9 390X and Radeon R9 390 boost clock speeds and double VRAM to 8 GB compared to the 4 GB of the 290X and 290, but will launch at a higher price than the older GPUs currently sell for. Is the VRAM boost worth it?
While one could write a small tome on the matter of memory capacity, especially in light of the fact that the Fury series only has 4GB of memory, ultimately the fact that the 390 series has 8GB now is due to a couple of factors. The first of which is the fact that 4GB Hawaii cards require 2Gb GDDR5 chips (2x16), a capacity that is slowly going away in favor of the 4Gb chips used on the Playstation 4 and many of the 2015 video cards. The other reason is that it allows AMD to exploit NVIDIA's traditional stinginess with VRAM; just as with the 290 series versus the GTX 780/770, this means AMD once again has a memory capacity advantage, which helps to shore up the value of their cards versus what NVIDIA offers at the same price.
Meanwhile with the above in mind, based on comments from AMD product managers, it sounds like the use of 4Gb chips also plays a part in the [20%] memory clockspeed increases we're seeing on the 390 series. Later generation chips don't just get bigger, but they get faster and operate at lower voltages as well, and from what we've seen it looks like AMD is taking advantage of all of these factors.
More interesting will be the Radeon R9 Fury X and Radeon R9 Fury, which use the new "Fiji" architecture. These will be AMD's first GPUs to ship with 4 GB of High Bandwidth Memory (HBM). Fury X is a water-cooled card that will launch June 24th for $649. Fury is an air-cooled version with less stream processors and texture units (lower yields) than the Fury X. It will launch on July 14th at $549. AMD claims that the new Fiji GPUs have 1.5 times the performance per watt of the R9 290X, partially due to the decrease in power needed by stacks of HBM vs. GDDR5 memory.
Later this summer, AMD will launch a 6" Fiji card with HBM called "Nano". AMD will launch a "Dual" card sometime in the fall, presumably the equivalent of two Fury X GPUs.
AMD was the first and only company to introduce products using HBM1. AMD's Radeon R9 Fury X GPUs featured 4 gigabytes of HBM1 using four 1 GB packages. Both AMD and Nvidia will introduce GPUs equipped with HBM2 memory this year. Samsung's first HBM2 packages will contain 4 GB of memory each, and the press release states that Samsung intends to manufacture 8 GB HBM2 packages within the year.
GPUs could include 8 GB of HBM2 using half of the die space used by AMD's Fury X, or just one-quarter of the die space if 8 GB HBM2 packages are used next year. Correction: HBM2 packages may be slightly physically larger than HBM1 packages. For example, SK Hynix will produce a 7.75 mm × 11.87 mm (91.99 mm2) HBM2 package, compared to 5.48 mm × 7.29 mm (39.94 mm2) HBM1 packages.
The 4GB HBM2 package is created by stacking a buffer die at the bottom and four 8-gigabit (Gb) core dies on top. These are then vertically interconnected by TSV holes and microbumps. A single 8Gb HBM2 die contains over 5,000 TSV holes, which is more than 36 times that of a 8Gb TSV DDR4 die, offering a dramatic improvement in data transmission performance compared to typical wire-bonding based packages.
Samsung's new DRAM package features 256GBps of bandwidth, which is double that of a HBM1 DRAM package. This is equivalent to a more than seven-fold increase over the 36GBps bandwidth of a 4Gb GDDR5 DRAM chip, which has the fastest data speed per pin (9Gbps) among currently manufactured DRAM chips. Samsung's 4GB HBM2 also enables enhanced power efficiency by doubling the bandwidth per watt over a 4Gb-GDDR5-based solution, and embeds ECC (error-correcting code) functionality to offer high reliability.
TSV refers to through-silicon via, a vertical electrical connection used to build 3D chip packages such as High Bandwidth Memory.
Update: HBM2 has been formalized in JEDEC's JESD235A standard, and Anandtech has an article with additional technical details.
SK Hynix will begin mass production of 4 GB HBM2 memory stacks soon:
SK Hynix has quietly added its HBM Gen 2 memory stacks to its public product catalog earlier this month, which means that the start of mass production should be imminent. The company will first offer two types of new memory modules with the same capacity, but different transfer-rates, targeting graphics cards, HPC accelerators and other applications. Over time, the HBM2 family will get broader.
SK Hynix intends to initially offer its clients 4 GB HBM2 4Hi stack KGSDs (known good stack dies) based on 8 Gb DRAM devices. The memory devices will feature a 1024-bit bus as well as 1.6 GT/s (H5VR32ESM4H-12C) and 2.0 GT/s (H5VR32ESM4H-20C) data-rates, thus offering 204 GB/s and 256 GB/s peak bandwidth per stack.
Samsung has already manufactured 4 GB stacks. Eventually, there will also be 2 GB and 8 GB stacks available.