from the making-a-comeback dept.
Today was Advanced Micro Devices' (AMD) 2015 Financial Analyst Day. The last one was held in 2012. Since then, the company has changed leadership, put its APUs in the major consoles, and largely abandoned the high-end chip market to Intel. Now AMD says it is focusing on gaming, virtual reality, and datacenters. AMD has revealed details of upcoming CPUs and GPUs at the event:
Perhaps the biggest announcement relates to AMD's x86 Zen CPUs, coming in 2016. AMD is targeting a 40% increase in instructions-per-clock (IPC) with Zen cores. By contrast, Intel's Haswell (a "Tock") increased IPC by about 10-11%, and Broadwell (a "Tick") increased IPC by about 5-6%. AMD is also abandoning the maligned Bulldozer modules with Clustered Multithreading in favor of a Simultaneous Multithreading design, similar to Intel's Hyperthreading. Zen is a high priority for AMD to the extent that it is pushing back its ARM K12 chips to 2017. AMD is also shifting focus away from Project Skybridge, an "ambidextrous framework" that combined x86 and ARM cores in SoCs. Zen cores will target a wide range of designs from "top-to-bottom", including both sub-10W TDPs and up to 100W. The Zen architecture will be followed by Zen+ at some point.
On the GPU front, AMD's 2016 GPUs will use FinFETs. AMD plans to be the first vendor to use High Bandwidth Memory (HBM), a 3D/stacked memory standard that enables much higher bandwidth (hence the name) and saves power. NVIDIA also plans to use HBM in its Pascal GPUs slated for 2016. The HBM will be positioned around the processor, as the GPU's thermal output would make cooling the RAM difficult if it were on top. HBM is competing against the similar Hybrid Memory Cube (HMC) standard.
Although High Bandwidth Memory is on track for 2016, it will actually be featured in an AMD desktop GPU to be released this quarter. AnandTech expects HBM to become a standard feature in AMD APUs, which benefit from higher memory bandwidth:
Coupled with the fact that any new GPU from AMD should also include AMD's latest color compression technology, and the implication is that the effective increase in memory bandwidth should be quite large. For AMD, they see this as being one of the keys of delivering better 4K performance along with better VR performance.
Finally, while talking about HBM on GPUs, AMD is also strongly hinting that they intend to bring HBM to other products as well. Given their product portfolio, we consider this to be a pretty transparent hint that the company wants to build HBM-equipped APUs. AMD's APUs have traditionally struggled to reach peak performance due to their lack of memory bandwidth – 128-bit DDR3 only goes so far – so HBM would be a natural extension to APUs."
AMD's Carrizo APUs will be released beginning this quarter, but it may be worth it to wait:
Badging aside, AMD still will have to face the fact that they're launching a 28nm notebook APU versus Intel's 14nm notebook CPUs, the company is once again banking on their strong GPU performance to help drive sales. Coupled with the combination of low power optimizations in Carrizo and full fixed-function hardware decoding of HEVC, and AMD will be relying on Carrizo to carry them through to 2016 and Zen.
AMD also announced Radeon M300 discrete GPUs for notebooks, promising "refined efficiency and power management" as well as DirectX 12 support.
One of the more interesting chips on AMD's roadmap may be a "high-performance server APU" intended for both high-performance computing and workstations.
HBM in a nutshell takes the wide & slow paradigm to its fullest. Rather than building an array of high speed chips around an ASIC to deliver 7Gbps+ per pin over a 256/384/512-bit memory bus, HBM at its most basic level involves turning memory clockspeeds way down – to just 1Gbps per pin – but in exchange making the memory bus much wider. How wide? That depends on the implementation and generation of the specification, but the examples AMD has been showcasing so far have involved 4 HBM devices (stacks), each featuring a 1024-bit wide memory bus, combining for a massive 4096-bit memory bus. It may not be clocked high, but when it's that wide, it doesn't need to be.
AMD will be the only manufacturer using the first generation of HBM, and will be joined by NVIDIA in using the second generation in 2016. HBM2 will double memory bandwidth over HBM1. The benefits of HBM include increased total bandwidth (from 320 GB/s for the R9 290X to 512 GB/s in AMD's "theoretical" 4-stack example) and reduced power consumption. Although HBM1's memory bandwidth per watt is tripled compared to GDDR5, the memory in AMD's example uses a little less than half the power (30 W for the R9 290X down to 14.6 W) due to the increased bandwidth. HBM stacks will also use 5-10% as much area of the GPU to provide the same amount of memory that GDDR5 would. That could potentially halve the size of the GPU:
By AMD's own estimate, a single HBM-equipped GPU package would be less than 70mm × 70mm (4900mm2), versus 110mm × 90mm (9900mm2) for R9 290X.
HBM will likely be featured in high-performance computing GPUs as well as accelerated processing units (APUs). HotHardware reckons that Radeon 300-series GPUs featuring HBM will be released in June.
AMD was the first and only company to introduce products using HBM1. AMD's Radeon R9 Fury X GPUs featured 4 gigabytes of HBM1 using four 1 GB packages. Both AMD and Nvidia will introduce GPUs equipped with HBM2 memory this year. Samsung's first HBM2 packages will contain 4 GB of memory each, and the press release states that Samsung intends to manufacture 8 GB HBM2 packages within the year.
GPUs could include 8 GB of HBM2 using half of the die space used by AMD's Fury X, or just one-quarter of the die space if 8 GB HBM2 packages are used next year. Correction: HBM2 packages may be slightly physically larger than HBM1 packages. For example, SK Hynix will produce a 7.75 mm × 11.87 mm (91.99 mm2) HBM2 package, compared to 5.48 mm × 7.29 mm (39.94 mm2) HBM1 packages.
The 4GB HBM2 package is created by stacking a buffer die at the bottom and four 8-gigabit (Gb) core dies on top. These are then vertically interconnected by TSV holes and microbumps. A single 8Gb HBM2 die contains over 5,000 TSV holes, which is more than 36 times that of a 8Gb TSV DDR4 die, offering a dramatic improvement in data transmission performance compared to typical wire-bonding based packages.
Samsung's new DRAM package features 256GBps of bandwidth, which is double that of a HBM1 DRAM package. This is equivalent to a more than seven-fold increase over the 36GBps bandwidth of a 4Gb GDDR5 DRAM chip, which has the fastest data speed per pin (9Gbps) among currently manufactured DRAM chips. Samsung's 4GB HBM2 also enables enhanced power efficiency by doubling the bandwidth per watt over a 4Gb-GDDR5-based solution, and embeds ECC (error-correcting code) functionality to offer high reliability.
TSV refers to through-silicon via, a vertical electrical connection used to build 3D chip packages such as High Bandwidth Memory.
Update: HBM2 has been formalized in JEDEC's JESD235A standard, and Anandtech has an article with additional technical details.
A CERN engineer has leaked a few details of an unreleased 32-core AMD "Zen" processor featuring support for 8 channels of DDR4 memory. The processor connects two 16-core CPUs with an on-die interconnect, and could be a replacement for older AMD Opteron chips and competitor to Intel's Xeon chips. Zen is the name of AMD's upcoming 14nm architecture:
AMD is long overdue for a major architecture update, though one is coming later this year. Featuring the codename "Zen," AMD's already provided a few details, such as that it will be built using a 14nm FinFET process technology and will have high core counts. In time, AMD will reveal all there is to know but Zen, but in the meantime, we now have a few additional details to share thanks to a computer engineer at CERN.
CERN engineer Liviu Valsan recently gave a presentation on technology and market trends for the data center. At around 2 minutes into the discussion, he brought up AMD's Zen architecture with a slide that contained some previously undisclosed details (along with a few things we already knew). One of the more interesting revelations was that upcoming x86 processors based on Zen will feature up to 32 physical cores.
Before you get too excited about the high core count, there are two things to note. The first is that AMD is employing a "bit of a trick," to use Valsan's words. To achieve a 32-core design, Valsan says AMD will use two 16-core CPUs on a single die with a next-generation interconnect, presumably one that would reduce or be void of bottlenecks.
The second thing to consider is that it's highly unlikely AMD would release a 32-core processor into the consumer market. Zen-based Opterons aren't out of the question—servers and workstations could take real advantage of the additional cores—but as far as FX processors go, it's more realistic to expect offerings to boast up to 8 cores, maybe even 16 at some point.
Previously: AMD's Upcoming "Zen" APU - 16 Cores, 32 Threads