from the but-still-no-gpus dept.
Today, [AMD] announced its most ambitious goal yet—to increase the energy efficiency of its Epyc CPUs and Instinct AI accelerators 30 times by 2025. This would help data centers and supercomputers achieve high performance with significant power savings over current solutions.
If it achieves this goal, the savings would add up to billions of kilowatt-hours of electricity saved in 2025 alone, meaning the power required to perform a single calculation in high-performance computing tasks will have decreased by 97 percent.
Increasing energy efficiency this much will involve a lot of engineering wizardry, including AMD's stacked 3D V-Cache chiplet technology. The company acknowledges the difficult task ahead of it, now that "energy-efficiency gains from process node advances are smaller and less frequent."
In addition to compute node performance/Watt measurements, to make the goal particularly relevant to worldwide energy use, AMD uses segment-specific datacenter power utilization effectiveness (PUE) with equipment utilization taken into account. The energy consumption baseline uses the same industry energy per operation improvement rates as from 2015-2020, extrapolated to 2025. The measure of energy per operation improvement in each segment from 2020-2025 is weighted by the projected worldwide volumes multiplied by the Typical Energy Consumption (TEC) of each computing segment to arrive at a meaningful metric of actual energy usage improvement worldwide.
See the 25x20 Initiative from a few years ago.
AMD has announced its "Milan-X" Epyc CPUs, which reuse the same Zen 3 chiplets found in "Milan" Epyc CPUs with up to 64 cores, but with triple the L3 cache using stacked "3D V-Cache" technology designed in partnership with TSMC. This means that some Epyc CPUs will go from having 256 MiB of L3 cache to a whopping 768 MiB (804 MiB of cache when including L1 and L2 cache). 2-socket servers using Milan-X can have over 1.5 gigabytes of L3 cache. The huge amount of additional cache results in average performance gains in "targeted workloads" of around 50% according to AMD. Microsoft found an 80% improvement in some workloads (e.g. computational fluid dynamics) due to the increase in effective memory bandwidth.
AMD's next-generation of Instinct high-performance computing GPUs will use a multi-chip module (MCM) design, essentially chiplets for GPUs. The Instinct MI250X includes two "CDNA 2" dies for a total of 220 compute units, compared to 120 compute units for the previous MI100 monolithic GPU. Performance is roughly doubled (FP32 Vector/Matrix, FP16 Matrix, INT8 Matrix), quadrupled (FP64 Vector), or octupled (FP64 Matrix). VRAM has been quadrupled to 128 GB of High Bandwidth Memory. Power consumption of the world's first MCM GPU will be high, as it has a 560 Watt TDP.
The Frontier exascale supercomputer will use both Epyc CPUs and Instinct MI200 GPUs.
AMD officially confirmed that upcoming Zen 4 "Genoa" Epyc CPUs made on a TSMC "5nm" node will have up to 96 cores. AMD also announced "Bergamo", a 128-core "Zen 4c" Epyc variant, with the 'c' indicating "cloud-optimized". This is a denser, more power-efficient version of Zen 4 with a smaller cache. According to a recent leak, Zen 4c chiplets will have 16 cores instead of 8, will retain hyperthreading, and will be used in future Zen 5 Ryzen desktop CPUs as AMD's answer to Intel's Alder Lake heterogeneous ("big.LITTLE") x86 microarchitecture.
Also at Tom's Hardware (Milan-X).
Previously: AMD Reveals 'Instinct' for Machine Intelligence
AMD Launches "Milan" Epyc Server CPUs, with Zen 3 and up to 64 Cores
AMD at Computex 2021: 5000G APUs, 6000M Mobile GPUs, FidelityFX Super Resolution, and 3D Chiplets
AMD Unveils New Ryzen V-Cache Details at HotChips 33
AMD Aims to Increase Energy Efficiency of Epyc CPUs and Instinct AI Accelerators 30x by 2025
AMD claims to have improved performance by about 5x while cutting power use to about 1/6th, when comparing 2014 "Kaveri" mobile APUs to 2020 "Renoir" mobile APUs. This exceeds a goal of improving efficiency by 25x by 2020:
The base value for AMD's goal is on its Kaveri mobile processors, which by the standards of today set a very low bar. As AMD moved to Carrizo, it implemented new power monitoring features on chip that allowed the system to offer a better distribution of power and ran closer to the true voltage needed, not wasting power. After Carrizo came Bristol Ridge, still based on the older cores, but used a new DDR4 controller as well as lower powered processors that were better optimized for efficiency.
A big leap came with Raven Ridge, with AMD combining its new highly efficient Zen x86 cores and Vega integrated graphics. This heralded a vast improvement in performance due to doubling the cores and improving the graphics, all within a similar power window as Bristol Ridge. This boosted up the important 25x20 metric and keeping it well above the 'linear' gain.
[...] The jump from Picasso to Renoir has been well documented. Our first use of the CPUs, reviewed in the ASUS Zephyrus G14, left us with our mouths open, almost literally. We called it a 'Mobile Revival', showcasing AMD's transition over from Zen+ to Zen2, from GF 12nm to TSMC 7nm, along with a lot of strong design and optimization on the graphics side. The changes from the 2019 to the 2020 chip include doubling the core count, from four to eight, improving the clock-for-clock performance by 15-20%, but also improving the graphics performance and frequencies despite moving down from an silicon design that had 11 compute units down to 8. This comes in line with a number of power updates, adhering to AHCI specifications, and as we discussed with Sam Naffziger, AMD Fellow, supporting the new S0ix low power states has helped tremendously. The reduction in the fabric power, along with additional memory bandwidth, offered large gains.
AMD accomplished this while using refined "7nm" Vega GPU cores in its APUs, instead of moving to a newer architecture such as RDNA2.