Slash Boxes

SoylentNews is people

posted by janrinok on Tuesday November 09 2021, @05:48AM   Printer-friendly [Skip to comment(s)]

AMD has announced its "Milan-X" Epyc CPUs, which reuse the same Zen 3 chiplets found in "Milan" Epyc CPUs with up to 64 cores, but with triple the L3 cache using stacked "3D V-Cache" technology designed in partnership with TSMC. This means that some Epyc CPUs will go from having 256 MiB of L3 cache to a whopping 768 MiB (804 MiB of cache when including L1 and L2 cache). 2-socket servers using Milan-X can have over 1.5 gigabytes of L3 cache. The huge amount of additional cache results in average performance gains in "targeted workloads" of around 50% according to AMD. Microsoft found an 80% improvement in some workloads (e.g. computational fluid dynamics) due to the increase in effective memory bandwidth.

AMD's next-generation of Instinct high-performance computing GPUs will use a multi-chip module (MCM) design, essentially chiplets for GPUs. The Instinct MI250X includes two "CDNA 2" dies for a total of 220 compute units, compared to 120 compute units for the previous MI100 monolithic GPU. Performance is roughly doubled (FP32 Vector/Matrix, FP16 Matrix, INT8 Matrix), quadrupled (FP64 Vector), or octupled (FP64 Matrix). VRAM has been quadrupled to 128 GB of High Bandwidth Memory. Power consumption of the world's first MCM GPU will be high, as it has a 560 Watt TDP.

The Frontier exascale supercomputer will use both Epyc CPUs and Instinct MI200 GPUs.

AMD officially confirmed that upcoming Zen 4 "Genoa" Epyc CPUs made on a TSMC "5nm" node will have up to 96 cores. AMD also announced "Bergamo", a 128-core "Zen 4c" Epyc variant, with the 'c' indicating "cloud-optimized". This is a denser, more power-efficient version of Zen 4 with a smaller cache. According to a recent leak, Zen 4c chiplets will have 16 cores instead of 8, will retain hyperthreading, and will be used in future Zen 5 Ryzen desktop CPUs as AMD's answer to Intel's Alder Lake heterogeneous ("big.LITTLE") x86 microarchitecture.

Also at Tom's Hardware (Milan-X).

Previously: AMD Reveals 'Instinct' for Machine Intelligence
AMD Launches "Milan" Epyc Server CPUs, with Zen 3 and up to 64 Cores
AMD at Computex 2021: 5000G APUs, 6000M Mobile GPUs, FidelityFX Super Resolution, and 3D Chiplets
AMD Unveils New Ryzen V-Cache Details at HotChips 33
AMD Aims to Increase Energy Efficiency of Epyc CPUs and Instinct AI Accelerators 30x by 2025

Original Submission

Related Stories

AMD Reveals ‘Instinct’ for Machine Intelligence 15 comments

Arthur T Knackerbracket has found the following story:

At the AMD Tech Summit in Sonoma, Calif., last week (Dec. 7-9), CEO Lisa Su unveiled the company's vision to accelerate machine intelligence over the next five to ten years with an open and heterogeneous computing approach and a new suite of hardware and open-source software offerings.

The roots for this strategy can be traced back to the company's acquisition of graphics chipset manufacturer ATI in 2006 and the subsequent launch of the CPU-GPU hybrid Fusion generation of computer processors. In 2012, the Fusion platform matured into the Heterogeneous Systems Architecture (HSA), now owned and maintained by the HSA Foundation.

Ten years since launching Fusion, AMD believes it has found the killer app for heterogeneous computing in machine intelligence, which is driven by exponential data surges.

"We generate 2.5 quintillion bytes of data every single day – whether you're talking about Tweets, YouTube videos, Facebook, Instagram, Google searches or emails," said Su. "We have incredible amounts of data out there. And the thing about this data is it's all different – text, video, audio, monitoring data. With all this different data, you really are in a heterogeneous system and that means you need all types of computing to satisfy this demand. You need CPUs, you need GPUs, you need accelerators, you need ASICS, you need fast interconnect technology. The key to it is it's a heterogeneous computing architecture.

"Why are we so excited about this? We've actually been talking about heterogeneous computing for the last ten years," Su continued. "This is the reason we wanted to bring CPUs and GPUs together under one roof and we were doing this when people didn't understand why we were doing this and we were also learning about what the market was and where the market needed these applications, but it's absolutely clear that for the machine intelligence era, we need heterogeneous compute."

Aiming to boost the performance, efficiency, and ease of implementation of deep learning workloads, AMD is introducing a brand-new hardware platform, Radeon Instinct, and new Radeon open source software solutions.

[...] "We are going to address key verticals that leverage a common infrastructure," said Raja Koduri, senior vice president and chief architect of Radeon Technologies Group. "The building block is our Radeon Instinct hardware platform, and above that we have the completely open source Radeon software platform. On top of that we're building optimized machine learning frameworks and libraries."

AMD is also investing in open interconnect technologies for heterogeneous accelerators; the company is a founding member of CCIX, Gen-Z and OpenCAPI.

[...] The AMD Tech Summit is a follow-on to the inaugural summit that debuted last December (2015). That first event was initiated by Raja Koduri as a team-building activity for the newly minted Radeon Technologies Group. The initial team of about 80, essentially hand-picked by Koduri to focus on graphics, met in Sonoma along with about 15 members of the press. The event was expanded this year to accommodate other AMD departments and nearly 100 media and analyst representatives.

-- submitted from IRC

Original Submission

AMD Launches "Milan" Epyc Server CPUs, with Zen 3 and up to 64 Cores 16 comments

AMD Unveils EPYC 'Milan' 7003 CPUs, Zen 3 Comes to 64-Core Server Chips

AMD unveiled its EPYC 7003 'Milan' processors today, claiming that the chips, which bring the company's powerful Zen 3 architecture to the server market for the first time, take the lead as the world's fastest server processor with its flagship 64-core 128-thread EPYC 7763. Like the rest of the Milan lineup, this chip comes fabbed on the 7nm process and is drop-in compatible with existing servers. AMD claims it brings up to twice the performance of Intel's competing Xeon Cascade Lake Refresh chips in HPC, Cloud, and enterprise workloads, all while offering a vastly better price-to-performance ratio.

Milan's agility lies in the Zen 3 architecture and its chiplet-based design. This microarchitecture brings many of the same benefits that we've seen with AMD's Ryzen 5000 series chips that dominate the desktop PC market, like a 19% increase in IPC and a larger unified L3 cache. Those attributes, among others, help improve AMD's standing against Intel's venerable Xeon lineup in key areas, like single-threaded work, and offer a more refined performance profile across a broader spate of applications.

One interesting new SKU is the EPYC 7663, a 56-core, 112-thread CPU with 7 working cores on each of the 8-core chiplets. There is also a 28-core EPYC 7453.

Next up, Zen 4 "Genoa".

Also at AnandTech, The Next Platform, Phoronix, and Ars Technica.

See also: The Tour of Italy with EPYC Milan: Interview with AMD's Forrest Norrod
AMD video announcement (51m4s) and recap (10m43s)

Original Submission

AMD at Computex 2021: 5000G APUs, 6000M Mobile GPUs, FidelityFX Super Resolution, and 3D Chiplets 10 comments

AMD's Ryzen 5000G APUs now have a release date for the DIY market: August 5th. The 8-core Ryzen 7 5700G has a suggested price of $359, while the 6-core Ryzen 5 5600G will be $259.

AMD announced the Radeon RX 6800M, 6700M, and 6600M discrete GPUs for laptops, promising better performance, efficiency, and battery-constrained performance. The Radeon RX 6800M is a 40 compute unit design (equivalent to the Radeon RX 6700 XT on desktop) with 12 GB of VRAM.

AMD biggest announcements were the introduction of FidelityFX Super Resolution (FSR) and the demonstration of a 3D chiplet design. FSR uses a spatial scaling algorithm to upscale game graphics for higher frame rates at a given resolution. The algorithm competes with Nvidia's Deep Learning Super Sampling (DLSS), but will be released as open source and work with some older AMD GPUs, integrated graphics, as well as competing products from Nvidia and Intel (it was shown running on an Nvidia GTX 1060).

AMD CEO Lisa Su also showed off a modified, delidded Ryzen 9 5900X CPU prototype, with "3D V-Cache technology". It was identical to the standard 5900X with the exception of through-silicon via (TSV) stacked L3 cache. This allowed the 5900X prototype to have 192 MB of total L3 cache instead of 64 MB (96 MB per 8-core chiplet). AMD claims it can run games with an average of +15% performance (simply due to the larger cache size), and some version of this will appear in products that are starting production at the end of 2021.

Related: TSMC "5nm", "3nm", Stacked Silicon, and More

Original Submission

AMD Unveils New Ryzen V-Cache Details at HotChips 33 7 comments

AMD Unveils New Ryzen V-Cache Details at HotChips 33:

AMD gave us more information about its upcoming V-Cache at Hot Chips this year, the annual conference where semiconductor engineers from all over the industry come together to crow over disclose details regarding their technical achievements in the past 12 months.

Earlier this year, AMD announced that it would not advance directly from Zen 3 to Zen 4. Instead, it would iterate on the Zen 3 core by stacking a full 64MB of 7nm L3 cache vertically on the core. AMD claims this can improve performance by up to 15 percent based on 1080p gaming results. The improvement in other applications is unknown.

Original Submission

AMD Aims to Increase Energy Efficiency of Epyc CPUs and Instinct AI Accelerators 30x by 2025 17 comments

AMD wants to make its chips 30 times more energy-efficient by 2025

Today, [AMD] announced its most ambitious goal yet—to increase the energy efficiency of its Epyc CPUs and Instinct AI accelerators 30 times by 2025. This would help data centers and supercomputers achieve high performance with significant power savings over current solutions.

If it achieves this goal, the savings would add up to billions of kilowatt-hours of electricity saved in 2025 alone, meaning the power required to perform a single calculation in high-performance computing tasks will have decreased by 97 percent.

Increasing energy efficiency this much will involve a lot of engineering wizardry, including AMD's stacked 3D V-Cache chiplet technology. The company acknowledges the difficult task ahead of it, now that "energy-efficiency gains from process node advances are smaller and less frequent."

What does it mean?

In addition to compute node performance/Watt measurements, to make the goal particularly relevant to worldwide energy use, AMD uses segment-specific datacenter power utilization effectiveness (PUE) with equipment utilization taken into account. The energy consumption baseline uses the same industry energy per operation improvement rates as from 2015-2020, extrapolated to 2025. The measure of energy per operation improvement in each segment from 2020-2025 is weighted by the projected worldwide volumes multiplied by the Typical Energy Consumption (TEC) of each computing segment to arrive at a meaningful metric of actual energy usage improvement worldwide.

See the 25x20 Initiative from a few years ago.

See also: NVIDIA CEO Jensen Huang to unveil new AI technologies and products at GTC Keynote in November

Original Submission

Frontier Supercomputer Breaks the ExaFLOPS Barrier, Tops the TOP500 List 19 comments

The Frontier supercomputer at the Oak Ridge National Laboratory (ORNL) has exceeded 1.1 exaFLOPS (Rmax), leading the June 2022 TOP500 list as the world's fastest supercomputer and the first truly "exascale" system.

Frontier uses 9,408 64-core Epyc 7A53 CPUs and 37,632 AMD Instinct MI250X GPUs. It has 4.6 petabytes each of DDR4 and High Bandwidth Memory.

Frontier also reached #2 on the June 2022 Green500 list at 52.227 gigaFLOPS/Watt, behind the smaller Frontier Test & Development System:

Previously, Frontier had been characterized as a two peak exaflops system, but its first Top500 benchmark measures some 1.686 peak exaflops. (Oak Ridge said that there remains "much higher headroom on the GPUs and the CPUs" to achieve the two peak exaflops target.) Outside of Linpack and the Top500, the system benchmarks at 6.88 exaflops of mixed-precision performance on HPL-AI. The team ran out of time and was not able to submit an HPCG benchmark.

[...] Frontier also achieved another win out of the gate: second place on the spring 2022 Green500 list, which ranks supercomputers by their flops per watt. The Oak Ridge team accomplished this by delivering those 1.102 Linpack exaflops in a 21.1-megawatt power envelope, an efficiency of 52.23 gigaflops per watt (which works out to one exaflops at 19.15 megawatts). This puts the system well within the 20-megawatt exascale power envelope target set by DARPA in 2008—a target that had been viewed with much skepticism over the ensuing 14 years. Frontier was only outpaced in efficiency by its own test and development system (Frontier TDS, aka "Crusher"), which delivered 62.68 gigaflops per watt.

#10: 30.05 petaflops (Nov. 2021) → 46.10 petaflops (June 2022)
#100: 4.79 petaflops → 5.39 petaflops
#500: 1.65 petaflops → 1.65 petaflops (both are Lenovo C1040, Xeon E5-2673v4 20C 2.3GHz systems)

Previously: New TOP500 List Released -- Fugaku Holds Top Spot, Exascale Remains Elusive; Green500 Released Too!
Top500: No Exascale, Fugaku Still Reigns, Polaris Debuts at #12

Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: -1, Redundant) by Anonymous Coward on Tuesday November 09 2021, @06:14AM (1 child)

    by Anonymous Coward on Tuesday November 09 2021, @06:14AM (#1194883)

    I hunger.

    (first post btw)

    • (Score: -1, Offtopic) by Anonymous Coward on Tuesday November 09 2021, @06:18AM

      by Anonymous Coward on Tuesday November 09 2021, @06:18AM (#1194885) []

      Sinistar is a multidirectional shooter arcade game developed and manufactured by Williams Electronics.

      Sinistar was the first game to use stereo sound (in the sit-down version), with two independent front and back sound boards for this purpose. It was also used a 49-way optical joystick that Williams produced specifically for this game.

  • (Score: 1, Funny) by Anonymous Coward on Tuesday November 09 2021, @06:25AM (3 children)

    by Anonymous Coward on Tuesday November 09 2021, @06:25AM (#1194887)

    When there can be computer articles/tech news for at least a week or more that don't mention Microsoft and/or Windows in it.

    Seriously, fuck Microsoft.

    • (Score: 2) by takyon on Tuesday November 09 2021, @06:30AM (1 child)

      by takyon (881) <> on Tuesday November 09 2021, @06:30AM (#1194888) Journal

      They wrote a good blog post about Milan-X, they have the chips on hand, and they are probably the biggest customer right now.

      [SIG] 10/28/2017: Soylent Upgrade v14 []
      • (Score: 0) by Anonymous Coward on Tuesday November 09 2021, @07:02PM

        by Anonymous Coward on Tuesday November 09 2021, @07:02PM (#1195013)

        Too bad someone didn't walk in their headquarters with a flame thrower and light those fuckers up. That would be a MS story worth posting!

    • (Score: 0) by Anonymous Coward on Tuesday November 09 2021, @03:37PM

      by Anonymous Coward on Tuesday November 09 2021, @03:37PM (#1194955)

      if overall news seems to focus spotlight like, then mostly something sinister is cooking in the dark corners.
      no news in e-mobility? more doom and gloom in energy sector.
      no genetics news?
      no ac-coupled bidirectional inverter news (a power-electronics computer in its own right).
      etc etc
      nature abhorres a vacuum. it's not just a good idea, it's a law.

  • (Score: 2) by Opportunist on Tuesday November 09 2021, @09:03AM (1 child)

    by Opportunist (5545) on Tuesday November 09 2021, @09:03AM (#1194895)

    But how long 'til we can actually buy one without handing a scalper an arm and a leg for the privilege of owning a new CPU? It's great to hear what awesome new technology is available... only to later learn that it's not available.

    Well, we could at least be happy it's not as bad as with GPUs, maybe we should be thankful for what we got...

  • (Score: 2, Redundant) by Frosty Piss on Tuesday November 09 2021, @09:16AM

    by Frosty Piss (4971) on Tuesday November 09 2021, @09:16AM (#1194899)

    Buzzword bingo.

  • (Score: 2) by Mojibake Tengu on Tuesday November 09 2021, @09:35AM (2 children)

    by Mojibake Tengu (8598) Subscriber Badge on Tuesday November 09 2021, @09:35AM (#1194902) Journal

    Instinct is for animals, intelligent machines need sentience...

    Well, AMD doing good so far. High bandwidth memory together with huge memory size is a necessity for Prolog code.
    That's something what ARM, even Apple M1 included, cannot compete with.

    The edge of 太玄 cannot be defined, for it is beyond every aspect of design
    • (Score: 2) by HiThere on Tuesday November 09 2021, @02:41PM (1 child)

      by HiThere (866) on Tuesday November 09 2021, @02:41PM (#1194926) Journal

      Prolog !? It's been a long time since I've heard of that one, The standard seems unchanged since around 2005, so unless somebody's written a *really good* library I don't expect to hear about it again soon (except in this thread).

      FWIW, I think even Lisp is better, and that is dying for lack of decent libraries. True, Prolog has some built-in capabilities for formal logic, but that's not really all that useful in the face of combinatorial explosions.

      Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
      • (Score: 2) by Mojibake Tengu on Tuesday November 09 2021, @08:13PM

        by Mojibake Tengu (8598) Subscriber Badge on Tuesday November 09 2021, @08:13PM (#1195035) Journal

        SWI-Prolog is found in every self-esteemed FOSS distro for ages. It's free, open source and actively developed for decades, competitive to any commercial Prolog implementation. Its library is huge.

        Concerning "combinatorial explosions"... no language is miraculous enough for funny people who can't think while writing code.

        The edge of 太玄 cannot be defined, for it is beyond every aspect of design
  • (Score: 2, Interesting) by Anonymous Coward on Tuesday November 09 2021, @09:35AM (3 children)

    by Anonymous Coward on Tuesday November 09 2021, @09:35AM (#1194903)

    Power consumption of the world's first MCM GPU will be high, as it has a 560 Watt TDP.

    Ehm... what? That single GPU uses more power than my last three PC builds together. How much bitcoin/sec does thing have to mine to make the effort a net positive?

    • (Score: 0) by Anonymous Coward on Tuesday November 09 2021, @02:05PM (1 child)

      by Anonymous Coward on Tuesday November 09 2021, @02:05PM (#1194916)

      Mare importantly, can it do machine learning; has anyone used an AMD graphics card for machine learning?

    • (Score: 4, Interesting) by takyon on Tuesday November 09 2021, @02:24PM

      by takyon (881) <> on Tuesday November 09 2021, @02:24PM (#1194921) Journal

      It's not for you. In fact, the initial version is not a PCIe card and will be headed straight for the likes of supercomputers:

      For the MI250(X), OAM is all but necessary to make full use of the platform. From a power and cooling standpoint, OAM is designed to scale much higher than dual-slot PCIe cards, with the spec maxing out at 700W for a single card. Meanwhile from an I/O standpoint, OAM has enough high-speed pins to enable eight 16-bit links, which is twice as many links as what AMD could do with a PCIe card. For similar reasons, it’s also a major component in enabling GPU/CPU coherency, as AMD needs the high-speed links to run IF from the GPUs to the CPUs.

      That's not to say that the PCIe version won't have a similar high power consumption. The new 600 Watt PCIe Gen5 power connector [] (if it uses it) means they could draw up to 675 Watts.

      Basically, MCM is multiple GPUs pasted together. Suddenly doubling the compute/execution unit count while increasing the clock speeds by over 10% results in high power consumption. The TSMC N6 node it's made on doesn't help like N5 would, since N6 only increases density a little from N7.

      [SIG] 10/28/2017: Soylent Upgrade v14 []