Arm Announces Neoverse V1 & N2 Infrastructure CPUs: +50% IPC, SVE Server Cores
Amazon's Graviton2 64-core Neoverse N1 server chip is the first of what should become a wider range of designs that will be driving the Arm server ecosystem forward and actively assaulting the infrastructure CPU market share that's currently dominated by the x86 players such as Intel and AMD.
[...] Today, we're ready to take the next step towards the next generation of the Neoverse platform, not only revealing the CPU microarchitecture previously known as Zeus, but a whole new product category that goes beyond the Neoverse N-series: Introducing the new Neoverse V-series and the Neoverse V1 (Zeus), as well as a new roadmap insertion in the form of the Neoverse N2 (Perseus).
[...] In terms of generational performance uplift, it's akin to Arm throwing down the gauntlet to the competition, achieving a ground-breaking +50[sic, % obvs] IPC boost compared to Neoverse N1 that we're seeing in silicon today. The performance uplift potential here is tremendous, as this is merely a same-process ISO-frequency upgrade, and actual products based on the V1 will also in all likelihood also see additional performance gains thanks to increased frequencies through process node advancements.
If we take the conservatively clocked Graviton2 with its 2.5GHz N1 cores as a baseline, a theoretical 3GHz V1 chip would represent an 80% uplift in per-core single-threaded performance. Not only would such a performance uptick vastly exceed any current x86 competition in the server space in terms of per-core performance, it would be enough to match the current best high-performance desktop chips from AMD and Intel today (Though we have to remember it'll compete against next-gen Zen3 Milan and Sapphire Rapids products).
[...] Alongside the Neoverse V1 platform, we've seen a roadmap insertion that previously wasn't there. The Perseus design will become the Neoverse N2, and will be the effective product-positioning successor to the N1. This new CPU IP represents a 40% IPC uplift compared to the N1, however still maintains the same design philosophy of maximising performance within the lowest power and smallest area.
Neoverse V1 is basically the server-oriented equivalent of the Cortex-X1 core, where performance is prioritized at the cost of less power efficiency and a greater die area (more cache, etc.). Neoverse N2 is more like (an unannounced successor of) Cortex-A78.
Also at TechPowerUp.
Related: Amazon Announces 64-core Graviton2 Arm CPU
Related Stories
The new Graviton2 SoC is a custom design by Amazon's own in-house silicon design teams and is a successor to the first-generation Graviton chip. The new chip quadruples the core count from 16 cores to 64 cores and employs Arm's newest Neoverse N1 cores. Amazon is using the highest performance configuration available, with 1MB L2 caches per core, with all 64 cores connected by a mesh fabric supporting 2TB/s aggregate bandwidth as well as integrating 32MB of L3 cache.
Amazon claims the new Graviton2 chip is[sic] can deliver up to 7x higher performance than the first generation based A1 instances in total across all cores, up to 2x the performance per core, and delivers memory access speed of up to 5x compared to its predecessor. The chip comes in at a massive 30B transistors on a 7nm manufacturing node - if Amazon is using similar high density libraries to mobile chips (they have no reason to use HPC libraries), then I estimate the chip to fall around 300-350mm² if I was forced to put out a figure.
The memory subsystem of the new chip is supported by 8 DDR4-3200 channels with support for hardware AES256 memory encryption. Peripherals of the system are supported by 64 PCIe4 lanes.
Arm's New Cortex-A78 and Cortex-X1 Microarchitectures: An Efficiency and Performance Divergence
Today for Arm's 2020 TechDay announcements, the company is not just releasing a single new CPU microarchitecture, but two. The long-expected Cortex-A78 is indeed finally making an appearance, but Arm is also introducing its new Cortex-X1 CPU as the company's new flagship performance design. The move is not only surprising, but marks an extremely important divergence in Arm's business model and design methodology, finally addressing some of the company's years-long product line compromises.
[...] The new Cortex-A78 pretty much continues Arm's traditional design philosophy, that being that it's built with a stringent focus on a balance between performance, power, and area (PPA). PPA is the name of the game for the wider industry, and here Arm is pretty much the leading player on the scene, having been able to provide extremely competitive performance at with low power consumption and small die areas. These design targets are the bread & butter of Arm as the company has an incredible range of customers who aim for very different product use-cases – some favoring performance while some other have cost as their top priority.
All in all (we'll get into the details later), the Cortex-A78 promises a 20% improvement in sustained performance under an identical power envelope. This figure is meant to be a product performance projection, combining the microarchitecture's improvements as well as the upcoming 5nm node advancements. The IP should represent a pretty straightforward successor to the already big jump that were the A76 and A77.
[...] The Cortex-X1 was designed within the frame of a new program at Arm, which the company calls the "Cortex-X Custom Program". The program is an evolution of what the company had previously already done with the "Built on Arm Cortex Technology" program released a few years ago. As a reminder, that license allowed customers to collaborate early in the design phase of a new microarchitecture, and request customizations to the configurations, such as a larger re-order buffer (ROB), differently tuned prefetchers, or interface customizations for better integrations into the SoC designs. Qualcomm was the predominant benefactor of this license, fully taking advantage of the core re-branding options.
[...] At the end of the day, what we're getting are two different microarchitectures – both designed by the same team, and both sharing the same fundamental design blocks – but with the A78 focusing on maximizing the PPA metric and having a big focus on efficiency, while the new Cortex-X1 is able to maximize performance, even if that means compromising on higher power usage or a larger die area.
While Cortex-A78 will only improve performance by around 7% from microarchitectural changes alone, Cortex-X1 will improve performance by up to 30% due to a wider design, doubling of most cache sizes, and other changes. Cortex-X1 cores are also expected to reach 3 GHz on a "5nm" node, delivering even more performance. The Cortex-X1 cores could use up to 50-100% more power than Cortex-A77/A78. Cores could be arranged in a 1+3+4 or 2+2+4 setup of Cortex-X1, Cortex-A78, and Cortex-A55 cores.
See also: Arm Announces The Mali-G78: Evolution to 24 Cores
(Score: 3, Informative) by takyon on Friday September 25 2020, @02:46PM (2 children)
https://www.nextplatform.com/2020/09/22/arm-expands-its-server-universe-with-updated-neoverse-roadmap/ [nextplatform.com]
https://www.phoronix.com/forums/forum/hardware/processors-memory/1208954-arm-begins-bringing-up-neoverse-n2-neoverse-v1-support-in-the-gnu-toolchain [phoronix.com]
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by Booga1 on Friday September 25 2020, @03:10PM (1 child)
I'm a little fuzzy with their talk of comparing IPC(instructions per clock) and frequency. They seem to be calling it "instructions per core." This feels more than a little misleading when trying to compare generational performance improvements from x86 to ARM.
Am I reading that right? Am I missing something?
(Score: 3, Interesting) by takyon on Friday September 25 2020, @03:44PM
There might be more accurate ways to say it, like "increased performance at a baseline clock speed".
Here they are estimating 40% IPC gain for the N2 and 50% for the V1. But clock speeds can also increase, adding more performance. V1 increases clock speeds by 20%, 1.5 * 1.2 = 1.8.
The relationship between IPC, frequency, and performance can be complicated. At some point, scaling can slow down or nearly stop. You might not actually get +10% performance from sustained 5.5 GHz vs. 5 GHz, depending on the architecture. I think I heard that Zen+ scales better than Zen 2 at extreme clock speeds, but almost nobody will notice because you would need liquid nitrogen to reach those clocks anyway.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 1, Insightful) by Anonymous Coward on Friday September 25 2020, @06:28PM
that's all i really care about. what do i care if some huge scumbag company has faster chips otherwise?
(Score: 2) by shortscreen on Friday September 25 2020, @09:09PM (1 child)
So with a 50% IPC boost and 3GHz clock, the new ARM would match a 4.8GHz (or whatever it is) x86 CPU in single thread performance? Which means the current 2.5GHz ARM was already equivalent to a ~2.67GHz x86?
(Score: 2) by takyon on Friday September 25 2020, @11:10PM
https://www.anandtech.com/show/13959/arm-announces-neoverse-n1-platform/4 [anandtech.com]
Performance is comparable, to the point where adding 40-50% IPC could vault it over some x86 chips.
It doesn't really need to hit clock speeds that high because it's designed to have 64-128 cores (some designs will have 72 or 96 cores). All-core boosts for 28-core Xeon or 64-core Epyc don't exceed 4 GHz.
If the author is trying to say that a V1 will beat an i9-10900K or Zen 3 CPU in single-threaded workloads, good luck with that I guess.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: -1, Troll) by Anonymous Coward on Saturday September 26 2020, @03:54AM
V1 N2 sounds like a virus name.
ARM pandemic?
Social distancing between computers? Would it finally kill off all these "social media" cocksuckers?*
*Except for SN, of course. I love SN, especially fnord.