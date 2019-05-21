Ampere will switch from using Arm's "Neoverse" cores to custom ARM cores developed in-house:
Today's big reveal comes in regard to the microarchitecture choices that Ampere is going to be using starting in their next generation 2022 "Siryn" design, successor to the Altra Max, and relates to the CPU IP being used:
Starting with Siryn, Ampere will be switching over from Arm's Neoverse cores to their new in-house full custom CPU microarchitecture. This announcement admittedly caught us completely off-guard, as we had largely expected Ampere to continue to be using Arm's Neoverse cores for the foreseeable future. The switch to a new full custom microarchitecture puts Ampere on a completely different trajectory than we had initially expected from the company.
In fact, Ampere explains that what the move towards a full custom microarchitecture core design was actually always the plan for the company since its inception, and their custom CPU design had been in the works for the past 3+ years.
[...] Ampere's explanation and rationale for designing a full custom core from the ground up, is that they are claiming they are able to achieve better performance and better power efficiency in datacentre workloads compared to what Arm's Neoverse "more general purpose" designs are able to achieve. This is quite an interesting claim to make, and contrasts Arm's projections and goals for their Neoverse cores. The recent Neoverse V1 and N2 cores were unveiled in more detail last month and are claimed to achieve significant generational IPC gains.
ARM CPU vendor Ampere announced an 80-core CPU called the Altra on Tuesday. If the core count didn't clue you in already, the Altra is aimed at data-center computing rather than home or even typical business needs. The Altra's 80 cores do not offer hyperthreading, so 80 cores here means 80 threads as well.
Before we go into too much detail about the Altra—which is currently sampling but is not yet generally available and does not have any third-party benchmarks—it's instructive to take a look slightly backward to its little sibling, the 32-core eMAG 8180.
The Altra is not Ampere's first entry into data-center ARM computing. Its last processor, the eMAG 8180, is a 32-core part running at up to 3.3GHz turbo. The eMAG 8180 is available in packet.net's c2.large.arm package, in the form of Lenovo's ThinkSystem HR330A 1u single-socket systems.
Kinvolk, a Berlin-based Linux development company, did some pretty extensive benchmarking of a single-socket eMAG 8180 system—comparing it to a 24-core AMD Epyc 7401P (24c/48t) and a dual-socket Xeon Gold 5120 (28c/56t total).
[...] Like the eMAG, the Altra does not offer SMT (Simultaneous Multi Threading), so its 80 cores mean 80 threads. Unlike the eMAG, the Altra is designed for either single or dual-socket operation—so we can expect to see 160-core Altra-powered systems later in 2020. We know that there will be multiple SKUs, with a TDP range the data sheet specifies at 45W to 210W. But we don't know their individual details.
Ampere's Product List: 80 Cores, up to 3.3 GHz at 250 W; 128 Core in Q4
The Ampere Altra range, as part of today's release, will offer parts from 32 cores up to 80 cores, up to 3.3 GHz, with a variety of TDPs up to 250 W. As we've described in our previous news items on the chip, this is an Arm v8.2 core with a few 8.3+8.5 features, offers support for FP16 and INT8, supports 8 channels of DDR4-3200 ECC at 2 DIMMs per channel, and up to 4 TiB of memory per socket in a 1P or 2P configuration. Each CPU will offer 128 PCIe 4.0 lanes, 32 of which can be used for socket-to-socket communications implemented with the CCIX protocol over PCIe. This means 50 GB/s in each direction, and 192 PCIe 4.0 lanes in a dual socket system for add-in cards. Each of the PCIe lanes can bifurcate down to x2.
[...] Previously Ampere had stated they were going for 80 cores at 3.0 GHz at 210 W, however the Q80-33 is pushing that frequency another 300 MHz for another 40 W, and we understand that the tapeout of silicon from TSMC performed better than expected, hence this new top processor.
[...] If that wasn't enough, Ampere dropped a sizeable nugget into our pre-announcement briefing. The company is set to launch a 128-core version of Altra later this year.
This will be a new silicon design, beyond Ampere's initial layout of 80 cores for Altra, however Ampere states that while they are using the same platform as the regular Altra, they have done extensive tweaking and optimizations within the mesh interconnect for Altra Max to hide the additional contention that might occur when using the same main memory speeds.
Altra Max will be socket and pin-compatible with Altra, also support dual socket deployments, and Ampere states that the silicon will be ready for early sampling with partners in Q4, and is looking to move into high volume in mid-2021.
Arm Announces Neoverse V1 & N2 Infrastructure CPUs: +50% IPC, SVE Server Cores
Amazon's Graviton2 64-core Neoverse N1 server chip is the first of what should become a wider range of designs that will be driving the Arm server ecosystem forward and actively assaulting the infrastructure CPU market share that's currently dominated by the x86 players such as Intel and AMD.
[...] Today, we're ready to take the next step towards the next generation of the Neoverse platform, not only revealing the CPU microarchitecture previously known as Zeus, but a whole new product category that goes beyond the Neoverse N-series: Introducing the new Neoverse V-series and the Neoverse V1 (Zeus), as well as a new roadmap insertion in the form of the Neoverse N2 (Perseus).
[...] In terms of generational performance uplift, it's akin to Arm throwing down the gauntlet to the competition, achieving a ground-breaking +50[sic, % obvs] IPC boost compared to Neoverse N1 that we're seeing in silicon today. The performance uplift potential here is tremendous, as this is merely a same-process ISO-frequency upgrade, and actual products based on the V1 will also in all likelihood also see additional performance gains thanks to increased frequencies through process node advancements.
If we take the conservatively clocked Graviton2 with its 2.5GHz N1 cores as a baseline, a theoretical 3GHz V1 chip would represent an 80% uplift in per-core single-threaded performance. Not only would such a performance uptick vastly exceed any current x86 competition in the server space in terms of per-core performance, it would be enough to match the current best high-performance desktop chips from AMD and Intel today (Though we have to remember it'll compete against next-gen Zen3 Milan and Sapphire Rapids products).
[...] Alongside the Neoverse V1 platform, we've seen a roadmap insertion that previously wasn't there. The Perseus design will become the Neoverse N2, and will be the effective product-positioning successor to the N1. This new CPU IP represents a 40% IPC uplift compared to the N1, however still maintains the same design philosophy of maximising performance within the lowest power and smallest area.
Neoverse V1 is basically the server-oriented equivalent of the Cortex-X1 core, where performance is prioritized at the cost of less power efficiency and a greater die area (more cache, etc.). Neoverse N2 is more like (an unannounced successor of) Cortex-A78.
Ampere Altra Performance Shows It Can Compete With - Or Even Outperform - AMD EPYC & Intel Xeon
While the talk in recent weeks has been about the performance of Apple's M1 ARM chip and then rumors there might be a 32 core chip in the pipe, there is already something much stronger: Ampere Altra has begun shipping and its flagship 80-core SoC with up to two sockets per server can easily take on the AMD EPYC 7742 "Rome" and Intel Xeon Platinum 8280 "Cascade Lake" performance across a variety of workloads. Here is our initial look at the Ampere Altra performance on Linux in our independent performance benchmarks.
[...] Prior to receiving the Ampere Altra Mount Jade server and prior to seeing the performance potential with Apple's M1 chip on the desktop side, I figured the Ampere Altra performance would be like that of prior ARM server chips where in best case scenarios may put up a good fight against Intel/AMD but not outright exceed in both raw performance and performance-per-Watt for a variety of workloads. After seeing the results I was very surprised with how well the Ampere Altra Q80-33 2P performance is against the Xeon Platinum 8280 and EPYC 7742 servers. The performance exceeded my expectations where the Ampere Altra was able to collect wins in not only the performance-per-Watt but in the raw performance as well. Aside from software not yet optimized for the AArch64 architecture, the worst case was generally the Ampere Altra coming a bit behind the x86_64 competition but even then it enjoyed much lower power consumption than the x86_64 processors tested.
