from the an-"Epyc"-battle dept.
We can continue to talk about Intel's excellent mesh topology and AMD strong new Zen architecture, but at the end of the day, the "how" will not matter to infrastructure professionals. Depending on your situation, performance, performance-per-watt, and/or performance-per-dollar are what matters.
The current Intel pricing draws the first line. If performance-per-dollar matters to you, AMD's EPYC pricing is very competitive for a wide range of software applications. With the exception of database software and vectorizable HPC code, AMD's EPYC 7601 ($4200) offers slightly less or slightly better performance than Intel's Xeon 8176 ($8000+). However the real competitor is probably the Xeon 8160, which has 4 (-14%) fewer cores and slightly lower turbo clocks (-100 or -200 MHz). We expect that this CPU will likely offer 15% lower performance, and yet it still costs about $500 more ($4700) than the best EPYC. Of course, everything will depend on the final server system price, but it looks like AMD's new EPYC will put some serious performance-per-dollar pressure on the Intel line.
The Intel chip is indeed able to scale up in 8 sockets systems, but frankly that market is shrinking fast, and dual socket buyers could not care less.
Meanwhile, although we have yet to test it, AMD's single socket offering looks even more attractive. We estimate that a single EPYC 7551P would indeed outperform many of the dual Silver Xeon solutions. Overall the single-socket EPYC gives you about 8 cores more at similar clockspeeds than the 2P Intel, and AMD doesn't require explicit cross socket communication - the server board gets simpler and thus cheaper. For price conscious server buyers, this is an excellent option.
However, if your software is expensive, everything changes. In that case, you care less about the heavy price tags of the Platinum Xeons. For those scenarios, Intel's Skylake-EP Xeons deliver the highest single threaded performance (courtesy of the 3.8 GHz turbo clock), high throughput without much (hardware) tuning, and server managers get the reassurance of Intel's reliable track record. And if you use expensive HPC software, you will probably get the benefits of Intel's beefy AVX 2.0 and/or AVX-512 implementations.
AMD's flagship Epyc CPU has 32 cores, while the largest Skylake-EP Xeon CPU has 28 cores.
Quoted text is from page 23, "Closing Thoughts".
[Ed. note: Article is multiple pages with no single page version in sight.]
Previously: Google Gets its Hands on Skylake-Based Intel Xeons
Intel Announces 4 to 18-Core Skylake-X CPUs
AMD Epyc 7000-Series Launched With Up to 32 Cores
Intel's Skylake and Kaby Lake CPUs Have Nasty Microcode Bug
AVX-512: A "Hidden Gem"?
As part of an ongoing effort to differentiate its public cloud services, Google made good this week on its intention to bring custom Xeon Skylake chips from Intel Corp. to its Google Compute Engine. The cloud provider is the first to offer the next-gen Xeons, and is getting access ahead of traditional server-makers like Dell and HPE.
Google announced plans to incorporate the next-generation Intel server chips into its public could last November. On Friday (Feb. 24), Urs Hölzle, Google's senior vice president for cloud infrastructure, said the Skylake upgrade would deliver a significant performance boost for demanding applications and workloads ranging from genomic research to machine learning.
The cloud vendor noted that Skylake includes Intel Advanced Vector Extensions (AVX-512) that target workloads such as data analytics, engineering simulations and scientific modeling. When compared to previous generations, the Skylake extensions are touted as doubling floating-point performance "for the heaviest calculations," Hölzle noted in a blog post.
Recently, Intel was rumored to be releasing 10 and 12 core "Core i9" CPUs to compete with AMD's 10-16 core "Threadripper" CPUs. Now, Intel has confirmed these as well as 14, 16, and 18 core Skylake-X CPUs. Every CPU with 6 or more cores appears to support quad-channel DDR4:
|i7-7640X||4/4||$242||$61 (less threads)|
Last year at Computex, the flagship Broadwell-E enthusiast chip was launched: the 10-core i7-6950X at $1,723. Today at Computex, the 10-core i9-7900X costs $999, and the 16-core i9-7960X costs $1,699. Clearly, AMD's Ryzen CPUs have forced Intel to become competitive.
Although the pricing of AMD's 10-16 core Threadripper CPUs is not known yet, the 8-core Ryzen R7 launched at $500 (available now for about $460). The Intel i7-7820X has 8 cores for $599, and will likely have better single-threaded performance than the AMD equivalent. So while Intel's CPUs are still more expensive than AMD's, they may have similar price/performance.
For what it's worth, Intel also announced quad-core Kaby Lake-X processors.
AMD has launched its Ryzen-based take on x86 server processors to compete with Intel's Xeon CPUs. All of the Epyc 7000-series CPUs support 128 PCIe 3.0 lanes and 8 channels (2 DIMMs per channel) of DDR4-2666 DRAM:
A few weeks ago AMD announced the naming of the new line of enterprise-class processors, called EPYC, and today marks the official launch with configurations up to 32 cores and 64 threads per processor. We also got an insight into several features of the design, including the AMD Infinity Fabric.
Today's announcement of the AMD EPYC product line sees the launch of the top four CPUs, focused primarily at dual socket systems. The full EPYC stack will contain twelve processors, with three for single socket environments, with the rest of the stack being made available at the end of July. It is worth taking a few minutes to look at how these processors look under the hood.
On the package are four silicon dies, each one containing the same 8-core silicon we saw in the AMD Ryzen processors. Each silicon die has two core complexes, each of four cores, and supports two memory channels, giving a total maximum of 32 cores and 8 memory channels on an EPYC processor. The dies are connected by AMD's newest interconnect, the Infinity Fabric, which plays a key role not only in die-to-die communication but also processor-to-processor communication and within AMD's new Vega graphics. AMD designed the Infinity Fabric to be modular and scalable in order to support large GPUs and CPUs in the roadmap going forward, and states that within a single package the fabric is overprovisioned to minimize any issues with non-NUMA aware software (more on this later).
With a total of 8 memory channels, and support for 2 DIMMs per channel, AMD is quoting a 2TB per socket maximum memory support, scaling up to 4TB per system in a dual processor system. Each CPU will support 128 PCIe 3.0 lanes, suitable for six GPUs with full bandwidth support (plus IO) or up to 32 NVMe drives for storage. All the PCIe lanes can be used for IO devices, such as SATA drives or network ports, or as Infinity Fabric connections to other devices. There are also 4 IO hubs per processor for additional storage support.
AMD's slides at Ars Technica.
Arthur T Knackerbracket has found the following story:
During April and May, Intel started updating processor documentation with a new errata note, and over the weekend we learned why: Skylake and Kaby Lake silicon has a microcode bug.
The errata is described in detail on the Debian mailing list, and affects Skylake and Kaby Lake Intel Core processors (in desktop, high-end desktop, embedded and mobile platforms), Xeon v5 and v6 server processors, and some Pentium models.
The Debian advisory says affected users need to disable hyper-threading "immediately" in their BIOS or UEFI settings, because the processors can "dangerously misbehave when hyper-threading is enabled."
Symptoms can include "application and system misbehaviour, data corruption, and data loss".
Henrique de Moraes Holschuh, who authored the Debian post, notes that all operating systems, not only Linux, are subject to the bug.
Imagine if we could use vector processing on something other than just floating point problems. Today, GPUs and CPUs work tirelessly to accelerate algorithms based on floating point (FP) numbers. Algorithms can definitely benefit from basing their mathematics on bits and integers (bytes, words) if we could just accelerate them too. FPGAs can do this, but the hardware and software costs remain very high. GPUs aren't designed to operate on non-FP data. Intel AVX introduced some support, and now Intel AVX-512 is bringing a great deal of flexibility to processors. I will share why I'm convinced that the "AVX512VL" capability in particular is a hidden gem that will let AVX-512 be much more useful for compilers and developers alike.
Fortunately for software developers, Intel has done a poor job keeping the "secret" that AVX-512 is coming to Intel's recently announced Xeon Scalable processor line very soon. Amazon Web Services has publically touted AVX-512 on Skylake as coming soon!
It is timely to examine the new AVX-512 capabilities and their ability to impact beyond the more regular HPC needs for floating point only workloads. The hidden gem in all this, which enables shifting to AVX-512 more easily, is the "VL" (vector length) extensions which allow AVX-512 instructions to behave like SSE or AVX/AVX2 instructions when that suits us. This is a clever and powerful addition to enable its adoption in a wider assortment of software more quickly. The VL extensions mean that programmers (and compilers) do not need to shift immediately from 256-bits (AVX/AVX2) to 512-bits to use the new bit/byte/word manipulations. This transitional benefit is useful not only for an interim, but also for applications which find 256-bits more natural (perhaps a small, but important, subset of problems).
Will it be enough to stave off "Epyc"?
AMD's Threadripper 1950X (TR 1950X?) will have 16 cores for $1,000, and the Threadripper 1920X will have 12 cores for $800. They will be available in early August:
Last night out of the blue, we received an email from AMD, sharing some of the specifications for the forthcoming Ryzen Threadripper CPUs to be announced today. Up until this point, we knew a few things – Threadripper would consist of two Zeppelin dies featuring AMD's latest Zen core and microarchitecture, and would essentially double up on the HEDT Ryzen launch. Double dies means double pretty much everything: Threadripper would support up to 16 cores, up to 32 MB of L3 cache, quad-channel memory support, and would require a new socket/motherboard platform called X399, sporting a massive socket with 4094-pins (and also marking an LGA socket for AMD). By virtue of being sixteen cores, AMD is seemingly carving a new consumer category above HEDT/High-End Desktop, which we've coined the 'Super High-End Desktop', or SHED for short.
[...] From what we do know, 16 Zen cores at $999 is about the ballpark price we were expecting. With the clock speeds of 3.4 GHz base and 4 GHz Turbo, this is essentially two Ryzen 7 1800X dies at $499 each stuck together, creating the $999 price (obviously it's more complicated than this). Given the frequencies and the performance of these dies, the TDP is likely in the 180W range; seeing as how the Ryzen 7 1800X was a 95W CPU with slightly higher frequencies. The 1950X runs at 4.0 GHz turbo and also has access to AMD's XFR – which will boost the processor when temperature and power allows – in jumps of +25 MHz: AMD would not comment on the maximum frequency boost of XFR, though given our experiences of the Ryzen silicon and previous Ryzen processor specifications, this is likely to be +100 MHz. We were not told if the CPUs would come with a bundled CPU cooler, although if our 180W prediction is in the right area, then substantial cooling would be needed. We expect AMD to use the same Indium-Tin solder as the Ryzen CPUs, although we were unable to get confirmation at this at this time.
[...] Comparing the two, and what we know, AMD is going to battle on many fronts. Coming in at $999 is going to be aggressive, along with an all-core turbo at 3.4 GHz or above: Intel's chip at $1999 will likely turbo below this. Both chips will have quad-channel DRAM, supporting DDR4-2666 in 1 DIMM per channel mode (and DDR4-2400 in 2 DPC), but there are some tradeoffs. Intel Core parts do not support ECC, and AMD Threadripper parts are expected to (awaiting confirmation). Intel has the better microarchitecture in terms of pure IPC, though it will be interesting to see the real-world difference if AMD is clocked higher. AMD Threadripper processors will have access to 60 lanes of PCIe for accelerators, such as GPUs, RAID cards and other functions, with another 4 reserved by the chipset: Intel will likely be limited to 44 for accelerators but have a much better chipset in the X299 for IO support and capabilities. We suspect AMD to run a 180W TDP, and Intel at 165W, giving a slight advantage to Intel perhaps (depending on workload), and Intel will also offer AVX512 support for its CPU whereas AMD has smaller FMA and AVX engines by comparison. The die-to-die latency of AMD's MCM will also be an interesting element to the story, depending exactly where AMD is aiming this product.
There's also some details for Ryzen 3 quad-cores, but no confirmed pricing yet.
Meanwhile, Intel's marketing department has badmouthed AMD, calling 32-core Naples server chips "4 glued-together desktop die". That could have something to do with AMD's chips matching Intel's performance on certain workloads at around half the price.
Previously: CPU Rumor Mill: Intel Core i9, AMD Ryzen 9, and AMD "Starship"
Intel Announces 4 to 18-Core Skylake-X CPUs
Intel Core i9-7900X Reviewed: Hotter and More Expensive than AMD Ryzen 1800X for Small Gains
AMD Epyc 7000-Series Launched With Up to 32 Cores
A new update to the Intel document for software developers indicates that the company will begin to introduce various AVX-512 instruction set extensions to its consumer CPUs soon. This will start from the codenamed Cannon Lake (CNL) and Ice Lake (ICL) processors, made using 10 nm process technologies. The new extensions will enable future chips to improve performance in certain applications. One of the main questions on AVX-512 is which consumer programs will actually support the AVX-512 when these CNL and ICL processors hit the market. In addition to the AVX-512, the upcoming processors will introduce a host of other new non-AVX-512 instructions.
According to the Intel Architecture Instruction Set Extensions and Future Features Programming Reference document, Intel's Cannon Lake CPUs will support AVX512F, AVX512CD, AVX512DQ, AVX512BW, and AVX512VL. This will bring the feature set of these CPUs to the current level of the Skylake-SP based processors. In addition, the Cannon Lake microarchitecture will support the AVX512_IFMA and AVX512_VBMI commands, but at this point, it is unclear whether the support will be limited to servers, or will also be featured in the consumer processors (the latter scenario is likely based on the document wording, but remains unclear).
Intel originally promised to release Cannon Lake processors in 2016 – 2017 timeframe, but delayed introduction of its 10 nm process technology to 2018, thus postponing the CPU launch as well. Initially it was expected that the Cannon Lake CPUs would generally resemble the Kaby Lake and Coffee Lake chips with some refinements, but the addition of the AVX-512 support means a rather tangible architecture improvement. For AVX-512, large the[sic] chunks of data require massive memory bandwidth, which the Skylake-SP cores get due to large caches and more memory controllers. Keeping in mind memory bandwidth and power consumption factors, the AVX-512 might not be supported by all Cannon Lake client CPUs, but only by those aimed at higher-performance machines (i.e., no AVX-512 for ULP mobile parts as well as entry-level desktop SKUs, but this is [speculation] at this point). Meanwhile, [the] good news is that by the time AVX-512-supporting Cannon Lake processors arrive, programs for client PCs that take advantage of the latest extensions will likely be available.
Cray is adding an AMD processor option to its CS500 line of clustered supercomputers.
The CS500 supports more than 11,000 nodes which can use Intel Xeon SP CPUs, optionally accelerated by Nvidia Tesla GPUs or Intel Phi co-processors. Intel Stratix FPGA acceleration is also supported.
There can be up to 72 nodes in a rack, interconnected by EDR/FDR InfiniBand or Intel's OmniPath fabric.
Cray has now added an AMD Epyc 7000 option to the CPU mix:
- Systems provide four dual-socket nodes in a 2U chassis
- Each node supports two PCIe 3.0 x 16 slots (200Gb network capability) and HDD/SSD options
- Epyc 7000 processors support up to 32 cores and eight DDR4 memory channels per socket
Top-of-the-line Epyc chips have 32 cores and 64 threads. An upcoming generation of 7nm Epyc chips is rumored to have up to 48 or 64 cores, using 6 or 8 cores per Core Complex (CCX) instead of the current 4.
Intel has announced the next family of Xeon processors that it plans to ship in the first half of next year. The new parts represent a substantial upgrade over current Xeon chips, with up to 48 cores and 12 DDR4 memory channels per socket, supporting up to two sockets.
These processors will likely be the top-end Cascade Lake processors; Intel is labelling them "Cascade Lake Advanced Performance," with a higher level of performance than the Xeon Scalable Processors (SP) below them. The current Xeon SP chips use a monolithic die, with up to 28 cores and 56 threads. Cascade Lake AP will instead be a multi-chip processor with multiple dies contained with in a single package. AMD is using a similar approach for its comparable products; the Epyc processors use four dies in each package, with each die having 8 cores.
The switch to a multi-chip design is likely driven by necessity: as the dies become bigger and bigger it becomes more and more likely that they'll contain a defect. Using several smaller dies helps avoid these defects. Because Intel's 10nm manufacturing process isn't yet good enough for mass market production, the new Xeons will continue to use a version of the company's 14nm process. Intel hasn't yet revealed what the topology within each package will be, so the exact distribution of those cores and memory channels between chips is as yet unknown. The enormous number of memory channels will demand an enormous socket, currently believed to be a 5903 pin connector.
Intel also announced tinier 4-6 core E-2100 Xeons with ECC memory support.
Meanwhile, AMD is holding a New Horizon event on Nov. 6, where it is expected to announce 64-core Epyc processors.
Related: AMD Epyc 7000-Series Launched With Up to 32 Cores
AVX-512: A "Hidden Gem"?
Intel's Skylake-SP vs AMD's Epyc
Intel Teases 28 Core Chip, AMD Announces Threadripper 2 With Up to 32 Cores
TSMC Will Make AMD's "7nm" Epyc Server CPUs
Intel Announces 9th Generation Desktop Processors, Including a Mainstream 8-Core CPU