from the revived-competition dept.
Intel has announced the next family of Xeon processors that it plans to ship in the first half of next year. The new parts represent a substantial upgrade over current Xeon chips, with up to 48 cores and 12 DDR4 memory channels per socket, supporting up to two sockets.
These processors will likely be the top-end Cascade Lake processors; Intel is labelling them "Cascade Lake Advanced Performance," with a higher level of performance than the Xeon Scalable Processors (SP) below them. The current Xeon SP chips use a monolithic die, with up to 28 cores and 56 threads. Cascade Lake AP will instead be a multi-chip processor with multiple dies contained with in a single package. AMD is using a similar approach for its comparable products; the Epyc processors use four dies in each package, with each die having 8 cores.
The switch to a multi-chip design is likely driven by necessity: as the dies become bigger and bigger it becomes more and more likely that they'll contain a defect. Using several smaller dies helps avoid these defects. Because Intel's 10nm manufacturing process isn't yet good enough for mass market production, the new Xeons will continue to use a version of the company's 14nm process. Intel hasn't yet revealed what the topology within each package will be, so the exact distribution of those cores and memory channels between chips is as yet unknown. The enormous number of memory channels will demand an enormous socket, currently believed to be a 5903 pin connector.
Intel also announced tinier 4-6 core E-2100 Xeons with ECC memory support.
Meanwhile, AMD is holding a New Horizon event on Nov. 6, where it is expected to announce 64-core Epyc processors.
Related: AMD Epyc 7000-Series Launched With Up to 32 Cores
AVX-512: A "Hidden Gem"?
Intel's Skylake-SP vs AMD's Epyc
Intel Teases 28 Core Chip, AMD Announces Threadripper 2 With Up to 32 Cores
TSMC Will Make AMD's "7nm" Epyc Server CPUs
Intel Announces 9th Generation Desktop Processors, Including a Mainstream 8-Core CPU
AMD has launched its Ryzen-based take on x86 server processors to compete with Intel's Xeon CPUs. All of the Epyc 7000-series CPUs support 128 PCIe 3.0 lanes and 8 channels (2 DIMMs per channel) of DDR4-2666 DRAM:
A few weeks ago AMD announced the naming of the new line of enterprise-class processors, called EPYC, and today marks the official launch with configurations up to 32 cores and 64 threads per processor. We also got an insight into several features of the design, including the AMD Infinity Fabric.
Today's announcement of the AMD EPYC product line sees the launch of the top four CPUs, focused primarily at dual socket systems. The full EPYC stack will contain twelve processors, with three for single socket environments, with the rest of the stack being made available at the end of July. It is worth taking a few minutes to look at how these processors look under the hood.
On the package are four silicon dies, each one containing the same 8-core silicon we saw in the AMD Ryzen processors. Each silicon die has two core complexes, each of four cores, and supports two memory channels, giving a total maximum of 32 cores and 8 memory channels on an EPYC processor. The dies are connected by AMD's newest interconnect, the Infinity Fabric, which plays a key role not only in die-to-die communication but also processor-to-processor communication and within AMD's new Vega graphics. AMD designed the Infinity Fabric to be modular and scalable in order to support large GPUs and CPUs in the roadmap going forward, and states that within a single package the fabric is overprovisioned to minimize any issues with non-NUMA aware software (more on this later).
With a total of 8 memory channels, and support for 2 DIMMs per channel, AMD is quoting a 2TB per socket maximum memory support, scaling up to 4TB per system in a dual processor system. Each CPU will support 128 PCIe 3.0 lanes, suitable for six GPUs with full bandwidth support (plus IO) or up to 32 NVMe drives for storage. All the PCIe lanes can be used for IO devices, such as SATA drives or network ports, or as Infinity Fabric connections to other devices. There are also 4 IO hubs per processor for additional storage support.
AMD's slides at Ars Technica.
Imagine if we could use vector processing on something other than just floating point problems. Today, GPUs and CPUs work tirelessly to accelerate algorithms based on floating point (FP) numbers. Algorithms can definitely benefit from basing their mathematics on bits and integers (bytes, words) if we could just accelerate them too. FPGAs can do this, but the hardware and software costs remain very high. GPUs aren't designed to operate on non-FP data. Intel AVX introduced some support, and now Intel AVX-512 is bringing a great deal of flexibility to processors. I will share why I'm convinced that the "AVX512VL" capability in particular is a hidden gem that will let AVX-512 be much more useful for compilers and developers alike.
Fortunately for software developers, Intel has done a poor job keeping the "secret" that AVX-512 is coming to Intel's recently announced Xeon Scalable processor line very soon. Amazon Web Services has publically touted AVX-512 on Skylake as coming soon!
It is timely to examine the new AVX-512 capabilities and their ability to impact beyond the more regular HPC needs for floating point only workloads. The hidden gem in all this, which enables shifting to AVX-512 more easily, is the "VL" (vector length) extensions which allow AVX-512 instructions to behave like SSE or AVX/AVX2 instructions when that suits us. This is a clever and powerful addition to enable its adoption in a wider assortment of software more quickly. The VL extensions mean that programmers (and compilers) do not need to shift immediately from 256-bits (AVX/AVX2) to 512-bits to use the new bit/byte/word manipulations. This transitional benefit is useful not only for an interim, but also for applications which find 256-bits more natural (perhaps a small, but important, subset of problems).
Will it be enough to stave off "Epyc"?
We can continue to talk about Intel's excellent mesh topology and AMD strong new Zen architecture, but at the end of the day, the "how" will not matter to infrastructure professionals. Depending on your situation, performance, performance-per-watt, and/or performance-per-dollar are what matters.
The current Intel pricing draws the first line. If performance-per-dollar matters to you, AMD's EPYC pricing is very competitive for a wide range of software applications. With the exception of database software and vectorizable HPC code, AMD's EPYC 7601 ($4200) offers slightly less or slightly better performance than Intel's Xeon 8176 ($8000+). However the real competitor is probably the Xeon 8160, which has 4 (-14%) fewer cores and slightly lower turbo clocks (-100 or -200 MHz). We expect that this CPU will likely offer 15% lower performance, and yet it still costs about $500 more ($4700) than the best EPYC. Of course, everything will depend on the final server system price, but it looks like AMD's new EPYC will put some serious performance-per-dollar pressure on the Intel line.
The Intel chip is indeed able to scale up in 8 sockets systems, but frankly that market is shrinking fast, and dual socket buyers could not care less.
Meanwhile, although we have yet to test it, AMD's single socket offering looks even more attractive. We estimate that a single EPYC 7551P would indeed outperform many of the dual Silver Xeon solutions. Overall the single-socket EPYC gives you about 8 cores more at similar clockspeeds than the 2P Intel, and AMD doesn't require explicit cross socket communication - the server board gets simpler and thus cheaper. For price conscious server buyers, this is an excellent option.
However, if your software is expensive, everything changes. In that case, you care less about the heavy price tags of the Platinum Xeons. For those scenarios, Intel's Skylake-EP Xeons deliver the highest single threaded performance (courtesy of the 3.8 GHz turbo clock), high throughput without much (hardware) tuning, and server managers get the reassurance of Intel's reliable track record. And if you use expensive HPC software, you will probably get the benefits of Intel's beefy AVX 2.0 and/or AVX-512 implementations.
AMD's flagship Epyc CPU has 32 cores, while the largest Skylake-EP Xeon CPU has 28 cores.
Quoted text is from page 23, "Closing Thoughts".
[Ed. note: Article is multiple pages with no single page version in sight.]
Previously: Google Gets its Hands on Skylake-Based Intel Xeons
Intel Announces 4 to 18-Core Skylake-X CPUs
AMD Epyc 7000-Series Launched With Up to 32 Cores
Intel's Skylake and Kaby Lake CPUs Have Nasty Microcode Bug
AVX-512: A "Hidden Gem"?
AMD released Threadripper CPUs in 2017, built on the same 14nm Zen architecture as Ryzen, but with up to 16 cores and 32 threads. Threadripper was widely believed to have pushed Intel to respond with the release of enthusiast-class Skylake-X chips with up to 18 cores. AMD also released Epyc-branded server chips with up to 32 cores.
This week at Computex 2018, Intel showed off a 28-core CPU intended for enthusiasts and high end desktop users. While the part was overclocked to 5 GHz, it required a one-horsepower water chiller to do so. The demonstration seemed to be timed to steal the thunder from AMD's own news.
Now, AMD has announced two Threadripper 2 CPUs: one with 24 cores, and another with 32 cores. They use the "12nm LP" GlobalFoundries process instead of "14nm", which could improve performance, but are currently clocked lower than previous Threadripper parts. The TDP has been pushed up to 250 W from the 180 W TDP of Threadripper 1950X. Although these new chips match the core counts of top Epyc CPUs, there are some differences:
At the AMD press event at Computex, it was revealed that these new processors would have up to 32 cores in total, mirroring the 32-core versions of EPYC. On EPYC, those processors have four active dies, with eight active cores on each die (four for each CCX). On EPYC however, there are eight memory channels, and AMD's X399 platform only has support for four channels. For the first generation this meant that each of the two active die would have two memory channels attached – in the second generation Threadripper this is still the case: the two now 'active' parts of the chip do not have direct memory access.
This also means that the number of PCIe lanes remains at 64 for Threadripper 2, rather than the 128 of Epyc.
Threadripper 1 had a "game mode" that disabled one of the two active dies, so it will be interesting to see if users of the new chips will be forced to disable even more cores in some scenarios.
AMD CEO Lisa Su has announced that second-generation "Rome" EPYC CPU that the company is wrapping up work on is being produced out at TSMC. This is a notable departure from how things have gone for AMD with the Zen 1 generation, as GlobalFoundries has produced all of AMD's Zen CPUs, both for consumer Ryzen and professional EPYC parts.
[...] As it stands, AMD seems rather optimistic about how things are currently going. Rome silicon is already back in the labs, and indeed AMD is already sampling the parts to certain partners for early validation. Which means AMD remains on track to launch their second-generation EPYC processors in 2019.
[...] Ultimately however if they are meeting their order quota from GlobalFoundries, then AMD's situation is ultimately much more market driven: which fab can offer the necessary capacity and performance, and at the best prices. Which will be an important consideration as GlobalFoundries has indicated that it may not be able to keep up with 7nm demand, especially with the long manufacturing process their first-generation DUV-based 7nm "7LP" process requires.
Related: TSMC Holds Groundbreaking Ceremony for "5nm" Fab, Production to Begin in 2020
Cray CS500 Supercomputers to Include AMD's Epyc as a Processor Option
AMD Returns to the Datacenter, Set to Launch "7nm" Radeon Instinct GPUs for Machine Learning in 2018
AMD Ratcheting Up the Pressure on Intel
More on AMD's Licensing of Epyc Server Chips to Chinese Companies
Among many of Intel's announcements today, a key one for a lot of users will be the launch of Intel's 9th Generation Core desktop processors, offering up to 8-cores on Intel's mainstream consumer platform. These processors are drop-in compatible with current Coffee Lake and Z370 platforms, but are accompanied by a new Z390 chipset and associated motherboards as well. The highlights from this launch is the 8-core Core i9 parts, which include a 5.0 GHz turbo Core i9-9900K, rated at a 95W TDP.
[...] Leading from the top of the stack is the Core i9-9900K, Intel's new flagship mainstream processor. This part is eight full cores with hyperthreading, with a base frequency of 3.6 GHz at 95W TDP, and a turbo up to 5.0 GHz on two cores. Memory support is up to dual channel DDR4-2666. The Core i9-9900K builds upon the Core i7-8086K from the 8th Generation product line by adding two more cores, and increasing that 5.0 GHz turbo from one core to two cores. The all-core turbo is 4.7 GHz, so it will be interesting to see what the power consumption is when the processor is fully loaded. The Core i9 family will have the full 2MB of L3 cache per core.
[...] Also featuring 8-cores is the Core i7-9700K, but without the hyperthreading. This part will have a base frequency of 3.6 GHz as well for a given 95W TDP, but can turbo up to 4.9 GHz only on a single core. The i7-9700K is meant to be the direct upgrade over the Core i7-8700K, and although both chips have the same underlying Coffee Lake microarchitecture, the 9700K has two more cores and slightly better turbo performance, but less L3 cache per core at only 1.5MB per.
AMD has announced the next generation of its Epyc server processors, with up to 64 cores (128 threads) each. Instead of an 8-core "core complex" (CCX), AMD's 64-core chips will feature 8 "chiplets" with 8 cores each:
AMD on Tuesday formally announced its next-generation EPYC processor code-named Rome. The new server CPU will feature up to 64 cores featuring the Zen 2 microarchitecture, thus providing at least two times higher performance per socket than existing EPYC chips.
As discussed in a separate story covering AMD's new 'chiplet' design approach, AMD EPYC 'Rome' processor will carry multiple CPU chiplets manufactured using TSMC's 7 nm fabrication process as well as an I/O die produced at a 14 nm node. As it appears, high-performance 'Rome' processors will use eight CPU chiplets offering 64 x86 cores in total.
Separating CPU chiplets from the I/O die has its advantages because it enables AMD to make the CPU chiplets smaller as physical interfaces (such as DRAM and Infinity Fabric) do not scale that well with shrinks of process technology. Therefore, instead of making CPU chiplets bigger and more expensive to manufacture, AMD decided to incorporate DRAM and some other I/O into a separate chip. Besides lower costs, the added benefit that AMD is going to enjoy with its 7 nm chiplets is ability to easier[sic] bin new chips for needed clocks and power, which is something that is hard to estimate in case of servers.
AMD also announced that Zen 4 is under development. It could be made on a "5nm" node, although that is speculation. The Zen 3 microarchitecture will be made on TSMC's N7+ process ("7nm" with more extensive use of extreme ultraviolet lithography).
AMD's Epyc CPUs will now be offered on Amazon Web Services.
AnandTech live blog of New Horizon event.
Intel has enjoyed a virtual monopoly in the server CPU arena for some time. However, AMD's EPYC series of processors, based on the latest iteration of Zen architecture, may change that. The first generation of these chipsets, Naples, managed to reduce Intel's market share to 99% shortly after its launch. This may sound less than impressive, but in a billion-dollar industry, it was possibly quite valuable to AMD.
The latest report on the server market by DRAMeXchange indicates that Intel's share is down to 98% by now. This represents a 100% improvement for AMD. Furthermore, the analysts estimate that the release of EPYC Rome-based silicon will result in further gains. They will ultimately result in a total market share of 5% for these CPUs by the end of 2019.
Intel is keeping AMD under 15%. For now:
Now it's easy to tell that Intel will still remain the dominant player in the market, retaining a 90-95% market share lead over AMD but Intel's Ex-CEO, Brian Krzanich, stated that his company wouldn't want AMD capturing 15-20% server market share. In fact, at the pace at which AMD is gaining their server market share, 15% doesn't really feel like a far cry from now.
[...] Looking at the market penetration rate, Intel's Purley platform has been adopted by 60% users in the server space and is expected to reach 65% in the coming year. On the other hand, AMD's EPYC Naples platform has been adopted by 70% and considering that AMD is keeping socket longevity intact with Rome, we can see the adoption rate further expanding after 7nm chips launch.
Previously: AMD Misses Q1 Earnings Target; Withdraws from High-Density Server Market
AMD Ratcheting Up the Pressure on Intel
More on AMD's Licensing of Epyc Server Chips to Chinese Companies
AMD's server marketshare hits 1% for the first time in 4 years
Intel has teased* plans to return to the discrete graphics market in 2020. Now, some of those plans have leaked. Intel's Xe branded GPUs will apparently use an architecture capable of scaling to "any number" of GPUs that are connected by a multi-chip module (MCM). The "e" in Xe is meant to represent the number of GPU dies, with one of the first products being called X2/X2:
Developers won't need to worry about optimizing their code for multi-GPU, the OneAPI will take care of all that. This will also allow the company to beat the foundry's usual lithographic limit of dies that is currently in the range of ~800mm2. Why have one 800mm2 die when you can have two 600mm2 dies (the lower the size of the die, the higher the yield) or four 400mm2 ones? Armed with One API and the Xe macroarchitecture Intel plans to ramp all the way up to Octa GPUs by 2024. From this roadmap, it seems like the first Xe class of GPUs will be X2.
The tentative timeline for the first X2 class of GPUs was also revealed: June 31st, 2020. This will be followed by the X4 class sometime in 2021. It looks like Intel plans to add two more cores [dies] every year so we should have the X8 class by 2024. Assuming Intel has the scaling solution down pat, it should actually be very easy to scale these up. The only concern here would be the packaging yield – which Intel should be more than capable of handling and binning should take care of any wastage issues quite easily. Neither NVIDIA nor AMD have yet gone down the MCM path and if Intel can truly deliver on this design then the sky's the limit.
AMD has made extensive use of MCMs in its Zen CPUs, but will reportedly not use an MCM-based design for its upcoming Navi GPUs. Nvidia has published research into MCM GPUs but has yet to introduce products using such a design.
Intel will use an MCM for its upcoming 48-core "Cascade Lake" Xeon CPUs. They are also planning on using "chiplets" in other CPUs and mixing big and small CPU cores and/or cores made on different process nodes.
*Previously: Intel Planning a Return to the Discrete GPU Market, Nvidia CEO Responds
Intel Discrete GPU Planned to be Released in 2020
Intel Announces "Sunny Cove", Gen11 Graphics, Discrete Graphics Brand Name, 3D Packaging, and More