from the core-elation? dept.
Intel published a developer guide confirming details of its upcoming Alder Lake processors.
Desktop "Alder Lake-S" processors will include up to 8 "Golden Cove" performance cores (P-cores), 8 "Gracemont" (Atom) efficiency cores (E-cores), and 32 graphics execution units (Gen 12.2 EUs). A smaller die will include only up to 6 P-cores and no E-cores, to be used in lower-end products such as a 6-core Intel Core i5-12400 or a quad-core i3.
Mobile "Alder Lake-P" processors will include up to 6 P-cores, 8 E-cores, and 96 graphics EUs. A smaller "ultra mobile" die will include up to 2 P-cores and 8 E-cores.
AVX-512 is physically present on Golden Cove cores, but disabled in Alder Lake.
The guide mainly focuses on software implementations for hybrid CPUs. It provides various optimization strategies for Alder Lake, including lack of optimization, a "Good Scenario", and the "Best Scenario". According to the document, lack of optimization will not mean that the CPU will be unable to distribute workloads for hybrid CPUs, which should be handled by ThreadDirector anyway, but some may be distributed to the wrong types of cores, should the scheduling algorithm not recognize the task.
In the "Good Scenario," Intel assumes that the application will be aware of the hybrid architecture. The primary tasks should target Performance cores, whereas non-essential and background threads with lower priority should target Effcieent cores.
The "Best Scenario" goes into further detail about which workloads specifically should target Efficient cores: Shader Compilation, Audio Mixing, Asset Streaming, Decompression, Any other non-critical work.
Intel's Thread Director combines a microcontroller with software-based scheduling:
Intel's Thread Director controller puts an embedded microcontroller inside the processor such that it can monitor what each thread is doing and what it needs out of its performance metrics. It will look at the ratio of loads, stores, branches, average memory access times, patterns, and types of instructions. It then provides suggested hints back to the Windows 11 OS scheduler about what the thread is doing, whether it is important or not, and it is up to the OS scheduler to combine that with other information about the system as to where that thread should go. Ultimately the OS is both topologically aware and now workload aware to a much higher degree.
Inside the microcontroller as part of Thread Director, it monitors which instructions are power hungry, such as AVX-VNNI (for machine learning) or other AVX2 commands that often draw high power, and put a big flag on those for the OS for prioritization. It also looks at other threads in the system and if a thread needs to be demoted, either due to not having enough free P-cores or for power/thermal reasons, it will give hints to the OS as to which thread is best to move. Intel states that it can profile a thread in as little as 30 microseconds, whereas a traditional OS scheduler may take 100s of milliseconds to make the same conclusion (or the wrong one).
On top of this, Intel says that Thread Director can also optimize for frequency. If a thread is limited in a way other than frequency, it can detect this and reduce frequency, voltage, and power. This will help the mobile processors, and when asked Intel stated that it can change frequency now in microseconds rather than milliseconds.
[...] On the question of Linux, Intel only went as far to say that Windows 11 was the priority, and they're working upstreaming a variety of features in the Linux kernel but it will take time. An Intel spokesperson said more details closer to product launch, however these things will take a while, perhaps months and years, to get to a state that could be feature-parity equivalent with Windows 11.
Upcoming Intel processors will support scalable AVX-512 instructions, which one former Intel employee calls a "hidden gem":
Imagine if we could use vector processing on something other than just floating point problems. Today, GPUs and CPUs work tirelessly to accelerate algorithms based on floating point (FP) numbers. Algorithms can definitely benefit from basing their mathematics on bits and integers (bytes, words) if we could just accelerate them too. FPGAs can do this, but the hardware and software costs remain very high. GPUs aren't designed to operate on non-FP data. Intel AVX introduced some support, and now Intel AVX-512 is bringing a great deal of flexibility to processors. I will share why I'm convinced that the "AVX512VL" capability in particular is a hidden gem that will let AVX-512 be much more useful for compilers and developers alike.
Fortunately for software developers, Intel has done a poor job keeping the "secret" that AVX-512 is coming to Intel's recently announced Xeon Scalable processor line very soon. Amazon Web Services has publically touted AVX-512 on Skylake as coming soon!
It is timely to examine the new AVX-512 capabilities and their ability to impact beyond the more regular HPC needs for floating point only workloads. The hidden gem in all this, which enables shifting to AVX-512 more easily, is the "VL" (vector length) extensions which allow AVX-512 instructions to behave like SSE or AVX/AVX2 instructions when that suits us. This is a clever and powerful addition to enable its adoption in a wider assortment of software more quickly. The VL extensions mean that programmers (and compilers) do not need to shift immediately from 256-bits (AVX/AVX2) to 512-bits to use the new bit/byte/word manipulations. This transitional benefit is useful not only for an interim, but also for applications which find 256-bits more natural (perhaps a small, but important, subset of problems).
Will it be enough to stave off "Epyc"?
AMD: Windows 11 May Cause Performance Dips of Up to 15% on Ryzen CPUs
AMD: Windows 11 May Cause Performance Dips Of Up To 15% On Ryzen CPUs:
Wccftech reported a few days ago about known issues appearing after users have installed Microsoft's newest Windows 11 operating system—Oracle VirtualBox software and Cốc Cốc browser compatibility issues, as well as Intel networking issues. Today, AMD reported issues that their Windows 11 compatible AMD processors were having with performance while running certain applications after the new OS installation.
AMD urges users to stick with Windows 10 as a workaround, hotfix coming
[...] Known changes to performance affected areas such as
- Measured and functional L3 cache latency may increase by ~3X.
- UEFI CPPC2 ("preferred core") may not preferentially schedule threads on a processor's fastest core.
