Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Sunday October 17 2021, @03:02AM   Printer-friendly
from the core-elation? dept.

Intel published a developer guide confirming details of its upcoming Alder Lake processors.

Desktop "Alder Lake-S" processors will include up to 8 "Golden Cove" performance cores (P-cores), 8 "Gracemont" (Atom) efficiency cores (E-cores), and 32 graphics execution units (Gen 12.2 EUs). A smaller die will include only up to 6 P-cores and no E-cores, to be used in lower-end products such as a 6-core Intel Core i5-12400 or a quad-core i3.

Mobile "Alder Lake-P" processors will include up to 6 P-cores, 8 E-cores, and 96 graphics EUs. A smaller "ultra mobile" die will include up to 2 P-cores and 8 E-cores.

AVX-512 is physically present on Golden Cove cores, but disabled in Alder Lake.

The guide mainly focuses on software implementations for hybrid CPUs. It provides various optimization strategies for Alder Lake, including lack of optimization, a "Good Scenario", and the "Best Scenario". According to the document, lack of optimization will not mean that the CPU will be unable to distribute workloads for hybrid CPUs, which should be handled by ThreadDirector anyway, but some may be distributed to the wrong types of cores, should the scheduling algorithm not recognize the task.

In the "Good Scenario," Intel assumes that the application will be aware of the hybrid architecture. The primary tasks should target Performance cores, whereas non-essential and background threads with lower priority should target Effcieent cores.

The "Best Scenario" goes into further detail about which workloads specifically should target Efficient cores: Shader Compilation, Audio Mixing, Asset Streaming, Decompression, Any other non-critical work.

Intel's Thread Director combines a microcontroller with software-based scheduling:

Intel's Thread Director controller puts an embedded microcontroller inside the processor such that it can monitor what each thread is doing and what it needs out of its performance metrics. It will look at the ratio of loads, stores, branches, average memory access times, patterns, and types of instructions. It then provides suggested hints back to the Windows 11 OS scheduler about what the thread is doing, whether it is important or not, and it is up to the OS scheduler to combine that with other information about the system as to where that thread should go. Ultimately the OS is both topologically aware and now workload aware to a much higher degree.

Inside the microcontroller as part of Thread Director, it monitors which instructions are power hungry, such as AVX-VNNI (for machine learning) or other AVX2 commands that often draw high power, and put a big flag on those for the OS for prioritization. It also looks at other threads in the system and if a thread needs to be demoted, either due to not having enough free P-cores or for power/thermal reasons, it will give hints to the OS as to which thread is best to move. Intel states that it can profile a thread in as little as 30 microseconds, whereas a traditional OS scheduler may take 100s of milliseconds to make the same conclusion (or the wrong one).

On top of this, Intel says that Thread Director can also optimize for frequency. If a thread is limited in a way other than frequency, it can detect this and reduce frequency, voltage, and power. This will help the mobile processors, and when asked Intel stated that it can change frequency now in microseconds rather than milliseconds.

[...] On the question of Linux, Intel only went as far to say that Windows 11 was the priority, and they're working upstreaming a variety of features in the Linux kernel but it will take time. An Intel spokesperson said more details closer to product launch, however these things will take a while, perhaps months and years, to get to a state that could be feature-parity equivalent with Windows 11.

See also: Intel 12th gen Alder Lake-P and Alder Lake-M mobile SKUs to enter production between Q4 2021 and Q1 2022; Up to 14 cores, Xe GT3, PCie Gen5, and DDR5 on the anvil
Linux 5.16 To Add Intel Encrypted PXP, Alder Lake S Declared Stable & Ready
Alder Lake Support Added To Intel's TCC Driver In Linux 5.15
Scheduler Changes For Linux 5.15 - Still No Sign Of Any Intel Thread Director Optimizations

Previously: Windows 11 Bashes Some AMD Procs; Boosts Some Intel Core i7 Alder Lake


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Informative) by takyon on Sunday October 17 2021, @03:58PM

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Sunday October 17 2021, @03:58PM (#1187739) Journal

    is it possible to start each task/thread on the little cores and only escalate/move to the heavy cores later? or is a task/thread bound to the core it was started on?

    They can migrate, quickly.

    Actually, the AVX situation is probably due to the need for parity between the two core types, so that any thread can run on any core:

    https://en.wikipedia.org/wiki/Alder_Lake_(microprocessor)#CPU [wikipedia.org]

    Golden Cove high-performance CPU cores (P-core)
    * AVX-512 (including FP16), while physically present in the die, is disabled to match E-core.

    Gracemont high-efficiency CPU cores (E-core)
    * AVX2, FMA and AVX-VNNI to catch up with P-core.

    x86 still has a long way to go before it could possibly match ARM efficiency, and the top Alder Lake is rumored to use a lot of energy if allowed to:

    Intel Core i9-12900K tested: toasty 93C under load, 250W of power used [tweaktown.com]

    The 3.9 GHz boost clock for Atom cores, while about 1.4 GHz lower than the performance cores, is higher than any other Intel Atom product ever, AFAIK.

    What is true is that the Atom cores have a performance-per-watt advantage, as well as a performance-per-area advantage. Intel can put 4 Atom cores in the place of 1 Golden Cove core.

    The next generation, Raptor Lake, will literally double down on this strategy by keeping 8 big cores, and doubling to 16 small cores. It's rumored that Intel will quadruple down with "Arrow Lake-S" [digitaltrends.com] (desktop), again keeping 8 big cores, but doubling again to 32 small cores, for a total of 40 cores, 48 threads.

    Google is not the only major ARM user, just look at Apple [notebookcheck.net].

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    Starting Score:    1  point
    Moderation   +1  
       Informative=1, Total=1
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3