Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Friday August 17 2018, @09:10PM   Printer-friendly
from the doing-more-with-less dept.

Arm Unveils Client CPU Performance Roadmap Through 2020 - Taking Intel Head On

Today's roadmap now publicly discloses the codenames of the next two generations of CPU cores following the A76 – Deimos and Hercules. Both future cores are based on the new A76 micro-architecture and will introduce respective evolutionary refinements and incremental updates for the Austin cores.

The A76 being a 2018 product – and we should be hearing more on the first commercial devices on 7nm towards the end of the year and coming months, Deimos is its 2019 successor aiming at more wide-spread 7nm adoption. Hercules is said to be the next iteration of the microarchitecture for 2020 products and the first 5nm implementations. This is as far as Arm is willing to project in the future for today's disclosure, as the Sophia team is working on the next big microarchitecture push, which I suspect will be the successor to Hercules in 2021.

Part of today's announcement is Arm's reiteration of the performance and power goals of the A76 against competing platforms from Intel. The measurement metric today was the performance of a SPECint2006 Speed run under Linux while complied under GCC7. The power metrics represent the whole SoC "TDP", meaning CPU, interconnect and memory controllers – essentially the active platform power much in a similar way we've been representing smartphone mobile power in recent mobile deep-dive articles.

Here a Cortex A76 based system running at up to 3GHz is said to match the single-thread performance of an Intel Core i5-7300U running at its maximum 3.5GHz turbo operating speed, all while doing it within a TDP of less than 5W, versus "15W" for the Intel system. I'm not too happy with the power presentation done here by Arm as we kind of have an apples-and-oranges comparison; the Arm estimates here are meant to represent actual power consumption under the single-threaded SPEC workload while the Intel figures are the official TDP figures of the SKU – which obviously don't directly apply to this scenario.

Also at TechCrunch.

See also: Arm Maps Out Attack on Intel Core i5
ARM's First Client PC Roadmap Makes Bold Claims, Doesn't Back Them Up
ARM says its next processors will outperform Intel laptop chips

Related: ARM Based Laptop DIY Kit Ready to Hit the Shops
First ARM Snapdragon-Based Windows 10 S Systems Announced
Laptop and Phone Convergence at CES
Snapdragon 1000 ARM SoC Could Compete With Low-Power Intel Chips in Laptops


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1) by anubi on Saturday August 18 2018, @07:27AM (2 children)

    by anubi (2828) on Saturday August 18 2018, @07:27AM (#723059) Journal

    After studying the intent behind the Parallax Propeller, I am surprised the big boys by now aren't using machines with at least a thousand cores.

    Each one running a single process. No multitasking. Context switching eats up a lot of time. Spawn off another process? NEW another core. Instantiate and delete processes very similar to memory management in C++.

    All this task switching consumes time. The optimization is in getting as many cores as possible running in parallel.

    I think Chip Gracey of Parallax has thunk up one helluva architecture in his Propeller chip. I find it to be a great programmable I/O processors... I can program them to whatever protocol I want...serial, I2C, SPI, ModBus, DMX, TCP, whatever! Even serial VGA ready to send to a monitor. Then change it later if I need to.

    I think I see the potential in his architecture, even though I do not understand the nitty-gritty of it yet.

    --
    "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
  • (Score: 1, Insightful) by Anonymous Coward on Saturday August 18 2018, @09:39AM (1 child)

    by Anonymous Coward on Saturday August 18 2018, @09:39AM (#723075)

    All this task switching consumes time. The optimization is in getting as many cores as possible running in parallel.

    Memory/IO become quickly the bottleneck, no?

    • (Score: 1) by anubi on Saturday August 18 2018, @10:58AM

      by anubi (2828) on Saturday August 18 2018, @10:58AM (#723090) Journal

      Give each core some local memory... and some multiport shared. The multiport is for I/O, while the program and local variables are in the local. A harvard-like architecture.

      ( I am obviously not a professional chip designer, but when I saw Chip Gracey's design of the Propeller chip, I was pretty impressed. I'd love to know more about that chip, but its not lack of information, actually I have a rather nice book on the chip published by McGraw-Hill from Parallax, but its sheer lack of time for me to sit down with a few chips and code up a few thingies. I would really like is to understand how to change the VGA driver to take I2C instead of serial TX/RX, so I can put it on the my I2C line along with all my other interfaces. I'll just pick an unused address and use that. And write to it in a similar manner as I presently write to LCD displays. Then use the other cogs to emulate yet more UARTS so to make talking with other things that insist on RXD/TXD assigned to yet more I2C address. I envision one emulating 4 UARTS, with one pre-assigned for VGA duty, answering to four consecutive I2C addresses. )

      --
      "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]