Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Thursday March 23 2017, @07:23PM   Printer-friendly
from the big.Little-just-couldn't-decide dept.

ARM will replace the big.LITTLE cluster design with a new one that allows up to 8 CPU cores per cluster, different types of cores within a cluster, and anywhere from one to many (unlimited?) clusters:

The first stage of DynamIQ is a larger cluster paradigm - which means up to eight cores per cluster. But in a twist, there can be a variable core design within a cluster. Those eight cores could be different cores entirely, from different ARM Cortex-A families in different configurations.

Many questions come up here, such as how the cache hierarchy will allow threads to migrate between cores within a cluster (perhaps similar to how threads migrate between clusters on big.Little today), even when cores have different cache arrangements. ARM did not yet go into that level of detail, however we were told that more information will be provided in the coming months.

Each variable core-configuration cluster will be a part of a new fabric, with uses additional power saving modes and aims to provide much lower latency. The underlying design also allows each core to be controlled independently for voltage and frequency, as well as sleep states. Based on the slide diagrams, various other IP blocks, such as accelerators, should be able to be plugged into this fabric and benefit from that low latency. ARM quoted elements such as safety critical automotive decisions can benefit from this.

A tri-cluster smartphone design using 2 high-end cores, 2 mid-level cores, and 4 low-power cores could be replaced by one that uses all three types of core in the same single cluster. The advantage of that approach remains to be seen.

More about ARM big.LITTLE.


Original Submission

Related Stories

ARM Cortex-A75, Cortex-A55, and Mali-G72 Announced 16 comments

ARM has announced two new CPU cores, the Cortex-A75 and Cortex-A55. According to ARM, the A75 increases performance by around 22% over the A73 at the same level of power consumption. It can also scale to use more power per core (1-2 W rather than 0.75 W) which could slightly improve the performance of ARM laptops and tablets.

The smaller core, the Cortex-A55, increases performance by around 18% compared to the Cortex-A53, but also increases power consumption by 3%. Thus, power efficiency is about 14-15% better than the A53.

ARM's successor to big.LITTLE, DynamIQ, allows for up to 8 cores of any size (which for now means either the A75 or A55) inside of a single cluster. This means that a configuration including 1x Cortex-A75 and 7x Cortex-A55 cores would be possible, or even optimal according to ARM.

ARM also announced its Mali-G72 GPU, an incremental upgrade to the Mali-G71:

ARM says that the Mali-G72 will see a 25 percent boost to energy efficiency compared with the G71, meaning that SoC designers will have more power to play with to boost performance or increase battery life.

Similarly, the G72 offers 20 percent better performance density, meaning that manufacturers can pack more GPU cores into the same die area as before, giving further potential for a performance boost without an increase in cost. Previously ARM was targeting 16 to 20 Mali-G71 cores as the optimum for mobile, and expects to see the number push closer to the 32 shader core maximum supported by the G72 this time around.


Original Submission

Snapdragon 845 Announced 6 comments

Snapdragon 845 is a newly announced Qualcomm ARM system-on-a-chip (SoC) built on a 10nm "Low Power Plus" process. It is the first SoC to implement ARM's new DynamiQ clustering scheme:

The Snapdragon 845 is a large step in terms of SoC architectures as it's the first to employ ARM's DynamiQ CPU cluster organization. Quickly explained, DynamIQ enables the various different CPU cores within an SoC to be hosted within the same cluster and cache hierarchy, as opposed to having separate discrete clusters with no shared cache between them (with coherency instead happening over an interconnect such as ARM's CCI). This major transition is probably the largest to date that we've seen in modern mobile smartphone ARM consumer SoCs.

[...] The Kryo 385 gold/performance cluster runs at up to 2.8GHz, which is a 14% frequency increase over the 2.45GHz of the Snapdragon 835's CPU core. But we also have to remember that given that the new CPU cores are likely based on A75's we should be expecting IPC gains of up to 22-34% based on use-cases, bringing the overall expected performance improvement to 25-39%. Qualcomm promises a 25-30% increase so we're not far off from ARM's projections.

The silver/efficiency cluster is running at 1.8GHz, this is clocked slightly slower than the A53's on the Snapdragon 835 however the maximum clocks of the efficiency cluster is mainly determined by where the efficiency curve of the performance cluster intersects. Nevertheless the efficiency cores promise 15% boost in performance compared to its predecessor.

The Adreno 630 GPU should provide up to 30% better performance than the Snapdragon 835's Adreno 540 at the same level of power consumption. Snapdragon 845 devices can record (encode) 2160p60 10-bit H.265 video, compared to 2160p30 for Snapdragon 835.

Also at The Verge, CNET, TechCrunch, BGR, and 9to5Google.

Previously: Qualcomm's Snapdragon 835 Detailed: 3 Billion Transistors on a 10nm Process


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 2) by Snotnose on Thursday March 23 2017, @07:40PM (4 children)

    by Snotnose (1623) on Thursday March 23 2017, @07:40PM (#483360)

    Their high end chips had an old ARM (7?) for low level stuff, an ARM9 driving the phone, and an ARM11 for apps. All on 1 chunk of silicon. Haven't been there in 7-8 years, dunno what's in their new chips.

    --
    Why shouldn't we judge a book by it's cover? It's got the author, title, and a summary of what the book's about.
    • (Score: 2) by FatPhil on Thursday March 23 2017, @07:44PM (2 children)

      by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Thursday March 23 2017, @07:44PM (#483366) Homepage
      If it was a phone SoC, it probably had up to 7 arm cores on it. The modem would have had an arm, the wi-fi would also have had an arm too, ...
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
      • (Score: 2) by DannyB on Thursday March 23 2017, @08:24PM (1 child)

        by DannyB (5839) Subscriber Badge on Thursday March 23 2017, @08:24PM (#483380) Journal

        That design does not seem very generalized.

        The workload should be able to shift to different cores if a significant number of ARM cores are destroyed within a Borg vessel.

        --
        People today are educated enough to repeat what they are taught but not to question what they are taught.
        • (Score: 4, Informative) by jmorris on Friday March 24 2017, @01:27AM

          by jmorris (4844) on Friday March 24 2017, @01:27AM (#483466)

          The radio is always kept isolated to prevent the insecure Android side from possibly being able to get at the physical interface of the radio. Putting an entirely separate CPU, RAM and FLASH, often with only a serial link to the main CPU is secure, especially since the radio processor tends to only boot signed images. Some get cheap and use the newer ARM cpu's ability to partition off a really secure section of memory and run in a super better than ring 0 mode but I bet the FCC doesn't like it and makes that known. The Wifi is the same way, dedicated signed firmware on a dedicated CPU, usually connected by an internal USB link. Because those radios are basically a software defined radio that is physically capable of all sorts of fun things... IF we could get our hands on them. The FCC ain't having none of that.

          It really is crazy how many processing units a phone can stuff in. My old crappy Tegra3 based phone has four fast ARM cores, one slow ARM core, one ARM "AVP" core as a co-processor (it is actually the boot processor and does the secure boot stuff and starts the main one, then idles with it's own dedicated 256K block of static ram to help (along with yet another undocumented specialty processing unit) play media files and do sleep/wake, etc. Then there is a crypto processor that NVidia won't document in the tech manual, a couple of GPU cores, the radio is on an entirely different chip made by Intel with dedicated ram/flash. Same for BT, GPS and NFC, they have a small CPU in them, type unknown and there is even a little one in the SIM card. It truly is amazing the computing plenty we take for granted.

    • (Score: 2) by Hairyfeet on Friday March 24 2017, @07:34AM

      by Hairyfeet (75) <{bassbeast1968} {at} {gmail.com}> on Friday March 24 2017, @07:34AM (#483555) Journal

      I think they are still making those, they are used in a couple of the $150-$200 BLU phones I've been looking at as well as some Alcatel One Touch models. IIRC the new ones are octocores and have 2 of the ARM 7s for low power tasks like checking email with the screen off, 2 of the ARM 9 for phone tasks, and 4 of the ARM 11s for the apps. Pretty impressive if you ask me.

      --
      ACs are never seen so don't bother. Always ready to show SJWs for the racists they are.
  • (Score: 0) by Anonymous Coward on Thursday March 23 2017, @07:41PM (6 children)

    by Anonymous Coward on Thursday March 23 2017, @07:41PM (#483363)

    General-purpose programming languages cannot keep up with these special-purpose designs; have you read the latest C++/C thread model? It makes no sense! It's as understandable as Quantum Mechanics.

    Add to this increasingly complexity the looming specter of proprietary computing "fabrics", and there's just no room for anything but massive, monopolistic, top-down, dictatorial, walled-garden, magical corporate overlording.

    Personal Computing is dead.

    • (Score: 3, Funny) by LoRdTAW on Thursday March 23 2017, @07:53PM

      by LoRdTAW (3755) on Thursday March 23 2017, @07:53PM (#483370) Journal

      You gotta learn to app! Only apps matter. You also need to let go of control and also IoT more.

    • (Score: 2, Disagree) by DannyB on Thursday March 23 2017, @08:55PM (3 children)

      by DannyB (5839) Subscriber Badge on Thursday March 23 2017, @08:55PM (#483395) Journal

      Don't think about language first. Think in terms of Map and Reduce frameworks. Look for problems that can be decomposed into much smaller pieces. Ideally the class of problems that are "embarrassingly parallel". For example, computing each pixel of an image of the Mandelbrot set. Or any other pixel computation where each pixel is computed independently of its neighbors. (eg, 3D rendering, many photoshop filters, maybe video encode / decode) Or if the setup/teardown overhead is too high for computing a single pixel, then divide the image into blocks of pixels that are computed iteratively. Example, break a 4096x4096 image into 256x256 pixel blocks, treat each block as a fundamental problem element.

      Don't look at C / C++. Look at higher level languages. Examples: Clojure, Erlang, and others. The overhead of the runtime substrates for the higher level languages is not as important as the need to be able to easily scale the problem by simply throwing more cpu's at it. If my high level language solution is ten times slower than your C++ code, but I can trivially just throw more cpu's at my solution to scale up to any level I please, then I have a winner. You shouldn't even be thinking in terms of C++ thread models. Think "forbidden planet": that machine is going to provide whatever amount of power your monster needs to achieve its goals.

      The problem needs to be decomposable into small parts. Ideally very small, hence "embarrassingly parallel" like many pixel based problems. As a counter example, I would offer the problem Spock asked of the Enterprise computer: compute to the last digit the absolute value of Pi. The idea was that more and more "banks" of the computer would work on the problem until capacity was exhausted. I have difficulty imaging how this particular problem could achieve that, since it is not obvious to me how the computation of an infinitely long Pi could be done in parallel.

      --
      People today are educated enough to repeat what they are taught but not to question what they are taught.
      • (Score: 2, Interesting) by Anonymous Coward on Thursday March 23 2017, @09:05PM (1 child)

        by Anonymous Coward on Thursday March 23 2017, @09:05PM (#483399)

        The understanding of the details on which are built those high-level frameworks is slowly being locked away in the in the walled gardens of giant corporations. The only way to solve problems will be to do so within their set of concepts, because it will be too complex to reverse-engineer just what's going on.

        The user is being pushed to increasingly higher levels of abstraction (as you note), which are attached to reality through carefully guarded industry secrets. The world of computing is ever more magical.

        • (Score: 2) by Scruffy Beard 2 on Friday March 24 2017, @08:46AM

          by Scruffy Beard 2 (6030) on Friday March 24 2017, @08:46AM (#483570)

          I have an idle long-term plan for that: Build an auditable computer from scratch. Would probably take decades though.

          It would involve fuse ROMs programmed through CRC protected toggle switches. Then using those ROMs to build periperals like keyboards and monitors that you can trust.

          Would involve code correctness proofs as well. I am hoping that as complexity goes up, the formal proofs will greatly reduce debugging time.

          Goes off to start dreaming for reals.

      • (Score: 2) by jmorris on Friday March 24 2017, @01:38AM

        by jmorris (4844) on Friday March 24 2017, @01:38AM (#483470)

        Which is great if your program only need run a few times. Otherwise if it runs ten times slower it requires ten times the electricity and ten times the data center capacity. So many people make that mistake, deploying scripting and other toy/fad/academic languages into production and only when the crunch time comes realize that the survival of the company now depends on replacing that hot mess with real code before the hosting bills from chasing the load bankrupts them or starts chasing off the users who are finally swarming in with error messages. Remember the fail whale; don't be Jack. They barely survived the mistake.

    • (Score: 2) by Bot on Friday March 24 2017, @06:35AM

      by Bot (3902) on Friday March 24 2017, @06:35AM (#483543) Journal

      If you think that's bad, consider what will do systemd on that platform. On odroid (arm) systemd is not able to systematically close the network connection before rebooting, so the ssh client is left waiting half of the time.

      The good news is that if you are into genetic algorithms you can write javascript on chrome on systemd on dynamicIQ and let the platform itself evolve the code. Skynet has to start somewhere.

      --
      Account abandoned.
  • (Score: 1, Redundant) by FatPhil on Thursday March 23 2017, @07:57PM (3 children)

    by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Thursday March 23 2017, @07:57PM (#483372) Homepage
    My understanding, from people who do little things like maintain (as in the real official maintainers of) the linux kernel for arm-based chip families, and power-management subsystems, is that big.LITTLE isn't even fully working yet, despite it being several years old, and that the more complicated designs (such as min.med.max, as covered here a couple of weeks ago) have no power-saving benefit over a big.LITTLE design.

    Previous slideware not delivering what was promised, therefore introduce new slideware with even bigger promises?

    ARM ain't what they used to be a half a decade or so ago, they're turning into proper bullshit artists.
    --
    Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
    • (Score: 2) by bob_super on Thursday March 23 2017, @08:52PM (2 children)

      by bob_super (1357) on Thursday March 23 2017, @08:52PM (#483389)

      If you do industrial embedded designs, you customize your tasks and scheduler to properly use the right cores. BIG.little and similar schemes are great.

      If you do general computing, good luck figuring out the universal rule for Extremely Varied Apps With Greedy Coders ("I'll run the clock on the A72 so my user doesn't feel any lag").

      • (Score: 2) by FatPhil on Friday March 24 2017, @08:02AM (1 child)

        by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Friday March 24 2017, @08:02AM (#483563) Homepage
        Nit - it's big.LITTLE, not BIG.little. And yes, I know how you're supposed to make use of it in theory, but my point is that it's still not delivering all of the benefits that were promised. "Customising the scheduler" isn't something that you can just do on a lazy friday afternoon, teams of dozens of engineers working for years still haven't got it right. I know, I've worked with them. I have about 3 years of being a linux kernel developer, in an ARM SoC environment, on my CV.
        --
        Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
  • (Score: 1, Funny) by Anonymous Coward on Thursday March 23 2017, @08:00PM (3 children)

    by Anonymous Coward on Thursday March 23 2017, @08:00PM (#483373)

    Imagine a Beowulf ARM cluster.

    • (Score: 3, Funny) by DannyB on Thursday March 23 2017, @08:58PM

      by DannyB (5839) Subscriber Badge on Thursday March 23 2017, @08:58PM (#483396) Journal

      Sir, your imagination is limited.

      Imagine a Beowulf cluster of Beowulf clusters.

      It's Beowulf clusters all the way down. An infinite recursion. The final cluster of that infinite fuster cluck is built on ARM chips.

      --
      People today are educated enough to repeat what they are taught but not to question what they are taught.
    • (Score: 0) by Anonymous Coward on Thursday March 23 2017, @10:50PM

      by Anonymous Coward on Thursday March 23 2017, @10:50PM (#483425)

      Like this [clusterhat.com] or like this [networkworld.com]?

    • (Score: 0) by Anonymous Coward on Friday March 24 2017, @05:15AM

      by Anonymous Coward on Friday March 24 2017, @05:15AM (#483521)

      Dumb jokes never die, they just migrate to other websites.

(1)