Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Wednesday March 23 2016, @03:27AM   Printer-friendly
from the hickory-dickory-dock dept.

Intel may finally be abandoning its "Tick-Tock" strategy:

As reported at The Motley Fool, Intel's latest 10-K / annual report filing would seem to suggest that the 'Tick-Tock' strategy of introducing a new lithographic process note in one product cycle (a 'tick') and then an upgraded microarchitecture the next product cycle (a 'tock') is going to fall by the wayside for the next two lithographic nodes at a minimum, to be replaced with a three element cycle known as 'Process-Architecture-Optimization'.

Intel's Tick-Tock strategy has been the bedrock of their microprocessor dominance of the last decade. Throughout the tenure, every other year Intel would upgrade their fabrication plants to be able to produce processors with a smaller feature set, improving die area, power consumption, and slight optimizations of the microarchitecture, and in the years between the upgrades would launch a new set of processors based on a wholly new (sometimes paradigm shifting) microarchitecture for large performance upgrades. However, due to the difficulty of implementing a 'tick', the ever decreasing process node size and complexity therein, as reported previously with 14nm and the introduction of Kaby Lake, Intel's latest filing would suggest that 10nm will follow a similar pattern as 14nm by introducing a third stage to the cadence.

Year Process Name Type
2016 14nm Kaby Lake Optimization
2017 10nm Cannonlake Process
2018 10nm Ice Lake Architecture
2019 10nm Tiger Lake Optimization
2020 7nm ??? Process

This suggests that 10nm "Cannonlake" chips will be released in 2017, followed by a new 10nm architecture in 2018 (tentatively named "Ice Lake"), optimization in 2019 (tentatively named "Tiger Lake"), and 7nm chips in 2020. This year's "optimization" will come in the form of "Kaby Lake", which could end up making underwhelming improvements such as slightly higher clock speeds, due to higher yields of the previously-nameed "Skylake" chips. To be fair, Kaby Lake will supposedly add the following features alongside any CPU performance tweaks:

Kaby Lake will add native USB 3.1 support, whereas Skylake motherboards require a third-party add-on chip in order to provide USB 3.1 ports. It will also feature a new graphics architecture to improve performance in 3D graphics and 4K video playback. Kaby Lake will add native HDCP 2.2 support. Kaby Lake will add full fixed function HEVC Main10/10-bit and VP9 10-bit hardware decoding.

Previously: Intel's "Tick-Tock" Strategy Stalls, 10nm Chips Delayed


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by bzipitidoo on Wednesday March 23 2016, @03:18PM

    by bzipitidoo (4388) on Wednesday March 23 2016, @03:18PM (#322113) Journal

    Replace CALL and RET with the basic RISC instructions that compose it. They are basically jumps with extra actions, stack manipulations, that aren't always needed or wanted.

    CALL and RET are perfect examples why having several actions in a single instruction, CISC style, can be a bad idea. For instance, suppose that in calling a subroutine, parameters need to be passed and registers saved. The classic way to do that is a whole bunch of PUSH instructions. A big problem with that is that the stack pointer has to be updated each time PUSH or CALL is used. It would be faster to simply MOV everything needed onto the stack, and update the stack pointer once. Jump instead of call. Even uglier is that the CALL has to come after the parameters are PUSHed, which puts the return address on top of the stack. Now to get the parameters, the subroutine has to POP the return address and save it somewhere else, then POP the parameters, and finally PUSH the return address back onto the stack. That last action is solely to accommodate RET. In the calling code the return address was already known at compile time, why wait until a CALL instruction to put it on the stack, why not put it on the stack first? So, what to do about all this gross inefficiency? If not push the return address first, put the parameters somewhere else, maybe in registers, if there is enough room, or in a separate stack. PUSH, POP, CALL, and RET all use the same stack pointer, so it's a little extra work to set up and maintain a 2nd stack. There are work arounds of course, but it's best that there not be those problems. Just don't even use CALL, RET, PUSH, and POP. Why do all this stack maintenance anyway? In case the subroutine is recursive, and the recursion is not tail-end. Otherwise, maintaining a stack for parameters is a waste of CPU cycles.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 3, Informative) by turgid on Wednesday March 23 2016, @08:27PM

    by turgid (4318) Subscriber Badge on Wednesday March 23 2016, @08:27PM (#322241) Journal

    For instance, suppose that in calling a subroutine, parameters need to be passed and registers saved. The classic way to do that is a whole bunch of PUSH instructions. A big problem with that is that the stack pointer has to be updated each time PUSH or CALL is used. It would be faster to simply MOV everything needed onto the stack, and update the stack pointer once. Jump instead of call.

    There's nothing to stop you doing that, and I seem to remember seeing some (compiler-generated?) code that did precisely that donkey's years ago.

    In fact, the rest of what you mention (the location of the return address on the stack) is merely calling convention as defined by the compiler in use (usually some variant of C or its descendants). Before the world went (mad on) C++ there were a variety of compiled high-level languages about and many of them had their own subroutine calling conventions. Pascal (and its friends) for one, was different.

    I don't think the x86 instruction set is quite as limiting as you think.

  • (Score: 2) by maxwell demon on Thursday March 24 2016, @05:19PM

    by maxwell demon (1608) Subscriber Badge on Thursday March 24 2016, @05:19PM (#322590) Journal

    A big problem with that is that the stack pointer has to be updated each time PUSH or CALL is used.

    Why is that a problem? Do you fear that the stack pointer wears out, or something?

    It would be faster to simply MOV everything needed onto the stack, and update the stack pointer once.

    Nobody stops you from doing that. But I doubt that your method would be faster, as the incrementing/decrementing of the stack pointer can be done in parallel to the memory read/write. Something which is at least harder to do if it is a separate instruction.

    Jump instead of call.

    And how do you encode the return address? Note that pushing the instruction pointer will store the address of the next instruction on the stack, which is the jump. Oh wait, you don't want to push either. So it's instead: store current IP to some register, add the call sequence length to it, store the result in memory, and jump to the subroutine. Sorry, I'm not convinced this is better than a simple CALL.

    Even uglier is that the CALL has to come after the parameters are PUSHed, which puts the return address on top of the stack. Now to get the parameters, the subroutine has to POP the return address and save it somewhere else, then POP the parameters, and finally PUSH the return address back onto the stack.

    Wait, WHAT? Please tell me which compiler targeting x86 does such nonsense, so I can avoid it like the plague. In case you were not aware: The stack is normal memory, which can simply be read and written like any other memory.

    So, what to do about all this gross inefficiency?

    "Doctor, doing that hurts." — "Then don't do it."

    --
    The Tao of math: The numbers you can count are not the real numbers.
    • (Score: 2) by bzipitidoo on Thursday March 24 2016, @11:10PM

      by bzipitidoo (4388) on Thursday March 24 2016, @11:10PM (#322695) Journal

      Having to update the stack pointer half a dozen times instead of once will stall the pipeline. Those instructions cannot be executed in parallel or out of order.

      > And how do you encode the return address? Note that pushing the instruction pointer will store the address of the next instruction on the stack, which is the jump. Oh wait, you don't want to push either. So it's instead: store current IP to some register, add the call sequence length to it, store the result in memory, and jump to the subroutine.

      That is exactly how you do it. Storing the pre-jump IP can be built into the jump instruction. And why not? Math instructions set flags, why can't jump instructions set return points? The big difference between that sort of jump and a CALL is no stack.

      > Sorry, I'm not convinced this is better than a simple CALL.

      The jump is better. CALL is not simple, that's the problem. CALL does too much. It pushes an address to the stack and updates the stack pointer. Fine, if that will be used sometime later. Waste of time otherwise. And there are many cases in which the return address is not needed. Tail-end recursion is a big one. What sounds better to you, store the same return address 100,000 times because you insisted on using CALL in a tail-end recursive routine, or store it once?

      Of course I am aware that the stack can be set up anywhere in normal memory.

      > Please tell me which compiler targeting x86 does such nonsense

      They don't. Compilers are written to avoid generating specialized instructions. Might never use CALL, RET, PUSH, or POP at all. Instructions that aren't used shouldn't be in the instruction set.

      • (Score: 2) by maxwell demon on Friday March 25 2016, @03:42PM

        by maxwell demon (1608) Subscriber Badge on Friday March 25 2016, @03:42PM (#322930) Journal

        Having to update the stack pointer half a dozen times instead of once will stall the pipeline. Those instructions cannot be executed in parallel or out of order.

        Incrementing a register is a fast operation, which will surely be finished before the memory write is. Indeed, given that the instructions are decoded internally anyway, the processor may even transform the multiple updates into a single update, as it knows the semantics of the push. Think of it as on-the-fly compilation, which always profits of knowing more semantics.

        That is exactly how you do it. Storing the pre-jump IP can be built into the jump instruction. And why not? Math instructions set flags, why can't jump instructions set return points? The big difference between that sort of jump and a CALL is no stack.

        OK, so your issue is not really one instruction doing several things (in your suggested modified jump, it would both store the current value of IP to another register, and store another value to the IP, which really is two separate operations), but the fact that the processor uses a fixed data structure in memory (the stack).

        Tail-end recursion is a big one. What sounds better to you, store the same return address 100,000 times because you insisted on using CALL in a tail-end recursive routine, or store it once?

        If the compiler does not know/detect that you do tail recursion, it will have to emit instructions to save the address anyway. And if the compiler detects the tail recursion, it can simply replace the CALL with a JMP. So it doesn't realistically make a difference: In both cases you'll get the growing stack if the compiler does not detect the tail recursion, and a simple jump otherwise.

        Instructions that aren't used shouldn't be in the instruction set.

        Those instructions which aren't used today are in the instruction set for backwards compatibility. Just like there's still the FWAIT instruction which does not do anything at all in modern x86, but was absolutely necessary in the original 8086/8087 combination. And I don't see how their presence is a big problem.

        If you are designing a completely new instruction set, you've got the freedom to only include instructions which are useful with modern technology. But don't underestimate the value of backwards compatibility.

        I'd bet modern PCs still allow to disable the address line A20 through something that to the software looks like an old keyboard controller. Despite no modern operating system relying on (or even correctly working with) a disabled A20 line. But you can still boot DOS on those computers.

        Are you aware that some instructions in the x86 instruction set only exists in order to allow mechanical translation of 8080 code into 8086 code? Indeed, I'd bet the main reason of the 8086 segmented memory model was to enable such translations (and unlike having some legacy instructions, the 8086 segmented memory model was a real disadvantage).

        --
        The Tao of math: The numbers you can count are not the real numbers.