Intel may finally be abandoning its "Tick-Tock" strategy:
As reported at The Motley Fool, Intel's latest 10-K / annual report filing would seem to suggest that the 'Tick-Tock' strategy of introducing a new lithographic process note in one product cycle (a 'tick') and then an upgraded microarchitecture the next product cycle (a 'tock') is going to fall by the wayside for the next two lithographic nodes at a minimum, to be replaced with a three element cycle known as 'Process-Architecture-Optimization'.
Intel's Tick-Tock strategy has been the bedrock of their microprocessor dominance of the last decade. Throughout the tenure, every other year Intel would upgrade their fabrication plants to be able to produce processors with a smaller feature set, improving die area, power consumption, and slight optimizations of the microarchitecture, and in the years between the upgrades would launch a new set of processors based on a wholly new (sometimes paradigm shifting) microarchitecture for large performance upgrades. However, due to the difficulty of implementing a 'tick', the ever decreasing process node size and complexity therein, as reported previously with 14nm and the introduction of Kaby Lake, Intel's latest filing would suggest that 10nm will follow a similar pattern as 14nm by introducing a third stage to the cadence.
Year | Process | Name | Type |
---|---|---|---|
2016 | 14nm | Kaby Lake | Optimization |
2017 | 10nm | Cannonlake | Process |
2018 | 10nm | Ice Lake | Architecture |
2019 | 10nm | Tiger Lake | Optimization |
2020 | 7nm | ??? | Process |
This suggests that 10nm "Cannonlake" chips will be released in 2017, followed by a new 10nm architecture in 2018 (tentatively named "Ice Lake"), optimization in 2019 (tentatively named "Tiger Lake"), and 7nm chips in 2020. This year's "optimization" will come in the form of "Kaby Lake", which could end up making underwhelming improvements such as slightly higher clock speeds, due to higher yields of the previously-nameed "Skylake" chips. To be fair, Kaby Lake will supposedly add the following features alongside any CPU performance tweaks:
Kaby Lake will add native USB 3.1 support, whereas Skylake motherboards require a third-party add-on chip in order to provide USB 3.1 ports. It will also feature a new graphics architecture to improve performance in 3D graphics and 4K video playback. Kaby Lake will add native HDCP 2.2 support. Kaby Lake will add full fixed function HEVC Main10/10-bit and VP9 10-bit hardware decoding.
Previously: Intel's "Tick-Tock" Strategy Stalls, 10nm Chips Delayed
(Score: 2) by maxwell demon on Thursday March 24 2016, @05:19PM
The Tao of math: The numbers you can count are not the real numbers.
(Score: 2) by bzipitidoo on Thursday March 24 2016, @11:10PM
Having to update the stack pointer half a dozen times instead of once will stall the pipeline. Those instructions cannot be executed in parallel or out of order.
> And how do you encode the return address? Note that pushing the instruction pointer will store the address of the next instruction on the stack, which is the jump. Oh wait, you don't want to push either. So it's instead: store current IP to some register, add the call sequence length to it, store the result in memory, and jump to the subroutine.
That is exactly how you do it. Storing the pre-jump IP can be built into the jump instruction. And why not? Math instructions set flags, why can't jump instructions set return points? The big difference between that sort of jump and a CALL is no stack.
> Sorry, I'm not convinced this is better than a simple CALL.
The jump is better. CALL is not simple, that's the problem. CALL does too much. It pushes an address to the stack and updates the stack pointer. Fine, if that will be used sometime later. Waste of time otherwise. And there are many cases in which the return address is not needed. Tail-end recursion is a big one. What sounds better to you, store the same return address 100,000 times because you insisted on using CALL in a tail-end recursive routine, or store it once?
Of course I am aware that the stack can be set up anywhere in normal memory.
> Please tell me which compiler targeting x86 does such nonsense
They don't. Compilers are written to avoid generating specialized instructions. Might never use CALL, RET, PUSH, or POP at all. Instructions that aren't used shouldn't be in the instruction set.
(Score: 2) by maxwell demon on Friday March 25 2016, @03:42PM
Incrementing a register is a fast operation, which will surely be finished before the memory write is. Indeed, given that the instructions are decoded internally anyway, the processor may even transform the multiple updates into a single update, as it knows the semantics of the push. Think of it as on-the-fly compilation, which always profits of knowing more semantics.
OK, so your issue is not really one instruction doing several things (in your suggested modified jump, it would both store the current value of IP to another register, and store another value to the IP, which really is two separate operations), but the fact that the processor uses a fixed data structure in memory (the stack).
If the compiler does not know/detect that you do tail recursion, it will have to emit instructions to save the address anyway. And if the compiler detects the tail recursion, it can simply replace the CALL with a JMP. So it doesn't realistically make a difference: In both cases you'll get the growing stack if the compiler does not detect the tail recursion, and a simple jump otherwise.
Those instructions which aren't used today are in the instruction set for backwards compatibility. Just like there's still the FWAIT instruction which does not do anything at all in modern x86, but was absolutely necessary in the original 8086/8087 combination. And I don't see how their presence is a big problem.
If you are designing a completely new instruction set, you've got the freedom to only include instructions which are useful with modern technology. But don't underestimate the value of backwards compatibility.
I'd bet modern PCs still allow to disable the address line A20 through something that to the software looks like an old keyboard controller. Despite no modern operating system relying on (or even correctly working with) a disabled A20 line. But you can still boot DOS on those computers.
Are you aware that some instructions in the x86 instruction set only exists in order to allow mechanical translation of 8080 code into 8086 code? Indeed, I'd bet the main reason of the 8086 segmented memory model was to enable such translations (and unlike having some legacy instructions, the 8086 segmented memory model was a real disadvantage).
The Tao of math: The numbers you can count are not the real numbers.