Intel may finally be abandoning its "Tick-Tock" strategy:
As reported at The Motley Fool, Intel's latest 10-K / annual report filing would seem to suggest that the 'Tick-Tock' strategy of introducing a new lithographic process note in one product cycle (a 'tick') and then an upgraded microarchitecture the next product cycle (a 'tock') is going to fall by the wayside for the next two lithographic nodes at a minimum, to be replaced with a three element cycle known as 'Process-Architecture-Optimization'.
Intel's Tick-Tock strategy has been the bedrock of their microprocessor dominance of the last decade. Throughout the tenure, every other year Intel would upgrade their fabrication plants to be able to produce processors with a smaller feature set, improving die area, power consumption, and slight optimizations of the microarchitecture, and in the years between the upgrades would launch a new set of processors based on a wholly new (sometimes paradigm shifting) microarchitecture for large performance upgrades. However, due to the difficulty of implementing a 'tick', the ever decreasing process node size and complexity therein, as reported previously with 14nm and the introduction of Kaby Lake, Intel's latest filing would suggest that 10nm will follow a similar pattern as 14nm by introducing a third stage to the cadence.
Year | Process | Name | Type |
---|---|---|---|
2016 | 14nm | Kaby Lake | Optimization |
2017 | 10nm | Cannonlake | Process |
2018 | 10nm | Ice Lake | Architecture |
2019 | 10nm | Tiger Lake | Optimization |
2020 | 7nm | ??? | Process |
This suggests that 10nm "Cannonlake" chips will be released in 2017, followed by a new 10nm architecture in 2018 (tentatively named "Ice Lake"), optimization in 2019 (tentatively named "Tiger Lake"), and 7nm chips in 2020. This year's "optimization" will come in the form of "Kaby Lake", which could end up making underwhelming improvements such as slightly higher clock speeds, due to higher yields of the previously-nameed "Skylake" chips. To be fair, Kaby Lake will supposedly add the following features alongside any CPU performance tweaks:
Kaby Lake will add native USB 3.1 support, whereas Skylake motherboards require a third-party add-on chip in order to provide USB 3.1 ports. It will also feature a new graphics architecture to improve performance in 3D graphics and 4K video playback. Kaby Lake will add native HDCP 2.2 support. Kaby Lake will add full fixed function HEVC Main10/10-bit and VP9 10-bit hardware decoding.
Previously: Intel's "Tick-Tock" Strategy Stalls, 10nm Chips Delayed
(Score: 3, Insightful) by maxwell demon on Wednesday March 23 2016, @10:55AM
There's only one register needed to maintain the call stack, the stack pointer. The use of a second register (BP) is due to language calling conventions (which due to their common usage later got codified into the instructions enter/leave). More exactly, it's because of variable-length argument lists being stored on the stack, and C using the same calling conventions also for functions without variable-length argument list.
Indeed, if the call stack were used for just what it was originally intended for, to store the return address and saved registers during a function call, it would not need to be accessible through the data segment in 32 bit mode, which would mean that it would be orders of magnitude harder for exploits to manipulate return addresses.
I'm not sure what you'd want to replace CALL/RET with.
The Tao of math: The numbers you can count are not the real numbers.
(Score: 2) by bzipitidoo on Wednesday March 23 2016, @03:18PM
Replace CALL and RET with the basic RISC instructions that compose it. They are basically jumps with extra actions, stack manipulations, that aren't always needed or wanted.
CALL and RET are perfect examples why having several actions in a single instruction, CISC style, can be a bad idea. For instance, suppose that in calling a subroutine, parameters need to be passed and registers saved. The classic way to do that is a whole bunch of PUSH instructions. A big problem with that is that the stack pointer has to be updated each time PUSH or CALL is used. It would be faster to simply MOV everything needed onto the stack, and update the stack pointer once. Jump instead of call. Even uglier is that the CALL has to come after the parameters are PUSHed, which puts the return address on top of the stack. Now to get the parameters, the subroutine has to POP the return address and save it somewhere else, then POP the parameters, and finally PUSH the return address back onto the stack. That last action is solely to accommodate RET. In the calling code the return address was already known at compile time, why wait until a CALL instruction to put it on the stack, why not put it on the stack first? So, what to do about all this gross inefficiency? If not push the return address first, put the parameters somewhere else, maybe in registers, if there is enough room, or in a separate stack. PUSH, POP, CALL, and RET all use the same stack pointer, so it's a little extra work to set up and maintain a 2nd stack. There are work arounds of course, but it's best that there not be those problems. Just don't even use CALL, RET, PUSH, and POP. Why do all this stack maintenance anyway? In case the subroutine is recursive, and the recursion is not tail-end. Otherwise, maintaining a stack for parameters is a waste of CPU cycles.
(Score: 3, Informative) by turgid on Wednesday March 23 2016, @08:27PM
There's nothing to stop you doing that, and I seem to remember seeing some (compiler-generated?) code that did precisely that donkey's years ago.
In fact, the rest of what you mention (the location of the return address on the stack) is merely calling convention as defined by the compiler in use (usually some variant of C or its descendants). Before the world went (mad on) C++ there were a variety of compiled high-level languages about and many of them had their own subroutine calling conventions. Pascal (and its friends) for one, was different.
I don't think the x86 instruction set is quite as limiting as you think.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 2) by maxwell demon on Thursday March 24 2016, @05:19PM
The Tao of math: The numbers you can count are not the real numbers.
(Score: 2) by bzipitidoo on Thursday March 24 2016, @11:10PM
Having to update the stack pointer half a dozen times instead of once will stall the pipeline. Those instructions cannot be executed in parallel or out of order.
> And how do you encode the return address? Note that pushing the instruction pointer will store the address of the next instruction on the stack, which is the jump. Oh wait, you don't want to push either. So it's instead: store current IP to some register, add the call sequence length to it, store the result in memory, and jump to the subroutine.
That is exactly how you do it. Storing the pre-jump IP can be built into the jump instruction. And why not? Math instructions set flags, why can't jump instructions set return points? The big difference between that sort of jump and a CALL is no stack.
> Sorry, I'm not convinced this is better than a simple CALL.
The jump is better. CALL is not simple, that's the problem. CALL does too much. It pushes an address to the stack and updates the stack pointer. Fine, if that will be used sometime later. Waste of time otherwise. And there are many cases in which the return address is not needed. Tail-end recursion is a big one. What sounds better to you, store the same return address 100,000 times because you insisted on using CALL in a tail-end recursive routine, or store it once?
Of course I am aware that the stack can be set up anywhere in normal memory.
> Please tell me which compiler targeting x86 does such nonsense
They don't. Compilers are written to avoid generating specialized instructions. Might never use CALL, RET, PUSH, or POP at all. Instructions that aren't used shouldn't be in the instruction set.
(Score: 2) by maxwell demon on Friday March 25 2016, @03:42PM
Incrementing a register is a fast operation, which will surely be finished before the memory write is. Indeed, given that the instructions are decoded internally anyway, the processor may even transform the multiple updates into a single update, as it knows the semantics of the push. Think of it as on-the-fly compilation, which always profits of knowing more semantics.
OK, so your issue is not really one instruction doing several things (in your suggested modified jump, it would both store the current value of IP to another register, and store another value to the IP, which really is two separate operations), but the fact that the processor uses a fixed data structure in memory (the stack).
If the compiler does not know/detect that you do tail recursion, it will have to emit instructions to save the address anyway. And if the compiler detects the tail recursion, it can simply replace the CALL with a JMP. So it doesn't realistically make a difference: In both cases you'll get the growing stack if the compiler does not detect the tail recursion, and a simple jump otherwise.
Those instructions which aren't used today are in the instruction set for backwards compatibility. Just like there's still the FWAIT instruction which does not do anything at all in modern x86, but was absolutely necessary in the original 8086/8087 combination. And I don't see how their presence is a big problem.
If you are designing a completely new instruction set, you've got the freedom to only include instructions which are useful with modern technology. But don't underestimate the value of backwards compatibility.
I'd bet modern PCs still allow to disable the address line A20 through something that to the software looks like an old keyboard controller. Despite no modern operating system relying on (or even correctly working with) a disabled A20 line. But you can still boot DOS on those computers.
Are you aware that some instructions in the x86 instruction set only exists in order to allow mechanical translation of 8080 code into 8086 code? Indeed, I'd bet the main reason of the 8086 segmented memory model was to enable such translations (and unlike having some legacy instructions, the 8086 segmented memory model was a real disadvantage).
The Tao of math: The numbers you can count are not the real numbers.