Intel may finally be abandoning its "Tick-Tock" strategy:
As reported at The Motley Fool, Intel's latest 10-K / annual report filing would seem to suggest that the 'Tick-Tock' strategy of introducing a new lithographic process note in one product cycle (a 'tick') and then an upgraded microarchitecture the next product cycle (a 'tock') is going to fall by the wayside for the next two lithographic nodes at a minimum, to be replaced with a three element cycle known as 'Process-Architecture-Optimization'.
Intel's Tick-Tock strategy has been the bedrock of their microprocessor dominance of the last decade. Throughout the tenure, every other year Intel would upgrade their fabrication plants to be able to produce processors with a smaller feature set, improving die area, power consumption, and slight optimizations of the microarchitecture, and in the years between the upgrades would launch a new set of processors based on a wholly new (sometimes paradigm shifting) microarchitecture for large performance upgrades. However, due to the difficulty of implementing a 'tick', the ever decreasing process node size and complexity therein, as reported previously with 14nm and the introduction of Kaby Lake, Intel's latest filing would suggest that 10nm will follow a similar pattern as 14nm by introducing a third stage to the cadence.
Year | Process | Name | Type |
---|---|---|---|
2016 | 14nm | Kaby Lake | Optimization |
2017 | 10nm | Cannonlake | Process |
2018 | 10nm | Ice Lake | Architecture |
2019 | 10nm | Tiger Lake | Optimization |
2020 | 7nm | ??? | Process |
This suggests that 10nm "Cannonlake" chips will be released in 2017, followed by a new 10nm architecture in 2018 (tentatively named "Ice Lake"), optimization in 2019 (tentatively named "Tiger Lake"), and 7nm chips in 2020. This year's "optimization" will come in the form of "Kaby Lake", which could end up making underwhelming improvements such as slightly higher clock speeds, due to higher yields of the previously-nameed "Skylake" chips. To be fair, Kaby Lake will supposedly add the following features alongside any CPU performance tweaks:
Kaby Lake will add native USB 3.1 support, whereas Skylake motherboards require a third-party add-on chip in order to provide USB 3.1 ports. It will also feature a new graphics architecture to improve performance in 3D graphics and 4K video playback. Kaby Lake will add native HDCP 2.2 support. Kaby Lake will add full fixed function HEVC Main10/10-bit and VP9 10-bit hardware decoding.
Previously: Intel's "Tick-Tock" Strategy Stalls, 10nm Chips Delayed
Related Stories
Intel's "Tick-Tock" strategy of micro-architectural changes followed by die shrinks has officially stalled. Although Haswell and Broadwell chips have experienced delays, and Broadwell desktop chips have been overshadowed by Skylake, delays in introducing 10nm process node chips have resulted in Intel's famously optimistic roadmap missing its targets by about a whole year. 10nm Cannonlake chips were set to begin volume production in late 2016, but are now scheduled for the second half of 2017. In its place, a third generation of 14nm chips named "Kaby Lake" will be launched. It is unclear what improvements Kaby Lake will bring over Skylake.
Intel will not be relying on the long-delayed extreme ultraviolet (EUV) lithography to make 10nm chips. The company's revenues for the last quarter were better than expected, despite the decline of the PC market. Intel's CEO revealed the stopgap 14nm generation at the Q2 2015 earnings call:
"The lithography is continuing to get more difficult as you try and scale and the number of multi-pattern steps you have to do is increasing," [Intel CEO Brian Krzanich] said, adding, "This is the longest period of time without a lithography node change."
[...] But Krzanich seemed confident that letting up on the gas, at least for now, is the right move – with the understanding that Intel will aim to get back onto its customary two-year cycle as soon as possible. "Our customers said, 'Look, we really want you to be predictable. That's as important as getting to that leading edge'," Krzanich said during Wednesday's earnings call. "We chose to actually just go ahead and insert – since nothing else had changed – insert this third wave [with Kaby Lake]. When we go from 10-nanometer to 7-nanometer, it will be another set of parameters that we'll reevaluate this."
Intel Roadmap | ||||
---|---|---|---|---|
Year | Old | New | ||
2014 | 14nm Broadwell | 14nm Broadwell | ||
2015 | 14nm Skylake | 14nm Skylake | ||
2016 | 10nm Cannonlake | 14nm Kaby Lake | ||
2017 | 10nm "Tock" | 10nm Cannonlake | ||
2018 | N/A | 10nm "Tock" |
Original Submission
Intel's Senior Vice President Jim Keller (who previously helped to design AMD's K8 and Zen microarchitectures) gave a talk at the Silicon 100 Summit that promised continued pursuit of transistor scaling gains, including a roughly 50x increase in gate density:
Intel's New Chip Wizard Has a Plan to Bring Back the Magic (archive)
In 2016, a biennial report that had long served as an industry-wide pledge to sustain Moore's law gave up and switched to other ways of defining progress. Analysts and media—even some semiconductor CEOs—have written Moore's law's obituary in countless ways. Keller doesn't agree. "The working title for this talk was 'Moore's law is not dead but if you think so you're stupid,'" he said Sunday. He asserted that Intel can keep it going and supply tech companies ever more computing power. His argument rests in part on redefining Moore's law.
[...] Keller also said that Intel would need to try other tactics, such as building vertically, layering transistors or chips on top of each other. He claimed this approach will keep power consumption down by shortening the distance between different parts of a chip. Keller said that using nanowires and stacking his team had mapped a path to packing transistors 50 times more densely than possible with Intel's 10 nanometer generation of technology. "That's basically already working," he said.
The ~50x gate density claim combines ~3x density from additional pitch scaling (from "10nm"), ~2x from nanowires, another ~2x from stacked nanowires, ~2x from wafer-to-wafer stacking, and ~2x from die-to-wafer stacking.
Related: Intel's "Tick-Tock" Strategy Stalls, 10nm Chips Delayed
Intel's "Tick-Tock" is Now More Like "Process-Architecture-Optimization"
Moore's Law: Not Dead? Intel Says its 10nm Chips Will Beat Samsung's
Another Step Toward the End of Moore's Law
Intel says it was too aggressive pursuing 10nm, will have 7nm chips in 2021
[Intel's CEO Bob] Swan made a public appearance at Fortune's Brainstorm Tech conference in Aspen, Colorado, on Tuesday and explained to the audience in attendance that Intel essentially set the bar too high for itself in pursuing 10nm. More specifically, he pointed to Intel's overly "aggressive goal" of going after a 2.7x transistor density improvement over 14nm.
[...] Needless to say, the 10nm delays have caused Intel to fall well behind that transistor density doubling. Many have proclaimed Moore's Law as dead, but as far as Swan is concerned, Moore's Law is not dead. It apparently just needed to undergo an unexpected surgery.
"The challenges of being late on this latest [10nm] node of Moore's Law was somewhat a function of what we've been able to do in the past, which in essence was define the odds on scaling the infrastructure," Swan explains. Bumping up to a 2.7x scaling factor proved to be "very complicated," more so than Intel anticipated. He also says that Intel erred when it "prioritized performance at a time when predictability was really important."
"The short story is we learned from it, we'll get our 10nm node out this year. Our 7nm node will be out in two years and it will be a 2.0X scaling so back to the historical Moore's Law curve," Swan added.
Also at Fortune and Tom's Hardware.
Related:
Intel's "Tick-Tock" Strategy Stalls, 10nm Chips Delayed
Intel's "Tick-Tock" is Now More Like "Process-Architecture-Optimization"
Moore's Law: Not Dead? Intel Says its 10nm Chips Will Beat Samsung's
Intel Releases Open Letter in Attempt to Address Shortage of "14nm" Processors and "10nm" Delays
Intel Says "7nm" Node Using Extreme Ultraviolet Lithography is on Track
Intel Promises "10nm" Chips by the End of 2019, and More
Intel Launches Coffee Lake Refresh, Roadmap Leaks Showing No "10nm" Desktop Parts Until 2022
Intel Shares "10nm" Ice Lake Processor Details
HP Boss: Intel Shortages are Steering Our Suited Customers to Buy AMD
Intel's Jim Keller Promises That "Moore's Law" is Not Dead, Outlines 50x Improvement Plan
(Score: 2) by Gravis on Wednesday March 23 2016, @04:30AM
it seems to me that if they really want to optimize their processors for speed, processing power or Watts per cycle that they should be using genetic optimization for their circuitry. of course it makes one ponder if they aren't doing that already and just holding back the results so that they can make more money.
(Score: 2) by francois.barbier on Wednesday March 23 2016, @02:38PM
http://www.damninteresting.com/on-the-origin-of-circuits/ [damninteresting.com]
(Score: 3, Insightful) by RamiK on Wednesday March 23 2016, @04:31AM
This is an optimistic linear extrapolation. A pessimistic - yet equally unfounded - graph would follow an exponential decay function to reflect the known physical limit around 1nm on the y axis and time on the x axis.
Of course, outside the voodoo of statistics and magical thinking regarding node advancements, one could carefully measure the reduced purchasing power over this never-ending recession; Adjust for increased competition from ARM in the data center; Note the lukewarm sales figures of Windows Phones and Desktops; Survey the number of governments switching to open-source software over security and cost concern with long term "talks" regarding open-source hardware; Compensate for the lack of the past IBM precursor advancement in POWER production; etc... And come up with an actually meaningful projection regarding future Intel x86 CPUs sales.
Then again, one is too lazy and tired so one is going to sleep.
compiling...
(Score: 2) by bzipitidoo on Wednesday March 23 2016, @06:39AM
So when are they going to dump the terrible x86 architecture? Possibly the stupidest part was designing the math coprocessor around a stack. That'd be a huge optimization. If they're not willing to do that, could they at least drop some cruft? The whole CALL, RET, and use of 2 registers to maintain a call stack ought to just go away. Also drop the decimal math instructions. In short, the x86 architecture is way too CISC.
(Score: 1, Informative) by Anonymous Coward on Wednesday March 23 2016, @06:52AM
That stuff got ripped out of the 64-bit instruction set and ABI, except that call/ret are still used.
There are way worse examples, like the new opcode for strspn and strcspn. There's a new opcode for CRC32, and some for AES. The virtualization stuff is completely insane. According to this soylentnews article, we're getting VP9 video in an upcoming CPU!!!
If you want to dig up old crud, there are other horrid examples. The built-in hardware task switching in combination with the MMU is turing-complete.
(Score: 3, Insightful) by maxwell demon on Wednesday March 23 2016, @10:55AM
There's only one register needed to maintain the call stack, the stack pointer. The use of a second register (BP) is due to language calling conventions (which due to their common usage later got codified into the instructions enter/leave). More exactly, it's because of variable-length argument lists being stored on the stack, and C using the same calling conventions also for functions without variable-length argument list.
Indeed, if the call stack were used for just what it was originally intended for, to store the return address and saved registers during a function call, it would not need to be accessible through the data segment in 32 bit mode, which would mean that it would be orders of magnitude harder for exploits to manipulate return addresses.
I'm not sure what you'd want to replace CALL/RET with.
The Tao of math: The numbers you can count are not the real numbers.
(Score: 2) by bzipitidoo on Wednesday March 23 2016, @03:18PM
Replace CALL and RET with the basic RISC instructions that compose it. They are basically jumps with extra actions, stack manipulations, that aren't always needed or wanted.
CALL and RET are perfect examples why having several actions in a single instruction, CISC style, can be a bad idea. For instance, suppose that in calling a subroutine, parameters need to be passed and registers saved. The classic way to do that is a whole bunch of PUSH instructions. A big problem with that is that the stack pointer has to be updated each time PUSH or CALL is used. It would be faster to simply MOV everything needed onto the stack, and update the stack pointer once. Jump instead of call. Even uglier is that the CALL has to come after the parameters are PUSHed, which puts the return address on top of the stack. Now to get the parameters, the subroutine has to POP the return address and save it somewhere else, then POP the parameters, and finally PUSH the return address back onto the stack. That last action is solely to accommodate RET. In the calling code the return address was already known at compile time, why wait until a CALL instruction to put it on the stack, why not put it on the stack first? So, what to do about all this gross inefficiency? If not push the return address first, put the parameters somewhere else, maybe in registers, if there is enough room, or in a separate stack. PUSH, POP, CALL, and RET all use the same stack pointer, so it's a little extra work to set up and maintain a 2nd stack. There are work arounds of course, but it's best that there not be those problems. Just don't even use CALL, RET, PUSH, and POP. Why do all this stack maintenance anyway? In case the subroutine is recursive, and the recursion is not tail-end. Otherwise, maintaining a stack for parameters is a waste of CPU cycles.
(Score: 3, Informative) by turgid on Wednesday March 23 2016, @08:27PM
There's nothing to stop you doing that, and I seem to remember seeing some (compiler-generated?) code that did precisely that donkey's years ago.
In fact, the rest of what you mention (the location of the return address on the stack) is merely calling convention as defined by the compiler in use (usually some variant of C or its descendants). Before the world went (mad on) C++ there were a variety of compiled high-level languages about and many of them had their own subroutine calling conventions. Pascal (and its friends) for one, was different.
I don't think the x86 instruction set is quite as limiting as you think.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 2) by maxwell demon on Thursday March 24 2016, @05:19PM
The Tao of math: The numbers you can count are not the real numbers.
(Score: 2) by bzipitidoo on Thursday March 24 2016, @11:10PM
Having to update the stack pointer half a dozen times instead of once will stall the pipeline. Those instructions cannot be executed in parallel or out of order.
> And how do you encode the return address? Note that pushing the instruction pointer will store the address of the next instruction on the stack, which is the jump. Oh wait, you don't want to push either. So it's instead: store current IP to some register, add the call sequence length to it, store the result in memory, and jump to the subroutine.
That is exactly how you do it. Storing the pre-jump IP can be built into the jump instruction. And why not? Math instructions set flags, why can't jump instructions set return points? The big difference between that sort of jump and a CALL is no stack.
> Sorry, I'm not convinced this is better than a simple CALL.
The jump is better. CALL is not simple, that's the problem. CALL does too much. It pushes an address to the stack and updates the stack pointer. Fine, if that will be used sometime later. Waste of time otherwise. And there are many cases in which the return address is not needed. Tail-end recursion is a big one. What sounds better to you, store the same return address 100,000 times because you insisted on using CALL in a tail-end recursive routine, or store it once?
Of course I am aware that the stack can be set up anywhere in normal memory.
> Please tell me which compiler targeting x86 does such nonsense
They don't. Compilers are written to avoid generating specialized instructions. Might never use CALL, RET, PUSH, or POP at all. Instructions that aren't used shouldn't be in the instruction set.
(Score: 2) by maxwell demon on Friday March 25 2016, @03:42PM
Incrementing a register is a fast operation, which will surely be finished before the memory write is. Indeed, given that the instructions are decoded internally anyway, the processor may even transform the multiple updates into a single update, as it knows the semantics of the push. Think of it as on-the-fly compilation, which always profits of knowing more semantics.
OK, so your issue is not really one instruction doing several things (in your suggested modified jump, it would both store the current value of IP to another register, and store another value to the IP, which really is two separate operations), but the fact that the processor uses a fixed data structure in memory (the stack).
If the compiler does not know/detect that you do tail recursion, it will have to emit instructions to save the address anyway. And if the compiler detects the tail recursion, it can simply replace the CALL with a JMP. So it doesn't realistically make a difference: In both cases you'll get the growing stack if the compiler does not detect the tail recursion, and a simple jump otherwise.
Those instructions which aren't used today are in the instruction set for backwards compatibility. Just like there's still the FWAIT instruction which does not do anything at all in modern x86, but was absolutely necessary in the original 8086/8087 combination. And I don't see how their presence is a big problem.
If you are designing a completely new instruction set, you've got the freedom to only include instructions which are useful with modern technology. But don't underestimate the value of backwards compatibility.
I'd bet modern PCs still allow to disable the address line A20 through something that to the software looks like an old keyboard controller. Despite no modern operating system relying on (or even correctly working with) a disabled A20 line. But you can still boot DOS on those computers.
Are you aware that some instructions in the x86 instruction set only exists in order to allow mechanical translation of 8080 code into 8086 code? Indeed, I'd bet the main reason of the 8086 segmented memory model was to enable such translations (and unlike having some legacy instructions, the 8086 segmented memory model was a real disadvantage).
The Tao of math: The numbers you can count are not the real numbers.
(Score: 1, Informative) by Anonymous Coward on Wednesday March 23 2016, @11:16AM
They did dump it, or did you miss the introduction of 64 bit?
As long as proprietary software and especially Windows is so stuck in 32 bit that you can't even get a pure 64 bit Windows they don't have much of a choice (as a result, tablets are stuck with pure 32 bit on a 64 bit processor, because combined 32+64 bit is just too large).
Decimal math, seriously? They take up no relevant space, what would be the point of removing them?
(Score: 2) by bzipitidoo on Wednesday March 23 2016, @03:43PM
They take up opcode space. Even if the additional transistors needed to support decimal math take trivial amounts of additional room on the CPU die, opcode space is still precious. To make room for these useless instructions, other, more useful instructions must be left out, or the average opcode size must be increased, which makes binaries larger, which makes cache misses more frequent.
(Score: 0) by Anonymous Coward on Thursday March 24 2016, @12:30AM
That opcode space got recycled for new 64-bit opcodes.
(Score: 0) by Anonymous Coward on Thursday March 24 2016, @05:10AM
Intel dropped it in the year 2001. All aboard! Uh, somebody...anybody aboard the uh, Itanium [wikipedia.org] train? Shucks, come on people, not that way.
(Score: 2, Interesting) by bitstream on Wednesday March 23 2016, @06:43AM
Intel (and AMD) will finally be pushed into a corner and be forced to offer something way better than the x86 instruction set? and more efficient parallelization, when possible.
Then there's a inherent inefficiency with 64-bits because all the code will handle 32 extra bits and in many cases they are not needed. Relative addressing is sometimes really nice. So the instruction set should allow address references that are limited in range, when desired. The instruction sets of MIPS, PA-RISC or ARM perhaps is something to take a deeper look at?
Chips can perhaps also run faster with proper liquid cooling. Another aspect to look into. USB on chip is a serious waste of chip area. It's a polling, duplex colliding, overhead pile of shit.
(Score: 0) by Anonymous Coward on Wednesday March 23 2016, @08:42AM
Totally agree, you should definitely send your resume at Intel HR.
(Score: 0) by Anonymous Coward on Wednesday March 23 2016, @11:47AM
http://www.news.gatech.edu/2014/02/17/silicon-germanium-chip-sets-new-speed-record [gatech.edu]
(Score: 2) by bitstream on Wednesday March 23 2016, @02:50PM
1 transistor at 798 GHz and 4.3 kelvin.
1 transistor at 417 GHz and 293 kelvin.
10 000 000 transistors at 363 kelvin and at competitive price. Does it make it?
Interesting transistor nevertheless.
(Score: 0) by Anonymous Coward on Thursday March 24 2016, @10:07PM
Ten million transistors is enough for a Pentium III. Modern processors have billions of transistors.
https://commons.wikimedia.org/wiki/File:Transistor_Count_and_Moore%27s_Law_-_2011.svg [wikimedia.org]
(Score: 2) by bitstream on Saturday March 26 2016, @06:03PM
10 million is bottom limit to do any decent computing with the current demands. Given these incredible speeds and thus likely heat. It may in fact pay to reduce the number of transistors and have fewer but really fast ones. A speed gain of 100x will likely make consumers to accept that trade-off.
(though external memory speed might be a serious bottleneck)
(Score: 2) by Alfred on Wednesday March 23 2016, @01:43PM
We're not gaining GHz anymore, 10% gain each generation from optimizations doesn't matter except to a few, the x86 instruction set is crap and holds the architecture back, baked in DRM and privacy concerns.
I have no reason to upgrade my years old i7. It is only fully utilized for video transcoding. I don't need to overclock. If everyone understood that a more powerful machine doesn't compensate for a lack of talent or lack of frags then new CPU sales would not be enticing. Even if a new CPU saved 10 minutes of compile time chances are the guy wasted the 10 gained on facebook anyway.
My only reason to buy a new CPU is to add a whole new machine.
(Score: 2) by SDRefugee on Wednesday March 23 2016, @02:27PM
Hell, my primary machine has a Xeon 5500, quad-core, running Linux, and I can't see any reason to upgrade as
long as this system still runs to my satisfaction... This constant upgrading pressure is weird and isn't happening
in *my* world...
America should be proud of Edward Snowden, the hero, whether they know it or not..
(Score: 0) by Anonymous Coward on Wednesday March 23 2016, @03:16PM
I would upgrade just for the power usage alone. That is probably a monster of a box. Something more recent will probably beat it in all respects and use less power to do it.
(Score: 0) by Anonymous Coward on Wednesday March 23 2016, @03:05PM
Off the top of my head...
Modern CPUs have:
-- extensions for AES acceleration which are used everywhere.
-- much better memory plumbing and DDR4 support
-- integrated graphics that are actually decent
-- much lower TPW's and run A LOT cooler
-- higher clock speeds in some cases, not all.
I recently[6 months ago] picked up a Xeon x5650 to replace my ancient Nehalem i7-920. I love it. Went from 130W TPW to 95W and picked up two more cores and AES instructions and vt-d, etc. It was $80 too.
My next build will be in 6 months or so, once all the Xeon stuff is out and the prices settle a bit.
I love the thought of building a NUC-ish size server with i7-ish power and 40-60W TPW that will run cool and quiet on solid-state storage. It also means I can put it on a 5-600W UPS and it will run for quite a while when the mains go down.
(Score: 2) by Alfred on Wednesday March 23 2016, @04:38PM
Though yes those are all things that new CPUs have over old ones none of them are a sufficient reason for me to upgrade. Even as a group the pull is not there. The benefit per unit cost is not good enough and when two of those also require added cost of a new mobo it gets worse. Saving 35W of TDP won't pay for itself especially when I already run at 3% load 95% of the time.
I do like building and I have been through the phase of wanting the latest or more powerful. I grew out of that and know that no one will see my rig and anyone who does won't care. Its just a tool, I don't need a Porsche to drive to work.
(Score: 0) by Anonymous Coward on Wednesday March 23 2016, @09:42PM
Awww :( I'll look at your "rig" if it'll make you feel better.
(Score: 2) by bitstream on Wednesday March 23 2016, @03:09PM
My thoughts too. They have to start improving the architecture itself not just pumping the gazillionhertz, add instructions for the latest fad (VP9, USB wtf?) or trying to do superficial improvements. Computations done with less gates using less cycles translates into less heat and the ability to push the clock closer to the real limit. Because the clock domain(s) can be kept smaller and gates can be kept from interfering each other better.
The frequency roof will probably be good for the chip design industry. They have to pay attention to what the chips do instead of relying on physics wizardry. On the other side people writing software may perhaps be smarter about algorithms and not rely on yet another faster CPU to bail them from sloppy coding practice.
Completely optical processing is likely the path to a serious performance boost in the range of 1000x. Provided a semitransparent gate can be accomplished at small geometries and high temperature.
(Score: 2) by bob_super on Wednesday March 23 2016, @05:22PM
TSMC is essentially leaping over 10nm, and straight into 7nm.
If they execute (not a small if, but early feedback is pretty good), that will give all their customers (including my former employer) a serious 2-year advantage over Intel process chips.
(Score: 2) by takyon on Wednesday March 23 2016, @09:18PM
Yeah, right. That 7nm node is going to get delayed.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by bob_super on Wednesday March 23 2016, @09:39PM
I'll take a citation for that, given that I know of actual test chips...
Good yields, and actual mass market? Could be delayed, sure, but I'm waiting to hear where you got the exclusive from.