from the Have-come-a-long-ways-from-the-4004 dept.
Intel will be using a few packaging technologies to connect CPU core "chiplets":
Intel revealed three new packaging technologies at SEMICON West: Co-EMIB, Omni-Directional Interconnect (ODI) and Multi-Die I/O (MDIO). These new technologies enable massive designs by stitching together multiple dies into one processor. Building upon Intel's 2.5D EMIB and 3D Foveros tech, the technologies aim to bring near-monolithic power and performance to heterogeneous packages. For the data-center, that could enable a platform scope that far exceeds the die-size limits of single dies.
[...] Compared to interposers, which can be reticle-sized (832mm2) or even larger, [EMIB (Embedded Multi-die Interconnect Bridge)] is just a small (hence, cheap) piece of silicon. It provides the same bandwidth and energy-per-bit advantages of an interposer compared to standard package traces, which are traditionally used for multi-chip packages (MCPs), such as AMD's Infinity Fabric. (To some extent, because the PCH is a separate die, chiplets have actually been around for a very long time.)
[...] Intel showed off a concept product that contains four Foveros stacks, with each stack having eight small compute chiplets that are connected via TSVs to the base die. (So the role of Foveros there is to connect the chiplets as if it were a monolithic die.) Each Foveros stack is then interconnected via two (Co-)EMIB links with its two adjacent Foveros stacks. Co-EMIB is further used to connect the HBM and transceivers to the compute stacks.
Evidently, the cost of such a product would be enormous, as it essentially contains multiple traditional monolithic-class products in a single package. That's likely why Intel categorized it as a data-centric concept product, aimed mainly at the cloud players that are more than happy to absorb those costs in exchange for the extra performance.
[...] When they are ready, these technologies will provide Intel with powerful capabilities for the heterogeneous and data-centric era. On the client side, the benefits of advanced packaging include smaller package size and lower power consumption (for Lakefield, Intel claims a 10x SoC standby power improvement at 2.6mW). In the data center, advanced packaging will help to build very large and powerful platforms on a single package, with performance, latency, and power characteristics close to what a monolithic die would yield. The yield advantage of small chiplets and the establishment of chipset ecosystem are major drivers, too.
Related: Intel Core i7-8809G with Radeon Graphics and High Bandwidth Memory: Details Leaked
Intel Announces "Sunny Cove", Gen11 Graphics, Discrete Graphics Brand Name, 3D Packaging, and More
Intel Promises "10nm" Chips by the End of 2019, and More
Intel Details Lakefield CPU SoC With 3D Packaging and Big/Small Core Configuration
Intel's Jim Keller Promises That "Moore's Law" is Not Dead, Outlines 50x Improvement Plan
An Intel website leaked some details of the Intel Core i7-8809G, a "Kaby Lake" desktop CPU with on-package AMD Radeon graphics and High Bandwidth Memory 2.0. While it is listed as an 8th-generation part, 8th-generation "Coffee Lake" CPUs for desktop users have up to 6 cores (in other words, Intel has been releasing multiple microarchitectures as "8th-generation"). The i7-8809G may be officially announced at the Consumer Electronics Show next week.
The components are linked together using what Intel calls "embedded multi-die interconnect bridge technology" (EMIB). The thermal design power (TDP) of the entire package is around 100 Watts:
Intel at the original launch did state that they were using Core-H grade CPUs for the Intel with Radeon Graphics products, which would mean that the CPU portion is around 45W. This would lead to ~55W left for graphics, which would be in the RX 550 level: 8 CUs, 512 SPs, running at 1100 MHz. It is worth nothing that AMD already puts up to 10 Vega CUs in its 15W processors, so with the Intel i7-8809G product Intel has likely has gone wider and slower: judging by the size of the silicon in the mockup, this could be more of a 20-24 CU design built within that 55W-75W window, depending on how the power budget is moved around between CPU and GPU. We await more information, of course.
It is rumored to include 4 GB of HBM2 on-package, while the CPU also supports DDR4-2400 memory. Two cheaper EMIB CPUs have been mentioned:
According to some other media, the 8809G will turbo to 4.1 GHz, while the graphics will feature 24 [compute units (CUs)] (1536 [stream processors (SPs)]) running at 1190 MHz while the HBM2 is 4GB and will run at 800 MHz. The same media are also listing the Core i7-8705G (20 CUs, 1000 MHz on 'Vega M GL', 700 MHz on HBM2) and a Core i7-8706G. None of the information from those sources is yet to be verified by AnandTech or found on an official Intel webpage.
Currently available AMD Ryzen Mobile APUs only include 8-10 Vega CUs. These are mobile chips with a maximum TDP of 25 W; no desktop Ryzen chips with integrated graphics have been announced yet.
Intel has announced new developments at its Architecture Day 2018:
Sunny Cove, built on 10nm, will come to market in 2019 and offer increased single-threaded performance, new instructions, and 'improved scalability'. Intel went into more detail about the Sunny Cove microarchitecture, which is in the next part of this article. To avoid doubt, Sunny Cove will have AVX-512. We believe that these cores, when paired with Gen11 graphics, will be called Ice Lake.
Willow Cove looks like it will be a 2020 core design, most likely also on 10nm. Intel lists the highlights here as a cache redesign (which might mean L1/L2 adjustments), new transistor optimizations (manufacturing based), and additional security features, likely referring to further enhancements from new classes of side-channel attacks. Golden Cove rounds out the trio, and is firmly in that 2021 segment in the graph. Process node here is a question mark, but we're likely to see it on 10nm and or 7nm. Golden Cove is where Intel adds another slice of the serious pie onto its plate, with an increase in single threaded performance, a focus on AI performance, and potential networking and AI additions to the core design. Security features also look like they get a boost.
Intel says that GT2 Gen11 integrated graphics with 64 execution units will reach 1 teraflops of performance. It compared the graphics solution to previous-generation GT2 graphics with 24 execution units, but did not mention Iris Plus Graphics GT3e, which already reached around 800-900 gigaflops with 48 execution units. The GPU will support Adaptive Sync, which is the standardized version of AMD's FreeSync, enabling variable refresh rates over DisplayPort and reducing screen tearing.
Intel's upcoming discrete graphics cards, planned for release around 2020, will be branded Xe. Xe will cover configurations from integrated and entry-level cards all the way up to datacenter-oriented products.
Like AMD, Intel will also organize cores into "chiplets". But it also announced FOVEROS, a 3D packaging technology that will allow it to mix chips from different process nodes, stack DRAM on top of components, etc. A related development is Intel's demonstration of "hybrid x86" CPUs. Like ARM's big.LITTLE and DynamIQ heterogeneous computing architectures, Intel can combine its large "Core" with smaller Atom cores. In fact, it created a 12mm×12mm×1mm SoC (compare to a dime coin which has a radius of 17.91mm and thickness of 1.35mm) with a single "Sunny Cove" core, four Atom cores, Gen11 graphics, and just 2 mW of standby power draw.
We've been on Intel's case for years to tell us when its 10nm parts are coming to the mass market. Technically Intel already shipped its first 10nm processor, Cannon Lake, but this was low volume and limited to specific geographic markets. This time Intel is promising that its first volume consumer processor on 10nm will be Ice Lake. It should be noted that Intel hasn't put a date on Ice Lake launching, but has promised 10nm on shelves by the end of 2019. It has several products that could qualify for that, but Ice Lake is the likely suspect.
At Intel's Architecture Day in December, we saw chips designated as 'Ice Lake-U', built for 15W TDPs with four cores using the new Sunny Cove microarchitecture and Gen11 graphics. Intel went into some details about this part, which we can share with you today.
The 15W processor is a quad core part supporting two threads per core, and will have 64 EUs of Gen11 graphics. 64 EUs will be the standard 'GT2' mainstream configuration for this generation, up from 24 EUs today. In order to drive that many execution units, Intel stated that they need 50-60 GB/s of memory bandwidth, which will come from LPDDR4X memory. In order for those numbers to line up, they will need LPDDR4X-3200 at a minimum, which gives 51.2 GB/s. [...] For connectivity, the chips will support Wi-Fi 6 (802.11ax) if the laptop manufacturer uses the correct interface module, but the support for Wi-Fi 6 is in the chip. The processor also supports native Thunderbolt 3 over USB Type-C, marking the first Intel chip with native TB3 support.
Intel Lakefield is based around Foveros technology which helps connect chips and chiplets in a single package that matches the functionality and performance of a monolithic SOC. Each die is then stacked using FTF micro-bumps on the active interposer through which TSVs are drilled to connect with solder bumps and eventually the final package. The whole SOC is just 12×12 (mm) which is 144mm2.
Talking about the SOC itself and its individual layers, the Lakefield SOC that has been previewed consists of at least four layers or dies, each serving a different purpose. The top two layers are composed of the DRAM which will supplement the processor as the main system memory. This is done through the PoP (Package on Package) memory layout which stacks two BGA DRAMs on top of each other as illustrated in the preview video. The SOC won't have to rely on socketed DRAM in this case which saves a lot of footprint on the main board.
The second layer is the Compute Chiplet with a Hybrid CPU architecture and graphics, based on the 10nm process node. The Hybrid CPU architecture has a total of five individual Cores, one of them is labeled as the Big Core which features the Sunny Cove architecture. That's the same CPU architecture that will be featured on Intel's upcoming 10nm Ice Lake processors. The Sunny Cove Core is optimized for high-performance throughput. There are also four small CPUs that are based on the 10nm process but optimized for power efficiency. The same die [has] Intel's Gen 11 graphics engine with 64 Execution Units.
[...] [Last] of all is the base die which serves as the cache and I/O block of the SOC. Labeled as the P1222 and based on a 22FFL process node, the base die comes with a low cost and low leakage design while providing a feature-rich array of I/O capabilities.
It would be nice to finally see some consumer CPUs with stacked DRAM, although the amount was not specified (8 GB?).
Intel's Senior Vice President Jim Keller (who previously helped to design AMD's K8 and Zen microarchitectures) gave a talk at the Silicon 100 Summit that promised continued pursuit of transistor scaling gains, including a roughly 50x increase in gate density:
In 2016, a biennial report that had long served as an industry-wide pledge to sustain Moore's law gave up and switched to other ways of defining progress. Analysts and media—even some semiconductor CEOs—have written Moore's law's obituary in countless ways. Keller doesn't agree. "The working title for this talk was 'Moore's law is not dead but if you think so you're stupid,'" he said Sunday. He asserted that Intel can keep it going and supply tech companies ever more computing power. His argument rests in part on redefining Moore's law.
[...] Keller also said that Intel would need to try other tactics, such as building vertically, layering transistors or chips on top of each other. He claimed this approach will keep power consumption down by shortening the distance between different parts of a chip. Keller said that using nanowires and stacking his team had mapped a path to packing transistors 50 times more densely than possible with Intel's 10 nanometer generation of technology. "That's basically already working," he said.
The ~50x gate density claim combines ~3x density from additional pitch scaling (from "10nm"), ~2x from nanowires, another ~2x from stacked nanowires, ~2x from wafer-to-wafer stacking, and ~2x from die-to-wafer stacking.
Related: Intel's "Tick-Tock" Strategy Stalls, 10nm Chips Delayed
Intel's "Tick-Tock" is Now More Like "Process-Architecture-Optimization"
Moore's Law: Not Dead? Intel Says its 10nm Chips Will Beat Samsung's
Another Step Toward the End of Moore's Law
Taiwan Semiconductor Manufacturing Company (TSMC) announced a number of node scaling details and technological advancements at its 2020 Technology Symposium:
TSMC's first "5nm" node (N5) has a lower defect rate than its initial "7nm" node did at the same point in its development cycle (high volume manufacturing, which N5 is now in). This is due in part to increasing use of extreme ultraviolet lithography (EUV). "5nm" will represent 11% of TSMC's sub-"16nm" wafer production in 2020.
TSMC's "3nm" node (N3) will continue to use FinFETs rather than gate-all-around (GAA) transistors, and is scheduled for volume production in mid-late 2022. Performance is expected to improve 10-15% at the same power (compared to N5), or power consumption will be reduced 25-30% for the same performance. Logic area density improvement will be 1.7x, but SRAM density will only increase by 1.2x, leading to a 1.27x overall density increase for chips that are 70% SRAM and 30% logic.
Intel's EMIB (Embedded Die Interconnect Bridge) connects "chiplets" together without using a full silicon interposer. TSMC has its own version that it is calling Local Si Interconnect (LSI), and it will be combined with other packaging technologies. TSMC has demonstrated 12-layer stacking of chips using through silicon vias (TSVs), although cooling or doing anything useful with them could be somebody else's job.
See also: TSMC Updates on Node Availability Beyond Logic: Analog, HV, Sensors, RF
TSMC Launches New N12e Process: FinFET at 0.4V for IoT
2023 Interposers: TSMC Hints at 3400mm2 + 12x HBM in one Package
TSMC and Graphcore Prepare for AI Acceleration on 3nm
TSMC Has Reportedly Secured Orders for Its 2nm Node – Samsung May Not Beat Its Foundry Rival Until 2030, Claims New Report
While Intel has been discussing a lot about its mainstream Core microarchitecture, it can become easy to forget that its lower power Atom designs are still prevalent in many commercial verticals. Last year at Intel's Architecture Summit, the company unveiled an extended roadmap showing the next three generations of Atom following Goldmont Plus: Tremont, Gracemont, and 'Future Mont'. Tremont is set to be launched this year, coming first in a low powered hybrid x86 design called Lakefield for notebooks, and using a new stacking technology called Foveros built on 10+ nm. At the Linley Processor Conference today, Intel unveiled more about the microarchitecture behind Tremont.
[...] The Atom core within a given family is usually identical (L2 [cache] configuration might change), and because of the SoC in play, it might get a different name based on the market where it was headed. Intel scrapped the smartphone program back with Broxton in 2016, and the tablet type of SoC has also gone away. With Lakefield, combining Core and Atom, it could be used in Tablets again for 2019/2020, but we will see it in Notebooks with the Surface Pro Neo and in networking/embedded markets as Snow Ridge.
[...] The interesting thing here in our briefing with Intel is that they specifically stated that Tremont was built with performance in mind, and the aim was for a sizeable uptick in the raw clock-for-clock throughput compared to the previous generation Atom, Goldmont Plus. Based on Intel's own metrics, namely using SPEC, Intel is going to claim an average 30% iso-frequency performance uplift in core performance for Tremont over Goldmont Plus. It's worth noting here that this data is from an early Tremont design we were told, and should represent minimum uplifts.
[...] A 30% average jump in performance is a sizeable jump for any generation-to-generation cadence. Just taking it as-is feels premature: aside from microarchitectural advancements and a jump to 10nm, there has to be something at play here – either the power budget of Atom has ballooned, or the die area. With Intel explicitly out of the gate stating that their focusing on performance, a cynic is going to suggested that something else has paid that price, and to that end Intel wasn't prepared to talk about power windows or die area, though they did point to the already announced Lakefield CPU, which has a 1 x Core + 4 x Tremont design
Intel Lakefield will be the first processors to feature the chipmaker's 3D Foveros packaging. Foveros is a technology that essentially allows Intel to stack chips one on top of the other, equivalent to what storage manufacturers are doing with some new types of 3D NAND (string stacking).
According to 3DMark's report, the unidentified processor is equipped with five cores, which concurs with the core configuration for Intel's Lakefield chips. As you recall, Lakefield utilizes a design that's similar to ARM's big.LITTLE architecture. Intel complements the powerful core with other slower and more energy-efficient cores.
In Lakefield's case, Intel plans to endow the processor with one Sunny Cove core and four accompanying Atom Tremont cores. The chipmaker will cook up Lakefield chips with a combination of manufacturing process. Intel uses the 10nm node for the compute die and the 22nm node for the base die.
I'd like to see configurations with 1 small core for every 4 big cores, with the small cores handling low-level and background tasks.
Previously: Intel Details Lakefield CPU SoC With 3D Packaging and Big/Small Core Configuration
AMD Plans to Stack DRAM and SRAM on Top of its Future Processors
Intel Reveals Three New Packaging Technologies for Stitching Multiple Dies Into One Processor