Intel Lakefield is based around Foveros technology which helps connect chips and chiplets in a single package that matches the functionality and performance of a monolithic SOC. Each die is then stacked using FTF micro-bumps on the active interposer through which TSVs are drilled to connect with solder bumps and eventually the final package. The whole SOC is just 12×12 (mm) which is 144mm2.
Talking about the SOC itself and its individual layers, the Lakefield SOC that has been previewed consists of at least four layers or dies, each serving a different purpose. The top two layers are composed of the DRAM which will supplement the processor as the main system memory. This is done through the PoP (Package on Package) memory layout which stacks two BGA DRAMs on top of each other as illustrated in the preview video. The SOC won't have to rely on socketed DRAM in this case which saves a lot of footprint on the main board.
The second layer is the Compute Chiplet with a Hybrid CPU architecture and graphics, based on the 10nm process node. The Hybrid CPU architecture has a total of five individual Cores, one of them is labeled as the Big Core which features the Sunny Cove architecture. That's the same CPU architecture that will be featured on Intel's upcoming 10nm Ice Lake processors. The Sunny Cove Core is optimized for high-performance throughput. There are also four small CPUs that are based on the 10nm process but optimized for power efficiency. The same die [has] Intel's Gen 11 graphics engine with 64 Execution Units.
[...] [Last] of all is the base die which serves as the cache and I/O block of the SOC. Labeled as the P1222 and based on a 22FFL process node, the base die comes with a low cost and low leakage design while providing a feature-rich array of I/O capabilities.
It would be nice to finally see some consumer CPUs with stacked DRAM, although the amount was not specified (8 GB?).
Intel has announced new developments at its Architecture Day 2018:
Sunny Cove, built on 10nm, will come to market in 2019 and offer increased single-threaded performance, new instructions, and 'improved scalability'. Intel went into more detail about the Sunny Cove microarchitecture, which is in the next part of this article. To avoid doubt, Sunny Cove will have AVX-512. We believe that these cores, when paired with Gen11 graphics, will be called Ice Lake.
Willow Cove looks like it will be a 2020 core design, most likely also on 10nm. Intel lists the highlights here as a cache redesign (which might mean L1/L2 adjustments), new transistor optimizations (manufacturing based), and additional security features, likely referring to further enhancements from new classes of side-channel attacks. Golden Cove rounds out the trio, and is firmly in that 2021 segment in the graph. Process node here is a question mark, but we're likely to see it on 10nm and or 7nm. Golden Cove is where Intel adds another slice of the serious pie onto its plate, with an increase in single threaded performance, a focus on AI performance, and potential networking and AI additions to the core design. Security features also look like they get a boost.
Intel says that GT2 Gen11 integrated graphics with 64 execution units will reach 1 teraflops of performance. It compared the graphics solution to previous-generation GT2 graphics with 24 execution units, but did not mention Iris Plus Graphics GT3e, which already reached around 800-900 gigaflops with 48 execution units. The GPU will support Adaptive Sync, which is the standardized version of AMD's FreeSync, enabling variable refresh rates over DisplayPort and reducing screen tearing.
Intel's upcoming discrete graphics cards, planned for release around 2020, will be branded Xe. Xe will cover configurations from integrated and entry-level cards all the way up to datacenter-oriented products.
Like AMD, Intel will also organize cores into "chiplets". But it also announced FOVEROS, a 3D packaging technology that will allow it to mix chips from different process nodes, stack DRAM on top of components, etc. A related development is Intel's demonstration of "hybrid x86" CPUs. Like ARM's big.LITTLE and DynamIQ heterogeneous computing architectures, Intel can combine its large "Core" with smaller Atom cores. In fact, it created a 12mm×12mm×1mm SoC (compare to a dime coin which has a radius of 17.91mm and thickness of 1.35mm) with a single "Sunny Cove" core, four Atom cores, Gen11 graphics, and just 2 mW of standby power draw.
We've been on Intel's case for years to tell us when its 10nm parts are coming to the mass market. Technically Intel already shipped its first 10nm processor, Cannon Lake, but this was low volume and limited to specific geographic markets. This time Intel is promising that its first volume consumer processor on 10nm will be Ice Lake. It should be noted that Intel hasn't put a date on Ice Lake launching, but has promised 10nm on shelves by the end of 2019. It has several products that could qualify for that, but Ice Lake is the likely suspect.
At Intel's Architecture Day in December, we saw chips designated as 'Ice Lake-U', built for 15W TDPs with four cores using the new Sunny Cove microarchitecture and Gen11 graphics. Intel went into some details about this part, which we can share with you today.
The 15W processor is a quad core part supporting two threads per core, and will have 64 EUs of Gen11 graphics. 64 EUs will be the standard 'GT2' mainstream configuration for this generation, up from 24 EUs today. In order to drive that many execution units, Intel stated that they need 50-60 GB/s of memory bandwidth, which will come from LPDDR4X memory. In order for those numbers to line up, they will need LPDDR4X-3200 at a minimum, which gives 51.2 GB/s. [...] For connectivity, the chips will support Wi-Fi 6 (802.11ax) if the laptop manufacturer uses the correct interface module, but the support for Wi-Fi 6 is in the chip. The processor also supports native Thunderbolt 3 over USB Type-C, marking the first Intel chip with native TB3 support.
AMD revealed at a recent high performance computing event that it is working on new designs that use 3D-stacked DRAM and SRAM on top of its processors to improve performance.
[...] Intel whipped the covers off its Foveros 3D chip stacking technology during its recent Architecture Day event and revealed it already has a leading-edge product ready to enter production. The package consists of a 10nm CPU and an I/O chip mated with TSVs (Through Silicon Via) that connect the die through vertical electrical connections in the center of the die. Intel also added a memory chip to the top of the stack using a conventional PoP (Package on Package) implementation.
Not to be left behind, AMD is also turning its eyes toward 3D chip stacking techniques, albeit from a slightly different angle. AMD SVP and GM Forrest Norrod recently presented at the Rice Oil and Gas HPC conference and revealed that the company has its own 3D stacking intiative underway.
[...] [True] 3D stacking consists of two die (in this case, memory and a processor) placed on top of each other and connected through vertical TSV connections that mate the die directly together. These TSV connections, which transfer data between the two die at the fastest speeds possible, typically reside in the center of the die. That direct mating increases performance and reduces power consumption (all data movement requires power, but direct connections streamline the process). 3D stacking also affords density advantages.
Where are the CPUs with attached High Bandwidth Memory?
At Computex 2019 in Taipei, AMD CEO Lisa Su gave a keynote presentation announcing the first "7nm" Navi GPU and Ryzen 3000-series CPUs. All of the products will support PCI Express 4.0.
Contrary to recent reports, AMD says that the Navi microarchitecture is not based on Graphics Core Next (GCN), but rather a new "RDNA" macroarchitecture ('R' for Radeon), although the extent of the difference is not clear. There is also no conflict with Nvidia's naming scheme; the 5000-series naming is a reference to the company's 50th anniversary.
AMD claims that Navi GPUs will have 25% better performance/clock and 50% better performance/Watt vs. Vega GPUs. AMD Radeon RX 5700 is the first "7nm" Navi GPU to be announced. It was compared with Nvidia's GeForce RTX 2070, with the RX 5700 outperforming the RTX 2070 by 10% in the AMD-favorable game Strange Brigade. Pricing and other launch details will be revealed on June 10.
|CPU||Cores / Threads||Frequency||TDP||Price|
|Ryzen 9 3900X||12 / 24||3.8 - 4.6 GHz||105 W||$499|
|Ryzen 7 3800X||8 / 16||3.9 - 4.5 GHz||105 W||$399|
|Ryzen 7 3700X||8 / 16||3.6 - 4.4 GHz||65 W||$329|
|Ryzen 5 3600X||6 / 12||3.8 - 4.4 GHz||95 W||$249|
|Ryzen 5 3600||6 / 12||3.6 - 4.2 GHz||65 W||$199|
The Ryzen 9 3900X is the only CPU in the list using two core chiplets, each with 6 of 8 cores enabled. AMD has held back on releasing a 16-core monster for now. AMD compared the Ryzen 9 3900X to the $1,189 Intel Core i9-9920X, the Ryzen 7 3800X to the $499 Intel Core i9-9900K, and the Ryzen 7 3700X to the Intel Core i7-9700K, with the AMD chips outperforming the Intel chips in certain single and multi-threaded benchmarks (wait for the reviews before drawing any definitive conclusions). All five of the processors will come with a bundled cooler, as seen in this list.
Intel will be using a few packaging technologies to connect CPU core "chiplets":
Intel revealed three new packaging technologies at SEMICON West: Co-EMIB, Omni-Directional Interconnect (ODI) and Multi-Die I/O (MDIO). These new technologies enable massive designs by stitching together multiple dies into one processor. Building upon Intel's 2.5D EMIB and 3D Foveros tech, the technologies aim to bring near-monolithic power and performance to heterogeneous packages. For the data-center, that could enable a platform scope that far exceeds the die-size limits of single dies.
[...] Compared to interposers, which can be reticle-sized (832mm2) or even larger, [EMIB (Embedded Multi-die Interconnect Bridge)] is just a small (hence, cheap) piece of silicon. It provides the same bandwidth and energy-per-bit advantages of an interposer compared to standard package traces, which are traditionally used for multi-chip packages (MCPs), such as AMD's Infinity Fabric. (To some extent, because the PCH is a separate die, chiplets have actually been around for a very long time.)
[...] Intel showed off a concept product that contains four Foveros stacks, with each stack having eight small compute chiplets that are connected via TSVs to the base die. (So the role of Foveros there is to connect the chiplets as if it were a monolithic die.) Each Foveros stack is then interconnected via two (Co-)EMIB links with its two adjacent Foveros stacks. Co-EMIB is further used to connect the HBM and transceivers to the compute stacks.
Evidently, the cost of such a product would be enormous, as it essentially contains multiple traditional monolithic-class products in a single package. That's likely why Intel categorized it as a data-centric concept product, aimed mainly at the cloud players that are more than happy to absorb those costs in exchange for the extra performance.
[...] When they are ready, these technologies will provide Intel with powerful capabilities for the heterogeneous and data-centric era. On the client side, the benefits of advanced packaging include smaller package size and lower power consumption (for Lakefield, Intel claims a 10x SoC standby power improvement at 2.6mW). In the data center, advanced packaging will help to build very large and powerful platforms on a single package, with performance, latency, and power characteristics close to what a monolithic die would yield. The yield advantage of small chiplets and the establishment of chipset ecosystem are major drivers, too.
Related: Intel Core i7-8809G with Radeon Graphics and High Bandwidth Memory: Details Leaked
Intel Announces "Sunny Cove", Gen11 Graphics, Discrete Graphics Brand Name, 3D Packaging, and More
Intel Promises "10nm" Chips by the End of 2019, and More
Intel Details Lakefield CPU SoC With 3D Packaging and Big/Small Core Configuration
Intel's Jim Keller Promises That "Moore's Law" is Not Dead, Outlines 50x Improvement Plan
Intel Lakefield will be the first processors to feature the chipmaker's 3D Foveros packaging. Foveros is a technology that essentially allows Intel to stack chips one on top of the other, equivalent to what storage manufacturers are doing with some new types of 3D NAND (string stacking).
According to 3DMark's report, the unidentified processor is equipped with five cores, which concurs with the core configuration for Intel's Lakefield chips. As you recall, Lakefield utilizes a design that's similar to ARM's big.LITTLE architecture. Intel complements the powerful core with other slower and more energy-efficient cores.
In Lakefield's case, Intel plans to endow the processor with one Sunny Cove core and four accompanying Atom Tremont cores. The chipmaker will cook up Lakefield chips with a combination of manufacturing process. Intel uses the 10nm node for the compute die and the 22nm node for the base die.
I'd like to see configurations with 1 small core for every 4 big cores, with the small cores handling low-level and background tasks.
Previously: Intel Details Lakefield CPU SoC With 3D Packaging and Big/Small Core Configuration
AMD Plans to Stack DRAM and SRAM on Top of its Future Processors
Intel Reveals Three New Packaging Technologies for Stitching Multiple Dies Into One Processor
While Intel has been discussing a lot about its mainstream Core microarchitecture, it can become easy to forget that its lower power Atom designs are still prevalent in many commercial verticals. Last year at Intel's Architecture Summit, the company unveiled an extended roadmap showing the next three generations of Atom following Goldmont Plus: Tremont, Gracemont, and 'Future Mont'. Tremont is set to be launched this year, coming first in a low powered hybrid x86 design called Lakefield for notebooks, and using a new stacking technology called Foveros built on 10+ nm. At the Linley Processor Conference today, Intel unveiled more about the microarchitecture behind Tremont.
[...] The Atom core within a given family is usually identical (L2 [cache] configuration might change), and because of the SoC in play, it might get a different name based on the market where it was headed. Intel scrapped the smartphone program back with Broxton in 2016, and the tablet type of SoC has also gone away. With Lakefield, combining Core and Atom, it could be used in Tablets again for 2019/2020, but we will see it in Notebooks with the Surface Pro Neo and in networking/embedded markets as Snow Ridge.
[...] The interesting thing here in our briefing with Intel is that they specifically stated that Tremont was built with performance in mind, and the aim was for a sizeable uptick in the raw clock-for-clock throughput compared to the previous generation Atom, Goldmont Plus. Based on Intel's own metrics, namely using SPEC, Intel is going to claim an average 30% iso-frequency performance uplift in core performance for Tremont over Goldmont Plus. It's worth noting here that this data is from an early Tremont design we were told, and should represent minimum uplifts.
[...] A 30% average jump in performance is a sizeable jump for any generation-to-generation cadence. Just taking it as-is feels premature: aside from microarchitectural advancements and a jump to 10nm, there has to be something at play here – either the power budget of Atom has ballooned, or the die area. With Intel explicitly out of the gate stating that their focusing on performance, a cynic is going to suggested that something else has paid that price, and to that end Intel wasn't prepared to talk about power windows or die area, though they did point to the already announced Lakefield CPU, which has a 1 x Core + 4 x Tremont design
Yesterday, Samsung Electronics had announced a new 3D IC packaging technology called eXtended-Cube, or "X-Cube", allowing chip-stacking of SRAM dies on top of a base logic die through TSVs.
Current TSV deployments in the industry mostly come in the form of stacking memory dies on top of a memory controller die in high-bandwidth-memory (HBM) modules that are then integrated with more complex packaging technologies, such as silicon interposers, which we see in today's high-end GPUs and FPGAs, or through other complex packaging such as Intel's EMIB.
Samsung's X-Cube is quite different to these existing technologies in that it does away with intermediary interposers or silicon bridges, and directly connects a stacked chip on top of the primary logic die of a design.
Samsung has built a 7nm EUV test chip using this methodology by integrating an SRAM die on top of a logic die. The logic die is designed with TSV pillars which then connect to µ-bumps with only 30µm pitch, allowing the SRAM-die to be directly connected to the main die without intermediary mediums. The company this is the industry's first such design with an advanced process node technology.
[...] Stacking more valuable SRAM instead of DRAM on top of the logic chip would likely represent a higher value proposition and return-on-investment to chip designers, as this would allow smaller die footprints for the base logic dies, with larger SRAM cache structures being able to reside on the stacked die. Such a large SRAM die would naturally also allow for significantly more SRAM that would allow for higher performance and lower power usage for a chip.
3D SRAM is not a new idea, but this kind of stacking could become commonplace in CPUs within a few years. SRAM takes up a large amount of CPU die area, so stacking it into layers above or near cores could be beneficial.
Intel is splitting its high-end Tiger Lake mobile chip lineup to meet two TDP targets: 35 Watts and 45 Watts. Tiger Lake-H35 chips have been launched, with 4 cores, 8 threads, and 96 graphics "Xe" (Gen12) execution units. Later in Q1, Intel will launch 45 Watt TDP Tiger Lake with up to 8 cores, 16 threads.
Intel has also launched new "Jasper Lake" Celeron/Pentium chips on a "10nm" process node. Jasper Lake uses the Tremont Atom core previously used in Lakefield. TDPs range from 6-10 Watts. 16 GB of memory is explicity supported, up from 8 GB of Gemini Lake Refresh (although boards like the ODROID-H2+ supported 32 GB, go figure). Graphics performance of the Pentium Silver N6005 should be substantially higher than its predecessor due to the use of Gen11 graphics and an increase to 32 execution units.
See also: Intel says the Iris Xe Max isn't really for gaming. They're not wrong
Intel Confirms 10nm Ice Lake Xeon Production Has Started
Intel Appoints Pat Gelsinger as New CEO, From Feb 15th
Intel Launches 11th Gen vPro For Tiger Lake Mobile CPUs, Adds CET Security Tech
An Interview with Intel CEO Bob Swan: Roundtable Q&A on Fabs and Future