Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Saturday June 03 2023, @06:23PM   Printer-friendly

What Are Chiplets and Why They Are So Important for the Future of Processors:

[Editor's Comment: This a long 'summary' but even so much has been removed from the original to comply with fair use restrictions. If you are interested I recommend reading the entire article. JR]

While chiplets have been in use for decades, they've been employed sparingly and for very specific purposes. Now, they're at the cutting edge of technology, with millions of people worldwide using them in desktop PCs, workstations, and servers.

[...] Chiplets are segmented processors. Instead of consolidating every part into a single chip (known as a monolithic approach), specific sections are manufactured as separate chips. These individual chips are then mounted together into a single package using a complex connection system.

This arrangement allows the parts that can benefit from the latest fabrication methods to be shrunk in size, improving the efficiency of the process and allowing them to fit in more components.

The parts of the chip that can't be significantly reduced or don't require reduction can be produced using older and more economical methods.

While the process of manufacturing such processors is complex, the overall cost is typically lower. Furthermore, it offers processor companies a more manageable pathway to expand their product range.

To fully understand why processor manufacturers have turned to chiplets, we must first delve into how these devices are made. CPUs and GPUs start their life as large discs made of ultra-pure silicon, typically a little under 12 inches (300 mm) in diameter and 0.04 inches (1 mm) thick.

This silicon wafer undergoes a sequence of intricate steps, resulting in multiple layers of different materials – insulators, dielectrics, and metals. These layers' patterns are created through a process called photolithography, where ultraviolet light is shone through an enlarged version of the pattern (a mask), and subsequently shrunk via lenses to the required size.

The pattern gets repeated, at set intervals, across the surface of the wafer and each of these will ultimately become a processor. Since chips are rectangular and wafers are circular, the patterns must overlap the disc's perimeter. These overlapping parts are ultimately discarded as they are non-functional.[...]

[...] Unfortunately, while logic circuits continue to shrink with every major step forward in process node technology, analog circuits have barely changed and SRAM is starting to reach a limit too.

While logic still forms the largest portion of the die, the amount of SRAM in today's CPUs and GPUs has significantly grown in recent years. For example, AMD's Vega 20 chip used in its Radeon VII graphics card has a combined total of 5 MB of L1 and L2 cache. Just two GPU generations later, the Navi 21 has over 130 MB of assorted cache – a remarkable 25 times more than Vega 20.

We can expect these levels to continue to increase as new generations of processors are developed, but with memory not scaling down as well as the logic, it will become increasingly less cost-effective to manufacture all of the circuitry on the same process node.

In an ideal world, one would design a die where analog sections are fabricated on the largest and cheapest node, SRAM parts on a much smaller one, and logic reserved for the absolute cutting-edge technology. Unfortunately, this is not practically achievable. However, there exists an alternative approach.

Back in 1995, Intel launched a successor to its original P5 processor, the Pentium II. What set it apart from the usual fare at that time, was that beneath the plastic shield sat a circuit board housing two chips: the main chip, containing all the processing logic and analog systems, and one or two separate SRAM modules serving as Level 2 cache.

Intel manufactured the primary chip, but the cache was sourced from other firms. This would become fairly standard for desktop PCs in the mid-to-late 1990s, until semiconductor fabrication techniques improved to the point where logic, memory, and analog could all be integrated into the same die.

While Intel continued to dabble with multiple chips in the same package, it largely stuck with the so-called monolithic approach for processors – i.e., one chip for everything. For most processors, there was no need for more than one die, as manufacturing techniques were proficient (and affordable) enough to keep it straightforward.

[...] For a technology vendor, using heterogeneous integration for a niche product is one thing, but employing it for the majority of their portfolio is another. This is precisely what AMD did with its range of processors. In 2017, the semiconductor giant released its Zen architecture in the form of the single-die Ryzen desktop CPU. Several months later, two multi-chip product lines, Threadripper and EPYC, debuted, with the latter boasting up to four dies.

With the launch of Zen 2 two years later, AMD fully embraced HI, MCM, SiP – call it what you will. They shifted the majority of the analog systems out of the processor and placed them into a separate die. These were manufactured on a simpler, cheaper process node, while a more advanced one was used for the remaining logic and cache.

And so, chiplets became the buzzword of choice.

[...] But if this design choice is so advantageous, why isn't Intel doing it? Why aren't we seeing it being used in other processors, like GPUs?

To address the first question, Intel is indeed adopting the full chiplet route, and it's on track to do so with its next consumer CPU architecture, called Meteor Lake. Naturally, Intel's approach is somewhat unique, so let's explore how it differs from AMD's approach.

Using the term tiles instead of chiplets, this generation of processors will split the previously monolithic design into four separate chips:

High-speed, low-latency connections are present between the SOC and the other three tiles, and all of them are connected to another die, known as an interposer. This interposer delivers power to each chip and contains the traces between them. The interposer and four tiles are then mounted onto an additional board to allow the whole assembly to be packaged.

Unlike Intel, AMD does not use any special mounting die but has its own unique connection system, known as Infinity Fabric, to handle chiplet data transactions. Power delivery runs through a fairly standard package, and AMD also uses fewer chiplets. So why is Intel's design as such?

One challenge with AMD's approach is that it's not very suitable for the ultra-mobile, low-power sector. This is why AMD still uses monolithic CPUs for that segment. Intel's design allows them to mix and match different tiles to fit a specific need. For example, budget models for affordable laptops can use much smaller tiles everywhere, while AMD only has one size chiplet for each purpose.

The downside to Intel's system is that it's complex and expensive to produce, although it's too early to predict how this will affect retail prices. Both CPU firms, however, are fully committed to the chiplet concept. Once every part of the manufacturing chain is engineered around it, costs should decrease.

[...] To continue enhancing chip performance, engineers essentially have two avenues – add more logic, with the necessary memory to support it, and increase internal clock speeds. Regarding the latter, the average CPU hasn't significantly altered in this aspect for years. AMD's FX-9590 processor, from 2013, could reach 5 GHz in certain workloads, while the highest clock speed in its current models is 5.7 GHz (with the Ryzen 9 7950X).

Intel recently launched the Core i9-13900KS, capable of reaching 6 GHz under the right conditions, but most of its models have clock speeds similar to AMD's.

However, what has changed is the amount of circuitry and SRAM. The aforementioned FX-9590 had 8 cores (and 8 threads) and 8 MB of L3 cache, whereas the 7950X3D boasts 16 cores, 32 threads, and 128 MB of L3 cache. Intel's CPUs have similarly expanded in terms of cores and SRAM.

Nvidia's first unified shader GPU, the G80 from 2006, consisted of 681 million transistors, 128 cores, and 96 kB of L2 cache in a chip measuring 484 mm2 in area. Fast forward to 2022, when the AD102 was launched, and it now comprises 76.3 billion transistors, 18,432 cores, and 98,304 kB of L2 cache within 608 mm2 of die area.

[...] Decades in the future, the average PC might be home to CPUs and GPUs the size of your hand, but peel off the heat spreader and you'll find a host of tiny chips – not three or four, but dozens of them, all ingeniously tiled and stacked together.

The dominance of the chiplet has only just begun.


Original Submission

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Interesting) by looorg on Saturday June 03 2023, @09:11PM (4 children)

    by looorg (578) on Saturday June 03 2023, @09:11PM (#1309638)

    It would seem that a sufficiently long time passed again so that what was old is now seen as new and revolutionary again. It used to be that different tasks in the machine had their own dedicated chip in the machine. Then things got more and more consolidated into fewer and fewer chips on the board, cost reduction $$$.

    (image in the middle of the article of some AMD CPU) ... here we can see that there are two dies in the package – the Core Complex Die (CCD) at the top, housing the cores and cache, and the Input/Output Die (IOD) at the bottom containing all the controllers (for memory, PCI Express, USB, etc.) and physical interfaces.

    But that somewhat appears to have reached the end of the line, size issues, heat issues, layering and stacking issues. Time to break it apart again. Or I guess instead of having one LARGE (expensive) die lets have the processors be built with many small(er) once. One little die for IO, one little die for CPU, one little die for GFX, one little die for ... (insert whatever you want your chip to do).

    In some regard beyond whatever $ idea they have it makes sense. Everything doesn't have to be built by the latest whizbangtech. It gains nothing from it, except having more problems with heat and size and whatever. So they combine different dies and parts built with different techs into one package again.

    I wonder how long it will be until they go full circle and I can buy my own FPU as it's own chip again, WROOOOOOMZ! If they can have them talk with each other without much latency they might as well offload things out of the package. It might reduce the insane heat and power issues. It won't matter to the customers since nobody but nerds build their own machines anymore.

    • (Score: 2) by JoeMerchant on Saturday June 03 2023, @09:19PM

      by JoeMerchant (3937) on Saturday June 03 2023, @09:19PM (#1309640)

      >I guess instead of having one LARGE (expensive) die

      I believe this gets into yield issues, much cheaper to scrap little chips instead of big ones when you have one failure per 10 square cm.

      >If they can have them talk with each other without much latency

      But they can't. And not all general purpose workloads massively parallelize well - though "AI" stuff mostly does.

      For me, I'm ready to see the next "power shrink" like we went through in 2005 when AMD was making chips that drew 20W that did the same workloads as Intel chips that were drawing 125+W. That AMD lead only lasted a couple of years, and lately we seem to be on the power creep up again. I don't necessarily need to see battery level loads, though that _is_ nice and the Pi Pico webserver I have been running in my yard for the past 12 months 24-7 on a solar cell with battery backup is pretty cool, mostly I'd just like mainstream "competent" computers that don't need forced air cooling. It feels like there's a conspiracy in the industry to keep power draws up so cooling fans are required - thus keeping system lifetimes finite.

      --
      🌻🌻 [google.com]
    • (Score: 4, Informative) by takyon on Sunday June 04 2023, @08:03AM (2 children)

      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Sunday June 04 2023, @08:03AM (#1309713) Journal

      Smaller chiplets have better yields than large monolithic dies, and better utilization of the wafer. They can also be used across multiple product lines. Epyc, Threadripper, and Ryzen use the same chiplets. Give the worst chiplets to the peasants. This leads to a situation where there is no point disabling perfectly good 8-core chiplets to make dual- or quad-cores, outside of niche Epyc products.

      Chiplets can allow for more massive products, like multi-chip GPUs that push beyond the reticle limit. On the other hand, we've seen Cerebras just use the entire damn wafer and disable the parts that don't work [anandtech.com]. Heat might not be as big of an issue as you think for large chips.

      Some things don't scale as well on newer nodes, so just stick it on the older, cheaper node. Cache, for instance. Which can also be 3D stacked relatively easy. There isn't a huge heat issue, but possibly a voltage issue with TSMC's current implementations.

      In Ryzen 7000, AMD put an iGPU into the central location of the I/O die. We'll have to see what they do with AI accelerators for desktop, but it can probably go in there too.

      Intel claims that its "tiles" are better than AMD's "chiplets":

      https://www.hardwaretimes.com/intel-believes-its-tiles-are-a-better-approach-to-the-mcm-design-than-amds-chiplets/ [hardwaretimes.com]
      https://www.pcgamer.com/intels-3d-chip-tech-is-perfect-so-it-doesnt-have-to-follow-amds-chiplet-design/ [pcgamer.com]

      But in reality, the Ponte Vecchio GPU with dozens of tiles is vaporware, Meteor Lake was delayed, etc. Maybe "tiles" are better but AMD's focus on modest, incremental improvements has paid off.

      I'd like to see AMD do something like stacked + unified L3/L4 across multiple chiplets. Zen 6 is the one to watch out for any major changes.

      Consumer computers are going to move towards one 3D package with everything in it. CPU, GPU, accelerators, memory, perhaps storage, and you have to replace the entire thing to upgrade it. Enthusiasts/nerds can get more memory, add-in cards, etc. and put it on the PCIe bus, with the understanding that there will be massive latency penalties from accessing these. That's not necessarily a problem, it's how most stuff works today after all.

      I believe most people will be using "mega APUs" akin to the rumored Strix Halo [notebookcheck.net] by the 2030s. Performance requirements are plateauing for a lot of stuff. Game consoles already use similar mega APUs and will continue to do so for as long as new generations are released. APU performance doesn't have to be wimpy, and latency will become a greater barrier to performance improvements in the long run.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
  • (Score: 2) by JoeMerchant on Saturday June 03 2023, @09:12PM

    by JoeMerchant (3937) on Saturday June 03 2023, @09:12PM (#1309639)

    Back in the 90s we were miniaturizing stuff and crossed paths with an advanced hearing aid company out of Switzerland - they were doing chiplets then, but they didn't end up in the low-cost hearing aids of the day. As I recall, they were doing some of the advanced communication gear like was used by the Secret Service and shown in just about every action movie of the past 30 years with the actors holding their hands to their ears to talk, or with little coily wires coming down, except the reality in 1999 (for the good gear) was: no coily cords, no hands to ears, and a motto of "speak in a whisper, hear in a crowd."

    --
    🌻🌻 [google.com]
  • (Score: 2) by turgid on Saturday June 03 2023, @09:23PM

    by turgid (4318) Subscriber Badge on Saturday June 03 2023, @09:23PM (#1309641) Journal
(1)