Big problems from tiny memory cells:
According to a report from WikiChip, TSMC's SRAM Scaling has slowed tremendously. When it comes to brand-new fabrication nodes, we expect them to increase performance, cut down power consumption, and increase transistor density. But while logic circuits have been scaling well with the recent process technologies, SRAM cells have been lagging behind and apparently almost stopped scaling at TSMC's 3nm-class production nodes. This is a major problem for future CPUs, GPUs, and SoCs that will likely get more expensive because of slow SRAM cells area scaling.
When TSMC formally introduced its N3 fabrication technologies earlier this year, it said that the new nodes would provide 1.6x and 1.7x improvements in logic density when compared to its N5 (5nm-class) process. What it did not reveal is that SRAM cells of the new technologies almost do not scale compared to N5, according to WikiChip, which obtained information from a TSMC paper published at the International Electron Devices Meeting (IEDM)
[...] Modern CPUs, GPUs, and SoCs use loads of SRAM for various caches as they process loads of data and it is extremely inefficient to fetch data from memory, especially for various artificial intelligence (AI) and machine learning (ML) workloads. But even general-purpose processors, graphics chips, and application processors for smartphones carry huge caches these days: AMD's Ryzen 9 7950X carries 81MB of cache in total, whereas Nvidia's AD102 uses at least 123MB of SRAM for various caches that Nvidia publicly disclosed.
Going forward, the need for caches and SRAM will only increase, but with N3 (which is set to be used for a few products only) and N3E there will be no way to reduce die area occupied by SRAM and mitigate higher costs of the new node compared to N5. Essentially, it means that die sizes of high-performance processors will increase, and so will their costs. Meanwhile, just like logic cells, SRAM cells are prone to defects. To some degree chip designers will be able to alleviate larger SRAM cells with N3's FinFlex innovations (mixing and matching different kinds of FinFETs in a block to optimize it for performance, power, or area), but at this point we can only guess what kind of fruits this will bring.
[...] One of the ways to mitigate slowing SRAM area scaling in terms of costs is going multi-chiplet design and disaggregate larger caches into separate dies made on a cheaper node. This is something that AMD does with its 3D V-Cache, albeit for a slightly different reason (for now). Another way is to use alternative memory technologies like eDRAM or FeRAM for caches, though the latter have their own peculiarities.
In any case, it looks like slowing of SRAM scaling with FinFET-based nodes at 3nm and beyond seems to be a major challenge for chip designers in the coming years.
(Score: 3, Informative) by takyon on Wednesday December 21, @12:12AM (2 children)
The solution is in the article. Put in 3D cache, using a cheaper node like "6nm". AMD currently triples L3 cache on V-Cache CPUs, but I wouldn't be surprised if TSMC can add more layers to the cache chiplet. While it would use more silicon and complex packaging, even if nodes like "3nm" had good SRAM scaling, they could be 2-3x more expensive than "6nm".
Alternatively, there is the big/small strategy with less cache for some cores. Alder Lake used 1.25 MB L2 cache per P-core, 2 MB per 4-core cluster of E-cores. Raptor Lake increases that to 2 MB per P-core, 4 MB per E-core cluster. Then L3 cache is shared between all cores but I believe it's tied to either core count, so removing some P- or E-cores removes some L3. They end up with 36 MB L3 cache for the i9-13900K, similar to single chiplet Ryzen but not dual-chiplet (32 MB each, inaccessible to the other chiplet's cores).
Zen 4c is pretty much confirmed to remove some L3 cache, but not the L2, while increasing cores and using space-saving tricks to pack it all closer together. We might see it in a heterogeneous product soon like the rumored "Phoenix 2" aka "Little Phoenix" [twitter.com]. They can regress on L3 cache to some extent and it's not the end of the world. Renoir only had 8 MB total, 4 MB accessible per CCX, and it was still good.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by richtopia on Wednesday December 21, @03:50PM (1 child)
Every major player is moving to chiplets & die stacking and it is super critical for moving the industry forward. I suspect this migration will be comparable to the transition to multi-core around 2005; it is enabling the industry to enable past a technical hurdle and will allow innovation to restart. At the moment TSMC and AMD demonstrated success with the 5800X3D, but this advanced packaging has many enabling technologies in development beyond what is seen in 3D V-Cache. From what I've seen in the industry we probably will see a gradual transition from niche products to 100% 3D IC in 5 years. In that transition, the struggles of the article will be valid especially for more traditional chips.
https://en.wikipedia.org/wiki/Three-dimensional_integrated_circuit [wikipedia.org]
(Score: 2) by takyon on Thursday December 22, @03:06AM
I think it will be early 2030s for fully 3D IC, unless competition from one side or the other (AMD, Intel, ARM) forces everyone to fast track it.
These companies don't want to deal with the fallout from improving performance by 10x in 1 year, which I think is plausible for monolithic 3D ICs including RAM/L4 cache inside the chip (is that what you meant by "100%"?).
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by ChrisMaple on Wednesday December 21, @05:03AM (1 child)
My experience is now 22 years old, so it may no longer apply. SRAM cells are specially designed by the chip fabricator using the repetitive nature of SRAMs to allow tighter spacing for a particular design than can be managed with random logic. It may be that at 3 nm these design tricks can no longer be effective, and SRAM has to obey rules more similar to those of random logic.
(Score: 3, Informative) by takyon on Wednesday December 21, @05:32AM
SRAM scaling has been much less in comparison to logic for the last several nodes, not just "3nm". For example, it looks like from N7 to N5 [wikichip.org], TSMC increased logic density by +70%, SRAM density by +35% (there are two different kinds, a performance-optimized version and density-optimized), and analog density by +20%. It's just that it has hit nearly zero scaling with this particular TSMC node, leading to these "sky is falling" articles.
N3 is one of the last nodes from TSMC to use FinFETs. They switch to gate-all-around (GAAFETs) at N2. That node will have very bad scaling overall [anandtech.com], ">1.1x"* from N3E to N2. At least for the initial N2 version of GAAFETs. Follow-ups could deliver better density increases, but probably not much improvement for the SRAM portion.
TSMC tends to set less ambitious targets than competitors (namely Intel), and then hit those targets consistently and mostly on-time. They insert in-between nodes onto their roadmaps that fix up the previous one and deliver small performance boosts nearly annually for their big customers like Apple. That's what we're seeing with the likes of N5, N4, N4X, N3, N3E, N2, etc.
Nevertheless, we will still see double digit performance and efficiency increases from these new nodes, which can be worth it despite skyrocketing wafer/fab costs. Chiplets and 3D stacking will help alleviate the SRAM problem. Intel [anandtech.com] and Samsung [anandtech.com] have also teased plans for stacked SRAM.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]