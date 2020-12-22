from the not-scale-invariant dept.
Big problems from tiny memory cells:
According to a report from WikiChip, TSMC's SRAM Scaling has slowed tremendously. When it comes to brand-new fabrication nodes, we expect them to increase performance, cut down power consumption, and increase transistor density. But while logic circuits have been scaling well with the recent process technologies, SRAM cells have been lagging behind and apparently almost stopped scaling at TSMC's 3nm-class production nodes. This is a major problem for future CPUs, GPUs, and SoCs that will likely get more expensive because of slow SRAM cells area scaling.
When TSMC formally introduced its N3 fabrication technologies earlier this year, it said that the new nodes would provide 1.6x and 1.7x improvements in logic density when compared to its N5 (5nm-class) process. What it did not reveal is that SRAM cells of the new technologies almost do not scale compared to N5, according to WikiChip, which obtained information from a TSMC paper published at the International Electron Devices Meeting (IEDM)
[...] Modern CPUs, GPUs, and SoCs use loads of SRAM for various caches as they process loads of data and it is extremely inefficient to fetch data from memory, especially for various artificial intelligence (AI) and machine learning (ML) workloads. But even general-purpose processors, graphics chips, and application processors for smartphones carry huge caches these days: AMD's Ryzen 9 7950X carries 81MB of cache in total, whereas Nvidia's AD102 uses at least 123MB of SRAM for various caches that Nvidia publicly disclosed.
Going forward, the need for caches and SRAM will only increase, but with N3 (which is set to be used for a few products only) and N3E there will be no way to reduce die area occupied by SRAM and mitigate higher costs of the new node compared to N5. Essentially, it means that die sizes of high-performance processors will increase, and so will their costs. Meanwhile, just like logic cells, SRAM cells are prone to defects. To some degree chip designers will be able to alleviate larger SRAM cells with N3's FinFlex innovations (mixing and matching different kinds of FinFETs in a block to optimize it for performance, power, or area), but at this point we can only guess what kind of fruits this will bring.
[...] One of the ways to mitigate slowing SRAM area scaling in terms of costs is going multi-chiplet design and disaggregate larger caches into separate dies made on a cheaper node. This is something that AMD does with its 3D V-Cache, albeit for a slightly different reason (for now). Another way is to use alternative memory technologies like eDRAM or FeRAM for caches, though the latter have their own peculiarities.
In any case, it looks like slowing of SRAM scaling with FinFET-based nodes at 3nm and beyond seems to be a major challenge for chip designers in the coming years.
(Score: 2) by takyon on Wednesday December 21, @12:12AM
The solution is in the article. Put in 3D cache, using a cheaper node like "6nm". AMD currently triples L3 cache on V-Cache CPUs, but I wouldn't be surprised if TSMC can add more layers to the cache chiplet. While it would use more silicon and complex packaging, even if nodes like "3nm" had good SRAM scaling, they could be 2-3x more expensive than "6nm".
Alternatively, there is the big/small strategy with less cache for some cores. Alder Lake used 1.25 MB L2 cache per P-core, 2 MB per 4-core cluster of E-cores. Raptor Lake increases that to 2 MB per P-core, 4 MB per E-core cluster. Then L3 cache is shared between all cores but I believe it's tied to either core count, so removing some P- or E-cores removes some L3. They end up with 36 MB L3 cache for the i9-13900K, similar to single chiplet Ryzen but not dual-chiplet (32 MB each, inaccessible to the other chiplet's cores).
Zen 4c is pretty much confirmed to remove some L3 cache, but not the L2, while increasing cores and using space-saving tricks to pack it all closer together. We might see it in a heterogeneous product soon like the rumored "Phoenix 2" aka "Little Phoenix" [twitter.com]. They can regress on L3 cache to some extent and it's not the end of the world. Renoir only had 8 MB total, 4 MB accessible per CCX, and it was still good.
