Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Friday August 16 2019, @11:59AM   Printer-friendly
from the a-what? dept.

TSMC Shows Colossal Interposer, Says Moore's Law Still Alive

In the company's first blog post, TSMC has stated that Moore's Law is still alive and well, despite the zeitgeist of recent times being the reverse. The company also showed a colossal 2500mm2 interposer that includes eight HBM memory chips and two big processors.

Godfrey Cheng, TSMC's new head of global marketing, wrote the blog post. He notes that Moore's Law is not about performance, but about transistor density. While performance traditionally improved by increasing the clock speed and architecture, today it is more often improved by increasing parallelization, and hence requires increases in chip size. This enhances the importance of transistor density because chip cost is directly proportional to its area.

[...] "one possible future of great density improvements is to allow the stacking of multiple layers of transistors in something we call Monolithic 3D Integrated Circuits. You could add a CPU on top of a GPU on top of an AI Edge engine with layers of memory in between. Moore's Law is not dead, there are many different paths to continue to increase density."

[...] [System-technology co-optimization (STCO)] is done through advanced packaging, for which TSMC supports silicon-based interposers and fan-out-based chiplet integration. It also has techniques to stack chips on wafers, or stack wafers on top of other wafers. As one such example, TSMC showed a nearly-2500mm2 silicon interposer – the world's largest – on top of which two 600mm2 processors are placed and eight 75mm2 HBM memory chips, which makes for 1800mm2 of compute and memory silicon on top of the interposer-based package, well over two times the conventional reticle size limit.

Related: Dual-Wafer Packaging (Wafer-on-Wafer) Could Double CPU/GPU Performance
Another Step Toward the End of Moore's Law
Intel's Jim Keller Promises That "Moore's Law" is Not Dead, Outlines 50x Improvement Plan


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Insightful) by bzipitidoo on Friday August 16 2019, @03:25PM (6 children)

    by bzipitidoo (4388) on Friday August 16 2019, @03:25PM (#881127) Journal

    Moore's Law may not be dead yet, but it will be. It's just a question of when and what ends it. Heat death, so to speak? I have read that Peltier cooling is what makes 3D possible. Or, we reach the minimum number of atoms required for the correct functioning of transistors as we know them? The speed of light is another hard limit that makes clock speed difficult to increase further. At 4GHz, an electrical signal can travel only 7.5 cm at most between ticks.

    Parallelism, in the form of concurrency, has been increasing for decades. (Actual parallelism with parallel algorithms is not so popular, and still a pain to do.) Really, the personal computers of the early 1980s were an extreme in sequential computing. The CPU had to do everything, and I do mean everything, even low level disk controller commands, and the individual speaker clicks at whatever frequency the CPU could manage, to make audio. Graphics too were all CPU driven, at the individual pixel level. For a game to run animations and play sound effects took clever interweaving of CPU utilization, and compromise was necessary. Usually, the audio was grainy, tinny, and choppy. Making all that more concurrent was a no-brainer. It was only a matter of time before all these specialized activities that were offloaded with such things as the Sound Blaster audio card lead towards general activities. Today, multi-core is a painfully obvious performance boost.

    One thing I wonder about is smarter memory. Seems RAM could have the ability to zero out a large block, without burdening the CPU with the task of setting each individual word to 0. Just turn off the DRAM refresh for a fraction of a second. Would boost speed and security. Maybe zeroing memory can be done with a DMI transfer from a virtual block of always zero memory, sort of like /dev/zero?

    Starting Score:    1  point
    Moderation   +2  
       Insightful=1, Interesting=1, Total=2
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4  
  • (Score: 0) by Anonymous Coward on Friday August 16 2019, @03:31PM (2 children)

    by Anonymous Coward on Friday August 16 2019, @03:31PM (#881130)

    The Commodore 64 microcomputer of the early 80s was more advanced than that with separate chips for sound, graphics, and disk control. (The disk controller got its own 6502 variant CPU.)
    IBM PC and Apple II were more "primitive" in the manner you describe.

    • (Score: 2) by bzipitidoo on Friday August 16 2019, @04:36PM (1 child)

      by bzipitidoo (4388) on Friday August 16 2019, @04:36PM (#881160) Journal

      True. It was mostly the Apple II I was thinking of. Yet they could make a real hash of offloading. The Commodore 64 disk drive was notorious for its extreme slowness. What a colossal waste of all that dedicated hardware that they couldn't make it any faster. The crazy thing even had its own power supply. Seems to be a case study of why just throwing more microcontrollers at a task doesn't necessarily make it go faster. Everyone else's disk drive, including the more primitive Apple II's, was much faster.

      The stock Apple DOS 3.3 had plenty of stupidities in the code that made its disk access needlessly slow. Like, it took a fraction of a second too long to read and process a sector, so that it just missed the start of the next sector and had to wait for an entire revolution of the disk to bring the next sector past the head again. In spite of that, it was still much faster than the Commodore 64. Lots of aftermarket DOSes for the Apple II fixed that issue. As I recall, booting took 45 seconds for stock Apple DOS, and 15 seconds for an aftermarket Apple DOS. The Commodore 64 needed several minutes to read the disk, and could not be improved with a simple software change.

      • (Score: 0) by Anonymous Coward on Friday August 16 2019, @08:24PM

        by Anonymous Coward on Friday August 16 2019, @08:24PM (#881261)

        If I recall correctly, the C64 disk drive was slow because of some bug in the drive chips that they never bothered to fix. (Time to market pressure I think.)
        No matter. The problem was solved by plugging in a cartridge into the computers cartridge port that used a different routine for data transfer. Then the disk ran very fast.

        The point is, the Apple did an awful lot with very little hardware, but it was an evolutionary dead end. It was the ultimate hack, a very tightly integrated system that could not be evolved as separate components.

  • (Score: 2) by takyon on Friday August 16 2019, @04:42PM (2 children)

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Friday August 16 2019, @04:42PM (#881162) Journal

    We can already see the next steps with clarity. From the TSMC blog post [tsmc.com]:

    To feed modern fast CPUs, GPUs and dedicated AI Processors, it is critical to provide memory that is both physically closer to the cores that are requesting the data for improved latency, in addition to supplying a higher bandwidth of data for the cores to process. This is what device level density provides. When memory is collocated closer to the logic cores, the system achieves lower latency, lower power consumption and higher overall performance.

    [...] Advanced packaging today brings memory close to the logic. Typically, logic cores are fed through standalone memory chips through interfaces such as DDR or GDDR. The physical distance between the memory device and the logic cores limit performance through increased latency. Bandwidth is also limited with discrete memory as they only offer limited interface width. Additionally, power consumption for discrete logic and memory also govern a device's overall performance, especially in applications such as smartphones or IOT devices as there is limited ability to dissipate the thermal energy radiated by discrete devices.

    [...] Tight integration of logic cores with memory through advanced packaging techniques are already available today from TSMC. The line between a semiconductor and a system solution is blurry as the new advanced packaging techniques are silicon wafer based. TSMC has pioneered advanced packaging techniques that allow our customers to deliver a complete system with a silicon-based interposer or fan-out-based chiplet integration. We also have advanced packaging techniques that allow us to stack chips on wafers or stack wafer on wafer prior to integration into packaged modules. These advanced packaging techniques allow TSMC customers to deliver much higher density and more performance. We will continue to drive innovation in advanced packaging technologies.

    Intel and AMD are working on 2.5D/3D designs that stack memory near or on top of the CPU, but the real winner is going to be:

    DARPA's 3DSoC Becoming a Reality [soylentnews.org]

    With that tight integration of memory and cores in layers, we could see CPUs that outperform today's flagship Intel/AMD chips, but with the power consumption of a smartphone SoC or RasPi Zero. That's before you take into account transistor advances [soylentnews.org]. The memory amounts of the very first 3DSoCs will probably be around 4 GB to 8 GB. If we can eventually use dense non-volatile universal memory in the same place, maybe we can have 1 TB or more there instead, and do all computing in memory.

    We are going to see performance increases in the orders of magnitude. Single-board computers with 3DSoCs will meet the computing needs of the vast majority of users. We'll still see a trend towards embarrassingly parallel core/thread counts (64-core Threadripper, anyone?), but those who actually need that will probably be pushing zettaflops.

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 1) by ChrisMaple on Sunday August 18 2019, @12:25AM (1 child)

      by ChrisMaple (6964) on Sunday August 18 2019, @12:25AM (#881612)

      I have doubts about those transistor advances, specifically the cited Air Channel Transistor with a performance boost of ten thousand. Field emission devices tend to erode at high currents, and high currents are necessary for the high speeds that charge interconnect capacitances -- and interconnect capacitances don't go away even when the transistors improve. The article claims that there's no power dissipation in the transistors because there's no material to get in the way of the electrons: that's blatantly false; an electron going through a potential (voltage) change always and inescapably involves power.

      Perhaps some day Air Channel Transistors will be a commercial reality in integrated circuits, but I'd be greatly surprised if they caused as much as a 3X speed improvement.

      • (Score: 2) by takyon on Sunday August 18 2019, @12:43AM

        by takyon (881) <takyonNO@SPAMsoylentnews.org> on Sunday August 18 2019, @12:43AM (#881617) Journal

        It's just one of a number of advancements that will be competing to leave the lab and enter the fab. Some of them will never be practical, but multiple could succeed.

        However, the concept of moving logic ever closer to memory is a done deal, and impossible to ignore.

        --
        [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]