Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Friday August 16 2019, @11:59AM   Printer-friendly
from the a-what? dept.

TSMC Shows Colossal Interposer, Says Moore's Law Still Alive

In the company's first blog post, TSMC has stated that Moore's Law is still alive and well, despite the zeitgeist of recent times being the reverse. The company also showed a colossal 2500mm2 interposer that includes eight HBM memory chips and two big processors.

Godfrey Cheng, TSMC's new head of global marketing, wrote the blog post. He notes that Moore's Law is not about performance, but about transistor density. While performance traditionally improved by increasing the clock speed and architecture, today it is more often improved by increasing parallelization, and hence requires increases in chip size. This enhances the importance of transistor density because chip cost is directly proportional to its area.

[...] "one possible future of great density improvements is to allow the stacking of multiple layers of transistors in something we call Monolithic 3D Integrated Circuits. You could add a CPU on top of a GPU on top of an AI Edge engine with layers of memory in between. Moore's Law is not dead, there are many different paths to continue to increase density."

[...] [System-technology co-optimization (STCO)] is done through advanced packaging, for which TSMC supports silicon-based interposers and fan-out-based chiplet integration. It also has techniques to stack chips on wafers, or stack wafers on top of other wafers. As one such example, TSMC showed a nearly-2500mm2 silicon interposer – the world's largest – on top of which two 600mm2 processors are placed and eight 75mm2 HBM memory chips, which makes for 1800mm2 of compute and memory silicon on top of the interposer-based package, well over two times the conventional reticle size limit.

Related: Dual-Wafer Packaging (Wafer-on-Wafer) Could Double CPU/GPU Performance
Another Step Toward the End of Moore's Law
Intel's Jim Keller Promises That "Moore's Law" is Not Dead, Outlines 50x Improvement Plan


Original Submission

Related Stories

Dual-Wafer Packaging (Wafer-on-Wafer) Could Double CPU/GPU Performance 24 comments

The Taiwan Semiconductor Manufacturing Company (TSMC) has revealed a manufacturing technique (called wafer-on-wafer or WoW) that could allow CPUs and GPUs to take their first step towards vertical scaling:

Instead of one wafer per chip, future GPUs may include two or more wafers stacked vertically, which would double the performance without the need to develop new horizontal designs every 2 years. A dual wafer setup, for example, would be achieved by flipping the upper wafer over the lower one, binding both via a flip-chip package. Thus, future GPUs could include multiple wafers in one die and the operating system could detect it as a multi-processor graphics card, eliminating the need for SLI setups.

One shortcoming for this technology would be its lower manufacturing yields for sizes lower than 16 nm. If one of the stacked wafers does not pass the QA, the entire stack is discarded, leading to low yields and poor cost effectiveness. TSMC is currently working to improve this technology so that sub-12 nm processes could equally benefit from it.

Not discussed is how to deal with the heat generated in such a stack.

See also: Here's why Intel and AMD's 7nm CPU revolution is so important to the future of PCs


Original Submission

Another Step Toward the End of Moore's Law 16 comments

At the end of March, two semiconductor manufacturing titans climbed another rung on the ladder of Moore's Law.

Taiwan Semiconductor (TSMC) announced 5nm manufacturing of at-risk-production while Samsung announced its own 5nm manufacturing process was ready for sampling.

TSMC says its 5-nm process offers a 15 percent speed gain or a 30 percent improvement in power efficiency. Samsung is promising a 10 percent performance improvement or a 20 percent efficiency improvement

Also, "both Samsung and TSMC are offering what they're calling a 6-nm process" as a kind of stepping stone for customers with earlier availability (H2 2019) vs 5nm production.

Unfortunately, but perhaps not unexpectedly, the playing field has narrowed significantly with the progression to 5nm foundry production

GlobalFoundries gave up at 14 nm and Intel, which is years late with its rollout of an equivalent to competitors' 7 nm, is thought to be pulling back on its foundry services, according to analysts.

Samsung and TSMC remain because they can afford the investment and expect a reasonable return. Samsung was the largest chipmaker by revenue in 2018, but its foundry business ranks fourth, with TSMC in the lead. TSMC's capital expenditure was $10 billion in 2018. Samsung expects to nearly match that on a per-year basis until 2030.

Can the industry function with only two companies capable of the most advanced manufacturing processes? "It's not a question of can it work?" says [G. Dan Hutcheson, at VLSI Research]. "It has to work."

According to Len Jelinek, a semiconductor-manufacturing analyst at IHS Markit. "As long as we have at least two viable solutions, then the industry will be comfortable"

There may only be two left, but neither company is sitting still:

Intel's Jim Keller Promises That "Moore's Law" is Not Dead, Outlines 50x Improvement Plan 17 comments

Intel's Senior Vice President Jim Keller (who previously helped to design AMD's K8 and Zen microarchitectures) gave a talk at the Silicon 100 Summit that promised continued pursuit of transistor scaling gains, including a roughly 50x increase in gate density:

Intel's New Chip Wizard Has a Plan to Bring Back the Magic (archive)

In 2016, a biennial report that had long served as an industry-wide pledge to sustain Moore's law gave up and switched to other ways of defining progress. Analysts and media—even some semiconductor CEOs—have written Moore's law's obituary in countless ways. Keller doesn't agree. "The working title for this talk was 'Moore's law is not dead but if you think so you're stupid,'" he said Sunday. He asserted that Intel can keep it going and supply tech companies ever more computing power. His argument rests in part on redefining Moore's law.

[...] Keller also said that Intel would need to try other tactics, such as building vertically, layering transistors or chips on top of each other. He claimed this approach will keep power consumption down by shortening the distance between different parts of a chip. Keller said that using nanowires and stacking his team had mapped a path to packing transistors 50 times more densely than possible with Intel's 10 nanometer generation of technology. "That's basically already working," he said.

The ~50x gate density claim combines ~3x density from additional pitch scaling (from "10nm"), ~2x from nanowires, another ~2x from stacked nanowires, ~2x from wafer-to-wafer stacking, and ~2x from die-to-wafer stacking.

Related: Intel's "Tick-Tock" Strategy Stalls, 10nm Chips Delayed
Intel's "Tick-Tock" is Now More Like "Process-Architecture-Optimization"
Moore's Law: Not Dead? Intel Says its 10nm Chips Will Beat Samsung's
Another Step Toward the End of Moore's Law


Original Submission

TSMC's Chip-on-Wafer-on-Substrate (CoWoS) Connects Multiple Interposers 1 comment

TSMC & Broadcom Develop 1,700 mm2 CoWoS Interposer: 2X Larger Than Reticles

TSMC and Broadcom have also been playing with the idea of oversized chips, and this week they've announced their plans to develop a supersized interposer to be used in Chip-on-Wafer-on-Substrate (CoWoS) packaging.

Overall, the proposed 1,700 mm² interposer is twice the size of TSMC's 858 mm² reticle limit. Of course, TSMC can't actually produce a single interposer this large all in one shot – that's what the reticle limit is all about – so instead the company is essentially stitching together multiple interposers, building them next to each other on a single wafer and then connecting them. The net result is that an oversized interposer can be made to function without violating reticle limits.

The new CoWoS platform will initially be used for a new processor from Broadcom for the HPC market, and will be made using TSMC's EUV-based 5 nm (N5) process technology. This system-in-package product features 'multiple' SoC dies as well as six HBM2 stacks with a total capacity of 96 GB. According to Broadcom's press release, the chip will have a total bandwidth of up to 2.7 TB/s, which is in line with what Samsung's latest HBM2E chips can offer.

Also at Guru3D.

Previously: TSMC Shows Off Gigantic Silicon Interposer


Original Submission

High Demand Reported for TSMC's Chip-on-Wafer-on-Substrate Packaging 8 comments

Report: TSMC CoWoS Production Line at Full Capacity as Demand Increases

Despite the downturn of events around the world, TSMC is witnessing a significant increase in demand for its Chip-on-Wafer-on-Substrate (CoWoS) packaging, according to DigiTimes' unnamed industry sources. The Taiwanese silicon manufacturer is purportedly running its CoWoS production lines at full capacity.

CoWoS as is a 2.5D method of packaging multiple individual dies side-by-side on a single silicon interposer. The benefits are the ability to increase the density in small devices as you run into the limits of how big individual dies can be produced, better interconnectivity between dies and lower power consumption.

According to DigiTimes, AMD, Nvidia, HiSilicon, Xilinx and Broadcom have placed orders for the tech, with demand for high-performance computing chips, high bandwidth memory (HBM)-powered AI accelerators and ASICs during the past two weeks.

Examples of CoWoS packaged silicon are [...] AMD's Vega VII graphics cards, as well as Nvidia's V100 cards, which have HBM on the same silicon interposer where the GPU is. With the GPU and memory so close together, memory bandwidth is significantly higher on these chips compared to those using GDDR6 memory located elsewhere on the graphics card's PCB. Additionally, the PCB becomes much smaller.

Also at Wccftech.

Previously: TSMC Shows Off Gigantic Silicon Interposer
TSMC's Chip-on-Wafer-on-Substrate (CoWoS) Connects Multiple Interposers


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0) by Anonymous Coward on Friday August 16 2019, @01:50PM (6 children)

    by Anonymous Coward on Friday August 16 2019, @01:50PM (#881056)

    I'm glad someone besides me is working on this chip shrink stuff.
    I got an electrical engineering degree, but chip technology just seemed like the most incremental, tedious sort of drone job you could have.
    So I did software instead, which is not real engineering.

    • (Score: 1, Insightful) by Anonymous Coward on Friday August 16 2019, @02:41PM (5 children)

      by Anonymous Coward on Friday August 16 2019, @02:41PM (#881098)

      And here we have in 3 sentences the reason why all our software sucks: Because people go into it trying to avoid working hard and engineering properly.

      • (Score: 0) by Anonymous Coward on Friday August 16 2019, @03:08PM (3 children)

        by Anonymous Coward on Friday August 16 2019, @03:08PM (#881116)

        And for every real engineer trying to shrink dies and solve hard problems, there are 10 knock-offs reimagining the platform in a virtual environment that runs even slower than you thought possible.

        • (Score: 0) by Anonymous Coward on Friday August 16 2019, @03:22PM (2 children)

          by Anonymous Coward on Friday August 16 2019, @03:22PM (#881124)

          I have a 32c/64t 2990wx w 128 gb ram and am not kidding when I say I have come across websites that can grind it to a halt. What these people can achieve when it comes to inefficiency is just amazing.

          • (Score: 2) by takyon on Friday August 16 2019, @04:15PM (1 child)

            by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Friday August 16 2019, @04:15PM (#881150) Journal

            Are you going to upgrade to 64c/128t Threadripper 3, 256 GB RAM? That might allow acceptable browsing speeds.

            But rly though, the RasPi 4's wimpy ARM quad-core can load websites just fine. uBlock/uMatrix is there to help.

            --
            [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
            • (Score: 0) by Anonymous Coward on Friday August 16 2019, @04:40PM

              by Anonymous Coward on Friday August 16 2019, @04:40PM (#881161)

              I think they get race conditions but don't know where so just keep adding sleep calls everywhere.

      • (Score: 0) by Anonymous Coward on Friday August 16 2019, @03:26PM

        by Anonymous Coward on Friday August 16 2019, @03:26PM (#881128)

        I don't think you got the subtle humor at the end.
        Software was "not real engineering" before I went into it.
        Also, how do you think I am able to recognize that unless I have a sense of proper engineering to start with? (I graduated BSEE.)

        Most programmers never got an engineering degree. Did you?

        Not trying to pull rank, just wondering why you are.

  • (Score: 4, Insightful) by bzipitidoo on Friday August 16 2019, @03:25PM (6 children)

    by bzipitidoo (4388) Subscriber Badge on Friday August 16 2019, @03:25PM (#881127) Journal

    Moore's Law may not be dead yet, but it will be. It's just a question of when and what ends it. Heat death, so to speak? I have read that Peltier cooling is what makes 3D possible. Or, we reach the minimum number of atoms required for the correct functioning of transistors as we know them? The speed of light is another hard limit that makes clock speed difficult to increase further. At 4GHz, an electrical signal can travel only 7.5 cm at most between ticks.

    Parallelism, in the form of concurrency, has been increasing for decades. (Actual parallelism with parallel algorithms is not so popular, and still a pain to do.) Really, the personal computers of the early 1980s were an extreme in sequential computing. The CPU had to do everything, and I do mean everything, even low level disk controller commands, and the individual speaker clicks at whatever frequency the CPU could manage, to make audio. Graphics too were all CPU driven, at the individual pixel level. For a game to run animations and play sound effects took clever interweaving of CPU utilization, and compromise was necessary. Usually, the audio was grainy, tinny, and choppy. Making all that more concurrent was a no-brainer. It was only a matter of time before all these specialized activities that were offloaded with such things as the Sound Blaster audio card lead towards general activities. Today, multi-core is a painfully obvious performance boost.

    One thing I wonder about is smarter memory. Seems RAM could have the ability to zero out a large block, without burdening the CPU with the task of setting each individual word to 0. Just turn off the DRAM refresh for a fraction of a second. Would boost speed and security. Maybe zeroing memory can be done with a DMI transfer from a virtual block of always zero memory, sort of like /dev/zero?

    • (Score: 0) by Anonymous Coward on Friday August 16 2019, @03:31PM (2 children)

      by Anonymous Coward on Friday August 16 2019, @03:31PM (#881130)

      The Commodore 64 microcomputer of the early 80s was more advanced than that with separate chips for sound, graphics, and disk control. (The disk controller got its own 6502 variant CPU.)
      IBM PC and Apple II were more "primitive" in the manner you describe.

      • (Score: 2) by bzipitidoo on Friday August 16 2019, @04:36PM (1 child)

        by bzipitidoo (4388) Subscriber Badge on Friday August 16 2019, @04:36PM (#881160) Journal

        True. It was mostly the Apple II I was thinking of. Yet they could make a real hash of offloading. The Commodore 64 disk drive was notorious for its extreme slowness. What a colossal waste of all that dedicated hardware that they couldn't make it any faster. The crazy thing even had its own power supply. Seems to be a case study of why just throwing more microcontrollers at a task doesn't necessarily make it go faster. Everyone else's disk drive, including the more primitive Apple II's, was much faster.

        The stock Apple DOS 3.3 had plenty of stupidities in the code that made its disk access needlessly slow. Like, it took a fraction of a second too long to read and process a sector, so that it just missed the start of the next sector and had to wait for an entire revolution of the disk to bring the next sector past the head again. In spite of that, it was still much faster than the Commodore 64. Lots of aftermarket DOSes for the Apple II fixed that issue. As I recall, booting took 45 seconds for stock Apple DOS, and 15 seconds for an aftermarket Apple DOS. The Commodore 64 needed several minutes to read the disk, and could not be improved with a simple software change.

        • (Score: 0) by Anonymous Coward on Friday August 16 2019, @08:24PM

          by Anonymous Coward on Friday August 16 2019, @08:24PM (#881261)

          If I recall correctly, the C64 disk drive was slow because of some bug in the drive chips that they never bothered to fix. (Time to market pressure I think.)
          No matter. The problem was solved by plugging in a cartridge into the computers cartridge port that used a different routine for data transfer. Then the disk ran very fast.

          The point is, the Apple did an awful lot with very little hardware, but it was an evolutionary dead end. It was the ultimate hack, a very tightly integrated system that could not be evolved as separate components.

    • (Score: 2) by takyon on Friday August 16 2019, @04:42PM (2 children)

      by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Friday August 16 2019, @04:42PM (#881162) Journal

      We can already see the next steps with clarity. From the TSMC blog post [tsmc.com]:

      To feed modern fast CPUs, GPUs and dedicated AI Processors, it is critical to provide memory that is both physically closer to the cores that are requesting the data for improved latency, in addition to supplying a higher bandwidth of data for the cores to process. This is what device level density provides. When memory is collocated closer to the logic cores, the system achieves lower latency, lower power consumption and higher overall performance.

      [...] Advanced packaging today brings memory close to the logic. Typically, logic cores are fed through standalone memory chips through interfaces such as DDR or GDDR. The physical distance between the memory device and the logic cores limit performance through increased latency. Bandwidth is also limited with discrete memory as they only offer limited interface width. Additionally, power consumption for discrete logic and memory also govern a device's overall performance, especially in applications such as smartphones or IOT devices as there is limited ability to dissipate the thermal energy radiated by discrete devices.

      [...] Tight integration of logic cores with memory through advanced packaging techniques are already available today from TSMC. The line between a semiconductor and a system solution is blurry as the new advanced packaging techniques are silicon wafer based. TSMC has pioneered advanced packaging techniques that allow our customers to deliver a complete system with a silicon-based interposer or fan-out-based chiplet integration. We also have advanced packaging techniques that allow us to stack chips on wafers or stack wafer on wafer prior to integration into packaged modules. These advanced packaging techniques allow TSMC customers to deliver much higher density and more performance. We will continue to drive innovation in advanced packaging technologies.

      Intel and AMD are working on 2.5D/3D designs that stack memory near or on top of the CPU, but the real winner is going to be:

      DARPA's 3DSoC Becoming a Reality [soylentnews.org]

      With that tight integration of memory and cores in layers, we could see CPUs that outperform today's flagship Intel/AMD chips, but with the power consumption of a smartphone SoC or RasPi Zero. That's before you take into account transistor advances [soylentnews.org]. The memory amounts of the very first 3DSoCs will probably be around 4 GB to 8 GB. If we can eventually use dense non-volatile universal memory in the same place, maybe we can have 1 TB or more there instead, and do all computing in memory.

      We are going to see performance increases in the orders of magnitude. Single-board computers with 3DSoCs will meet the computing needs of the vast majority of users. We'll still see a trend towards embarrassingly parallel core/thread counts (64-core Threadripper, anyone?), but those who actually need that will probably be pushing zettaflops.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 1) by ChrisMaple on Sunday August 18 2019, @12:25AM (1 child)

        by ChrisMaple (6964) on Sunday August 18 2019, @12:25AM (#881612)

        I have doubts about those transistor advances, specifically the cited Air Channel Transistor with a performance boost of ten thousand. Field emission devices tend to erode at high currents, and high currents are necessary for the high speeds that charge interconnect capacitances -- and interconnect capacitances don't go away even when the transistors improve. The article claims that there's no power dissipation in the transistors because there's no material to get in the way of the electrons: that's blatantly false; an electron going through a potential (voltage) change always and inescapably involves power.

        Perhaps some day Air Channel Transistors will be a commercial reality in integrated circuits, but I'd be greatly surprised if they caused as much as a 3X speed improvement.

        • (Score: 2) by takyon on Sunday August 18 2019, @12:43AM

          by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Sunday August 18 2019, @12:43AM (#881617) Journal

          It's just one of a number of advancements that will be competing to leave the lab and enter the fab. Some of them will never be practical, but multiple could succeed.

          However, the concept of moving logic ever closer to memory is a done deal, and impossible to ignore.

          --
          [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(1)