Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Monday November 23 2020, @06:04PM   Printer-friendly

The Trillion-Transistor Chip That Just Left a Supercomputer in the Dust:

So, in a recent trial, researchers pitted the chip—which is housed in an all-in-one system about the size of a dorm room mini-fridge called the CS-1—against a supercomputer in a fluid dynamics simulation. Simulating the movement of fluids is a common supercomputer application useful for solving complex problems like weather forecasting and airplane wing design.

The trial was described in a preprint paper written by a team led by Cerebras's Michael James and NETL's Dirk Van Essendelft and presented at the supercomputing conference SC20 this week. The team said the CS-1 completed a simulation of combustion in a power plant roughly 200 times faster than it took the Joule 2.0 supercomputer to do a similar task.

The CS-1 was actually faster-than-real-time. As Cerebrus wrote in a blog post, "It can tell you what is going to happen in the future faster than the laws of physics produce the same result."

The researchers said the CS-1's performance couldn't be matched by any number of CPUs and GPUs. And CEO and cofounder Andrew Feldman told VentureBeat that would be true "no matter how large the supercomputer is." At a point, scaling a supercomputer like Joule no longer produces better results in this kind of problem. That's why Joule's simulation speed peaked at 16,384 cores, a fraction of its total 86,400 cores.

Previously:
Cerebras More than Doubles Core and Transistor Count with 2nd-Generation Wafer Scale Engine
Cerebras Systems' Wafer Scale Engine Deployed at Argonne National Labs
Cerebras "Wafer Scale Engine" Has 1.2 Trillion Transistors, 400,000 Cores


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Insightful) by EvilSS on Monday November 23 2020, @07:20PM (6 children)

    by EvilSS (1456) Subscriber Badge on Monday November 23 2020, @07:20PM (#1080767)
    If the numbers in the article are to be believed, as long as they have a halfway decent yield, they should be making bank with these. According to the article the system they beat is an order of magnitude more expensive (article is a little vague on pricing, 10's vs 1's of millions of dollars), and they had a 200x performance advantage over it. Hell if the results pan out, I'm sure Los Alamos would be first in line to buy a truck load of them. If it's as good as they say for fluid dynamics, I imagine it could be tailored to do nuclear simulations as well. Not to mention Google, Facebook, Microsoft, Amazon who can use it (and, in 3 of those cases sell time on it) for neural net training which seems to be their primary target use case. Not to mention universities. A system like this for under 10M would open up buying these at schools that can't afford the big super computing systems. If it is even 1/10th as fast as they claim for that use case it would be cheaper to rent one for a few minutes vs hours or even days on a GPU based platform. If they can produce them at the scale needed to meet demand, and before competitors start popping up in the space, then they should make money hand over fist.

    But, of course, we need to see more independent verification of their claims, and, as you suggest, their yield is the big "if" here. If they are tossing hundreds of wafers to get one working on, it would be a problem.
    Starting Score:    1  point
    Moderation   +2  
       Insightful=1, Interesting=1, Total=2
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4  
  • (Score: 3, Informative) by takyon on Monday November 23 2020, @08:13PM (2 children)

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Monday November 23 2020, @08:13PM (#1080777) Journal

    IIRC, it's built to be tolerant of defective cores. Maybe there's a controller or some other small part that must be in perfect shape for it to work, but it could mean that almost every wafer is usable, the complete opposite of tossing out hundreds to get one good one.

    Another thing is that TSMC's "7nm" yield is very good in the first place. And it costs about $9,346 [techpowerup.com].

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 2) by HiThere on Monday November 23 2020, @10:43PM

      by HiThere (866) on Monday November 23 2020, @10:43PM (#1080818) Journal

      Sounds promising, but that ordinary yield is based on assuming that only a small area of the surface needs to be free of defects. If they need too much error correction (or longer inter-processor routing) that, in and of itself, could slow things down a lot. There may well be only a few "grade A" chips, and a much larger number of grades B and C, which are slower, or have fewer working processors.

      --
      Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
    • (Score: 2) by TheRaven on Tuesday November 24 2020, @11:20AM

      by TheRaven (270) on Tuesday November 24 2020, @11:20AM (#1080949) Journal

      Most modern CPUs are designed to be tolerant of defects to a degree. It's pretty easy if the defect is in the cache: you just disable part of the cache and sell the chip as a cheaper variant. Intel started doing this aggressively around the 486: if you had a defect in the FPU, it was sold as a 486SX, if it had a defect in the CPU, it was sold as a 487, if both passed tests then it was a 486DX. Around the Pentium 3 era, yields got high enough that they (and AMD) ended up selling higher-rated parts with lower model numbers, because that made more money that lowering the price of the high-end parts.

      This kind of thing is *much* easier with a regular layout. If you design your network on chip correctly, you can just route around any units that didn't work. IBM and Sony did this with the Cell: most of the chips made had at a defect in one of the SPUs, these were put in Playstations with 7 SPUs. The ones with no defects were put in IBM server parts with 8 SPUs. The ones with a defect in the CPU were put on accelerator boards. If your 'chip' is a wafer full of cores in a regular layout with a NOC routing between them, you can power the whole thing up, test each core, and then configure your NOC switches to route around areas that don't work (including entire parts of the network if there's a fault in part of the network itself). The main difficulty is that each system you produce will have subtly different topology, which will affect inter-core latency and may impact overall performance. Oh, and powering / cooling a chip that big is also nontrivial...

      --
      sudo mod me up
  • (Score: 5, Interesting) by driverless on Monday November 23 2020, @10:34PM (2 children)

    by driverless (4770) on Monday November 23 2020, @10:34PM (#1080817)

    I was at the conference where this was introduced. The consensus among the attendees, all of whom were experts in the field, was that it was yet another attempt at WSI, was an impressive proof-of-concept, and like every other time this has been tried would sink without trace after a year or two. No-one could see where this was going or who would buy it apart from one or two national labs to play with it for awhile.