Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Tuesday August 20 2019, @09:11PM   Printer-friendly
from the not-going-to-fit-in-a-cell-phone dept.

The five technical challenges Cerebras overcame in building the first trillion transistor chip

Superlatives abound at Cerebras, the until-today stealthy next-generation silicon chip company looking to make training a deep learning model as quick as buying toothpaste from Amazon. Launching after almost three years of quiet development, Cerebras introduced its new chip today — and it is a doozy. The "Wafer Scale Engine" is 1.2 trillion transistors (the most ever), 46,225 square millimeters (the largest ever), and includes 18 gigabytes of on-chip memory (the most of any chip on the market today) and 400,000 processing cores (guess the superlative).

It's made a big splash here at Stanford University at the Hot Chips conference, one of the silicon industry's big confabs for product introductions and roadmaps, with various levels of oohs and aahs among attendees. You can read more about the chip from Tiernan Ray at Fortune and read the white paper from Cerebras itself.

Also at BBC, VentureBeat, and PCWorld.


Original Submission

Related Stories

Cerebras Systems' Wafer Scale Engine Deployed at Argonne National Labs 8 comments

Cerebras Unveils First Installation of Its AI Supercomputer at Argonne National Labs

At Supercomputing 2019 in Denver, Colo., Cerebras Systems unveiled the computer powered by the world's biggest chip. Cerebras says the computer, the CS-1, has the equivalent machine learning capabilities of hundreds of racks worth of GPU-based computers consuming hundreds of kilowatts, but it takes up only one-third of a standard rack and consumes about 17 kW. Argonne National Labs, future home of what's expected to be the United States' first exascale supercomputer, says it has already deployed a CS-1. Argonne is one of two announced U.S. National Laboratories customers for Cerebras, the other being Lawrence Livermore National Laboratory.

The system "is the fastest AI computer," says CEO and cofounder Andrew Feldman. He compared it with Google's TPU clusters (the 2nd of three generations of that company's AI computers), noting that one of those "takes 10 racks and over 100 kilowatts to deliver a third of the performance of a single [CS-1] box."

The CS-1 is designed to speed the training of novel and large neural networks, a process that can take weeks or longer. Powered by a 400,000-core, 1-trillion-transistor wafer-scale processor chip, the CS-1 should collapse that task to minutes or even seconds. However, Cerebras did not provide data showing this performance in terms of standard AI benchmarks such as the new MLPerf standards. Instead it has been wooing potential customers by having them train their own neural network models on machines at Cerebras.

[...] The CS-1's first application is in predicting cancer drug response as part of a U.S. Department of Energy and National Cancer Institute collaboration. It is also being used to help understand the behavior of colliding black holes and the gravitational waves they produce. A previous instance of that problem required 1024 out of 4392 nodes of the Theta supercomputer.

Also at TechCrunch, VentureBeat, and Wccftech.

Previously: Cerebras "Wafer Scale Engine" Has 1.2 Trillion Transistors, 400,000 Cores


Original Submission

Cerebras More than Doubles Core and Transistor Count with 2nd-Generation Wafer Scale Engine 20 comments

342 Transistors for Every Person In the World: Cerebras 2nd Gen Wafer Scale Engine Teased

One of the highlights of Hot Chips from 2019 was the startup Cerebras showcasing its product – a large 'wafer-scale' AI chip that was literally the size of a wafer. The chip itself was rectangular, but it was cut from a single wafer, and contained 400,000 cores, 1.2 trillion transistors, 46225 mm2 of silicon, and was built on TSMC's 16 nm process.

[...] Obviously when doing wafer scale, you can't just add more die area, so the only way is to optimize die area per core and take advantage of smaller process nodes. That means for TSMC 7nm, there are now 850,000 cores and 2.6 trillion transistors. Cerebras has had to develop new technologies to deal with multi-reticle designs, but they succeeded with the first gen, and transferred the learnings to the new chip. We're expecting more details about this new product later this year.

Previously: Cerebras "Wafer Scale Engine" Has 1.2 Trillion Transistors, 400,000 Cores
Cerebras Systems' Wafer Scale Engine Deployed at Argonne National Labs


Original Submission

The Trillion-Transistor Chip That Just Left a Supercomputer in the Dust 25 comments

The Trillion-Transistor Chip That Just Left a Supercomputer in the Dust:

So, in a recent trial, researchers pitted the chip—which is housed in an all-in-one system about the size of a dorm room mini-fridge called the CS-1—against a supercomputer in a fluid dynamics simulation. Simulating the movement of fluids is a common supercomputer application useful for solving complex problems like weather forecasting and airplane wing design.

The trial was described in a preprint paper written by a team led by Cerebras's Michael James and NETL's Dirk Van Essendelft and presented at the supercomputing conference SC20 this week. The team said the CS-1 completed a simulation of combustion in a power plant roughly 200 times faster than it took the Joule 2.0 supercomputer to do a similar task.

The CS-1 was actually faster-than-real-time. As Cerebrus wrote in a blog post, "It can tell you what is going to happen in the future faster than the laws of physics produce the same result."

The researchers said the CS-1's performance couldn't be matched by any number of CPUs and GPUs. And CEO and cofounder Andrew Feldman told VentureBeat that would be true "no matter how large the supercomputer is." At a point, scaling a supercomputer like Joule no longer produces better results in this kind of problem. That's why Joule's simulation speed peaked at 16,384 cores, a fraction of its total 86,400 cores.

Previously:
Cerebras More than Doubles Core and Transistor Count with 2nd-Generation Wafer Scale Engine
Cerebras Systems' Wafer Scale Engine Deployed at Argonne National Labs
Cerebras "Wafer Scale Engine" Has 1.2 Trillion Transistors, 400,000 Cores


Original Submission

Cerebras Packs 16 Wafer-Scale Chips Into Andromeda "AI" Supercomputer 4 comments

Hungry for AI? New supercomputer contains 16 dinner-plate-size chips

On Monday, Cerebras Systems unveiled its 13.5 million core Andromeda AI supercomputer for deep learning, reports Reuters. According Cerebras, Andromeda delivers over one 1 exaflop (1 quintillion operations per second) of AI computational power at 16-bit half precision.

The Andromeda is itself a cluster of 16 Cerebras C-2 computers linked together. Each CS-2 contains one Wafer Scale Engine chip (often called "WSE-2"), which is currently the largest silicon chip ever made, at about 8.5-inches square and packed with 2.6 trillion transistors organized into 850,000 cores.

Cerebras built Andromeda at a data center in Santa Clara, California, for $35 million. It's tuned for applications like large language models and has already been in use for academic and commercial work. "Andromeda delivers near-perfect scaling via simple data parallelism across GPT-class large language models, including GPT-3, GPT-J and GPT-NeoX," writes Cerebras in a press release.

Previously: Cerebras "Wafer Scale Engine" Has 1.2 Trillion Transistors, 400,000 Cores
Cerebras Systems' Wafer Scale Engine Deployed at Argonne National Labs
Cerebras More than Doubles Core and Transistor Count with 2nd-Generation Wafer Scale Engine
The Trillion-Transistor Chip That Just Left a Supercomputer in the Dust


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0) by Anonymous Coward on Tuesday August 20 2019, @09:18PM (2 children)

    by Anonymous Coward on Tuesday August 20 2019, @09:18PM (#882786)

    So, a single chip out of the whole wafer? I assume they figured out a monster pizza-sized packaging, too? And how many pins?

    • (Score: 2) by takyon on Tuesday August 20 2019, @09:27PM (1 child)

      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Tuesday August 20 2019, @09:27PM (#882791) Journal

      https://cdn.wccftech.com/wp-content/uploads/2019/08/cerebras-wse-nvidia-v100-featured-image.jpg [wccftech.com]

      I skimmed the white paper [cerebras.net] and there's nothing to answer your specific questions. But there's one bad typo:

      The Cerebras WSE has 18 Gigabytes of on chip memory and 9.6 bytes of memory bandwidth.

      I think that's supposed to be 9.6 petabytes per second.

      The TechCrunch article does get into packaging:


      The third challenge Cerebras confronted was handling thermal expansion. Chips get extremely hot in operation, but different materials expand at different rates. That means the connectors tethering a chip to its motherboard also need to thermally expand at precisely the same rate, lest cracks develop between the two.

      As Feldman explained, “How do you get a connector that can withstand [that]? Nobody had ever done that before, [and so] we had to invent a material. So we have PhDs in material science, [and] we had to invent a material that could absorb some of that difference.”

      Once a chip is manufactured, it needs to be tested and packaged for shipment to original equipment manufacturers (OEMs) who add the chips into the products used by end customers (whether data centers or consumer laptops). There is a challenge though: Absolutely nothing on the market is designed to handle a whole-wafer chip.

      “How on earth do you package it? Well, the answer is you invent a lot of shit. That is the truth. Nobody had a printed circuit board this size. Nobody had connectors. Nobody had a cold plate. Nobody had tools. Nobody had tools to align them. Nobody had tools to handle them. Nobody had any software to test,” Feldman explained. “And so we have designed this whole manufacturing flow, because nobody has ever done it.” Cerebras’ technology is much more than just the chip it sells — it also includes all of the associated machinery required to actually manufacture and package those chips.

      [...] Cerebras has a demo chip (I saw one, and yes, it is roughly the size of my head), and it has started to deliver prototypes to customers, according to reports. The big challenge, though, as with all new chips, is scaling production to meet customer demand.

      For Cerebras, the situation is a bit unusual. Because it places so much computing power on one wafer, customers don’t necessarily need to buy dozens or hundreds of chips and stitch them together to create a compute cluster. Instead, they may only need a handful of Cerebras chips for their deep-learning needs. The company’s next major phase is to reach scale and ensure a steady delivery of its chips, which it packages as a whole system “appliance” that also includes its proprietary cooling technology.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 2) by All Your Lawn Are Belong To Us on Tuesday August 20 2019, @10:26PM

        by All Your Lawn Are Belong To Us (6553) on Tuesday August 20 2019, @10:26PM (#882826) Journal

        to original equipment manufacturers (OEMs) who add the chips into the products used by end customers (whether data centers or consumer laptops)

        I'd hate to hold a laptop with a chip like this in my lap.... :O ;)

        --
        This sig for rent.
  • (Score: 3, Interesting) by takyon on Tuesday August 20 2019, @09:18PM (7 children)

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Tuesday August 20 2019, @09:18PM (#882787) Journal

    Why is this a good idea? Keeping everything on the same chip and on-chip memory reduces latency and increases energy efficiency.

    Power consumption will be in the ballpark of 10kW to 15 kW, not counting the cooling.

    Obviously, the wafer will have defects, but it can tolerate them and has redundant cores (ya think?).

    This is the opposite of the chiplet approach, where you want small pieces of silicon to boost yields and create a wide variety of designs (from mobile to server) with the same chiplets. In this approach, you want as much performance as possible, and using the whole wafer gets you better performance than splitting up the wafer. At least, that's the story, we don't know until someone actually benchmarks something on it.

    Apparently, it can run regular code, not just machine learning stuff. Keeping in mind that these are "small cores".

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 4, Insightful) by JoeMerchant on Tuesday August 20 2019, @09:28PM (1 child)

      by JoeMerchant (3937) on Tuesday August 20 2019, @09:28PM (#882792)

      It's hardly "local" - the wafer area is like a square 8.5" on a side.

      It probably needs to be maintained in a clean room its whole working life...

      --
      Україна досі не є частиною Росії Слава Україні🌻 https://news.stanford.edu/2023/02/17/will-russia-ukraine-war-end
    • (Score: 2) by takyon on Tuesday August 20 2019, @09:38PM

      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Tuesday August 20 2019, @09:38PM (#882802) Journal

      I'll revise that to 15 kW, not sure about the cooling, since it seems the company is providing the cooling solution.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 1, Funny) by Anonymous Coward on Tuesday August 20 2019, @10:47PM (1 child)

      by Anonymous Coward on Tuesday August 20 2019, @10:47PM (#882834)

      This is the opposite of the chiplet approach,

      They can put several of them side to side and call it chiplets. Imagine a beowulf cluster of... nah forget it.

    • (Score: 2) by driverless on Wednesday August 21 2019, @08:29AM (1 child)

      by driverless (4770) on Wednesday August 21 2019, @08:29AM (#883022)

      Ah, they've finally gone public so it's OK to talk about it... yeah, it's a crazy device, even when they presented it the guy started with "every other wafer-scale project has failed", followed by endless questions about why theirs would be any different, and no clear answers. It's pretty outrageous, a single 10kW device with very special-case functionality that requires something the size of a small server rack to run, why would anyone buy this when you can use the space and power for a more conventional, and far more flexible, solution? I mean, from a geeky research-project basis it's pretty cool, but why? Their talk was mostly interruptions for questions about how this thing could be even remotely practical.

      • (Score: 2) by takyon on Wednesday August 21 2019, @10:16AM

        by takyon (881) <takyonNO@SPAMsoylentnews.org> on Wednesday August 21 2019, @10:16AM (#883044) Journal

        Ah, they've finally gone public so it's OK to talk about it...

        What, were you at Hot Chips? Or...

        Anyway, I don't think it's so crazy. Obviously this is a niche product, but it could offer great performance/$ for big companies that need it.

        This thing exists because Moore Slaw Dead and there is a lot of hype money in AI/machine learning, for now and perhaps many years to come.

        It's possible that some of the IP here will make its way into other products. But you could also just use lots of chiplets, stacked memory, etc. on a big ass-interposer. And 3DSoC is going to revolutionize the industry by putting logic and memory as close as possible, and it will probably stay that way.

        --
        [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
  • (Score: 2) by JoeMerchant on Tuesday August 20 2019, @09:31PM (2 children)

    by JoeMerchant (3937) on Tuesday August 20 2019, @09:31PM (#882798)

    ~12kW (the equivalent of 8 to 10 standard household space heaters) radiating off of something smaller than a standard sheet of paper..? yep, that's gonna get hot.

    --
    Україна досі не є частиною Росії Слава Україні🌻 https://news.stanford.edu/2023/02/17/will-russia-ukraine-war-end
    • (Score: 4, Interesting) by FatPhil on Tuesday August 20 2019, @10:31PM (1 child)

      by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Tuesday August 20 2019, @10:31PM (#882829) Homepage
      Same disipation as 12 1kW bar heaters - so it's gonna glow with the a similar black body spectrum.

      They'll never power this up, that's my betting. This is a press release. I'm sure they're looking for funding right now. All they seem to think they need to do is wave a 20cm wafer around and the hundreds of millions will come rushing in. There is a bubble that still needs pumping, it seems.
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
      • (Score: 2) by driverless on Wednesday August 21 2019, @08:37AM

        by driverless (4770) on Wednesday August 21 2019, @08:37AM (#883025)

        They've powered samples up, but see my previous post above.

  • (Score: 2) by legont on Tuesday August 20 2019, @11:28PM (1 child)

    by legont (4179) on Tuesday August 20 2019, @11:28PM (#882851)

    How many *coins per hour?

    --
    "Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
    • (Score: 2, Informative) by Anonymous Coward on Tuesday August 20 2019, @11:59PM

      by Anonymous Coward on Tuesday August 20 2019, @11:59PM (#882864)

      10 quadrillion Zimbabcoins

  • (Score: 2) by Rupert Pupnick on Wednesday August 21 2019, @01:48AM (2 children)

    by Rupert Pupnick (7277) on Wednesday August 21 2019, @01:48AM (#882901) Journal

    Presumably the cores are arrayed in some kind of neural network topology with memory distributed throughout. Would love to know more if anyone has any other relevant links.

    Thermal problem is huge as already pointed out by SNers. So bad, that this special “Z direction” cooling is required. They can’t use fluid flow parallel to the surface of the chip as in a traditional cooling design because the “downstream” edge of the chip would run too hot. If it’s silicon technology you can’t go above 150C anywhere on the chip.

    • (Score: 1) by NickM on Wednesday August 21 2019, @02:19AM (1 child)

      by NickM (2867) Subscriber Badge on Wednesday August 21 2019, @02:19AM (#882913) Journal
      According to the Fortune arctic in the summary

      Wafers incur defects when circuits are burned into them, and those areas become unusable. Nvidia, Intel, and other makers of “normal” smaller chips can get around that by cutting out the good chips in a wafer and scrapping the rest. You can’t do that if the entire wafer is the chip. So Cerebras had to build in redundant circuits, to route around defects in order to still deliver 400,000 working cores, like a miniature internet that keeps going when individual server computers go down. The wafers were produced in partnership with Taiwan Semiconductor Manufacturing, the world’s largest chip manufacturer, but Cerebras has exclusive rights to the intellectual property that makes the process possible.

      --
      I a master of typographic, grammatical and miscellaneous errors !
      • (Score: 2) by Rupert Pupnick on Wednesday August 21 2019, @10:11AM

        by Rupert Pupnick (7277) on Wednesday August 21 2019, @10:11AM (#883042) Journal

        Yeah, read that. Was asking about the ideal failure-free topology, not how easy it is to reconfigure when a piece fails.

(1)