The five technical challenges Cerebras overcame in building the first trillion transistor chip
Superlatives abound at Cerebras, the until-today stealthy next-generation silicon chip company looking to make training a deep learning model as quick as buying toothpaste from Amazon. Launching after almost three years of quiet development, Cerebras introduced its new chip today — and it is a doozy. The "Wafer Scale Engine" is 1.2 trillion transistors (the most ever), 46,225 square millimeters (the largest ever), and includes 18 gigabytes of on-chip memory (the most of any chip on the market today) and 400,000 processing cores (guess the superlative).
It's made a big splash here at Stanford University at the Hot Chips conference, one of the silicon industry's big confabs for product introductions and roadmaps, with various levels of oohs and aahs among attendees. You can read more about the chip from Tiernan Ray at Fortune and read the white paper from Cerebras itself.
Also at BBC, VentureBeat, and PCWorld.
(Score: 0) by Anonymous Coward on Tuesday August 20 2019, @09:18PM (2 children)
So, a single chip out of the whole wafer? I assume they figured out a monster pizza-sized packaging, too? And how many pins?
(Score: 2) by takyon on Tuesday August 20 2019, @09:27PM (1 child)
https://cdn.wccftech.com/wp-content/uploads/2019/08/cerebras-wse-nvidia-v100-featured-image.jpg [wccftech.com]
I skimmed the white paper [cerebras.net] and there's nothing to answer your specific questions. But there's one bad typo:
I think that's supposed to be 9.6 petabytes per second.
The TechCrunch article does get into packaging:
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by All Your Lawn Are Belong To Us on Tuesday August 20 2019, @10:26PM
I'd hate to hold a laptop with a chip like this in my lap.... :O ;)
This sig for rent.
(Score: 3, Interesting) by takyon on Tuesday August 20 2019, @09:18PM (7 children)
Why is this a good idea? Keeping everything on the same chip and on-chip memory reduces latency and increases energy efficiency.
Power consumption will be in the ballpark of 10kW to 15 kW, not counting the cooling.
Obviously, the wafer will have defects, but it can tolerate them and has redundant cores (ya think?).
This is the opposite of the chiplet approach, where you want small pieces of silicon to boost yields and create a wide variety of designs (from mobile to server) with the same chiplets. In this approach, you want as much performance as possible, and using the whole wafer gets you better performance than splitting up the wafer. At least, that's the story, we don't know until someone actually benchmarks something on it.
Apparently, it can run regular code, not just machine learning stuff. Keeping in mind that these are "small cores".
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 4, Insightful) by JoeMerchant on Tuesday August 20 2019, @09:28PM (1 child)
It's hardly "local" - the wafer area is like a square 8.5" on a side.
It probably needs to be maintained in a clean room its whole working life...
Україна досі не є частиною Росії Слава Україні🌻 https://news.stanford.edu/2023/02/17/will-russia-ukraine-war-end
(Score: 2) by takyon on Tuesday August 20 2019, @09:39PM
Still better than the equivalent amount of split up chips.
They could also use algorithms that distribute data closer to other related data.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by takyon on Tuesday August 20 2019, @09:38PM
I'll revise that to 15 kW, not sure about the cooling, since it seems the company is providing the cooling solution.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 1, Funny) by Anonymous Coward on Tuesday August 20 2019, @10:47PM (1 child)
They can put several of them side to side and call it chiplets. Imagine a beowulf cluster of... nah forget it.
(Score: 2) by takyon on Tuesday August 20 2019, @11:07PM
They should stack the suckers. But it would only work with something using a lot less power, like a neuromorphic architecture and non-volatile memory.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by driverless on Wednesday August 21 2019, @08:29AM (1 child)
Ah, they've finally gone public so it's OK to talk about it... yeah, it's a crazy device, even when they presented it the guy started with "every other wafer-scale project has failed", followed by endless questions about why theirs would be any different, and no clear answers. It's pretty outrageous, a single 10kW device with very special-case functionality that requires something the size of a small server rack to run, why would anyone buy this when you can use the space and power for a more conventional, and far more flexible, solution? I mean, from a geeky research-project basis it's pretty cool, but why? Their talk was mostly interruptions for questions about how this thing could be even remotely practical.
(Score: 2) by takyon on Wednesday August 21 2019, @10:16AM
What, were you at Hot Chips? Or...
Anyway, I don't think it's so crazy. Obviously this is a niche product, but it could offer great performance/$ for big companies that need it.
This thing exists because Moore Slaw Dead and there is a lot of hype money in AI/machine learning, for now and perhaps many years to come.
It's possible that some of the IP here will make its way into other products. But you could also just use lots of chiplets, stacked memory, etc. on a big ass-interposer. And 3DSoC is going to revolutionize the industry by putting logic and memory as close as possible, and it will probably stay that way.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by JoeMerchant on Tuesday August 20 2019, @09:31PM (2 children)
~12kW (the equivalent of 8 to 10 standard household space heaters) radiating off of something smaller than a standard sheet of paper..? yep, that's gonna get hot.
Україна досі не є частиною Росії Слава Україні🌻 https://news.stanford.edu/2023/02/17/will-russia-ukraine-war-end
(Score: 4, Interesting) by FatPhil on Tuesday August 20 2019, @10:31PM (1 child)
They'll never power this up, that's my betting. This is a press release. I'm sure they're looking for funding right now. All they seem to think they need to do is wave a 20cm wafer around and the hundreds of millions will come rushing in. There is a bubble that still needs pumping, it seems.
Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
(Score: 2) by driverless on Wednesday August 21 2019, @08:37AM
They've powered samples up, but see my previous post above.
(Score: 2) by legont on Tuesday August 20 2019, @11:28PM (1 child)
How many *coins per hour?
"Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
(Score: 2, Informative) by Anonymous Coward on Tuesday August 20 2019, @11:59PM
10 quadrillion Zimbabcoins
(Score: 2) by Rupert Pupnick on Wednesday August 21 2019, @01:48AM (2 children)
Presumably the cores are arrayed in some kind of neural network topology with memory distributed throughout. Would love to know more if anyone has any other relevant links.
Thermal problem is huge as already pointed out by SNers. So bad, that this special “Z direction” cooling is required. They can’t use fluid flow parallel to the surface of the chip as in a traditional cooling design because the “downstream” edge of the chip would run too hot. If it’s silicon technology you can’t go above 150C anywhere on the chip.
(Score: 1) by NickM on Wednesday August 21 2019, @02:19AM (1 child)
I a master of typographic, grammatical and miscellaneous errors !
(Score: 2) by Rupert Pupnick on Wednesday August 21 2019, @10:11AM
Yeah, read that. Was asking about the ideal failure-free topology, not how easy it is to reconfigure when a piece fails.