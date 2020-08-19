from the not-going-to-fit-in-a-cell-phone dept.
The five technical challenges Cerebras overcame in building the first trillion transistor chip
Superlatives abound at Cerebras, the until-today stealthy next-generation silicon chip company looking to make training a deep learning model as quick as buying toothpaste from Amazon. Launching after almost three years of quiet development, Cerebras introduced its new chip today — and it is a doozy. The "Wafer Scale Engine" is 1.2 trillion transistors (the most ever), 46,225 square millimeters (the largest ever), and includes 18 gigabytes of on-chip memory (the most of any chip on the market today) and 400,000 processing cores (guess the superlative).
It's made a big splash here at Stanford University at the Hot Chips conference, one of the silicon industry's big confabs for product introductions and roadmaps, with various levels of oohs and aahs among attendees. You can read more about the chip from Tiernan Ray at Fortune and read the white paper from Cerebras itself.
(Score: 0) by Anonymous Coward on Tuesday August 20, @09:18PM (1 child)
So, a single chip out of the whole wafer? I assume they figured out a monster pizza-sized packaging, too? And how many pins?
(Score: 2) by takyon on Tuesday August 20, @09:27PM
https://cdn.wccftech.com/wp-content/uploads/2019/08/cerebras-wse-nvidia-v100-featured-image.jpg [wccftech.com]
I skimmed the white paper [cerebras.net] and there's nothing to answer your specific questions. But there's one bad typo:
I think that's supposed to be 9.6 petabytes per second.
The TechCrunch article does get into packaging:
(Score: 3, Interesting) by takyon on Tuesday August 20, @09:18PM (3 children)
Why is this a good idea? Keeping everything on the same chip and on-chip memory reduces latency and increases energy efficiency.
Power consumption will be in the ballpark of 10kW to 15 kW, not counting the cooling.
Obviously, the wafer will have defects, but it can tolerate them and has redundant cores (ya think?).
This is the opposite of the chiplet approach, where you want small pieces of silicon to boost yields and create a wide variety of designs (from mobile to server) with the same chiplets. In this approach, you want as much performance as possible, and using the whole wafer gets you better performance than splitting up the wafer. At least, that's the story, we don't know until someone actually benchmarks something on it.
Apparently, it can run regular code, not just machine learning stuff. Keeping in mind that these are "small cores".
(Score: 2) by JoeMerchant on Tuesday August 20, @09:28PM (1 child)
It's hardly "local" - the wafer area is like a square 8.5" on a side.
It probably needs to be maintained in a clean room its whole working life...
(Score: 2) by takyon on Tuesday August 20, @09:39PM
Still better than the equivalent amount of split up chips.
They could also use algorithms that distribute data closer to other related data.
(Score: 2) by takyon on Tuesday August 20, @09:38PM
I'll revise that to 15 kW, not sure about the cooling, since it seems the company is providing the cooling solution.
(Score: 2) by JoeMerchant on Tuesday August 20, @09:31PM
~12kW (the equivalent of 8 to 10 standard household space heaters) radiating off of something smaller than a standard sheet of paper..? yep, that's gonna get hot.