Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Wednesday August 19 2020, @01:36AM   Printer-friendly
from the Amdahl's-law? dept.

342 Transistors for Every Person In the World: Cerebras 2nd Gen Wafer Scale Engine Teased

One of the highlights of Hot Chips from 2019 was the startup Cerebras showcasing its product – a large 'wafer-scale' AI chip that was literally the size of a wafer. The chip itself was rectangular, but it was cut from a single wafer, and contained 400,000 cores, 1.2 trillion transistors, 46225 mm2 of silicon, and was built on TSMC's 16 nm process.

[...] Obviously when doing wafer scale, you can't just add more die area, so the only way is to optimize die area per core and take advantage of smaller process nodes. That means for TSMC 7nm, there are now 850,000 cores and 2.6 trillion transistors. Cerebras has had to develop new technologies to deal with multi-reticle designs, but they succeeded with the first gen, and transferred the learnings to the new chip. We're expecting more details about this new product later this year.

Previously: Cerebras "Wafer Scale Engine" Has 1.2 Trillion Transistors, 400,000 Cores
Cerebras Systems' Wafer Scale Engine Deployed at Argonne National Labs


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 3, Interesting) by takyon on Wednesday August 19 2020, @01:36AM

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Wednesday August 19 2020, @01:36AM (#1038634) Journal

    https://www.hpcwire.com/2020/05/13/cerebras-argonne-supercomputer-fighting-covid-19/ [hpcwire.com]

    For Argonne and CS-1, brute force was not the name of the game. Instead, Argonne applied the CS-1’s AI capabilities to train machine learning models to churn through the lab’s massive molecular datasets (comprising existing FDA-approved drugs) and predict which of those molecules would have the best docking scores. The result, according to Cerebras: “hundreds of times” faster turnaround on the datasets at a fraction of the computational cost.

    The first iteration of the CS-1’s battle against COVID-19 was completed over the last few weeks. Now, Argonne and Cerebras are working on a new ML process for CS-1 that would treat the process as a computer vision problem, representing viral proteins and drug molecules as images rather than numbers.

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
  • (Score: 2) by takyon on Wednesday August 19 2020, @01:36AM

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Wednesday August 19 2020, @01:36AM (#1038635) Journal
  • (Score: 2) by takyon on Wednesday August 19 2020, @01:36AM

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Wednesday August 19 2020, @01:36AM (#1038636) Journal

    AI Supercomputer at PSC to Combine Cerebras ‘World’s Largest Chip’ and HPE Superdome Flex [insidehpc.com]

    The Pittsburgh Supercomputing Center has won a $5 million award from the National Science Foundation to build Neocortex, an AI supercomputer that incorporates the Cerebras Systems Wafer Scale Engine technology introduced last year along with Hewlett Packard Enterprise’s shared memory Superdome Flex hardware.

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
  • (Score: 2) by takyon on Wednesday August 19 2020, @01:37AM

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Wednesday August 19 2020, @01:37AM (#1038637) Journal
  • (Score: 2) by takyon on Wednesday August 19 2020, @01:39AM (6 children)

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Wednesday August 19 2020, @01:39AM (#1038638) Journal

    https://www.cerebras.net/product/#explorer-3 [cerebras.net]

    The CS-1 is an internally water-cooled system. Like a giant gaming PC on steroids, the CS-1 uses water to cool the WSE, and then uses air to cool the water. Water circulates through a closed loop internal to the system.

    Two hot-swappable pumps on the top right move water through a manifold across the back of the WSE, cooling the wafer and warming the water. Warm water is then pumped into a heat exchanger. This heat exchanger presents a large surface area for the cold air blown in by the four hot-swappable fans at the bottom of the CS-1. The fans move air from the cold aisle, cool the warm water via the heat exchanger, and exhaust the warm air into the warm aisle.

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 2) by RS3 on Wednesday August 19 2020, @02:03AM (5 children)

      by RS3 (6367) on Wednesday August 19 2020, @02:03AM (#1038650)

      15 kW power consumption for 1, uh, "chip".

      • (Score: 3, Funny) by Rosco P. Coltrane on Wednesday August 19 2020, @02:06AM (1 child)

        by Rosco P. Coltrane (4757) on Wednesday August 19 2020, @02:06AM (#1038651)

        On the plus side, if a chip fails QC, it can always be sold as a space heater.

        • (Score: 2) by RS3 on Wednesday August 19 2020, @03:44AM

          by RS3 (6367) on Wednesday August 19 2020, @03:44AM (#1038693)

          I hope it's mounted horizontally. Regardless of the cooling system, at those power levels the silicon's liable to slump.

      • (Score: 4, Informative) by takyon on Wednesday August 19 2020, @02:08AM

        by takyon (881) <takyonNO@SPAMsoylentnews.org> on Wednesday August 19 2020, @02:08AM (#1038653) Journal

        Sometimes, to be energy efficient, you need to dump all the energy into one big ass chip. At least, according to Cerebras.

        They didn't give any wattage for the 2nd-generation chip, but maybe it will go down. The size should be around the same, and the TSMC "7nm" process node is substantially more efficient than "16nm".

        https://en.wikichip.org/wiki/7_nm_lithography_process#TSMC [wikichip.org]

        TSMC original 7-nanometer N7 process was introduced in April 2018. Compared to its own 16-nanometer technology, TSMC claims its 7 nm node provides around 35-40% speed improvement or 65% lower power. Compared to the half-node 10 nm node, N7 is said to provide ~20% speed improvement or ~40% power reduction. In terms of density, N7 is said to deliver 1.6x and 3.3x improvement compared to N10 and N16 respectively. N7 largely builds on all prior FinFET processes the company has had previously. To that end, this is a fourth-generation FinFET, fifth-generation HKMG, gate-last, dual gate oxide process.

        --
        [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 2) by driverless on Wednesday August 19 2020, @02:58AM (1 child)

        by driverless (4770) on Wednesday August 19 2020, @02:58AM (#1038681)

        I was at the Hot Chips presentation when they introduced this... the response from industry professionals was a near-universal WTF, it just doesn't make sense to pile everything onto one monster device when you can avoid the near-insurmountable engineering problems just by going with many smaller ones. Even Cerebras admitted that every time someone's tried WSI it's failed, "but this time it's different". Surprised to see they're still around, presumably they're relying on a couple of near-infinite-budget national-lab customers to keep going.

  • (Score: 2) by takyon on Wednesday August 19 2020, @01:43AM (1 child)

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Wednesday August 19 2020, @01:43AM (#1038641) Journal

    https://www.cerebras.net/wp-content/uploads/2019/08/Cerebras-Wafer-Scale-Engine-An-Introduction.pdf [cerebras.net]

    The 400,000 cores on the Cerebras WSE are connected via the Swarm communication fabric in a 2D mesh with 100 Petabits per second of bandwidth. Swarm provides a hardware routing engine to each of the compute cores and connects them with short wires optimized for latency and bandwidth. The resulting fabric supports single-word active messages that can be handled by the receiving cores without any software overhead. The fabric provides flexible, all-hardware communication.

    Swarm is fully configurable. The Cerebras software configures all the cores on the WSE to support the precise communication required for training the user-specified model. For each neural network, Swarm provides a unique and optimized communication path. This is different than the approach taken by central processing units and graphics processing units that have one hard-coded on-chip communication path into which all neural networks are shoehorned.

    Swarm’s results are impressive. Typical messages traverse one hardware link with nanosecond latency. The aggregate bandwidth of the system is measured in tens of petabytes per second. Communication software such as TCP/IP and MPI is not needed, avoiding associated performance penalties. The energy cost of communication in this architecture is well below one picojoule per bit, which is nearly two orders of magnitude lower than central processing units or graphics processing units. As a result of the Swarm communication fabric, the WSE trains models faster and uses less power.

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 0) by Anonymous Coward on Wednesday August 19 2020, @11:56AM

      by Anonymous Coward on Wednesday August 19 2020, @11:56AM (#1038774)

      I wonder if they thought about the hypercube topology developed for The Connection Machine?
          https://en.wikipedia.org/wiki/Connection_Machine [wikipedia.org]
      Maybe Cerebras could configure the programmable hardware to emulate a really big Connection Machine...and run some of the software that was prototyped back then?

  • (Score: 2) by mendax on Wednesday August 19 2020, @02:21AM (1 child)

    by mendax (2840) on Wednesday August 19 2020, @02:21AM (#1038655)

    Who ever said that Moore's law was dead apparently needs a reality check.

    --
    It's really quite a simple choice: Life, Death, or Los Angeles.
  • (Score: 0) by Anonymous Coward on Wednesday August 19 2020, @03:14AM (2 children)

    by Anonymous Coward on Wednesday August 19 2020, @03:14AM (#1038687)

    i like to lick it and stick it

    • (Score: 3, Funny) by mhajicek on Wednesday August 19 2020, @04:12AM

      by mhajicek (51) Subscriber Badge on Wednesday August 19 2020, @04:12AM (#1038705)

      Picturing you with a large silicon wafer stuck to your forehead.

      --
      The spacelike surfaces of time foliations can have a cusp at the surface of discontinuity. - P. Hajicek
    • (Score: 0) by Anonymous Coward on Wednesday August 19 2020, @03:50PM

      by Anonymous Coward on Wednesday August 19 2020, @03:50PM (#1038851)

      Ritual troll display.

  • (Score: 0) by Anonymous Coward on Wednesday August 19 2020, @07:07AM (1 child)

    by Anonymous Coward on Wednesday August 19 2020, @07:07AM (#1038751)

    Ok, so what's the yield on this thing? Guess they designed this thing to tolerate good amount of silicon defects?

    • (Score: 2) by takyon on Wednesday August 19 2020, @03:25PM

      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Wednesday August 19 2020, @03:25PM (#1038833) Journal

      The design tolerates and routes around defective cores, and the stated core count (400k, now 850k) doesn't include the defective cores, from what I remember.

      TSMC is also getting great yields on "7nm". Depending on how much overprovisioning of the cores was done, maybe the yield rate is nearly 100%?

      Oh, and from my Wikichip link above, TSMC's "7nm" has 3.3x the density of "16nm". But this is only 2.125x the cores. So they have a lot of margin there. I bet they added more SRAM.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(1)