Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 17 submissions in the queue.
posted by janrinok on Tuesday April 15, @01:12AM   Printer-friendly
from the for-some-definitions-of-'more-powerful' dept.

Google's new Ironwood chip is 24x more powerful than the world's fastest supercomputer:

Google Cloud unveiled its seventh-generation Tensor Processing Unit (TPU), Ironwood, on Wednesday. This custom AI accelerator, the company claims, delivers more than 24 times the computing power of the world's fastest supercomputer when deployed at scale.

The new chip, announced at Google Cloud Next '25, represents a significant pivot in Google's decade-long AI chip development strategy. While previous generations of TPUs were designed primarily for both training and inference workloads, Ironwood is the first purpose-built specifically for inference — the process of deploying trained AI models to make predictions or generate responses.

"Ironwood is built to support this next phase of generative AI and its tremendous computational and communication requirements," said Amin Vahdat, Google's Vice President and General Manager of ML, Systems, and Cloud AI, in a virtual press conference ahead of the event. "This is what we call the 'age of inference' where AI agents will proactively retrieve and generate data to collaboratively deliver insights and answers, not just data."

The technical specifications of Ironwood are striking. When scaled to 9,216 chips per pod, Ironwood delivers 42.5 exaflops of computing power — dwarfing El Capitan's 1.7 exaflops, currently the world's fastest supercomputer. Each individual Ironwood chip delivers peak compute of 4,614 teraflops.

Ironwood also features significant memory and bandwidth improvements. Each chip comes with 192GB of High Bandwidth Memory (HBM), six times more than Trillium, Google's previous-generation TPU announced last year. Memory bandwidth reaches 7.2 terabits per second per chip, a 4.5x improvement over Trillium.

Perhaps most importantly, in an era of power-constrained data centers, Ironwood delivers twice the performance per watt compared to Trillium, and is nearly 30 times more power efficient than Google's first Cloud TPU from 2018.

"At a time when available power is one of the constraints for delivering AI capabilities, we deliver significantly more capacity per watt for customer workloads," Vahdat explained.

The emphasis on inference rather than training represents a significant inflection point in the AI timeline. The industry has been fixated on building increasingly massive foundation models for years, with companies competing primarily on parameter size and training capabilities. Google's pivot to inference optimization suggests we're entering a new phase where deployment efficiency and reasoning capabilities take center stage.

This transition makes sense. Training happens once, but inference operations occur billions of times daily as users interact with AI systems. The economics of AI are increasingly tied to inference costs, especially as models grow more complex and computationally intensive.

During the press conference, Vahdat revealed that Google has observed a 10x year-over-year increase in demand for AI compute over the past eight years — a staggering factor of 100 million overall. No amount of Moore's Law progression could satisfy this growth curve without specialized architectures like Ironwood.

What's particularly notable is the focus on "thinking models" that perform complex reasoning tasks rather than simple pattern recognition. This suggests that Google sees the future of AI not just in larger models, but in models that can break down problems, reason through multiple steps and simulate human-like thought processes.

Google is positioning Ironwood as the foundation for its most advanced AI models, including Gemini 2.5, which the company describes as having "thinking capabilities natively built in."

At the conference, Google also announced Gemini 2.5 Flash, a more cost-effective version of its flagship model that "adjusts the depth of reasoning based on a prompt's complexity." While Gemini 2.5 Pro is designed for complex use cases like drug discovery and financial modeling, Gemini 2.5 Flash is positioned for everyday applications where responsiveness is critical.

The company also demonstrated its full suite of generative media models, including text-to-image, text-to-video, and a newly announced text-to-music capability called Lyria. A demonstration showed how these tools could be used together to create a complete promotional video for a concert.

Ironwood is just one part of Google's broader AI infrastructure strategy. The company also announced Cloud WAN, a managed wide-area network service that gives businesses access to Google's planet-scale private network infrastructure.


Original Submission

This discussion was created by janrinok (52) for logged-in users only. Log in and try again!
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0, Funny) by Anonymous Coward on Tuesday April 15, @01:28AM (1 child)

    by Anonymous Coward on Tuesday April 15, @01:28AM (#1400244)

    hope they all circle the bowl together.

    • (Score: 0) by Anonymous Coward on Tuesday April 15, @01:58AM

      by Anonymous Coward on Tuesday April 15, @01:58AM (#1400248)

      Wake me up when I get my morningwood back.

  • (Score: 5, Touché) by HiThere on Tuesday April 15, @02:26AM (9 children)

    by HiThere (866) on Tuesday April 15, @02:26AM (#1400249) Journal

    As stated, that's probably true. But "When deployed at scale" doesn't specify at what scale, and the rest (judging by the summary) doesn't specify on what problem set it has this increased capability.

    So it's probably true, but it's still marketer-speak.

    --
    Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
    • (Score: 3, Interesting) by EJ on Tuesday April 15, @03:25AM (3 children)

      by EJ (2452) on Tuesday April 15, @03:25AM (#1400255)

      Did you read the article summary?

      "When scaled to 9,216 chips per pod..."

      • (Score: 2) by corey on Tuesday April 15, @04:46AM (2 children)

        by corey (2202) on Tuesday April 15, @04:46AM (#1400261)

        I think he means that “at scale” can be anything from 2 to infinity. In the sense that 3,000,000,000 Premium 100s would beat the worlds fastest supercomputer. That’s also “at scale.” Is very much marketing dribble.

        What they should say is, the per unit conpute rating of these new Ironwood TPUs is an incremental increase over previous generations.

        • (Score: 2, Disagree) by EJ on Tuesday April 15, @05:10AM (1 child)

          by EJ (2452) on Tuesday April 15, @05:10AM (#1400262)

          I guess the point of saying "at scale" is so people don't NEED to know what it means. If I tell you that tires deployed "at scale" can hold the weight of a car, then you don't need to care that a car takes four tires.

          Sure, a technical-minded person may be curious about the specific numbers, but an executive just cares about the cost and overall performance. "Marketing-speak" has its purposes. Those purposes just aren't interesting to many of this site's posters.

          • (Score: 3, Insightful) by PiMuNu on Tuesday April 15, @08:00AM

            by PiMuNu (3823) on Tuesday April 15, @08:00AM (#1400276)

            On the other hand, if I tell you tyres deployed "at scale" can hold up more weight than the best truck tyres, it doesn't tell you anything.

            Toy car tyres can hold up more weight than the best truck tyres, given enough of them.

    • (Score: 4, Touché) by driverless on Tuesday April 15, @07:54AM (2 children)

      by driverless (4770) on Tuesday April 15, @07:54AM (#1400275)

      What the headline is actually saying is that our apples have 24x the computing power of your oranges.

      • (Score: 3, Insightful) by PiMuNu on Tuesday April 15, @08:21AM

        by PiMuNu (3823) on Tuesday April 15, @08:21AM (#1400278)

        > What the headline is actually saying is that our apples have 24x the computing power of your oranges.

        A bag of our apples has 24x the computing power of a differently sized bag of oranges.

      • (Score: 2) by VLM on Tuesday April 15, @04:27PM

        by VLM (445) Subscriber Badge on Tuesday April 15, @04:27PM (#1400312)

        I also like how they use the EE measurement "power" yet what it actually does that is measurable, is fail to count how many "R" in strawberry.

    • (Score: 2, Informative) by Anonymous Coward on Tuesday April 15, @09:54AM (1 child)

      by Anonymous Coward on Tuesday April 15, @09:54AM (#1400283)

      But "When deployed at scale" doesn't specify at what scale, ...

      So it's probably true, but it's still marketer-speak.

      The last phrase suggests to me you aren't actually interested in this info, otherwise you wouldn't ignoring some of the summary, like

      When scaled to 9,216 chips per pod, Ironwood delivers 42.5 exaflops of computing power — dwarfing El Capitan's 1.7 exaflops, currently the world's fastest supercomputer. Each individual Ironwood chip delivers peak compute of 4,614 teraflops.

      If I'm wrong and you (or others) are interested, many of the links in the summary points to this page [blog.google]

      and the rest (judging by the summary) doesn't specify on what problem set it has this increased capability.

      Letting aside the Tensor Processing Unit [blog.google] link in the summary, you could visit also Wikipedia entry for TPU [wikipedia.org].

      • (Score: 1, Touché) by Anonymous Coward on Tuesday April 15, @01:17PM

        by Anonymous Coward on Tuesday April 15, @01:17PM (#1400293)

        But how many "pods"? Seems still arbitrary.

  • (Score: 3, Funny) by Mojibake Tengu on Tuesday April 15, @05:57AM

    by Mojibake Tengu (8598) on Tuesday April 15, @05:57AM (#1400268) Journal

    Gemini is exceptionally ingenious in finding elaborate excuses why she cannot perform a desired computation.

    I like her for just that.

    As for some useful tasks... well, there are other LLMs.

    --
    Rust programming language offends both my Intelligence and my Spirit.
  • (Score: 5, Insightful) by rpnx on Tuesday April 15, @06:05PM

    by rpnx (13892) on Tuesday April 15, @06:05PM (#1400321) Journal

    Google needs to release TPUs in discrete PCIe cards. Without doing that, there's no way TPU programming will gain widespread adoption. OpenXLA is not enough.

(1)