Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 19 submissions in the queue.
posted by mrpg on Monday October 27, @07:11AM   Printer-friendly
from the nice dept.

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system:

Alibaba Cloud claims its new Aegaeon pooling system reduces the number of Nvidia GPUs required to serve large language models by 82% during a multi-month beta test inside its Model Studio marketplace. The result, published in a peer-reviewed paper presented at the 2025 ACM Symposium on Operating Systems (SOSP) in Seoul, suggests that cloud providers may be able to extract significantly more inference capacity from existing silicon, especially in constrained markets like China, where the supply of Nvidia's latest H20s remains limited.

Unlike training-time breakthroughs that chase model quality or speed, Aegaeon is an inference-time scheduler designed to maximize GPU utilization across many models with bursty or unpredictable demand. Instead of pinning one accelerator to one model, Aegaeon virtualizes GPU access at the token level, allowing it to schedule tiny slices of work across a shared pool. This means one H20 could serve several different models simultaneously, with system-wide “goodput” — a measure of effective output — rising by as much as nine times compared to older serverless systems.

The system was tested in production over several months, according to the paper, which lists authors from both Peking University and Alibaba’s infrastructure division, including CTO Jingren Zhou. During that window, the number of GPUs needed to support dozens of different LLMs — ranging in size up to 72 billion parameters — fell from 1,192 to just 213.


Original Submission

 
This discussion was created by mrpg (5708) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Insightful) by gnuman on Monday October 27, @10:55AM (1 child)

    by gnuman (5013) on Monday October 27, @10:55AM (#1422460)

    It's funny how a constrained environment brings up efficiencies.

    Starting Score:    1  point
    Moderation   +3  
       Insightful=3, Total=3
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 4, Insightful) by Booga1 on Monday October 27, @03:39PM

    by Booga1 (6333) on Monday October 27, @03:39PM (#1422484)

    Necessity is the mother of invention.

    I wonder if this will have any effect slowing down hardware demand.
    The rush for AI everything is crazy, and the amount of trust people are putting in it is even more insane.