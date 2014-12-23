Arthur T Knackerbracket has processed the following story:
Semiconductors have been getting progressively hotter over the past few years as Moore's Law has slowed and more power is required to push higher performance gen over gen.
"That doesn't work anymore... That was back in the Moore's law era where the new node would give me the ability to pack in more transistors that are more performant and it wouldn't increase the energy… that's long gone."
This is a problem AMD has been exploring for years. The company launched the 30x25 initiative in 2021 with the goal to deliver a 30-fold improvement in compute efficiency from a 2020 baseline by 2025.
[...] As CEO Lisa Su illustrated so starkly in her ISSC keynote earlier this year, given the current pace of technology, while a zetaFLOP class supercomputer is certainly possible within around 10 years, it would require so much power to be completely practical. By her estimate, such a machine would require in excess of 500 MW to operate.
With AMD's deadline fast approaching the chip biz has made significant progress, but it still has a long way to go, having achieved just 13.5x improvement so far.
This is an incredibly complex problem to solve and there is no one big lever you can pull to solve it, Papermaster explains. "We're on such an exponential curve of both compute and higher energy consumption that what [you] have to think about is what are the levers you have to bend the curve."
[...] One of the first ways AMD optimized power efficiency was by desegregating compute from I/O and memory and then using the best available process tech for each. The thinking is that certain elements scale better with process shrinks than others. This is the reason AMD's Epyc 4 CPUs use a 6nm process node for I/O and a 5nm node for the compute dies.
[...] This approach doesn't change the fact Moore's Law is slowing down. Packing more compute into a single package is going to require more power, but it does help to reduce the amount needed to move data around.
[...] Even so, hotter chips still pose a challenge with regard to thermal management. As we've previously reported, higher TDPs are already causing headaches for data center operators, especially those looking to deploy AI infrastructure at scale.
Papermaster argues these challenges aren't insurmountable and represent an opportunity with regard to next-gen thermal management and datacenter infrastructure
"As they build up that datacenter, it's worth it for them to invest in advanced cooling. It's worth it for them to have a leading edge, new sources of renewable energy, and new geographic locations that are more ideal to place these datacenters," he said. "I think there's a whole new area of innovation in advanced cooling, better thermal materials, better heat removal systems."
And with these technologies, Papermaster expects AMD and others will be able to push power targets even higher. "I don't see that we're at max wattage by any means," he says.
However, beyond architectural, packaging and systems-level improvements, Papermaster emphasizes the opportunity presented by developing better software.
"The next frontier is getting a deeper partnership through the software stack. We're already started working closely with the leading edge AI practitioners… companies like Microsoft, like Oracle, Lamini and what we've done with Mosaic ML," he says. "Those kinds of partnerships really give us insights as to what we can do optimizing with the players who are providing the software solution."