Second GPU Cloudburst Experiment Yields New Findings
In late 2019, researchers at the San Diego Supercomputer Center (SDSC) and the Wisconsin IceCube Particle Astrophysics Center (WIPAC) caught the attention of the high-performance computing community and top commercial cloud providers by successfully completing a bold experiment that marshalled all globally available-for-sale GPUs (graphics processing units) for a brief run which proved it is possible to elastically burst to very large scales of GPUs using the cloud, even in this pre-exascale era of computing.
[...] Fast forward to early February 4, 2020, when the same research team conducted a second experiment with a fraction of the remaining funding left over from a modest National Science Foundation EAGER grant.
[...] "We drew several key conclusions from this second demonstration," said SDSC's Sfiligoi. "We showed that the cloudburst run can actually be sustained during an entire workday instead of just one or two hours, and have moreover measured the cost of using only the two most cost-effective cloud instances for each cloud provider."
The team managed to reach and sustain a plateau of about 15,000 GPUs, or 170 PFLOP32s (i.e. fp32 PFLOPS[*]) using the peak fp32 FLOPS provided by NVIDIA specs. The cloud instances were provisioned from all major geographical areas, and the total integrated compute time was just over one fp32 exaFLOP[*] hour. The total cost of the cloud run was roughly $60,000.
[*] fp32 (floating point, 32-bit operands aka single precision)
FLOPS: Floating-point OPerations per Second
PFLOPS; Petaflops 1015 (i.e. 1,000,000,000,000,000) floating point operations per second.
exaFLOPS; 1018 (i.e. 1,000,000,000,000,000,000) floating point operations per second.
Disclaimer: The original story was apparently submitted by a participant in the research.
(Score: 0) by Anonymous Coward on Thursday February 13 2020, @12:00PM (2 children)
Just how fast can you watch porn?
(Score: 2) by takyon on Thursday February 13 2020, @12:12PM (1 child)
Concurrent streams. How many cores does your brain have?
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 1, Touché) by Anonymous Coward on Thursday February 13 2020, @12:50PM
So much pr0n, so little worth watching.
Maybe these elastic cloud AIs can get to work creating pr0n worth watching.
(Score: 4, Interesting) by takyon on Thursday February 13 2020, @12:48PM
Most of the big discrete GPUs right now are (going to be) in the range of 5 to 20 TFLOPS FP32. For example, 8 TFLOPS for Radeon RX 5700 [techpowerup.com], 13.5 TFLOPS for RTX 2080 Ti [techpowerup.com], and 16.3 TFLOPS for Titan RTX [techpowerup.com]. AMD will have "Big Navi" this year and Nvidia should have "Ampere". One of them will probably make it to 20 TFLOPS.
The enterprise/HPC cards have more memory, better FP16 performance, etc. Tesla T4 [techpowerup.com] mentioned in TFA is 8.141 TFLOPS FP32, so it should take almost 21,000 of those to reach 170 PFLOPS.
Intel is about to enter the market, and will be focusing on supercomputing:
Intel’s Xe for HPC: Ponte Vecchio with Chiplets, EMIB, and Foveros on 7nm, Coming 2021 [anandtech.com]
Monstrous 500W Intel Xe MCM Flagship GPU Leaked In Internal Documents – 4 Xe Tiles Stacked Using Foveros 3D Packaging [wccftech.com]
The multi-chip module and HBM approach will enable massive performance in a single gigantic package. Although heat could be through the roof, it's worth it if it can be cooled effectively and there is a FLOPS/W improvement.
We will see GPUs that can hit 100 TFLOPS, and hopefully 1 PFLOPS (or a lot more if a 3DSoC approach gets adapted). You'll be able to rent exascale performance soon. FP32 is one thing, but GPUs will have to compete with stuff like tensor processors and the giant Cerebras Wafer Scale Engine for machine learning. Are these levels of performance irrelevant for gaming? Triple-monitor/wide 8K seems like the absolute end of the road, and 16K VR shouldn't need all of the performance of regular 16K due to headset eye tracking + foveated rendering. I guess we could boost refresh rates to 1000 Hz.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 0) by Anonymous Coward on Thursday February 13 2020, @01:53PM (7 children)
did they just run a benchmark program to test or did they acctually throw some observation data that needed to be crunched at it?
how many kilowatt hours were sacrificed? how much energy was "abused" from craddle (design, manufacture, shipping) to grave (landfill)?
there will never be enough processing power to figure out why the planet is not covered with silicon that spits out electrons ... eh?
(Score: 4, Informative) by takyon on Thursday February 13 2020, @02:18PM (6 children)
No wasted electrons.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 1) by fustakrakich on Thursday February 13 2020, @05:05PM (5 children)
How many watts were consumed in that eight hours?
How many watts per FLOP?
La politica e i criminali sono la stessa cosa..
(Score: 2) by takyon on Thursday February 13 2020, @05:23PM (4 children)
I'm just gonna be lazy and say:
1.46 megawatts (42 gigajoules consumed, or 10 tons of TNT)
9 picowatts
Are those right? Probably not. But they sound good.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 1) by fustakrakich on Thursday February 13 2020, @05:32PM (3 children)
I am disappointed that this is not a subject of interest. A lot of power is being used to produce something hardly more useful than determining somebody's price/earnings ratios.
La politica e i criminali sono la stessa cosa..
(Score: 2) by takyon on Thursday February 13 2020, @05:34PM (1 child)
https://www.top500.org/green500/ [top500.org]
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 1) by fustakrakich on Thursday February 13 2020, @06:21PM
Well, if it can cook minute rice in 15 seconds, I'm all for it.
They're still using a hell of a lot of juice.
La politica e i criminali sono la stessa cosa..
(Score: 0) by Anonymous Coward on Friday February 14 2020, @07:58AM