Researchers from Julia Computing, UC Berkeley, Intel, the National Energy Research Scientific Computing Center (NERSC), Lawrence Berkeley National Laboratory, and JuliaLabs@MIT have developed a new parallel computing method to dramatically scale up the process of cataloging astronomical objects. This major improvement leverages 8,192 Intel Xeon processors in Berkeley Lab's new Cori supercomputer and Julia, the high-performance, open-source scientific computing language to deliver a 225x increase in the speed of astronomical image analysis.
The code used for this analysis is called Celeste. It was developed at Berkeley Lab and uses statistical inference to mathematically locate and characterize light sources in the sky. When it was first released in 2015, Celeste was limited to single-node execution on at most hundreds of megabytes of astronomical images. In the case of the Sloan Digital Sky Survey (SDSS), which is the dataset used for this research, this analysis is conducted by identifying points of light in nearly 5 million images of approximately 12 megabytes each – a dataset of 55 terabytes.
Using the new parallel implementation, the research team dramatically increased the speed of its analysis by an estimated 225x. This enabled the processing of more than 20 thousand images, or 250 gigabytes – an increase of more than 3 orders of magnitude compared with previous iterations.
The original paper is available.
(Score: 5, Informative) by VLM on Monday November 28 2016, @09:06PM
I read the original paper, its kinda interesting.
This major improvement leverages 8,192 Intel Xeon processors
Pg 5 of the paper they paid for 1630 nodes of 128 gigs ram, and they're memory IO bound (as I glanced at the paper) so that's their actual scaling factor or limitation. Each node has two processors and each processor has 16 cores, so theoretically 32 cores will try to talk over the same ram bus to 128 gigs of ram, it'll be a rough life for something as data intensive as this project. Some nice crunchy fluid dynamics floating point work probably wouldn't saturate the memory bus.
The paper went into waaay more detail than I cared to study but they basically hit an IO wall, which is why they only got a 225x increase in thruput despite 1630x nodes or who really knows exactly how much parallelism although its clearly a whole lot.
Crays marketing material
http://www.cray.com/products/computing/xc-series?tab=technology [cray.com]
claims
The Intel Xeon multi-core processors provide up to 8,448 cores and enable 297 teraflops per Cray XC liquid-cooled cabinet, and 99 teraflops per Cray XC air-cooled cabinet.
My guess is someone copied and pasted in the promotional material of 8448 and someone else was like "Uh no, thats not a power of 2" and confused virtual cores with physical processors and that got edited to 8192.
Not really sure where 8192 came from, which is an interesting Sherlock Holmes puzzle.
The weirdest part about this whole project is you're trying to process roughly a million, 15 meg files that are mostly kinda independent other than some minimal overlap. So naturally you'd give it to "the internet" as a massively parallel "seti at home" type task, right? Well these guys used many hours of expensive supercomputer time instead, at about 10% or so efficiency. Its cool from a "hold my beer and watch this" but its not really efficient or fast.
From what I know Julia stuff usually doesn't have that low of a speedup factor.
(Score: 0) by Anonymous Coward on Monday November 28 2016, @09:12PM
Don't sweat the details, feel the buzz.