Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Monday November 28 2016, @08:46PM   Printer-friendly
from the julia-computing-not-julia-child dept.

Researchers from Julia Computing, UC Berkeley, Intel, the National Energy Research Scientific Computing Center (NERSC), Lawrence Berkeley National Laboratory, and JuliaLabs@MIT have developed a new parallel computing method to dramatically scale up the process of cataloging astronomical objects. This major improvement leverages 8,192 Intel Xeon processors in Berkeley Lab's new Cori supercomputer and Julia, the high-performance, open-source scientific computing language to deliver a 225x increase in the speed of astronomical image analysis.

The code used for this analysis is called Celeste. It was developed at Berkeley Lab and uses statistical inference to mathematically locate and characterize light sources in the sky. When it was first released in 2015, Celeste was limited to single-node execution on at most hundreds of megabytes of astronomical images. In the case of the Sloan Digital Sky Survey (SDSS), which is the dataset used for this research, this analysis is conducted by identifying points of light in nearly 5 million images of approximately 12 megabytes each – a dataset of 55 terabytes.

Using the new parallel implementation, the research team dramatically increased the speed of its analysis by an estimated 225x. This enabled the processing of more than 20 thousand images, or 250 gigabytes – an increase of more than 3 orders of magnitude compared with previous iterations.

The original paper is available.


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1, Interesting) by Anonymous Coward on Monday November 28 2016, @09:05PM

    by Anonymous Coward on Monday November 28 2016, @09:05PM (#434219)

    While I pray every day for MATLAB to die a quick horrible death, the Julia people discredit themeselves by continuing to promote disingenuous 10 squillion-fold speed up gains. If you code the wrong way then you can get disastrous performance in any language. No need for those for-loop shennanigans and "yeah, but if you didn't know..." explanations. Everyone knows.

    You already win on price and probably on performance too without any sleight of hand. Put real, indisputable comparisons out there that can't get knocked down by 30 seconds of research.

  • (Score: 5, Informative) by VLM on Monday November 28 2016, @09:06PM

    by VLM (445) Subscriber Badge on Monday November 28 2016, @09:06PM (#434220)

    I read the original paper, its kinda interesting.

    This major improvement leverages 8,192 Intel Xeon processors

    Pg 5 of the paper they paid for 1630 nodes of 128 gigs ram, and they're memory IO bound (as I glanced at the paper) so that's their actual scaling factor or limitation. Each node has two processors and each processor has 16 cores, so theoretically 32 cores will try to talk over the same ram bus to 128 gigs of ram, it'll be a rough life for something as data intensive as this project. Some nice crunchy fluid dynamics floating point work probably wouldn't saturate the memory bus.

    The paper went into waaay more detail than I cared to study but they basically hit an IO wall, which is why they only got a 225x increase in thruput despite 1630x nodes or who really knows exactly how much parallelism although its clearly a whole lot.

    Crays marketing material

    http://www.cray.com/products/computing/xc-series?tab=technology [cray.com]

    claims

    The Intel Xeon multi-core processors provide up to 8,448 cores and enable 297 teraflops per Cray XC liquid-cooled cabinet, and 99 teraflops per Cray XC air-cooled cabinet.

    My guess is someone copied and pasted in the promotional material of 8448 and someone else was like "Uh no, thats not a power of 2" and confused virtual cores with physical processors and that got edited to 8192.

    Not really sure where 8192 came from, which is an interesting Sherlock Holmes puzzle.

    The weirdest part about this whole project is you're trying to process roughly a million, 15 meg files that are mostly kinda independent other than some minimal overlap. So naturally you'd give it to "the internet" as a massively parallel "seti at home" type task, right? Well these guys used many hours of expensive supercomputer time instead, at about 10% or so efficiency. Its cool from a "hold my beer and watch this" but its not really efficient or fast.

    From what I know Julia stuff usually doesn't have that low of a speedup factor.

    • (Score: 0) by Anonymous Coward on Monday November 28 2016, @09:12PM

      by Anonymous Coward on Monday November 28 2016, @09:12PM (#434225)

      Don't sweat the details, feel the buzz.

  • (Score: 3, Insightful) by Anonymous Coward on Monday November 28 2016, @09:08PM

    by Anonymous Coward on Monday November 28 2016, @09:08PM (#434222)

    This is not some breakthrough in parallel computing. Its a breakthrough in parallelizing some a particular statistical problem: they made a "fast numerical optimization routine for Bayesian posterior inference and a statistically efficient scheme for decomposing astronomical optimization problems into subproblems."

    When I hear "Parallel Computing Method" I think of things like SIMD, SIMT and SMP. This work is not a new way to do parallelism; its using parallelism on a problem where it previously didn't scale well. I'd like it if the title of articles mentioned the area where progress was made (parallelizing Bayesian posterior inference for example), rather than referring to progress in related fields: It gets confusing.

    • (Score: 1, Insightful) by Anonymous Coward on Monday November 28 2016, @09:27PM

      by Anonymous Coward on Monday November 28 2016, @09:27PM (#434232)

      Yeah, it looks like the article could be summarized as "Another scientific problem found to be embarrassingly parallelizable, and ported to '90s Beowulf cluster computing framework".

      • (Score: 2) by bob_super on Monday November 28 2016, @09:54PM

        by bob_super (1357) on Monday November 28 2016, @09:54PM (#434246)

        Can you rephrase that using the standard newspaper unit, i.e. give us PS3-equivalent numbers?

  • (Score: 1, Insightful) by Anonymous Coward on Monday November 28 2016, @09:08PM

    by Anonymous Coward on Monday November 28 2016, @09:08PM (#434223)

    New method, or optimizing poorly written software?

  • (Score: 1) by claywar on Monday November 28 2016, @09:47PM

    by claywar (3069) on Monday November 28 2016, @09:47PM (#434240)

    Just imagine a Beowu... oh, just nevermind.