Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Tuesday July 27 2021, @06:54AM   Printer-friendly

Will Approximation Drive Post-Moore's Law HPC Gains?:

“Hardware-based improvements are going to get more and more difficult,” said Neil Thompson, an innovation scholar at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL). [...] Thompson, speaking at Supercomputing Frontiers Europe 2021, likely wasn’t wrong: the proximate death of Moore’s law has been a hot topic in the HPC community for a long time.

[...] Thompson opened with a graph of computing power utilized by the National Oceanic and Atmospheric Administration (NOAA) over time. “Since the 1950s, there has been about a one trillion-fold increase in the amount of computing power being used in these models,” he said. But there was a problem: tracking a weather forecasting metric called mean absolute error (“When you make a prediction, how far off are you on that prediction?”), Thompson pointed out that “you actually need exponentially more computing power to get that [improved] performance.” Without those exponential gains in computing power, the steady gains in accuracy would slow, as well.

Enter, of course, Moore’s law, and the flattening of CPU clock frequencies in the mid-2000s. “But then we have this division, right?” Thompson said. “We start getting into multicore chips, and we’re starting to get computing power in that very specific way, which is not as useful unless you have that amount of parallelism.” Separating out parallelism, he explained, progress had dramatically slowed. “This might worry us if we want to, say, improve weather prediction at the same speed going forward,” he said.

So in 2020, Thompson and others wrote a paper examining ways to improve performance over time in a post-Moore’s law world. The authors landed on three main categories of promise: software-level improvements; algorithmic improvements; and new hardware architectures.

This third category, Thompson said, is experiencing the biggest moment right now, with GPUs and FPGAs exploding in the HPC scene and ever more tailor-made chips emerging. Just five years ago, only four percent of advanced computing users used specialized chips; now, Thompson said, it was 11 percent, and in five more years, it would be 17 percent. But over time, he cautioned, gains from specialized hardware would encounter similar problems to those currently faced by traditional hardware, leaving researchers looking for yet more avenues to improve performance.

[...] The way past these mathematical limits in algorithm optimization, Thompson explained, was through approximation. He brought back the graph of algorithm improvement over time, adding in approximate algorithms – one 100 percent off, one ten percent off. “If you are willing to accept a ten percent approximation to this problem,” he said, you could get enormous jumps, improving performance by a factor of 32. “We are in the process of analyzing this data right now, but I think what you can already see here is that these approximate algorithms are in fact giving us very very substantial gains.”

Thompson presented another graph, this time charting the balance of approximate versus exact improvements in algorithms over time. “In the 1940s,” he said, “almost all of the improvements that people are making are exact improvements – meaning they’re solving the exact problem. … But you can see that as we approach these later decades, and many of the exact algorithms are starting to become already completely solved in an optimal way … approximate algorithms are becoming more and more important as the way that we are advancing algorithms.”

Journal Reference:
Charles E. Leiserson, Neil C. Thompson, Joel S. Emer, et al. There’s plenty of room at the Top: What will drive computer performance after Moore’s law? [$], Science (DOI: 10.1126/science.aam9744)


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 2, Funny) by shrewdsheep on Tuesday July 27 2021, @12:03PM (2 children)

    by shrewdsheep (5215) on Tuesday July 27 2021, @12:03PM (#1160328)

    We can now call our bugs approximations and will gain both in run-time (throw in a return statement here and there) and developer-time (just call it a day)!

    • (Score: 5, Touché) by acid andy on Tuesday July 27 2021, @01:15PM

      by acid andy (1683) on Tuesday July 27 2021, @01:15PM (#1160340) Homepage Journal

      Can I interest you in a job in marketing? ;)

      --
      If a cat has kittens, does a rat have rittens, a bat bittens and a mat mittens?
    • (Score: 2) by DannyB on Tuesday July 27 2021, @04:37PM

      by DannyB (5839) Subscriber Badge on Tuesday July 27 2021, @04:37PM (#1160400) Journal

      As long as we can get answers faster, we will be willing to sacrifice accuracy.

      I will notify the payroll department at once.

      --
      The lower I set my standards the more accomplishments I have.
  • (Score: 1, Interesting) by Anonymous Coward on Tuesday July 27 2021, @01:57PM (3 children)

    by Anonymous Coward on Tuesday July 27 2021, @01:57PM (#1160349)

    Turns out users considered it a bug, Intel had to recall all the chips, and that experiment in approximation ended with a whimper.

    • (Score: 2) by DannyB on Tuesday July 27 2021, @04:56PM

      by DannyB (5839) Subscriber Badge on Tuesday July 27 2021, @04:56PM (#1160407) Journal

      Back then, people actually cared about accuracy and correctness.

      Now speed is subsitute fo accurancy.

      --
      The lower I set my standards the more accomplishments I have.
    • (Score: 2) by takyon on Tuesday July 27 2021, @04:58PM

      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Tuesday July 27 2021, @04:58PM (#1160409) Journal

      This sounds like something that will be done in specialized software for the most part.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 2) by sjames on Thursday July 29 2021, @03:09AM

      by sjames (2882) on Thursday July 29 2021, @03:09AM (#1160890) Journal

      We are Pentium of Borg, you will be approximated.

  • (Score: 3, Funny) by DannyB on Tuesday July 27 2021, @04:55PM

    by DannyB (5839) Subscriber Badge on Tuesday July 27 2021, @04:55PM (#1160405) Journal

    How about this new patented invention . . . tailor taylor maid made for HPC . . .

    Pessimizing compilers!

    Now introducing the shiny gnu -P3 option which includes all pessimizations of both -P2 and -P1.

    -P1 generates code that is worse than the obvious translation an unaided simplistic six weak student project compiler would produce.

    -P2 introduces local pessimizations, including for example, loop-invariant code motion. Code outside the loop is moved inside the loop if it will not affect the overall result.

    -P3 introduces global pessimizations, such as cache scrambling which ensures code commonly used together is located far apart in the executable image. Possibly far enough to make relative address branching impossible unless "branch islands" are introduced to "hopscotch" to the function being called in relocatable code.

    --
    The lower I set my standards the more accomplishments I have.
  • (Score: 5, Interesting) by sjames on Tuesday July 27 2021, @06:52PM (4 children)

    by sjames (2882) on Tuesday July 27 2021, @06:52PM (#1160425) Journal

    Many models typically run on HPC are iterative in nature, so the errors are at least additive and often multiplicative or even exponential. So a 10% approximatetion on one iteration will result in an essentially randomized output by the thousandth iteration. There are probably a few models where errors tend to regress to the mean where approximation could work, but many more will rapidly diverge.

    Lower latency inter-processor communication will be more likely to break those bottlenecks. No I don't mean infiniband. Infiniband is such a kitchen sink standard that you end up with something over something over infiniband in order to get something like reliable communication. By that point you've burned way too many CPU cycles on protocols and drivers, so latency and communications overhead eats any further gains in speed.

    • (Score: 0) by Anonymous Coward on Wednesday July 28 2021, @12:17AM (1 child)

      by Anonymous Coward on Wednesday July 28 2021, @12:17AM (#1160510)

      That really depends on what you are doing. For example, "AI" work is commonly done in half precision because speed is more important than absolute accuracy. For a long time, general calculations on 32 bits was good enough even with quadruple precision and double-double readily available. For some problems, any error is unacceptable; for other problems, having a error of 10% or larger at the end is perfectly acceptable.

      • (Score: 2) by sjames on Wednesday July 28 2021, @02:09AM

        by sjames (2882) on Wednesday July 28 2021, @02:09AM (#1160534) Journal

        As I said, some models work with reduced precision. I have also seen models that sometimes terminate with numerical instability even with double or quad.

    • (Score: 0) by Anonymous Coward on Wednesday July 28 2021, @02:35PM (1 child)

      by Anonymous Coward on Wednesday July 28 2021, @02:35PM (#1160645)

      Many models typically run on HPC are iterative in nature, so the errors are at least additive and often multiplicative or even exponential. So a 10% approximatetion on one iteration will result in an essentially randomized output by the thousandth iteration. There are probably a few models where errors tend to regress to the mean where approximation could work, but many more will rapidly diverge.

      Citation needed.

      This statement is assuming that the errors are self-compounding. It also, to a lesser extent, assumes that they are systemic.

      For example, imagine you are rendering a computer screen image for a game. The fact that pixel 1 is rendered slightly more blue doesn't necessarily mean that pixel 2 will be rendered slightly more blue as well. If each pixel is independently calculated, there is no reason to think the errors would be additive-or-worse, and I can think of numerous people who would gladly sacrifice a 1% or 10% color fidelity to get a 32x FPS increase. (Note: This is a theoretical example. I imagine that graphics rendering does have some cross-pixel interaction and that this may not get a 32x FPS increase due to the nature of the algorithms in question.)

      Second, if the errors are, there is Law of Large Numbers at play as well. Do enough calculations, and statistically most of the errors will wash out. This is true for additive errors at least, although I don't know enough to comment on multiplicative or exponential operations.

      • (Score: 2) by sjames on Thursday July 29 2021, @03:02AM

        by sjames (2882) on Thursday July 29 2021, @03:02AM (#1160886) Journal

        I'm not talking about rendering a frame in a game. I'm talking about deciding if a steel beam in a building will fail in a 50 MPH wind, or how a fire will develop and spread through a building. Or if a disturbance in the tropics will organize and become a major hurricane or if it'll rain a bit and fade away.

        In other words, thousands and thousands of iterations, each taking the output of the previous iteration as it's input.

        OTOH, ray tracing won't likely go through more than 20 iterations and no additional energy enters the system with each iteration. But nobody considers ray tracing to be a job for HPC these days.

        TL;DR, the difference between positive and negative feedback.

(1)