Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Thursday June 21 2018, @12:46AM   Printer-friendly
from the approaching-the-singularity dept.

IBM researchers use analog memory to train deep neural networks faster and more efficiently

Deep neural networks normally require fast, powerful graphical processing unit (GPU) hardware accelerators to support the needed high speed and computational accuracy — such as the GPU devices used in the just-announced Summit supercomputer. But GPUs are highly energy-intensive, making their use expensive and limiting their future growth, the researchers explain in a recent paper published in Nature.

Instead, the IBM researchers used large arrays of non-volatile analog memory devices (which use continuously variable signals rather than binary 0s and 1s) to perform computations. Those arrays allowed the researchers to create, in hardware, the same scale and precision of AI calculations that are achieved by more energy-intensive systems in software, but running hundreds of times faster and at hundreds of times lower power — without sacrificing the ability to create deep learning systems.

The trick was to replace conventional von Neumann architecture, which is "constrained by the time and energy spent moving data back and forth between the memory and the processor (the 'von Neumann bottleneck')," the researchers explain in the paper. "By contrast, in a non-von Neumann scheme, computing is done at the location of the data [in memory], with the strengths of the synaptic connections (the 'weights') stored and adjusted directly in memory.

Equivalent-accuracy accelerated neural-network training using analogue memory (DOI: 10.1038/s41586-018-0180-5) (DX)


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Disagree) by jmorris on Thursday June 21 2018, @03:04AM (16 children)

    by jmorris (4844) on Thursday June 21 2018, @03:04AM (#695971)

    Digital is reproducible. Build one system, calculate a result and if another installation of similar hardware runs the same data through the same software it should obtain the same result. Analog upsets that. Each machine becomes unique. Different runs on the same machine won't even produce the same result. The machines will be more like biological systems, each with unique tendencies, traits and behaviors.

    Starting Score:    1  point
    Moderation   +2  
       Underrated=2, Disagree=1, Total=3
    Extra 'Disagree' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4  
  • (Score: 2) by crafoo on Thursday June 21 2018, @03:35AM (11 children)

    by crafoo (6639) on Thursday June 21 2018, @03:35AM (#695989)

    You might find this paper interesting:
    http://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf [www.itu.dk]

    rounding errors are random numbers injected into your digital calculations, and not identical between machines, or even between different runs on the same machine.

    • (Score: 2) by frojack on Thursday June 21 2018, @03:54AM (3 children)

      by frojack (1554) on Thursday June 21 2018, @03:54AM (#695997) Journal

      Yet, unless you are dealing in the noise, and doing computations N+6 places beyond the decimal point when your data is only accurate to N places, you just never find this stuff in actual use cases.

      --
      No, you are mistaken. I've always had this sig.
      • (Score: 4, Insightful) by c0lo on Thursday June 21 2018, @04:14AM (2 children)

        by c0lo (156) Subscriber Badge on Thursday June 21 2018, @04:14AM (#696007) Journal

        you just never find this stuff in actual use cases

        Never? Mate, even the simple use-case of a N-body problem with small N values like those launching a satellite towards an asteroid will take those into account and provide for a way of correcting the trajectory of that satellite.
        Weather simulations? Rife with accumulating rounding errors, need to take them in consideration (e.g. by running those simulations a good number of time and treating the obtained results statistically).

        (What's with you, the conservative people, that you are so in-love with absolutes? What's wrong with you, is it so hard to accept your human fallibility and the fact you can be wrong?)

        --
        https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
        • (Score: 1) by ChrisMaple on Thursday June 21 2018, @07:39PM (1 child)

          by ChrisMaple (6964) on Thursday June 21 2018, @07:39PM (#696372)

          On a digital computer, with the same inputs, using the same program, with care taken to assure the same initial state, if the results are not always the same then the computer is defective. Rounding errors are always the same: they are deterministic.

          • (Score: 2) by c0lo on Thursday June 21 2018, @10:54PM

            by c0lo (156) Subscriber Badge on Thursday June 21 2018, @10:54PM (#696445) Journal

            On a digital computer, with the same inputs, using the same program, with care taken to assure the same initial state, if the results are not always the same then the computer is defective.

            False. Multithreading will easily break the assertion above without the computer being defective.

            --
            https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
    • (Score: 3, Touché) by jmorris on Thursday June 21 2018, @04:33AM (6 children)

      by jmorris (4844) on Thursday June 21 2018, @04:33AM (#696023)

      Now try reading it.

      Once an algorithm is proven to be correct for IEEE arithmetic, it will work correctly on any machine supporting the IEEE standard.

      There is rounding error because floats have a fixed precision but it is known and repeatable. How many implementations have you actually written or dug around in the guts of and debugged? MC6809 and AVR for me. The oddball Microsoft Color BASIC format (8+32) for the MC6809 and single precision IEEE on AVR. If it isn't deterministic it is broken. Guess whose implementation had a rounding error in the first versions. Go ahead, guess.


      Yea, it was Microsoft. It was improperly retaining junk in the least bits when it put results back into the variable storage so numbers that would print the same or compare the same when converted to strings wouldn't compare as equal compared as numbers. Yea all floats can suffer that in extreme corner cases but this wasn't one of those, it was an implementation bug. Think it was Color BASIC 1.2 that finally squashed it.
      • (Score: 1, Insightful) by Anonymous Coward on Thursday June 21 2018, @05:22AM (5 children)

        by Anonymous Coward on Thursday June 21 2018, @05:22AM (#696038)

        there are many dynamical systems which are chaotic, and long enough DNS are not reproducible for them.
        this doesn't mean that the DNS are wrong, it just means that individual trajectories of the dynamical system cannot be reproduced exactly.
        other properties (statistical properties, stationary points, etc) can be computed within machine precision.
        in fact, the runs themselves are not reproducible, because we use threads and/or MPI, which do not guarantee reproducibility.

        • (Score: 2) by jmorris on Thursday June 21 2018, @07:04AM (4 children)

          by jmorris (4844) on Thursday June 21 2018, @07:04AM (#696073)

          I can't help it if you are having problems with multi-threaded and multi-processor code that doesn't synchronize correctly. Unless you are intentionally introducing real random numbers (and while some simulations indeed must, most prefer pseudo-random numbers for the repeatability across runs) then you should be getting 100% repeatable results. If you aren't, you need to debug some more. Math isn't generally supposed to be random. Digital computers aren't supposed to be random. The bits that aren't deterministic (cache effects, network, other I/O) are supposed to be abstracted away by the software to help you produce stable results, Unexplained little "random" variations will eventually bite yer ass at the wrong time.

          Of course you might also be one of those rare cases that had to made a deliberate decision to sacrifice correctness on the altar of performance. Sometimes that is the right solution; doesn't really matter how correct a weather forecast is if it takes two days to tell you tomorrow's weather.

          • (Score: 2) by c0lo on Thursday June 21 2018, @07:48AM

            by c0lo (156) Subscriber Badge on Thursday June 21 2018, @07:48AM (#696089) Journal

            I can't help it if you are having problems with multi-threaded and multi-processor code that doesn't synchronize correctly.

            Eh, what?
            Not all multithreading apps need to be synchronized, many still produce valid results accepting race conditions.

            --
            https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
          • (Score: 1) by anubi on Thursday June 21 2018, @07:53AM

            by anubi (2828) on Thursday June 21 2018, @07:53AM (#696092) Journal

            I get the idea they are only re-inventing the hybrid analog computer. Having the analog part do what analog is really good at, and having the digital do what the digital is really good at. The old-school type ( Like the EAI 580 I used at university ) was optimized for solving real-time input simultaneous differential equations. The continuous analog functions were done in analog, whereas the digital part controlled and monitored the whole shebang.

            Except, like everything else, things have improved by many orders of magnitude. Comparing what they have to my EAI-580 would probably be akin to comparing an 8080 to a modern hex-core CPU.

            --
            "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
          • (Score: 1, Informative) by Anonymous Coward on Thursday June 21 2018, @08:08AM (1 child)

            by Anonymous Coward on Thursday June 21 2018, @08:08AM (#696096)

            you're trolling, right? did you ever hear of the Lorenz system? here you go: https://en.wikipedia.org/wiki/Lorenz_system [wikipedia.org]

            numerical analysis says that it's best to use random round-off errors when integrating a generic dynamical system (otherwise you get systematic differences from the original dynamical system).
            that means that every time you integrate N steps of the Lorenz system, you get a random sequence of N round-off errors applied to the calculations.
            when N becomes large enough, those errors build up into something big enough to completely change the trajectory, because the Lorenz system is chaotic.

            for weather and similar fluid dynamics problems, this only gets worse.

            • (Score: 2) by jmorris on Thursday June 21 2018, @08:15AM

              by jmorris (4844) on Thursday June 21 2018, @08:15AM (#696100)

              As I said, in most such cases you want pseudo-random numbers instead of real crypto quality random for those purposes so you still have repeatability. Then you can run the same program again with the same inputs and get consistent outputs, which makes testing a lot easier.

  • (Score: 2) by c0lo on Thursday June 21 2018, @07:45AM (2 children)

    by c0lo (156) Subscriber Badge on Thursday June 21 2018, @07:45AM (#696088) Journal

    Digital is reproducible. Build one system, calculate a result and if another installation of similar hardware runs the same data through the same software it should obtain the same result.

    Yes, it's deterministic. No, it's not necessary exactly reproducible.

    Examples in computing: write a reasonable high multithreaded app doing different operations on a shared set of IEEE float data, computations that would theoretically be commutative. Run it twice and I guarantee you that you will find difference due to the rounding errors accumulating differently, caused by the differences the scheduler activates the threads.

    --
    https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
    • (Score: 1, Insightful) by Anonymous Coward on Thursday June 21 2018, @08:12AM (1 child)

      by Anonymous Coward on Thursday June 21 2018, @08:12AM (#696098)

      IEEE floating point operations are not commutative (not even theoretically).
      I guess what you want to say is that non-specialists would confuse IEEE floating point numbers with real numbers (for which addition and multiplication ARE commutative), and then be surprised at the difference in the results.
      sorry if this seems nitpicky, but we should be pedantic when explaining confusing things (otherwise the confusion remains).

      • (Score: 2) by c0lo on Thursday June 21 2018, @08:27AM

        by c0lo (156) Subscriber Badge on Thursday June 21 2018, @08:27AM (#696103) Journal

        computations that would theoretically be commutative

        I should have said:

        write a reasonable high multithreaded app doing different operations, theoretically commutative, using on a shared set of IEEE float data

        ---

        IEEE floating point operations are not commutative

        Exactly.

        --
        https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
  • (Score: 1) by shrewdsheep on Thursday June 21 2018, @07:52AM

    by shrewdsheep (5215) on Thursday June 21 2018, @07:52AM (#696091)

    The deep learning applications we are talking about are actually randomized algorithms. They *depend* on randomness, for example, to create a random starting positions to break symmetries in the network architecture. Also, the optimization process itself profits from randomization. Certainly, the whole process can be made reproducible by using pseudo-randomness with fixed seeds, however, this seems to miss the point. Deep networks will differ when run with different starting seeds but will be mostly identical when it comes to performance, e.g. classification accuracy. Also they usually run on single-precision. This does not matter as the higher layer of the network can compensate for errors made below, among them rounding errors.
    If we talk about the network weights (or parameters), analog computations could actually make the learning process *more* reproducible as the rounding errors are taken out of the equation and there could be fewer points of convergence. This would of course only make sense if the parameters could be extracted in digital form from the analog network, which would seem to be challenging but not impossible.