Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Thursday June 21 2018, @12:46AM   Printer-friendly
from the approaching-the-singularity dept.

IBM researchers use analog memory to train deep neural networks faster and more efficiently

Deep neural networks normally require fast, powerful graphical processing unit (GPU) hardware accelerators to support the needed high speed and computational accuracy — such as the GPU devices used in the just-announced Summit supercomputer. But GPUs are highly energy-intensive, making their use expensive and limiting their future growth, the researchers explain in a recent paper published in Nature.

Instead, the IBM researchers used large arrays of non-volatile analog memory devices (which use continuously variable signals rather than binary 0s and 1s) to perform computations. Those arrays allowed the researchers to create, in hardware, the same scale and precision of AI calculations that are achieved by more energy-intensive systems in software, but running hundreds of times faster and at hundreds of times lower power — without sacrificing the ability to create deep learning systems.

The trick was to replace conventional von Neumann architecture, which is "constrained by the time and energy spent moving data back and forth between the memory and the processor (the 'von Neumann bottleneck')," the researchers explain in the paper. "By contrast, in a non-von Neumann scheme, computing is done at the location of the data [in memory], with the strengths of the synaptic connections (the 'weights') stored and adjusted directly in memory.

Equivalent-accuracy accelerated neural-network training using analogue memory (DOI: 10.1038/s41586-018-0180-5) (DX)


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0) by Anonymous Coward on Thursday June 21 2018, @01:08AM (1 child)

    by Anonymous Coward on Thursday June 21 2018, @01:08AM (#695906)

    FTFY

  • (Score: 3, Informative) by Uncle_Al on Thursday June 21 2018, @01:23AM (14 children)

    by Uncle_Al (1108) on Thursday June 21 2018, @01:23AM (#695911)

    woot!

    • (Score: 2) by takyon on Thursday June 21 2018, @01:24AM (11 children)

      by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Thursday June 21 2018, @01:24AM (#695913) Journal

      The Bot uprising is going to be so steampunk.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 2) by RS3 on Thursday June 21 2018, @01:56AM (10 children)

        by RS3 (6367) on Thursday June 21 2018, @01:56AM (#695931)

        Everyone has been led to believe that digital is the end-all. We will be owned by analog bots.

        • (Score: 2) by bob_super on Thursday June 21 2018, @02:12AM

          by bob_super (1357) on Thursday June 21 2018, @02:12AM (#695944)

          We're currently owned by organic bots, after all.

        • (Score: 0) by Anonymous Coward on Thursday June 21 2018, @02:53AM (8 children)

          by Anonymous Coward on Thursday June 21 2018, @02:53AM (#695967)

          Unless you have a way of digitizing molecular structures then duh.

          • (Score: 2) by c0lo on Thursday June 21 2018, @04:03AM (6 children)

            by c0lo (156) Subscriber Badge on Thursday June 21 2018, @04:03AM (#696000) Journal

            Unless you have a way of digitizing molecular structures then duh.

            Given the discrete QM nature of the molecular structures, we are already there.
            A pity the same QM nature doesn't (yet) allow us to control the so many discrete states those structures can exhibit.

            --
            https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
            • (Score: 2) by RS3 on Thursday June 21 2018, @04:07AM (1 child)

              by RS3 (6367) on Thursday June 21 2018, @04:07AM (#696002)

              D'oh! You beat me by 2 minutes.

              • (Score: 2) by c0lo on Thursday June 21 2018, @04:30AM

                by c0lo (156) Subscriber Badge on Thursday June 21 2018, @04:30AM (#696020) Journal

                Sorry, I just didn't realize there was a competition going.

                --
                https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
            • (Score: 2) by RS3 on Thursday June 21 2018, @04:22AM (3 children)

              by RS3 (6367) on Thursday June 21 2018, @04:22AM (#696014)

              A pity the same QM nature doesn't (yet) allow us to control the so many discrete states those structures can exhibit.

              I'm curious what you mean. I believe many things function by stimulating energy states, such as pretty much anything that gives off photons / light including LASERs, magnetrons, klystrons, TWAT, vacuum tubes (electronic "valves"), X-ray, etc. So I'm guessing you mean some more advanced thing?

              • (Score: 2) by c0lo on Thursday June 21 2018, @04:36AM (2 children)

                by c0lo (156) Subscriber Badge on Thursday June 21 2018, @04:36AM (#696025) Journal

                In all examples, you take advantages of a large set of molecular structures so that you apply sorta "macroscopic" filter over the output and discard the rest of the output. This doesn't mean control, it just mean "selection".

                We aren't in the technological position to control the states of a molecular structures at individual levels - and Heisenberg warned us again thinking that we'll ever be able.

                --
                https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
                • (Score: 2) by RS3 on Thursday June 21 2018, @06:14AM (1 child)

                  by RS3 (6367) on Thursday June 21 2018, @06:14AM (#696056)

                  Okay, thanks. But aren't narrow spectrum light sources doing this?

                  But generally your answer is what I envisioned you were alluding to, at the most advanced concept. So one electric field exciting wire per atom. Or maybe a very focused electron beam. Hey, don't scanning tunneling electron microscopes approach that?

                  So what would be the purpose, output, whatever, of doing it? New molecules and compounds? Superconductivity? Something relating to nuclear fusion? Gene editing? Medicines? Tricorders? All of the above and more?

                  • (Score: 2) by c0lo on Thursday June 21 2018, @06:45AM

                    by c0lo (156) Subscriber Badge on Thursday June 21 2018, @06:45AM (#696066) Journal

                    Hey, don't scanning tunneling electron microscopes approach that?

                    Measuring != controlling

                    So what would be the purpose, output, whatever, of doing it?

                    CPU at Ångström-unit scale? :)

                    --
                    https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
          • (Score: 2) by RS3 on Thursday June 21 2018, @04:05AM

            by RS3 (6367) on Thursday June 21 2018, @04:05AM (#696001)

            Pretty sure molecular structures are held together by bonds that obey laws described by quantum mechanics. But I'm not a chemist nor physicist. Well, a little.

    • (Score: 1, Informative) by Anonymous Coward on Thursday June 21 2018, @02:37AM (1 child)

      by Anonymous Coward on Thursday June 21 2018, @02:37AM (#695961)

      Weren't Perceptrons analog?

      [googles]

      Here's one paper from 2013 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574673/ [nih.gov]
      > This study examines an analog circuit comprising a multilayer perceptron neural network (MLPNN). This study proposes a low-power and small-area analog MLP circuit to implement in an E-nose as a classifier, such that the E-nose would be relatively small, power-efficient, and portable. The analog MLP circuit had only four input neurons, four hidden neurons, and one output neuron. The circuit was designed and fabricated using a 0.18 μm standard CMOS process with a 1.8 V supply. The power consumption was 0.553 mW, and the area was approximately 1.36 × 1.36 mm2. The chip measurements showed that this MLPNN successfully identified the fruit odors of bananas, lemons, and lychees with 91.7% accuracy.

      Marvin Minsky debunked Perceptrons with his famous book of the same name, but now it's looking more like the only problem with the early attempts was that they weren't big enough and/or deep enough.

      • (Score: 2) by fritsd on Thursday June 21 2018, @04:56PM

        by fritsd (4586) on Thursday June 21 2018, @04:56PM (#696286) Journal

        Minsky and Papert spotted that, if you use a linear activation function, then whatever depth of Perceptron can be re-written as a simple multivariate linear equation.

        It's a pity that many people then gave up on Perceptrons until Rumelhart and McClelland (IIRC) rekindled the interest a long time after.

  • (Score: 4, Disagree) by jmorris on Thursday June 21 2018, @03:04AM (16 children)

    by jmorris (4844) on Thursday June 21 2018, @03:04AM (#695971)

    Digital is reproducible. Build one system, calculate a result and if another installation of similar hardware runs the same data through the same software it should obtain the same result. Analog upsets that. Each machine becomes unique. Different runs on the same machine won't even produce the same result. The machines will be more like biological systems, each with unique tendencies, traits and behaviors.

    • (Score: 2) by crafoo on Thursday June 21 2018, @03:35AM (11 children)

      by crafoo (6639) on Thursday June 21 2018, @03:35AM (#695989)

      You might find this paper interesting:
      http://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf [www.itu.dk]

      rounding errors are random numbers injected into your digital calculations, and not identical between machines, or even between different runs on the same machine.

      • (Score: 2) by frojack on Thursday June 21 2018, @03:54AM (3 children)

        by frojack (1554) on Thursday June 21 2018, @03:54AM (#695997) Journal

        Yet, unless you are dealing in the noise, and doing computations N+6 places beyond the decimal point when your data is only accurate to N places, you just never find this stuff in actual use cases.

        --
        No, you are mistaken. I've always had this sig.
        • (Score: 4, Insightful) by c0lo on Thursday June 21 2018, @04:14AM (2 children)

          by c0lo (156) Subscriber Badge on Thursday June 21 2018, @04:14AM (#696007) Journal

          you just never find this stuff in actual use cases

          Never? Mate, even the simple use-case of a N-body problem with small N values like those launching a satellite towards an asteroid will take those into account and provide for a way of correcting the trajectory of that satellite.
          Weather simulations? Rife with accumulating rounding errors, need to take them in consideration (e.g. by running those simulations a good number of time and treating the obtained results statistically).

          (What's with you, the conservative people, that you are so in-love with absolutes? What's wrong with you, is it so hard to accept your human fallibility and the fact you can be wrong?)

          --
          https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
          • (Score: 1) by ChrisMaple on Thursday June 21 2018, @07:39PM (1 child)

            by ChrisMaple (6964) on Thursday June 21 2018, @07:39PM (#696372)

            On a digital computer, with the same inputs, using the same program, with care taken to assure the same initial state, if the results are not always the same then the computer is defective. Rounding errors are always the same: they are deterministic.

            • (Score: 2) by c0lo on Thursday June 21 2018, @10:54PM

              by c0lo (156) Subscriber Badge on Thursday June 21 2018, @10:54PM (#696445) Journal

              On a digital computer, with the same inputs, using the same program, with care taken to assure the same initial state, if the results are not always the same then the computer is defective.

              False. Multithreading will easily break the assertion above without the computer being defective.

              --
              https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
      • (Score: 3, Touché) by jmorris on Thursday June 21 2018, @04:33AM (6 children)

        by jmorris (4844) on Thursday June 21 2018, @04:33AM (#696023)

        Now try reading it.

        Once an algorithm is proven to be correct for IEEE arithmetic, it will work correctly on any machine supporting the IEEE standard.

        There is rounding error because floats have a fixed precision but it is known and repeatable. How many implementations have you actually written or dug around in the guts of and debugged? MC6809 and AVR for me. The oddball Microsoft Color BASIC format (8+32) for the MC6809 and single precision IEEE on AVR. If it isn't deterministic it is broken. Guess whose implementation had a rounding error in the first versions. Go ahead, guess.


        Yea, it was Microsoft. It was improperly retaining junk in the least bits when it put results back into the variable storage so numbers that would print the same or compare the same when converted to strings wouldn't compare as equal compared as numbers. Yea all floats can suffer that in extreme corner cases but this wasn't one of those, it was an implementation bug. Think it was Color BASIC 1.2 that finally squashed it.
        • (Score: 1, Insightful) by Anonymous Coward on Thursday June 21 2018, @05:22AM (5 children)

          by Anonymous Coward on Thursday June 21 2018, @05:22AM (#696038)

          there are many dynamical systems which are chaotic, and long enough DNS are not reproducible for them.
          this doesn't mean that the DNS are wrong, it just means that individual trajectories of the dynamical system cannot be reproduced exactly.
          other properties (statistical properties, stationary points, etc) can be computed within machine precision.
          in fact, the runs themselves are not reproducible, because we use threads and/or MPI, which do not guarantee reproducibility.

          • (Score: 2) by jmorris on Thursday June 21 2018, @07:04AM (4 children)

            by jmorris (4844) on Thursday June 21 2018, @07:04AM (#696073)

            I can't help it if you are having problems with multi-threaded and multi-processor code that doesn't synchronize correctly. Unless you are intentionally introducing real random numbers (and while some simulations indeed must, most prefer pseudo-random numbers for the repeatability across runs) then you should be getting 100% repeatable results. If you aren't, you need to debug some more. Math isn't generally supposed to be random. Digital computers aren't supposed to be random. The bits that aren't deterministic (cache effects, network, other I/O) are supposed to be abstracted away by the software to help you produce stable results, Unexplained little "random" variations will eventually bite yer ass at the wrong time.

            Of course you might also be one of those rare cases that had to made a deliberate decision to sacrifice correctness on the altar of performance. Sometimes that is the right solution; doesn't really matter how correct a weather forecast is if it takes two days to tell you tomorrow's weather.

            • (Score: 2) by c0lo on Thursday June 21 2018, @07:48AM

              by c0lo (156) Subscriber Badge on Thursday June 21 2018, @07:48AM (#696089) Journal

              I can't help it if you are having problems with multi-threaded and multi-processor code that doesn't synchronize correctly.

              Eh, what?
              Not all multithreading apps need to be synchronized, many still produce valid results accepting race conditions.

              --
              https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
            • (Score: 1) by anubi on Thursday June 21 2018, @07:53AM

              by anubi (2828) on Thursday June 21 2018, @07:53AM (#696092) Journal

              I get the idea they are only re-inventing the hybrid analog computer. Having the analog part do what analog is really good at, and having the digital do what the digital is really good at. The old-school type ( Like the EAI 580 I used at university ) was optimized for solving real-time input simultaneous differential equations. The continuous analog functions were done in analog, whereas the digital part controlled and monitored the whole shebang.

              Except, like everything else, things have improved by many orders of magnitude. Comparing what they have to my EAI-580 would probably be akin to comparing an 8080 to a modern hex-core CPU.

              --
              "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
            • (Score: 1, Informative) by Anonymous Coward on Thursday June 21 2018, @08:08AM (1 child)

              by Anonymous Coward on Thursday June 21 2018, @08:08AM (#696096)

              you're trolling, right? did you ever hear of the Lorenz system? here you go: https://en.wikipedia.org/wiki/Lorenz_system [wikipedia.org]

              numerical analysis says that it's best to use random round-off errors when integrating a generic dynamical system (otherwise you get systematic differences from the original dynamical system).
              that means that every time you integrate N steps of the Lorenz system, you get a random sequence of N round-off errors applied to the calculations.
              when N becomes large enough, those errors build up into something big enough to completely change the trajectory, because the Lorenz system is chaotic.

              for weather and similar fluid dynamics problems, this only gets worse.

              • (Score: 2) by jmorris on Thursday June 21 2018, @08:15AM

                by jmorris (4844) on Thursday June 21 2018, @08:15AM (#696100)

                As I said, in most such cases you want pseudo-random numbers instead of real crypto quality random for those purposes so you still have repeatability. Then you can run the same program again with the same inputs and get consistent outputs, which makes testing a lot easier.

    • (Score: 2) by c0lo on Thursday June 21 2018, @07:45AM (2 children)

      by c0lo (156) Subscriber Badge on Thursday June 21 2018, @07:45AM (#696088) Journal

      Digital is reproducible. Build one system, calculate a result and if another installation of similar hardware runs the same data through the same software it should obtain the same result.

      Yes, it's deterministic. No, it's not necessary exactly reproducible.

      Examples in computing: write a reasonable high multithreaded app doing different operations on a shared set of IEEE float data, computations that would theoretically be commutative. Run it twice and I guarantee you that you will find difference due to the rounding errors accumulating differently, caused by the differences the scheduler activates the threads.

      --
      https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
      • (Score: 1, Insightful) by Anonymous Coward on Thursday June 21 2018, @08:12AM (1 child)

        by Anonymous Coward on Thursday June 21 2018, @08:12AM (#696098)

        IEEE floating point operations are not commutative (not even theoretically).
        I guess what you want to say is that non-specialists would confuse IEEE floating point numbers with real numbers (for which addition and multiplication ARE commutative), and then be surprised at the difference in the results.
        sorry if this seems nitpicky, but we should be pedantic when explaining confusing things (otherwise the confusion remains).

        • (Score: 2) by c0lo on Thursday June 21 2018, @08:27AM

          by c0lo (156) Subscriber Badge on Thursday June 21 2018, @08:27AM (#696103) Journal

          computations that would theoretically be commutative

          I should have said:

          write a reasonable high multithreaded app doing different operations, theoretically commutative, using on a shared set of IEEE float data

          ---

          IEEE floating point operations are not commutative

          Exactly.

          --
          https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
    • (Score: 1) by shrewdsheep on Thursday June 21 2018, @07:52AM

      by shrewdsheep (5215) on Thursday June 21 2018, @07:52AM (#696091)

      The deep learning applications we are talking about are actually randomized algorithms. They *depend* on randomness, for example, to create a random starting positions to break symmetries in the network architecture. Also, the optimization process itself profits from randomization. Certainly, the whole process can be made reproducible by using pseudo-randomness with fixed seeds, however, this seems to miss the point. Deep networks will differ when run with different starting seeds but will be mostly identical when it comes to performance, e.g. classification accuracy. Also they usually run on single-precision. This does not matter as the higher layer of the network can compensate for errors made below, among them rounding errors.
      If we talk about the network weights (or parameters), analog computations could actually make the learning process *more* reproducible as the rounding errors are taken out of the equation and there could be fewer points of convergence. This would of course only make sense if the parameters could be extracted in digital form from the analog network, which would seem to be challenging but not impossible.

  • (Score: 2) by requerdanos on Thursday June 21 2018, @03:09AM (2 children)

    by requerdanos (5997) Subscriber Badge on Thursday June 21 2018, @03:09AM (#695973) Journal

    non-volatile analog memory devices (which use continuously variable signals rather than binary 0s and 1s)

    Surely that should say that they use signals that vary by integer multiples of Planck's constant?

    • (Score: 0) by Anonymous Coward on Thursday June 21 2018, @04:19AM

      by Anonymous Coward on Thursday June 21 2018, @04:19AM (#696008)

      > ...they use signals that vary by integer multiples of Planck's constant

      Sure, have it your way.

      If you look, I think you will see that they use much larger step sizes than this. TFA isn't talking about "continuous" or "linear" analog computation. It's more likely 8 (or more??) voltage levels in each "memory/computation cell" where a traditional digital circuit has just two for "on" and "off".

    • (Score: 2) by RS3 on Thursday June 21 2018, @06:17AM

      by RS3 (6367) on Thursday June 21 2018, @06:17AM (#696058)

      Surely that should say that they use signals that vary by integer multiples of Planck's constant?

      How will you ever know?

  • (Score: 1) by ChrisMaple on Thursday June 21 2018, @07:47PM

    by ChrisMaple (6964) on Thursday June 21 2018, @07:47PM (#696377)

    Using non-von Neumann as the headline description of IBM's device is almost irrelevant. A Harvard architecture is non-von Neumann, so what?

(1)