Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Thursday October 26 2017, @06:01AM   Printer-friendly
from the quite-a-bit-better dept.

Researchers at MIT, Intel, and ETH Zurich have improved on-package DRAM performance by 33-50% using a new cache management scheme that they call Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation (Banshee):

The researchers developed a new data management scheme relying on a hash function they developed to reduce the metadata burden. Yu and his colleagues' new system, dubbed Banshee, adds three bits of data to each entry in the table. One bit indicates whether the data at that virtual address can be found in the DRAM cache, and the other two indicate its location relative to any other data items with the same hash index.

"In the entry, you need to have the physical address, you need to have the virtual address, and you have some other data," Yu says. "That's already almost 100 bits. So three extra bits is a pretty small overhead."

There's one problem with this approach that Banshee also has to address. If one of a chip's cores pulls a data item into the DRAM cache, the other cores won't know about it. Sending messages to all of a chip's cores every time any one of them updates the cache consumes a good deal of time and bandwidth. So Banshee introduces another small circuit, called a tag buffer, where any given core can record the new location of a data item it caches.

Also at MIT.

Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation (arXiv)


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 2) by MichaelDavidCrawford on Thursday October 26 2017, @06:20AM (4 children)

    by MichaelDavidCrawford (2339) Subscriber Badge <mdcrawford@gmail.com> on Thursday October 26 2017, @06:20AM (#587724) Homepage Journal

    For a while I was puzzling over ways to reduce energy consumption through refactoring while calculating the same result.

    My colleagues over at Kuro5hin argued this is impossible - but I have a Physics degree and they don't. "Can you tell me why your box has to be plugged into the wall? It's not just to heat your home."

    Some of those refactoring could be very small and localized, while achieving widespread benefit. For example there are many ways to improve on QuickSort. The glibc sort() was written on a SPARC workstation back around 1990. It does not use parallelism in any way.

    What that means is that if you invest in a lot of cores for your personal box, most of those cores sit idle because they're running thirty year-old code.

    --
    Yes I Have No Bananas. [gofundme.com]
    • (Score: 1, Interesting) by Anonymous Coward on Thursday October 26 2017, @07:39AM (2 children)

      by Anonymous Coward on Thursday October 26 2017, @07:39AM (#587737)

      My "box" isn't box-shaped and it doesn't need to be plugged into the wall. We don't use 15-year-old NetBurst Pentium 4 spaceheater technology anymore. NetBurst is a great example of dead-end abandoned hardware design since Intel subsequently "refactored" the good old P6 into the modern chips we use today.

      Simple refactoring of high-level code to calculate the same result is a waste of effort. Instead you should improve the underlying algorithms to reduce computational complexity. Express algorithms in high-level code and let the compiler generate low-level code appropriate to allocate hardware resources.

      Get a Cannon Lake [soylentnews.org] processor for your "rig" and watch GCC optimize 30-year-old code to use 512-bit vector registers.

      • (Score: 0) by Anonymous Coward on Thursday October 26 2017, @08:11AM (1 child)

        by Anonymous Coward on Thursday October 26 2017, @08:11AM (#587740)

        They have some extra bits to make memory lookups faster.

        I hope nobody gets 'special' patents on this stuff, since it appears to be basic engineering rather than a novel new device, concept, design, or feature.

        • (Score: 2) by c0lo on Thursday October 26 2017, @10:25AM

          by c0lo (156) Subscriber Badge on Thursday October 26 2017, @10:25AM (#587748) Journal

          I hope nobody gets 'special' patents on this stuff, since it appears to be basic engineering rather than a novel new device, concept, design, or feature.

          Guess where the control for that "tag buffer" will be implemented? ...No? ... Really?...
          Here's the hint: it's a place between all those cores on the chip; that place which, if you disable the use of it [soylentnews.org], your CPU cache loses all its "memory of what is stored where".

          So... are you still worried about patents now?

          --
          https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
    • (Score: 2) by DannyB on Thursday October 26 2017, @01:32PM

      by DannyB (5839) Subscriber Badge on Thursday October 26 2017, @01:32PM (#587791) Journal

      I suspect you are thinking too close to the bits and bytes and clock cycles. The thinking of C and C++ programmers. Don't get me wrong, this definitely has a place. Especially in microcontrollers and OSes. Even device drivers.

      Most programming in the world has to do with solving human domain problems. Boring things like accounting. Business applications specialized for every conceivable use from software for cabinet makers, lawyers offices, doctors offices, quick oil change shops, libraries, and endless other custom applications. Or exotic applications like weather forecasting. Satellite tracking.

      In about 1990 I believed that in the future most languages would have garbage collection. And now here we are today.

      At present, the JVM (Java Virtual Machine) stands as an example of an industrial strength runtime platform for multiple compiled languages. GC baked in. Portable platform neutral intermediate compiled code. JIT with both a C1 and C2 compiler. (Fast or Optimizing) Continuous dynamic profiling, even in compiled code. Automatic detection of when a routine becomes "hot" enough to compile with C1 and later with C2 into direct machine code. Choice of GC algorithms, and plenty of GC tuning knobs and dials. Graphical monitoring tools that let you look into the runtime system and measure things. By "industrial strength" I mean you can have heaps of dozens or in some cases hundreds of gigabytes (not megabytes) and still have GC pauses of 10 ms. But most other dynamic languages (Python, Node, etc) can't do that yet.

      What we need is a platform like LLVM that is extended with more of the capabilities in JVM. And then optimize hardware to that. Since GC languages are pretty much the norm for the software that makes the world go round, how about we start optimizing processor architectures and memory architectures around GC. What about some of the ideas of ancient times past (aka, 1980's) like tagged memory? If you optimize for GC and have a standard runtime platform that provides this, maybe along with all the other features of the JVM runtime, we might have a standard, portable runtime platform for all modern languages.

      There are two invisible benefits of this. Two things the JVM and Microsoft's CLR has that most people, even those who use it, don't see.
      1. GC greases the wheels of library compatibility. While C++ may have lots of libraries, they are not all inter compatible. One of the biggest impedance mismatches is different memory management disciplines. Who owns what? Who is responsible for disposing of what? With GC, not only is the programmer freed from the burden of memory management (like an automatic transmission) but nobody cares about who is responsible to dispose of something. If Joe's library returns a data structure, it will get disposed when nothing references it any longer.
      2. By having a standard type system, including primitives, objects and arrays, the barriers to using multiple programming languages, and libraries written in multiple languages, is torn down. Within a single running system, you might use libraries written in different programming languages, and everything is safely passed back and forth with standard types -- and nobody cares about memory management disciplines.

      While I believe this is the way of the future. Something like the JVM, and probably LLVM based, I don't seem to see much thought given at the low levels, and the hardware level on how to optimize this.

      --
      The lower I set my standards the more accomplishments I have.
  • (Score: 2) by DannyB on Thursday October 26 2017, @01:07PM

    by DannyB (5839) Subscriber Badge on Thursday October 26 2017, @01:07PM (#587783) Journal

    BYTE 1978-July, pg 42
    Conversation overheard in local computer store:
    Customer: What's the difference between static and dynamic memory?
    Salesman: Static memory works, and dynamic memory doesn't.

    This was in a time when the biggest problem computers had was malcontent chips wiggling their way out of their sockets, and you had to push the miscreant chips back down into their sockets.

    --
    The lower I set my standards the more accomplishments I have.
(1)