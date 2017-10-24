from the quite-a-bit-better dept.
Researchers at MIT, Intel, and ETH Zurich have improved on-package DRAM performance by 33-50% using a new cache management scheme that they call Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation (Banshee):
The researchers developed a new data management scheme relying on a hash function they developed to reduce the metadata burden. Yu and his colleagues' new system, dubbed Banshee, adds three bits of data to each entry in the table. One bit indicates whether the data at that virtual address can be found in the DRAM cache, and the other two indicate its location relative to any other data items with the same hash index.
"In the entry, you need to have the physical address, you need to have the virtual address, and you have some other data," Yu says. "That's already almost 100 bits. So three extra bits is a pretty small overhead."
There's one problem with this approach that Banshee also has to address. If one of a chip's cores pulls a data item into the DRAM cache, the other cores won't know about it. Sending messages to all of a chip's cores every time any one of them updates the cache consumes a good deal of time and bandwidth. So Banshee introduces another small circuit, called a tag buffer, where any given core can record the new location of a data item it caches.
Also at MIT.
Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation (arXiv)
(Score: 2) by MichaelDavidCrawford on Thursday October 26, @06:20AM
For a while I was puzzling over ways to reduce energy consumption through refactoring while calculating the same result.
My colleagues over at Kuro5hin argued this is impossible - but I have a Physics degree and they don't. "Can you tell me why your box has to be plugged into the wall? It's not just to heat your home."
Some of those refactoring could be very small and localized, while achieving widespread benefit. For example there are many ways to improve on QuickSort. The glibc sort() was written on a SPARC workstation back around 1990. It does not use parallelism in any way.
What that means is that if you invest in a lot of cores for your personal box, most of those cores sit idle because they're running thirty year-old code.
Donate To Soggy Jobs [soggy.jobs]
Reply to This