Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 19 submissions in the queue.
posted by martyb on Thursday April 24 2014, @06:33PM   Printer-friendly
from the page-processing-patch-provides-performance-plus-power-perks dept.

Mark D. Hill and his peers at the University of Wisconsin-Madison have been analyzing computing systems, trying to look for delays in the architecture and the interfaces between them.

Through careful analysis, Hill uncovers inefficiencies, sometimes major ones, in the workflows by which computers operate. Recently, he investigated inefficiencies in the way that computers implement virtual memory and determined that these operations can waste up to 50 percent of a computer's execution cycles.

The inefficiencies he found were due to the way computers had evolved over time. Memory had grown a million times bigger since the 1980s, but the way it was used had barely changed at all. A legacy method called paging, that was created when memory was far smaller, was preventing processors from achieving their peak potential.

Hill designed a solution(pdf) that uses paging selectively, adopting a simpler address translation method for key parts of important applications. This reduced the problem, bringing cache misses down to less than 1 percent. In the age of the nanosecond, fixing such inefficiencies pays dividends. For instance, with such a fix in place, Facebook could buy far fewer computers to do the same workload, saving millions.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Insightful) by maxwell demon on Thursday April 24 2014, @07:20PM

    by maxwell demon (1608) on Thursday April 24 2014, @07:20PM (#35732) Journal

    The paper is about GPU memory access. I guess most of Facebook's computing still runs on CPUs, therefore I strongly doubt it would help them much.

    --
    The Tao of math: The numbers you can count are not the real numbers.
    Starting Score:    1  point
    Moderation   +2  
       Insightful=2, Total=2
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4  
  • (Score: 5, Informative) by Angry Jesus on Thursday April 24 2014, @08:33PM

    by Angry Jesus (182) on Thursday April 24 2014, @08:33PM (#35773)

    > The paper is about GPU memory access.

    GPU access to ram shared with the host CPU (rather than private to the GPU).

    FWIW, I expected that using large pages (e.g. 4MB rather than the traditional 4KB) would be very effective in reducing the GPU's TLB miss rate. The paper acknowledges it up front, they say that with their benchmarks they got basically the same improvement by just using 2MB pages. But they think that requiring the app developer to use large pages would be burdensome. Based on my own HPC experience, large pages are very easy to use with little downside. Typically all you have to do is make sure to allocate the memory in a contiguous block that is correctly aligned. It is a few extra lines of code up front and then you can basically forget about it afterwards, it just works. So I'm thinking that while this paper is interesting, it will probably not have much of an impact.