Stories
Slash Boxes
Comments

SoylentNews is people

posted by CoolHand on Thursday May 21 2015, @11:17AM   Printer-friendly
from the wishing-our-memory-was-high-bandwidth dept.

Advanced Micro Devices (AMD) has shared more details about the High Bandwidth Memory (HBM) in its upcoming GPUs.

HBM in a nutshell takes the wide & slow paradigm to its fullest. Rather than building an array of high speed chips around an ASIC to deliver 7Gbps+ per pin over a 256/384/512-bit memory bus, HBM at its most basic level involves turning memory clockspeeds way down – to just 1Gbps per pin – but in exchange making the memory bus much wider. How wide? That depends on the implementation and generation of the specification, but the examples AMD has been showcasing so far have involved 4 HBM devices (stacks), each featuring a 1024-bit wide memory bus, combining for a massive 4096-bit memory bus. It may not be clocked high, but when it's that wide, it doesn't need to be.

AMD will be the only manufacturer using the first generation of HBM, and will be joined by NVIDIA in using the second generation in 2016. HBM2 will double memory bandwidth over HBM1. The benefits of HBM include increased total bandwidth (from 320 GB/s for the R9 290X to 512 GB/s in AMD's "theoretical" 4-stack example) and reduced power consumption. Although HBM1's memory bandwidth per watt is tripled compared to GDDR5, the memory in AMD's example uses a little less than half the power (30 W for the R9 290X down to 14.6 W) due to the increased bandwidth. HBM stacks will also use 5-10% as much area of the GPU to provide the same amount of memory that GDDR5 would. That could potentially halve the size of the GPU:

By AMD's own estimate, a single HBM-equipped GPU package would be less than 70mm × 70mm (4900mm2), versus 110mm × 90mm (9900mm2) for R9 290X.

HBM will likely be featured in high-performance computing GPUs as well as accelerated processing units (APUs). HotHardware reckons that Radeon 300-series GPUs featuring HBM will be released in June.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by takyon on Thursday May 21 2015, @12:21PM

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Thursday May 21 2015, @12:21PM (#185999) Journal

    The way I see it HBM, which is TSV stacked, will replace GDDR5 and there will never be a GDDR6.

    Everything that can be stacked will be stacked. V-NAND will solve/delay NAND endurance issues for years. Eventually processors will be stacked.

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 4, Interesting) by bzipitidoo on Thursday May 21 2015, @12:57PM

    by bzipitidoo (4388) on Thursday May 21 2015, @12:57PM (#186005) Journal

    The big problem with stacking is heat. Would have happened years ago if not for that. But then. heat is a big problem everywhere in circuit design.

    Parallelism, going wide, is the way forward for now. Doubt we'll move up from 64bit any time soon. There was a compelling reason to move from 32bit, which is that it can address at most 4G of RAM. We're nowhere close to bumping up against the 64bit limit of nearly 2x10^19. Instead, we've been seeing the multi core CPUs. Parallel programming as originally envisioned at the source code level hasn't really happened, people aren't using programming languages explicitly designed for parallelism. Instead we're seeing it at arm's length, in libraries such as OpenCL. Parallelism is the reason Google gained such a competitive advantage. They did it better, made that MapReduce library. The hunt is still on for other places to apply more width.

    • (Score: 3, Informative) by takyon on Thursday May 21 2015, @02:20PM

      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Thursday May 21 2015, @02:20PM (#186024) Journal

      http://gtresearchnews.gatech.edu/newsrelease/half-terahertz.htm [gatech.edu]

      The silicon-germanium heterojunction bipolar transistors built by the IBM-Georgia Tech team operated at frequencies above 500 GHz at 4.5 Kelvins (451 degrees below zero Fahrenheit) - a temperature attained using liquid helium cooling. At room temperature, these devices operated at approximately 350 GHz.

      Just get SiGe transistors and clock them way down.

      Well, it's not that simple, but it's a start.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 2) by Katastic on Thursday May 21 2015, @06:09PM

      by Katastic (3340) on Thursday May 21 2015, @06:09PM (#186130)

      I said the same thing on Slashdot a week or two ago and got zero up mods. Snarky bastards.

      The other issue is:

      >It may not be clocked high, but when it's that wide, it doesn't need to be.

      No, no, no, no and no. Latency does not scale with bus width. You can't get 9 women pregnant and expect a baby every month.

      • (Score: 0) by Anonymous Coward on Thursday May 21 2015, @08:53PM

        by Anonymous Coward on Thursday May 21 2015, @08:53PM (#186197)

        No, no, no, no and no. Latency does not scale with bus width. You can't get 9 women pregnant and expect a baby every month.

        Have you tried pipelining?

        • (Score: 2) by Katastic on Friday May 22 2015, @12:25AM

          by Katastic (3340) on Friday May 22 2015, @12:25AM (#186265)

          Pipelining by definition: at best does not change latency, and at worst, significantly increases latency. It cannot reduce latency.

          Fun history: It's 50% of the reason the Pentium 4 Netburst architecture was a complete failure and slower than the Pentium III's. They added a huge pipeline, with huge chances for stalls, but they thought they could hit 10 GHz with the P4 architecture so "it wouldn't matter."

          And then the 3-4 GHZ barrier happened...

          Pentium 4's were heating up faster than any of their models predicted. So the primary advantage of their new architecture couldn't be utilized. The smaller and smaller they manufactured things, new "problems" that could be disregarded before all a sudden become extremely important. Heat levels exploded exponentially.

  • (Score: 0) by Anonymous Coward on Thursday May 21 2015, @05:04PM

    by Anonymous Coward on Thursday May 21 2015, @05:04PM (#186089)

    > Eventually processors will be stacked.

    I think we can expect to see gigabytes of ram stacked on the cpus for high-bandwidth, low-latency access. Like a sort of L4 cache.