Stories
Slash Boxes
Comments

SoylentNews is people

posted by CoolHand on Thursday May 21 2015, @11:17AM   Printer-friendly
from the wishing-our-memory-was-high-bandwidth dept.

Advanced Micro Devices (AMD) has shared more details about the High Bandwidth Memory (HBM) in its upcoming GPUs.

HBM in a nutshell takes the wide & slow paradigm to its fullest. Rather than building an array of high speed chips around an ASIC to deliver 7Gbps+ per pin over a 256/384/512-bit memory bus, HBM at its most basic level involves turning memory clockspeeds way down – to just 1Gbps per pin – but in exchange making the memory bus much wider. How wide? That depends on the implementation and generation of the specification, but the examples AMD has been showcasing so far have involved 4 HBM devices (stacks), each featuring a 1024-bit wide memory bus, combining for a massive 4096-bit memory bus. It may not be clocked high, but when it's that wide, it doesn't need to be.

AMD will be the only manufacturer using the first generation of HBM, and will be joined by NVIDIA in using the second generation in 2016. HBM2 will double memory bandwidth over HBM1. The benefits of HBM include increased total bandwidth (from 320 GB/s for the R9 290X to 512 GB/s in AMD's "theoretical" 4-stack example) and reduced power consumption. Although HBM1's memory bandwidth per watt is tripled compared to GDDR5, the memory in AMD's example uses a little less than half the power (30 W for the R9 290X down to 14.6 W) due to the increased bandwidth. HBM stacks will also use 5-10% as much area of the GPU to provide the same amount of memory that GDDR5 would. That could potentially halve the size of the GPU:

By AMD's own estimate, a single HBM-equipped GPU package would be less than 70mm × 70mm (4900mm2), versus 110mm × 90mm (9900mm2) for R9 290X.

HBM will likely be featured in high-performance computing GPUs as well as accelerated processing units (APUs). HotHardware reckons that Radeon 300-series GPUs featuring HBM will be released in June.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Funny) by FatPhil on Thursday May 21 2015, @12:11PM

    by FatPhil (863) <reversethis-{if.fdsa} {ta} {tnelyos-cp}> on Thursday May 21 2015, @12:11PM (#185998) Homepage
    In the 2000s, because of synchronisation issues wide busses were rejected and narrow fast ones were favoured.
    In the 2010s, narrow fast ones are being replaced by wide busses.

    Fuckit, I ain't buying any more PC stuff until the boffins work out which is actually better.
    --
    Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
    Starting Score:    1  point
    Moderation   +1  
       Funny=1, Total=1
    Extra 'Funny' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 2) by takyon on Thursday May 21 2015, @12:21PM

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Thursday May 21 2015, @12:21PM (#185999) Journal

    The way I see it HBM, which is TSV stacked, will replace GDDR5 and there will never be a GDDR6.

    Everything that can be stacked will be stacked. V-NAND will solve/delay NAND endurance issues for years. Eventually processors will be stacked.

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 4, Interesting) by bzipitidoo on Thursday May 21 2015, @12:57PM

      by bzipitidoo (4388) on Thursday May 21 2015, @12:57PM (#186005) Journal

      The big problem with stacking is heat. Would have happened years ago if not for that. But then. heat is a big problem everywhere in circuit design.

      Parallelism, going wide, is the way forward for now. Doubt we'll move up from 64bit any time soon. There was a compelling reason to move from 32bit, which is that it can address at most 4G of RAM. We're nowhere close to bumping up against the 64bit limit of nearly 2x10^19. Instead, we've been seeing the multi core CPUs. Parallel programming as originally envisioned at the source code level hasn't really happened, people aren't using programming languages explicitly designed for parallelism. Instead we're seeing it at arm's length, in libraries such as OpenCL. Parallelism is the reason Google gained such a competitive advantage. They did it better, made that MapReduce library. The hunt is still on for other places to apply more width.

      • (Score: 3, Informative) by takyon on Thursday May 21 2015, @02:20PM

        by takyon (881) <takyonNO@SPAMsoylentnews.org> on Thursday May 21 2015, @02:20PM (#186024) Journal

        http://gtresearchnews.gatech.edu/newsrelease/half-terahertz.htm [gatech.edu]

        The silicon-germanium heterojunction bipolar transistors built by the IBM-Georgia Tech team operated at frequencies above 500 GHz at 4.5 Kelvins (451 degrees below zero Fahrenheit) - a temperature attained using liquid helium cooling. At room temperature, these devices operated at approximately 350 GHz.

        Just get SiGe transistors and clock them way down.

        Well, it's not that simple, but it's a start.

        --
        [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 2) by Katastic on Thursday May 21 2015, @06:09PM

        by Katastic (3340) on Thursday May 21 2015, @06:09PM (#186130)

        I said the same thing on Slashdot a week or two ago and got zero up mods. Snarky bastards.

        The other issue is:

        >It may not be clocked high, but when it's that wide, it doesn't need to be.

        No, no, no, no and no. Latency does not scale with bus width. You can't get 9 women pregnant and expect a baby every month.

        • (Score: 0) by Anonymous Coward on Thursday May 21 2015, @08:53PM

          by Anonymous Coward on Thursday May 21 2015, @08:53PM (#186197)

          No, no, no, no and no. Latency does not scale with bus width. You can't get 9 women pregnant and expect a baby every month.

          Have you tried pipelining?

          • (Score: 2) by Katastic on Friday May 22 2015, @12:25AM

            by Katastic (3340) on Friday May 22 2015, @12:25AM (#186265)

            Pipelining by definition: at best does not change latency, and at worst, significantly increases latency. It cannot reduce latency.

            Fun history: It's 50% of the reason the Pentium 4 Netburst architecture was a complete failure and slower than the Pentium III's. They added a huge pipeline, with huge chances for stalls, but they thought they could hit 10 GHz with the P4 architecture so "it wouldn't matter."

            And then the 3-4 GHZ barrier happened...

            Pentium 4's were heating up faster than any of their models predicted. So the primary advantage of their new architecture couldn't be utilized. The smaller and smaller they manufactured things, new "problems" that could be disregarded before all a sudden become extremely important. Heat levels exploded exponentially.

    • (Score: 0) by Anonymous Coward on Thursday May 21 2015, @05:04PM

      by Anonymous Coward on Thursday May 21 2015, @05:04PM (#186089)

      > Eventually processors will be stacked.

      I think we can expect to see gigabytes of ram stacked on the cpus for high-bandwidth, low-latency access. Like a sort of L4 cache.

  • (Score: 3, Funny) by c0lo on Thursday May 21 2015, @12:59PM

    by c0lo (156) Subscriber Badge on Thursday May 21 2015, @12:59PM (#186006) Journal

    Fuckit, I ain't buying any more PC stuff until the boffins work out which is actually better.

    Better - wide fast busses.
    Best - CPU with 2^36 registers 128bit wide - ought to be enough for anybody.

    --
    https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
  • (Score: 4, Interesting) by bob_super on Thursday May 21 2015, @03:54PM

    by bob_super (1357) on Thursday May 21 2015, @03:54PM (#186051)

    The problem with wide bus on the PCB is the amount of space it takes, and the compexity of synch across the lanes.
    Serial is easier to route if you embed the clock, because skew doesn't matter. But latency takes a huge hit, which stinks for CPU random accesses (cache miss).

    The advantage of HBM is to combine the low latency of the wide bus (fill a cache lane all in the same cycle), lower power of not going down to the board, and simplicity of having the chip manufacturer guarantee the interface's signal integrity between dies on the non-socketed substrate. The main remaining problem is mem size, and total power.
    Current FPGA tech, which uses pretty big dies, can accommodate more than 20k connections between dies. By staying around a GHz, to avoid SERDES latency/power, you can get absolutely massive bandwidth without going off-chip.
    We just couldn't do that ten years ago. Packaging's Signal integrity and practical ball/pads count limits meant having to go serial and add power-hungry SERDES/EQ if you wanted low BER and more bandwidth.

  • (Score: 3, Interesting) by Hairyfeet on Thursday May 21 2015, @04:42PM

    by Hairyfeet (75) <bassbeast1968NO@SPAMgmail.com> on Thursday May 21 2015, @04:42PM (#186081) Journal

    Actually this is not surprising, as the 00s were all "speed is key" until first Intel and then everybody else smacked right into the thermal wall. Everybody remembers what a dog slow giant piece of shit the P4 Netburst arch was and most act like it was a "what were they thinking?" brainfart but it actually wasn't, the entire design was based on the "speed is key" mindset of the time and with that design 10GHz was the goal, in fact you could take the Pentium D 805 and OC that thing to Celeron 300A levels, we're talking from stock 2.8Ghz all the way to 4.2Ghz-4.5GHz with a $70 sealed liquid cooler. The downside of course was those things were cranking out so much heat it would have turned your PC room into a sauna so Intel realized it was a dead end design wise.

    the same thing is now happening to the GPU makers, the skinny that has been going around the GPU forums is that both AMD and Nvidia have had designs that would easily be 3-4 times as fast as what is on the market but using the current designs you'd need a 2Kw PSU to feed the things and a radiator the size of a Honda Civic just too keep the suckers cool. Both companies have realized that the "speed is key" design philosophy has reached a dead end because the faster you go the hotter you get, end of story. I wouldn't be surprised to see the same happen to pretty much every component in the PC as they all hit the thermal wall, because no matter the part they will reach a point where the speed simply doesn't justify the amount of heat being produced.

      If I had to hazard a guess I'd say the next part to go this route would probably be SSDs, as they have already reached the point they are having to slap the chips on PCIe cards just to get more speed than the last model, once they can't get anymore speed even with PCIe they will have no choice but to go back to the drawing board and find another way to make gains over previous gens.

    --
    ACs are never seen so don't bother. Always ready to show SJWs for the racists they are.
    • (Score: 2) by SlimmPickens on Thursday May 21 2015, @10:49PM

      by SlimmPickens (1056) on Thursday May 21 2015, @10:49PM (#186239)

      Isn't pcie thing just because it's a better bus for random access memory than SATA? Has nothing to do with a thermal wall.

      • (Score: 2) by Hairyfeet on Friday May 22 2015, @01:13AM

        by Hairyfeet (75) <bassbeast1968NO@SPAMgmail.com> on Friday May 22 2015, @01:13AM (#186272) Journal

        It doesn't change the laws of thermal dynamics and the faster you go? The hotter you get. If you have tried one of the latest PCIe SSDs you'll know they are already a lot warmer than the SATA versions and that is only on a 4x bus....what do you think the temps will reach at 8x? 16x? Eventually you WILL run into the thermal wall, its not an "if" its a "when".

        --
        ACs are never seen so don't bother. Always ready to show SJWs for the racists they are.