Stories
Slash Boxes
Comments

SoylentNews is people

posted by CoolHand on Thursday May 21 2015, @11:17AM   Printer-friendly
from the wishing-our-memory-was-high-bandwidth dept.

Advanced Micro Devices (AMD) has shared more details about the High Bandwidth Memory (HBM) in its upcoming GPUs.

HBM in a nutshell takes the wide & slow paradigm to its fullest. Rather than building an array of high speed chips around an ASIC to deliver 7Gbps+ per pin over a 256/384/512-bit memory bus, HBM at its most basic level involves turning memory clockspeeds way down – to just 1Gbps per pin – but in exchange making the memory bus much wider. How wide? That depends on the implementation and generation of the specification, but the examples AMD has been showcasing so far have involved 4 HBM devices (stacks), each featuring a 1024-bit wide memory bus, combining for a massive 4096-bit memory bus. It may not be clocked high, but when it's that wide, it doesn't need to be.

AMD will be the only manufacturer using the first generation of HBM, and will be joined by NVIDIA in using the second generation in 2016. HBM2 will double memory bandwidth over HBM1. The benefits of HBM include increased total bandwidth (from 320 GB/s for the R9 290X to 512 GB/s in AMD's "theoretical" 4-stack example) and reduced power consumption. Although HBM1's memory bandwidth per watt is tripled compared to GDDR5, the memory in AMD's example uses a little less than half the power (30 W for the R9 290X down to 14.6 W) due to the increased bandwidth. HBM stacks will also use 5-10% as much area of the GPU to provide the same amount of memory that GDDR5 would. That could potentially halve the size of the GPU:

By AMD's own estimate, a single HBM-equipped GPU package would be less than 70mm × 70mm (4900mm2), versus 110mm × 90mm (9900mm2) for R9 290X.

HBM will likely be featured in high-performance computing GPUs as well as accelerated processing units (APUs). HotHardware reckons that Radeon 300-series GPUs featuring HBM will be released in June.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Interesting) by bob_super on Thursday May 21 2015, @03:54PM

    by bob_super (1357) on Thursday May 21 2015, @03:54PM (#186051)

    The problem with wide bus on the PCB is the amount of space it takes, and the compexity of synch across the lanes.
    Serial is easier to route if you embed the clock, because skew doesn't matter. But latency takes a huge hit, which stinks for CPU random accesses (cache miss).

    The advantage of HBM is to combine the low latency of the wide bus (fill a cache lane all in the same cycle), lower power of not going down to the board, and simplicity of having the chip manufacturer guarantee the interface's signal integrity between dies on the non-socketed substrate. The main remaining problem is mem size, and total power.
    Current FPGA tech, which uses pretty big dies, can accommodate more than 20k connections between dies. By staying around a GHz, to avoid SERDES latency/power, you can get absolutely massive bandwidth without going off-chip.
    We just couldn't do that ten years ago. Packaging's Signal integrity and practical ball/pads count limits meant having to go serial and add power-hungry SERDES/EQ if you wanted low BER and more bandwidth.

    Starting Score:    1  point
    Moderation   +2  
       Interesting=1, Informative=1, Total=2
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4