Slash Boxes

SoylentNews is people

posted by CoolHand on Thursday May 21 2015, @11:17AM   Printer-friendly
from the wishing-our-memory-was-high-bandwidth dept.

Advanced Micro Devices (AMD) has shared more details about the High Bandwidth Memory (HBM) in its upcoming GPUs.

HBM in a nutshell takes the wide & slow paradigm to its fullest. Rather than building an array of high speed chips around an ASIC to deliver 7Gbps+ per pin over a 256/384/512-bit memory bus, HBM at its most basic level involves turning memory clockspeeds way down – to just 1Gbps per pin – but in exchange making the memory bus much wider. How wide? That depends on the implementation and generation of the specification, but the examples AMD has been showcasing so far have involved 4 HBM devices (stacks), each featuring a 1024-bit wide memory bus, combining for a massive 4096-bit memory bus. It may not be clocked high, but when it's that wide, it doesn't need to be.

AMD will be the only manufacturer using the first generation of HBM, and will be joined by NVIDIA in using the second generation in 2016. HBM2 will double memory bandwidth over HBM1. The benefits of HBM include increased total bandwidth (from 320 GB/s for the R9 290X to 512 GB/s in AMD's "theoretical" 4-stack example) and reduced power consumption. Although HBM1's memory bandwidth per watt is tripled compared to GDDR5, the memory in AMD's example uses a little less than half the power (30 W for the R9 290X down to 14.6 W) due to the increased bandwidth. HBM stacks will also use 5-10% as much area of the GPU to provide the same amount of memory that GDDR5 would. That could potentially halve the size of the GPU:

By AMD's own estimate, a single HBM-equipped GPU package would be less than 70mm × 70mm (4900mm2), versus 110mm × 90mm (9900mm2) for R9 290X.

HBM will likely be featured in high-performance computing GPUs as well as accelerated processing units (APUs). HotHardware reckons that Radeon 300-series GPUs featuring HBM will be released in June.

Related Stories

AMD Teases x86 Improvements, High Bandwidth Memory GPUs 19 comments

Today was Advanced Micro Devices' (AMD) 2015 Financial Analyst Day. The last one was held in 2012. Since then, the company has changed leadership, put its APUs in the major consoles, and largely abandoned the high-end chip market to Intel. Now AMD says it is focusing on gaming, virtual reality, and datacenters. AMD has revealed details of upcoming CPUs and GPUs at the event:

Perhaps the biggest announcement relates to AMD's x86 Zen CPUs, coming in 2016. AMD is targeting a 40% increase in instructions-per-clock (IPC) with Zen cores. By contrast, Intel's Haswell (a "Tock") increased IPC by about 10-11%, and Broadwell (a "Tick") increased IPC by about 5-6%. AMD is also abandoning the maligned Bulldozer modules with Clustered Multithreading in favor of a Simultaneous Multithreading design, similar to Intel's Hyperthreading. Zen is a high priority for AMD to the extent that it is pushing back its ARM K12 chips to 2017. AMD is also shifting focus away from Project Skybridge, an "ambidextrous framework" that combined x86 and ARM cores in SoCs. Zen cores will target a wide range of designs from "top-to-bottom", including both sub-10W TDPs and up to 100W. The Zen architecture will be followed by Zen+ at some point.

On the GPU front, AMD's 2016 GPUs will use FinFETs. AMD plans to be the first vendor to use High Bandwidth Memory (HBM), a 3D/stacked memory standard that enables much higher bandwidth (hence the name) and saves power. NVIDIA also plans to use HBM in its Pascal GPUs slated for 2016. The HBM will be positioned around the processor, as the GPU's thermal output would make cooling the RAM difficult if it were on top. HBM is competing against the similar Hybrid Memory Cube (HMC) standard.

Intel, AMD, and Diapers at Computex 2015 16 comments

Computex 2015 has seen the enthusiastic promotion of Internet of Things devices, including a "smart diaper" seen on the show floor and a "smart vase" monitoring air quality featured in Intel's keynote.

The 6th generation of Intel Core processors, the 14nm "Tock" Skylake, was shown off in a 10mm thick all-in-one design with 4K resolution, but no new details about the CPUs were given. Sales of another form factor, the 2-in-1, were said to have increased 75% year-on-year, and they are expected to be more affordable this year. Intel also plans to increase the performance of its Atom-based Compute Sticks and release a more powerful Core M version this year.

Intel's Broadwell Xeon server chips will be featuring Iris Pro graphics. For example, the Xeon E3-1200v4 includes Iris Pro P6300, resulting in a chip suitable for video transcription. More details are at The Platform. Two socketed 65W Broadwell desktop processors with Iris Pro 6200 graphics have been announced. Both chips have 128 MB of on-die eDRAM acting as L4 cache. Other Broadwell desktop and laptop chips have been announced, and should be available within the next two months (followed by the first Skylake mobile chips in September).

Intel wants to bring wireless power and connectivity to Skylake laptops and tablets. Some Skylake devices will use WiGig (802.11ad) for data transfer, WiDi for display transfer, and Rezence magnetic resonance wireless charging. The extent to which PC vendors will commit to these cable-cutting wireless standards across Skylake devices remains to be seen. Intel also formally announced the merger of the Power Matters Alliance (PMA) and the Alliance for Wireless Power (A4WP).

Intel has deprecated the current Mini DisplayPort connector for Thunderbolt and adopted USB Type-C as the Thunderbolt 3 connector. Intel intended for the Thunderbolt interface to be used over USB ports in the first place back in 2011, but was blocked by the USB consortium at the time. Now that USB Type-C supports "USB Alternate Mode" functionality, the time has come for Intel to ditch MiniDP, the connector for 100 million Thunderbolt devices (many, but peanuts compared to USB). It has doubled the maximum bandwidth of Thunderbolt 3 to 40 Gbps, four times that of USB 3.1. Power consumption is halved, and the connector can drive two external 4K displays simultaneously or a single 5K display, at 60 Hz.

AMD has announced a launch date for graphics cards employing high-bandwidth memory (HBM): June 16th at E3. The NVIDIA GeForce GTX 980 Ti GPU was unveiled and reviewed the day before Computex. AMD's Carrizo APUs for laptops have been launched, at least on paper. AMD is explicitly targeting the $400-700 laptop segment with 15 W Carrizo chips. AMD has demoed FreeSync-over-HDMI, although hardware support remains scarce.

Broadcom and Qualcomm have unveiled 802.11ac MU-MIMO "Wave 2" products with 4x4 antenna configurations. Eight-antenna access points are capable of reaching an aggregate capacity of 6.77 Gbps. Broadcom also announced a 1 Watt gigabit Ethernet chip supporting the Energy Efficient Ethernet standard 802.3az, targeting European Code of Conduct energy efficiency requirements.

AMD Radeon 300 Series Launches, "Fury" to Come 9 comments

AMD has launched its 300 series GPUs. The new GPUs are considered "refreshes" of the "Hawaii" architecture, although there are some improvements. For example, the Radeon R7 360 has 2 GB of VRAM instead of the 1 GB of the Radeon R7 260, as well as a slightly higher memory clock. Radeon R9 390X and Radeon R9 390 boost clock speeds and double VRAM to 8 GB compared to the 4 GB of the 290X and 290, but will launch at a higher price than the older GPUs currently sell for. Is the VRAM boost worth it?

While one could write a small tome on the matter of memory capacity, especially in light of the fact that the Fury series only has 4GB of memory, ultimately the fact that the 390 series has 8GB now is due to a couple of factors. The first of which is the fact that 4GB Hawaii cards require 2Gb GDDR5 chips (2x16), a capacity that is slowly going away in favor of the 4Gb chips used on the Playstation 4 and many of the 2015 video cards. The other reason is that it allows AMD to exploit NVIDIA's traditional stinginess with VRAM; just as with the 290 series versus the GTX 780/770, this means AMD once again has a memory capacity advantage, which helps to shore up the value of their cards versus what NVIDIA offers at the same price.

Meanwhile with the above in mind, based on comments from AMD product managers, it sounds like the use of 4Gb chips also plays a part in the [20%] memory clockspeed increases we're seeing on the 390 series. Later generation chips don't just get bigger, but they get faster and operate at lower voltages as well, and from what we've seen it looks like AMD is taking advantage of all of these factors.

More interesting will be the Radeon R9 Fury X and Radeon R9 Fury, which use the new "Fiji" architecture. These will be AMD's first GPUs to ship with 4 GB of High Bandwidth Memory (HBM). Fury X is a water-cooled card that will launch June 24th for $649. Fury is an air-cooled version with less stream processors and texture units (lower yields) than the Fury X. It will launch on July 14th at $549. AMD claims that the new Fiji GPUs have 1.5 times the performance per watt of the R9 290X, partially due to the decrease in power needed by stacks of HBM vs. GDDR5 memory.

Later this summer, AMD will launch a 6" Fiji card with HBM called "Nano". AMD will launch a "Dual" card sometime in the fall, presumably the equivalent of two Fury X GPUs.

Samsung Announces Mass Production of HBM2 DRAM 10 comments

Samsung has announced the mass production of dynamic random access memory (DRAM) packages using the second generation High Bandwidth Memory (HBM2) interface.

AMD was the first and only company to introduce products using HBM1. AMD's Radeon R9 Fury X GPUs featured 4 gigabytes of HBM1 using four 1 GB packages. Both AMD and Nvidia will introduce GPUs equipped with HBM2 memory this year. Samsung's first HBM2 packages will contain 4 GB of memory each, and the press release states that Samsung intends to manufacture 8 GB HBM2 packages within the year. GPUs could include 8 GB of HBM2 using half of the die space used by AMD's Fury X, or just one-quarter of the die space if 8 GB HBM2 packages are used next year. Correction: HBM2 packages may be slightly physically larger than HBM1 packages. For example, SK Hynix will produce a 7.75 mm × 11.87 mm (91.99 mm2) HBM2 package, compared to 5.48 mm × 7.29 mm (39.94 mm2) HBM1 packages.

The 4GB HBM2 package is created by stacking a buffer die at the bottom and four 8-gigabit (Gb) core dies on top. These are then vertically interconnected by TSV holes and microbumps. A single 8Gb HBM2 die contains over 5,000 TSV holes, which is more than 36 times that of a 8Gb TSV DDR4 die, offering a dramatic improvement in data transmission performance compared to typical wire-bonding based packages.

Samsung's new DRAM package features 256GBps of bandwidth, which is double that of a HBM1 DRAM package. This is equivalent to a more than seven-fold increase over the 36GBps bandwidth of a 4Gb GDDR5 DRAM chip, which has the fastest data speed per pin (9Gbps) among currently manufactured DRAM chips. Samsung's 4GB HBM2 also enables enhanced power efficiency by doubling the bandwidth per watt over a 4Gb-GDDR5-based solution, and embeds ECC (error-correcting code) functionality to offer high reliability.

TSV refers to through-silicon via, a vertical electrical connection used to build 3D chip packages such as High Bandwidth Memory.

Update: HBM2 has been formalized in JEDEC's JESD235A standard, and Anandtech has an article with additional technical details.

AMD Teases x86 Improvements, High Bandwidth Memory GPUs
AMD Shares More Details on High Bandwidth Memory
Samsung Mass Produces 128 GB DDR4 Server Memory

Original Submission

SK Hynix Adds 4 GB HBM2 Memory Stacks to Catalog 11 comments

SK Hynix will begin mass production of 4 GB HBM2 memory stacks soon:

SK Hynix has quietly added its HBM Gen 2 memory stacks to its public product catalog earlier this month, which means that the start of mass production should be imminent. The company will first offer two types of new memory modules with the same capacity, but different transfer-rates, targeting graphics cards, HPC accelerators and other applications. Over time, the HBM2 family will get broader.

SK Hynix intends to initially offer its clients 4 GB HBM2 4Hi stack KGSDs (known good stack dies) based on 8 Gb DRAM devices. The memory devices will feature a 1024-bit bus as well as 1.6 GT/s (H5VR32ESM4H-12C) and 2.0 GT/s (H5VR32ESM4H-20C) data-rates, thus offering 204 GB/s and 256 GB/s peak bandwidth per stack.

Samsung has already manufactured 4 GB stacks. Eventually, there will also be 2 GB and 8 GB stacks available.

Previously: AMD Shares More Details on High Bandwidth Memory
Samsung Announces Mass Production of HBM2 DRAM

Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Funny) by FatPhil on Thursday May 21 2015, @12:11PM

    by FatPhil (863) <> on Thursday May 21 2015, @12:11PM (#185998) Homepage
    In the 2000s, because of synchronisation issues wide busses were rejected and narrow fast ones were favoured.
    In the 2010s, narrow fast ones are being replaced by wide busses.

    Fuckit, I ain't buying any more PC stuff until the boffins work out which is actually better.
    Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
    • (Score: 2) by takyon on Thursday May 21 2015, @12:21PM

      by takyon (881) <{takyon} {at} {}> on Thursday May 21 2015, @12:21PM (#185999) Journal

      The way I see it HBM, which is TSV stacked, will replace GDDR5 and there will never be a GDDR6.

      Everything that can be stacked will be stacked. V-NAND will solve/delay NAND endurance issues for years. Eventually processors will be stacked.

      [SIG] 10/28/2017: Soylent Upgrade v14 []
      • (Score: 4, Interesting) by bzipitidoo on Thursday May 21 2015, @12:57PM

        by bzipitidoo (4388) on Thursday May 21 2015, @12:57PM (#186005) Journal

        The big problem with stacking is heat. Would have happened years ago if not for that. But then. heat is a big problem everywhere in circuit design.

        Parallelism, going wide, is the way forward for now. Doubt we'll move up from 64bit any time soon. There was a compelling reason to move from 32bit, which is that it can address at most 4G of RAM. We're nowhere close to bumping up against the 64bit limit of nearly 2x10^19. Instead, we've been seeing the multi core CPUs. Parallel programming as originally envisioned at the source code level hasn't really happened, people aren't using programming languages explicitly designed for parallelism. Instead we're seeing it at arm's length, in libraries such as OpenCL. Parallelism is the reason Google gained such a competitive advantage. They did it better, made that MapReduce library. The hunt is still on for other places to apply more width.

        • (Score: 3, Informative) by takyon on Thursday May 21 2015, @02:20PM

          by takyon (881) <{takyon} {at} {}> on Thursday May 21 2015, @02:20PM (#186024) Journal


          The silicon-germanium heterojunction bipolar transistors built by the IBM-Georgia Tech team operated at frequencies above 500 GHz at 4.5 Kelvins (451 degrees below zero Fahrenheit) - a temperature attained using liquid helium cooling. At room temperature, these devices operated at approximately 350 GHz.

          Just get SiGe transistors and clock them way down.

          Well, it's not that simple, but it's a start.

          [SIG] 10/28/2017: Soylent Upgrade v14 []
        • (Score: 2) by Katastic on Thursday May 21 2015, @06:09PM

          by Katastic (3340) on Thursday May 21 2015, @06:09PM (#186130)

          I said the same thing on Slashdot a week or two ago and got zero up mods. Snarky bastards.

          The other issue is:

          >It may not be clocked high, but when it's that wide, it doesn't need to be.

          No, no, no, no and no. Latency does not scale with bus width. You can't get 9 women pregnant and expect a baby every month.

          • (Score: 0) by Anonymous Coward on Thursday May 21 2015, @08:53PM

            by Anonymous Coward on Thursday May 21 2015, @08:53PM (#186197)

            No, no, no, no and no. Latency does not scale with bus width. You can't get 9 women pregnant and expect a baby every month.

            Have you tried pipelining?

            • (Score: 2) by Katastic on Friday May 22 2015, @12:25AM

              by Katastic (3340) on Friday May 22 2015, @12:25AM (#186265)

              Pipelining by definition: at best does not change latency, and at worst, significantly increases latency. It cannot reduce latency.

              Fun history: It's 50% of the reason the Pentium 4 Netburst architecture was a complete failure and slower than the Pentium III's. They added a huge pipeline, with huge chances for stalls, but they thought they could hit 10 GHz with the P4 architecture so "it wouldn't matter."

              And then the 3-4 GHZ barrier happened...

              Pentium 4's were heating up faster than any of their models predicted. So the primary advantage of their new architecture couldn't be utilized. The smaller and smaller they manufactured things, new "problems" that could be disregarded before all a sudden become extremely important. Heat levels exploded exponentially.

      • (Score: 0) by Anonymous Coward on Thursday May 21 2015, @05:04PM

        by Anonymous Coward on Thursday May 21 2015, @05:04PM (#186089)

        > Eventually processors will be stacked.

        I think we can expect to see gigabytes of ram stacked on the cpus for high-bandwidth, low-latency access. Like a sort of L4 cache.

    • (Score: 3, Funny) by c0lo on Thursday May 21 2015, @12:59PM

      by c0lo (156) Subscriber Badge on Thursday May 21 2015, @12:59PM (#186006) Journal

      Fuckit, I ain't buying any more PC stuff until the boffins work out which is actually better.

      Better - wide fast busses.
      Best - CPU with 2^36 registers 128bit wide - ought to be enough for anybody.

    • (Score: 4, Interesting) by bob_super on Thursday May 21 2015, @03:54PM

      by bob_super (1357) on Thursday May 21 2015, @03:54PM (#186051)

      The problem with wide bus on the PCB is the amount of space it takes, and the compexity of synch across the lanes.
      Serial is easier to route if you embed the clock, because skew doesn't matter. But latency takes a huge hit, which stinks for CPU random accesses (cache miss).

      The advantage of HBM is to combine the low latency of the wide bus (fill a cache lane all in the same cycle), lower power of not going down to the board, and simplicity of having the chip manufacturer guarantee the interface's signal integrity between dies on the non-socketed substrate. The main remaining problem is mem size, and total power.
      Current FPGA tech, which uses pretty big dies, can accommodate more than 20k connections between dies. By staying around a GHz, to avoid SERDES latency/power, you can get absolutely massive bandwidth without going off-chip.
      We just couldn't do that ten years ago. Packaging's Signal integrity and practical ball/pads count limits meant having to go serial and add power-hungry SERDES/EQ if you wanted low BER and more bandwidth.

    • (Score: 3, Interesting) by Hairyfeet on Thursday May 21 2015, @04:42PM

      by Hairyfeet (75) <{bassbeast1968} {at} {}> on Thursday May 21 2015, @04:42PM (#186081) Journal

      Actually this is not surprising, as the 00s were all "speed is key" until first Intel and then everybody else smacked right into the thermal wall. Everybody remembers what a dog slow giant piece of shit the P4 Netburst arch was and most act like it was a "what were they thinking?" brainfart but it actually wasn't, the entire design was based on the "speed is key" mindset of the time and with that design 10GHz was the goal, in fact you could take the Pentium D 805 and OC that thing to Celeron 300A levels, we're talking from stock 2.8Ghz all the way to 4.2Ghz-4.5GHz with a $70 sealed liquid cooler. The downside of course was those things were cranking out so much heat it would have turned your PC room into a sauna so Intel realized it was a dead end design wise.

      the same thing is now happening to the GPU makers, the skinny that has been going around the GPU forums is that both AMD and Nvidia have had designs that would easily be 3-4 times as fast as what is on the market but using the current designs you'd need a 2Kw PSU to feed the things and a radiator the size of a Honda Civic just too keep the suckers cool. Both companies have realized that the "speed is key" design philosophy has reached a dead end because the faster you go the hotter you get, end of story. I wouldn't be surprised to see the same happen to pretty much every component in the PC as they all hit the thermal wall, because no matter the part they will reach a point where the speed simply doesn't justify the amount of heat being produced.

        If I had to hazard a guess I'd say the next part to go this route would probably be SSDs, as they have already reached the point they are having to slap the chips on PCIe cards just to get more speed than the last model, once they can't get anymore speed even with PCIe they will have no choice but to go back to the drawing board and find another way to make gains over previous gens.

      ACs are never seen so don't bother. Always ready to show SJWs for the racists they are.
      • (Score: 2) by SlimmPickens on Thursday May 21 2015, @10:49PM

        by SlimmPickens (1056) on Thursday May 21 2015, @10:49PM (#186239)

        Isn't pcie thing just because it's a better bus for random access memory than SATA? Has nothing to do with a thermal wall.

        • (Score: 2) by Hairyfeet on Friday May 22 2015, @01:13AM

          by Hairyfeet (75) <{bassbeast1968} {at} {}> on Friday May 22 2015, @01:13AM (#186272) Journal

          It doesn't change the laws of thermal dynamics and the faster you go? The hotter you get. If you have tried one of the latest PCIe SSDs you'll know they are already a lot warmer than the SATA versions and that is only on a 4x bus....what do you think the temps will reach at 8x? 16x? Eventually you WILL run into the thermal wall, its not an "if" its a "when".

          ACs are never seen so don't bother. Always ready to show SJWs for the racists they are.