Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Friday March 22 2019, @01:41AM   Printer-friendly
from the AKA-L4-cache dept.

Samsung HBM2E 'Flashbolt' Memory for GPUs: 16 GB Per Stack, 3.2 Gbps

Samsung has introduced the industry's first memory that correspond to the HBM2E specification. The company's new Flashbolt memory stacks increase performance by 33% and offer double per-die as well as double per-package capacity. Samsung introduced its HBM2E DRAMs at GTC, indicating that the gaming market is a target market for this memory.

Samsung's Flashbolt KGSDs (known good stacked die) are based on eight 16-Gb memory dies interconnected using TSVs (through silicon vias) in an 8-Hi stack configuration. Every Flashbolt package features a 1024-bit bus with a 3.2 Gbps data transfer speed per pin, thus offering up to 410 GB/s of bandwidth per KGSD.

Samsung positions its Flashbolt KGSDs for next-gen datacenter, HPC, AI/ML, and graphics applications. By using four Flashbolt stacks with a processor featuring a 4096-bit memory interface, developers can get 64 GB of memory with a 1.64 TB/s peak bandwidth, something that will be a great advantage for capacity and bandwidth-hungry chips. With two KGSDs they get 32 GB of DRAM with an 820 GB/s peak bandwidth.

Also at Tom's Hardware.

Previously: Samsung Increases Production of 8 GB High Bandwidth Memory 2.0 Stacks
JEDEC Updates High Bandwidth Memory Standard With New 12-Hi Stacks


Original Submission

Related Stories

Samsung Increases Production of 8 GB High Bandwidth Memory 2.0 Stacks 1 comment

In response to increased demand, Samsung is increasing production of the densest HBM2 DRAM available:

Samsung on Tuesday announced that it is increasing production volumes of its 8 GB, 8-Hi HBM2 DRAM stacks due to growing demand. In the coming months the company's 8 GB HBM2 chips will be used for several applications, including those for consumers, professionals, AI, as well as for parallel computing. Meanwhile, AMD's Radeon Vega graphics cards for professionals and gamers will likely be the largest consumers of HBM2 in terms of volume. And while AMD is traditionally a SK Hynix customer, the timing of this announcement with AMD's launches certainly suggests that AMD is likely a Samsung customer this round as well.

Samsung's 8 GB HBM Gen 2 memory KGSDs (known good stacked die) are based on eight 8-Gb DRAM devices in an 8-Hi stack configuration. The memory components are interconnected using TSVs and feature over 5,000 TSV interconnects each. Every KGSD has a 1024-bit bus and offers up to 2 Gbps data rate per pin, thus providing up to 256 GB/s of memory bandwidth per single 8-Hi stack. The company did not disclose power consumption and heat dissipation of its HBM memory components, but we have reached out [to] Samsung for additional details.

Previously:
Samsung Announces Mass Production of HBM2 DRAM
CES 2017: AMD Vega GPUs and FreeSync 2
AMD Launches the Radeon Vega Frontier Edition


Original Submission

JEDEC Updates High Bandwidth Memory Standard With New 12-Hi Stacks 3 comments

JEDEC Updates Groundbreaking High Bandwidth Memory (HBM) Standard

JEDEC Solid State Technology Association, the global leader in the development of standards for the microelectronics industry, today announced the publication of an update to JESD235 High Bandwidth Memory (HBM) DRAM standard.

[...] JEDEC standard JESD235B for HBM leverages Wide I/O and TSV technologies to support densities up to 24 GB per device at speeds up to 307 GB/s. This bandwidth is delivered across a 1024-bit wide device interface that is divided into 8 independent channels on each DRAM stack. The standard can support 2-high, 4-high, 8-high, and 12-high TSV stacks of DRAM at full bandwidth to allow systems flexibility on capacity requirements from 1 GB – 24 GB per stack.

This update extends the per pin bandwidth to 2.4 Gbps, adds a new footprint option to accommodate the 16 Gb-layer and 12-high configurations for higher density components, and updates the MISR polynomial options for these new configurations.

Some existing High Bandwidth Memory products already had a per pin bandwidth of 2.4 Gbps. However, the increase in stack size and density could allow a product with 96 GB of DRAM using just four stacks (16 Gb DRAM × 12 × 4), up from 32 GB (8 Gb DRAM × 8 × 4).

This update apparently applies to HBM2 and is not considered a third or fourth generation of HBM.

Also at Wccftech and AnandTech.

Previously: Samsung Increases Production of 8 GB High Bandwidth Memory 2.0 Stacks


Original Submission

SK Hynix Announces HBM2E Memory for 2020 Release 2 comments

[HBM is High Bandwidth Memory. -Ed.]

SK Hynix Announces 3.6 Gbps HBM2E Memory For 2020: 1.8 TB/sec For Next-Gen Accelerators

SK Hynix this morning has thrown their hat into the ring as the second company to announce memory based on the HBM2E standard. While the company isn't using any kind of flash name for the memory (ala Samsung's Flashbolt), the idea is the same: releasing faster and higher density HBM2 memory for the next generation of high-end processors. Hynix's HBM2E memory will reach up to 3.6 Gbps, which as things currently stand, will make it the fastest HBM2E memory on the market when it ships in 2020.

As a quick refresher, HBM2E is a small update to the HBM2 standard to improve its performance, serving as a mid-generational kicker of sorts to allow for higher clockspeeds, higher densities (up to 24GB with 12 layers), and the underlying changes that are required to make those happen. Samsung was the first memory vendor to announce HBM2E memory earlier this year, with their 16GB/stack Flashbolt memory, which runs at up to 3.2 Gbps. At the time, Samsung did not announce a release date, and to the best of our knowledge, mass production still hasn't begun.

[...] [SK Hynix's] capacity is doubling, from 8 Gb/layer to 16 Gb/layer, allowing a full 8-Hi stack to reach a total of 16GB. It's worth noting that the revised HBM2 standard actually allows for 12-Hi stacks, for a total of 24GB/stack, however we've yet to see anyone announce memory quite that dense.

See also: HBM2E: The E Stands For Evolutionary

Previously: JEDEC Updates High Bandwidth Memory Standard With New 12-Hi Stacks
Samsung Announces "Flashbolt" HBM2E (High Bandwidth Memory) DRAM packages


Original Submission

GlobalFoundries and SiFive Partner on High Bandwidth Memory (HBM2E)

GlobalFoundries and SiFive to Design HBM2E Implementation on 12LP/12LP+

GlobalFoundries and SiFive announced on Tuesday that they will be co-developing an implementation of HBM2E memory for GloFo's 12LP and 12LP+ FinFET process technologies. The IP package will enable SoC designers to quickly integrate HBM2E support into designs for chips that need significant amounts of bandwidth.

The HBM2E implementation by GlobalFoundries and SiFive includes the 2.5D packaging (interposer) designed by GF, with the HBM2E interface developed by SiFive. In addition to HBM2E technology, licensees of SiFive also gain access to the company's RISC-V portfolio and DesignShare IP ecosystem for GlobalFoundries' 12LP/12LP+, which will enable SoC developers to build RISC-V-based devices [using] GloFo's advanced fab technology.

GlobalFoundries and SiFive suggest that the 12LP+ manufacturing process and the HBM2E implementation will be primarily used for artificial intelligence training and inference applications for edge computing, with vendors looking to optimize for TOPS-per-milliwatt performance.

2.5D/3D packaging.

Related: Samsung Announces "Flashbolt" HBM2E (High Bandwidth Memory) DRAM packages
SK Hynix Announces HBM2E Memory for 2020 Release
GlobalFoundries Develops "12LP+" Fabrication Process
Qualcomm Invests in RISC-V Startup SiFive
SiFive Announces a RISC-V Core With an Out-of-Order Microarchitecture


Original Submission

High Bandwidth Memory Could Increase to 16 Layers, and More 7 comments

SK Hynix has licensed technology that could enable the production of 16-layer High Bandwidth Memory (HBM) stacks. Bandwidth could also be increased by a superior interconnect density:

SK Hynix has inked a new broad patent and technology licensing agreement with Xperi Corp. Among other things, the company licensed the DBI Ultra 2.5D/3D interconnect technology developed by Invensas. The latter was designed to enable building up to 16-Hi chip assemblies, including next-generation memory, and highly-integrated SoCs that feature numerous homogeneous layers.

Invensas' DBI Ultra is a proprietary die-to-wafer hybrid bonding interconnect technology that supports from 100,000 to 1,000,000 interconnects per mm2, using interconnect pitches as small as 1 µm. According to the company, the much greater number of interconnects can offer dramatically increased bandwidth vs. conventional copper pillar interconnect technology, which only goes as high as 625 interconnects per mm2. The small interconnects also offer a shorter z-height, making it possible to build a stacked chip with 16 layers in the same space as conventional 8-Hi chips, allowing for greater memory densities.

12-Hi stacks have been specified, but have only recently reached development/production.

JEDEC (Joint Electron Device Engineering Council) has updated the HBM2 standard to accommodate 3.2Gbps/pin speeds. This is in line with Samsung's "Flashbolt" HBM2E memory (although SK Hynix and Samsung may push speeds to a further 3.6Gbps or 4.2Gbps/pin), which will enter into mass production soon. JEDEC has not adopted the "HBM2E" nomenclature used by Samsung, SK Hynix, and others.

Micron has announced that it is shipping LPDDR5 DRAM to customers. LPDDR5 will be used in the Xiaomi Mi 10, ZTE Axon 10s Pro 5G, and Samsung Galaxy S20 (yes, even 16 GB of it).


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 2) by jmorris on Friday March 22 2019, @02:00AM (3 children)

    by jmorris (4844) on Friday March 22 2019, @02:00AM (#818261)

    So now we are talking about loading the basic 1K page in a single memory access. And still it will present challenges because of speed paired to the huge capacity. Imagine trying to do a pm-suspend, you will need to push up to 64GB to somewhere unless they are going to self refresh it and keep some power to the CPU going.

    So at this rate it is reasonable to ask when the first CPU package ships with 1TB of RAM. Pair that with a few TB of NVMe on a nearby slot and we really are in a new world. The AI people will be able to put it all to use so we now have the "killer app" to drive such a ramp up in computing performance.

    • (Score: 5, Interesting) by takyon on Friday March 22 2019, @02:27AM

      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Friday March 22 2019, @02:27AM (#818271) Journal

      4-8 GB is probably reasonable to start out. High Bandwidth Memory or TSV-connected DRAM isn't too cheap, even if it is getting cheaper. 4 GB could be a good baseline; everything from a sub-10Watt mobile APU to high-end desktop and server chips should come with at least that much, if not a lot more. And that much shouldn't add $100 to the cost of the chip.

      Intel Announces "Sunny Cove", Gen11 Graphics, Discrete Graphics Brand Name, 3D Packaging, and More [soylentnews.org]
      Intel Details Lakefield CPU SoC With 3D Packaging and Big/Small Core Configuration [soylentnews.org]
      AMD Plans to Stack DRAM and SRAM on Top of its Future Processors [soylentnews.org]

      Later, you'll see the real deal with DRAM very tightly integrated into the chip (check the table on slide/page 5):

      https://www.darpa.mil/attachments/3DSoCProposersDay20170915.pdf [darpa.mil]

      Only 4 GB is proposed there, but it could be hundreds of times as fast as today's chips.

      That's the new world, and it has at least a 1,000x performance increase waiting for us. Maybe 1 million - 1 billion times performance increase after we throw other technologies [soylentnews.org] in and layer up. And I'm just talking about classical CPUs; neuromorphic chips could probably become brain-like very easily if scaled vertically.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 2) by FatPhil on Friday March 22 2019, @02:33AM (1 child)

      by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Friday March 22 2019, @02:33AM (#818275) Homepage
      "basic 1K page in a single memory access"

      You're confusing bits with bytes.

      And regarding AI, RAM is a huge waste of energy, the best way of getting the best bang per buck is with small compute nodes that don't have access to shared RAM. Much like multicore DSPs of old, and, surprise surprise, many modern AI-oriented chip architectures. So this fast RAM won't enable significantly more powerful AI technologies, it will just be a latency reduction on the tech we already have. If the same number of transistors had been put into compute nodes, then you might be looking at actual computational improvements. ("I made AI a million times faster! How? I waited 17 doublings from moores law, and improved the algorithm speed by a factor of ten!" - true story)
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
      • (Score: 2) by jmorris on Friday March 22 2019, @02:47AM

        by jmorris (4844) on Friday March 22 2019, @02:47AM (#818279)

        In the current version it would max out at equal to the 512byte sector, but we know this ain't gonna stop. So the cache line, the memory bus and the page size is gonna be one and the same at 1K, then they will all move up to 4K to match the modern storage devices. But there is probably a limit, even inside a package, to how wide they can go without introducing more problems than they can deal with so huge pages accessed in parallel are probably out. In a way it is going to simplify things.

  • (Score: 2) by FatPhil on Friday March 22 2019, @02:20AM (2 children)

    by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Friday March 22 2019, @02:20AM (#818269) Homepage
    "but there's currently no information on when the new technology will reach the market."

    33% faster is roughly how long on a Moore's Law like progression? (Just over 6 months? Why does that remind me of the Oscar Wilde quote 'Fashion is a form of ugliness so intolerable that we have to alter it every six months.'?)
    --
    Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
(1)