SK Hynix is almost ready to produce GDDR6 memory with higher than expected per-pin bandwidth:
In a surprising move, SK Hynix has announced its first memory chips based on the yet-unpublished GDDR6 standard. The new DRAM devices for video cards have capacity of 8 Gb and run at 16 Gbps per pin data rate, which is significantly higher than both standard GDDR5 and Micron's unique GDDR5X format. SK Hynix plans to produce its GDDR6 ICs in volume by early 2018.
GDDR5 memory has been used for top-of-the-range video cards for over seven years, since summer 2008 to present. Throughout its active lifespan, GDDR5 increased its data rate by over two times, from 3.6 Gbps to 9 Gbps, whereas its per chip capacities increased by 16 times from 512 Mb to 8 Gb. In fact, numerous high-end graphics cards, such as NVIDIA's GeForce GTX 1060 and 1070, still rely on the GDDR5 technology, which is not going anywhere even after the launch of Micron's GDDR5X with up to 12 Gbps data rate per pin in 2016. As it appears, GDDR6 will be used for high-end graphics cards starting in 2018, just two years after the introduction of GDDR5X.
Previously: Samsung Announces Mass Production of HBM2 DRAM
DDR5 Standard to be Finalized by JEDEC in 2018
Related Stories
Samsung has announced the mass production of dynamic random access memory (DRAM) packages using the second generation High Bandwidth Memory (HBM2) interface.
AMD was the first and only company to introduce products using HBM1. AMD's Radeon R9 Fury X GPUs featured 4 gigabytes of HBM1 using four 1 GB packages. Both AMD and Nvidia will introduce GPUs equipped with HBM2 memory this year. Samsung's first HBM2 packages will contain 4 GB of memory each, and the press release states that Samsung intends to manufacture 8 GB HBM2 packages within the year. GPUs could include 8 GB of HBM2 using half of the die space used by AMD's Fury X, or just one-quarter of the die space if 8 GB HBM2 packages are used next year. Correction: HBM2 packages may be slightly physically larger than HBM1 packages. For example, SK Hynix will produce a 7.75 mm × 11.87 mm (91.99 mm2) HBM2 package, compared to 5.48 mm × 7.29 mm (39.94 mm2) HBM1 packages.
The 4GB HBM2 package is created by stacking a buffer die at the bottom and four 8-gigabit (Gb) core dies on top. These are then vertically interconnected by TSV holes and microbumps. A single 8Gb HBM2 die contains over 5,000 TSV holes, which is more than 36 times that of a 8Gb TSV DDR4 die, offering a dramatic improvement in data transmission performance compared to typical wire-bonding based packages.
Samsung's new DRAM package features 256GBps of bandwidth, which is double that of a HBM1 DRAM package. This is equivalent to a more than seven-fold increase over the 36GBps bandwidth of a 4Gb GDDR5 DRAM chip, which has the fastest data speed per pin (9Gbps) among currently manufactured DRAM chips. Samsung's 4GB HBM2 also enables enhanced power efficiency by doubling the bandwidth per watt over a 4Gb-GDDR5-based solution, and embeds ECC (error-correcting code) functionality to offer high reliability.
TSV refers to through-silicon via, a vertical electrical connection used to build 3D chip packages such as High Bandwidth Memory.
Update: HBM2 has been formalized in JEDEC's JESD235A standard, and Anandtech has an article with additional technical details.
Previously:
AMD Teases x86 Improvements, High Bandwidth Memory GPUs
AMD Shares More Details on High Bandwidth Memory
Samsung Mass Produces 128 GB DDR4 Server Memory
JEDEC has announced that it expects to finalize the DDR5 standard by next year. It says that DDR5 will double bandwidth and density, and increase power efficiency, presumably by lowering the operating voltages again (perhaps to 1.1 V). Availability of DDR5 modules is expected by 2020:
You may have just upgraded your computer to use DDR4 recently or you may still be using DDR3, but in either case, nothing stays new forever. JEDEC, the organization in charge of defining new standards for computer memory, says that it will be demoing the next-generation DDR5 standard in June of this year and finalizing the standard sometime in 2018. DDR5 promises double the memory bandwidth and density of DDR4, and JEDEC says it will also be more power-efficient, though the organization didn't release any specific numbers or targets.
The DDR4 SDRAM specification was finalized in 2012, and DDR3 in 2007, so DDR5's arrival is to be expected (cue the Soylentils still using DDR2). One way to double the memory bandwidth of DDR5 is to double the DRAM prefetch to 16n, matching GDDR5X.
Graphics cards are beginning to ship with GDDR5X. Some graphics cards and Knights Landing Xeon Phi chips include High Bandwidth Memory (HBM). A third generation of HBM will offer increased memory bandwidth, density, and more than 8 dies in a stack. Samsung has also talked about a cheaper version of HBM for consumers with a lower total bandwidth. SPARC64 XIfx chips include Hybrid Memory Cube. GDDR6 SDRAM could raise per-pin bandwidth to 14 Gbps, from the 10-14 Gbps of GDDR5X, while lowering power consumption.
Samsung's second generation ("1y-nm") 8 Gb DDR4 DRAM dies are being mass produced:
Samsung late on Wednesday said that it had initiated mass production of DDR4 memory chips using its second generation '10 nm-class' fabrication process. The new manufacturing technology shrinks die size of the new DRAM chips and improves their performance as well as energy efficiency. To do that, the process uses new circuit designs featuring air spacers (for the first time in DRAM industry). The new DRAM ICs (integrated circuits) can operate at 3600 Mbit/s per pin data rate (DDR4-3600) at standard DDR4 voltages and have been validated with major CPU manufacturers already.
[...] Samsung's new DDR4 chip produced using the company's 1y nm fabrication process has an 8-gigabit capacity and supports 3600 MT/s data transfer rate at 1.2 V. The new D-die DRAM runs 12.5% faster than its direct predecessor (known as Samsung C-die, rated for 3200 MT/s) and is claimed to be up to 15% more energy efficient as well. In addition, the latest 8Gb DDR4 ICs use a new in-cell data sensing system that offers a more accurate determination of the data stored in each cell and which helps to increase the level of integration (i.e., make cells smaller) and therefore shrink die size.
Samsung says that the new 8Gb DDR4 chips feature an "approximate 30% productivity gain" when compared to similar chips made using the 1x nm manufacturing tech.
UPDATE 12/21: Samsung clarified that productivity gain means increase in the number of chips per wafer. Since capacity of Samsung's C-die and D-die is the same, the increase in the number of dies equals the increase in the number of bits per wafer. Therefore, the key takeaway from the announcement is that the 1y nm technology and the new in-cell data sensing system enable Samsung to shrink die size and fit more DRAM dies on a single 300-mm wafer. Meanwhile, the overall 30% productivity gain results in lower per-die costs at the same yield and cycle time (this does not mean that the IC costs are 30% lower though) and increases DRAM bit output.
The in-cell data sensing system and air spacers will be used by Samsung in other upcoming types of DRAM, including DDR5, LPDDR5, High Bandwidth Memory 3.0, and GDDR6.
Also at Tom's Hardware.
Previously: Samsung Announces "10nm-Class" 8 Gb DRAM Chips
Related: Samsung Announces 12Gb LPDDR4 DRAM, Could Enable Smartphones With 6 GB of RAM
Samsung Announces 8 GB DRAM Package for Mobile Devices
Samsung's 10nm Chips in Mass Production, "6nm" on the Roadmap
Samsung Increases Production of 8 GB High Bandwidth Memory 2.0 Stacks
IC Insights Predicts Additional 40% Increase in DRAM Prices
Samsung has announced the mass production of 16 Gb GDDR6 SDRAM chips with a higher-than-expected pin speed. The chips could see use in upcoming graphics cards that are not equipped with High Bandwidth Memory:
Samsung has beaten SK Hynix and Micron to be the first to mass produce GDDR6 memory chips. Samsung's 16Gb (2GB) chips are fabricated on a 10nm process and run at 1.35V. The new chips have a whopping 18Gb/s pin speed and will be able to reach a transfer rate of 72GB/s. Samsung's current 8Gb (1GB) GDDR5 memory chips, besides having half the density, work at 1.55V with up to 9Gb/s pin speeds. In a pre-CES 2018 press release, Samsung briefly mentioned the impending release of these chips. However, the speed on release is significantly faster than the earlier stated 16Gb/s pin speed and 64GB/s transfer rate.
18 Gbps exceeds what the JEDEC standard calls for.
Also at Engadget and Wccftech.
Related: GDDR5X Standard Finalized by JEDEC
DDR5 Standard to be Finalized by JEDEC in 2018
SK Hynix to Begin Shipping GDDR6 Memory in Early 2018
Samsung's Second Generation 10nm-Class DRAM in Production
Micron Spills on GDDR6X: PAM4 Signaling For Higher Rates, Coming to NVIDIA's RTX 3090
It would seem that Micron this morning has accidentally spilled the beans on the future of graphics card memory technologies – and outed one of NVIDIA's next-generation RTX video cards in the process. In a technical brief that was posted to their website, dubbed "The Demand for Ultra-Bandwidth Solutions", Micron detailed their portfolio of high-bandwidth memory technologies and the market needs for them. Included in this brief was information on the previously-unannounced GDDR6X memory technology, as well as some information on what seems to be the first card to use it, NVIDIA's GeForce RTX 3090.
[...] At any rate, as this is a market overview rather than a technical deep dive, the details on GDDR6X are slim. The document links to another, still-unpublished document, "Doubling I/O Performance with PAM4: Micron Innovates GDDR6X to Accelerate Graphics Memory", that would presumably contain further details on GDDR6X. None the less, even this high-level overview gives us a basic idea of what Micron has in store for later this year.
The key innovation for GDDR6X appears to be that Micron is moving from using POD135 coding on the memory bus – a binary (two state) coding format – to four state coding in the form of Pulse-Amplitude Modulation 4 (PAM4). In short, Micron would be doubling the number of signal states in the GDDR6X memory bus, allowing it to transmit twice as much data per clock.
[...] According to Micron's brief, they're expecting to get GDDR6X to 21Gbps/pin, at least to start with. This is a far cry from doubling GDDR6's existing 16Gbps/pin rate, but it's also a data rate that would be grounded in the limitations of PAM4 and DRAM. PAM4 itself is easier to achieve than binary coding at the same total data rate, but having to accurately determine four states instead of two is conversely a harder task. So a smaller jump isn't too surprising.
The leaked Ampere-based RTX 3090 seems to be Nvidia's attempt to compete with AMD's upcoming RDNA2 ("Big Navi") GPUs without lowering the price of the usual high-end "Titan" GPU (Titan RTX launched at $2,499). Here are some of the latest leaks for the RTX 30 "Ampere" GPU lineup.
Previously: GDDR5X Standard Finalized by JEDEC
SK Hynix to Begin Shipping GDDR6 Memory in Early 2018
Samsung Announces Mass Production of GDDR6 SDRAM
Related: PCIe 6.0 Announced for 2021: Doubles Bandwidth Yet Again (uses PAM4)
(Score: 3, Interesting) by Scrutinizer on Monday May 01 2017, @12:58PM
I recall that SK Hynix was one of the manufacturers whose DDR4 modules were much more resistant to Rowhammer attacks [soylentnews.org] than those from Crucial/Micron, based upon test results from thirdio.com [thirdio.com]. (G.Skill was the other manufacturer with highly resistant DDR4 modules.)
Some of their fancy DDR6 memory might go well with the card needed to drive my future Facebook-branded face mask [oculus.com].
(Score: 2) by fyngyrz on Monday May 01 2017, @01:07PM (14 children)
My first reaction was "graphics cards? Why can't I have this stuff on the CPU?"
CPUs have been faster than main memory for years. Without main memory that's as fast as the instruction / data cycle time, things are slower than they otherwise could be for that CPU. The problems, I suppose, would be related to getting the info in and out of the CPU at a speed to match the new RAM, and a bus to carry it... but this would certainly be useful if it could be done and commercialized affordably.
(Score: 2) by Runaway1956 on Monday May 01 2017, @02:20PM (11 children)
In my experience, memory has been "fast enough" for a long time now. Given sufficient memory, an ancient AMD K6 450 mhz CPU can actually run Windows XP fairly well. That is one gig of PC-100, in this case. With limited memory, a 1 Ghz Athlon struggled to run basically the same installation.
My computer at home has 24 gig of memory, which is basically obsolete. The computer at work is quite modern and up to date, but it drags ass all day, every day. It only has 1 gig of memory, and runs Windows 7, with a buttload of background tasks. At home, I never wait for anything, except internet. At work, waiting for anything to load is painful. Microsoft needs to shitcan the paging system, and make it clear that computers require 8 gig of memory to run today. In the not-distant future, they're going to need 16 gig minimum, because everything just grows larger and more bloated all the time. Within a decade, we may need to raise the minimum to 64 gig if you want to run a Microsoft OS. (I expect Linux to stay reasonably responsive with far less memory, just because Linux still runs well on old hardware right now.)
Given the opportunity, yes, I'll take faster memory, faster CPU, faster everything. But, more than anything, I want a machine with ADEQUATE MEMORY! Ideally, your entire operating system, as well as most of your apps can reside in memory, and never make use of virtual memory. The slowest memory on the market today is more than fast enough to make almost any computer sing, if only enough memory is installed.
“I have become friends with many school shooters” - Tampon Tim Walz
(Score: 3, Interesting) by takyon on Monday May 01 2017, @02:57PM (3 children)
How is a machine with only 1 GB of RAM considered "quite modern"? I would struggle to find anything under 4 GB on the market today, and they would be landfill laptops and Chromebooks with an absolute minimum of 2 GB of RAM.
You can probably find someone with a free 4 GB DDR3 module. If not, buy it yourself for $20, put it in the work computer, and take it away when you leave. Nobody will notice and you can put it in your museum later.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by VLM on Monday May 01 2017, @03:09PM
Aside from all this desktop stuff, in the server and virtualization host market, nobody ever said their memory bus was too fast.
From an engineering standpoint it should be possible to make optimizations such that sequential reading is kinda the default and faster at the expense of totally random access. Remember "Row and column" strobes for dram addressing in the 80s or so, you could extend that concept way beyond 2 dimensions (not physically of course?) such that sequential access would require usually 1 or sometimes 2 address segment writes per cycle for a graphics display but totally randomly smacking some random address would take like 8 address segment loads. Like imagine 128 parallel data lines and 8 bits of address and a whole bunch of segment strobes to load up the address 8 bits at a time. I wonder if there's also some weird dual porting stuff going on such that it wouldn't really be your first choice for a CPU.
(Score: 2) by Runaway1956 on Monday May 01 2017, @04:47PM
It belongs to contractors, who have spec'd the machines to their own needs/wants. Seriously, machines that are - ohhhh - I guess they are three years old now - with decent CPU's, smallish hard drives, no optical drives, and - only ONE gig of memory. Don't ask me why, or how. They were built cheap, and that's how they run. Our own IT people weren't smart enough to realize they were getting ripped off. The machines in the offices aren't so bad as our machines in the work spaces, but they are still pretty crappy.
“I have become friends with many school shooters” - Tampon Tim Walz
(Score: 2) by JoeMerchant on Monday May 01 2017, @10:15PM
This ^^^ - my father bought me a cast-off iMac from his University for $50, the only thing really "wrong" with it was that it only had 1GB of RAM - and it was upgrade capped to 2GB by Apple... We spent another (years ago) $40 to get 2GB of RAM for it and it became downright usable - for single users doing single tasks.
🌻🌻 [google.com]
(Score: 2) by LoRdTAW on Monday May 01 2017, @03:15PM
For the most part, it certainly is. Basic desktop useage really doesn't push memory bandwidth so much as memory usage (code bloat). We also have plenty of CPU. Even a modest dual core i3 can handle most desktop use including low end gaming.
It's not Microsoft's fault. They have a minimum as well as an optimal configuration spec. But the problem lies in the OEM's who make tons of money upselling you on a few GB of RAM.
Dell loves pulling this shit by offering dozens of different models which are nothing more than the same PC with multiple fixed configurations. "Oh, you want more than 8GB RAM *AND* a core i3 in your optiplex? Too bad. Buy the i5 model for another $150 AND pay another $100 for the 8GB model. Or buy that nearly $900 i3 that's about $400 over the base i3. What I wind up doing is buying the base i3 model with 4GB, look up the part number of the OEM RAM module, and buy a matching 4GB module for $30-50. Or buy a compatible 8GB kit from crucial and use the extra 4GB OEM stick in another desktop to double that one to 8GB as well. You can save around $200+ per workstation doing this. Of course that's fine for small shops like mine with less than 20 desktops. If you order by the tens or hundreds, it's less practical and you wind up going with the upsell to eliminate the labour.
Dell (along with many others) were notorious for selling low end home PC's with barely enough RAM to boot the damn OS as was the case with my friends P4 Dell in the mid 00's with 256MB RAM and XP home. Ran like shit. Opening a web browser was an 30+ second ordeal as the disk chruned and burned shuffling stuff out of RAM into the page file. I had him buy a 1GB kit that was compatible and it was like a whole new computer. And 1GB should have been the XP minimum, not 256 like MS said.
(Score: 2) by kaszz on Monday May 01 2017, @05:01PM (4 children)
The paging system is what makes the difference sometimes between can or can't use a program at all. If you install enough primary memory. This will not be a problem, so take away the options for other people.
However if people did shitcan Microsoft software then a lot of memory problems would not occur. Besides all these requirements on gigs of primary memory is a indicator of really bad programming with some exceptions like for CAD etc.
(Score: 2) by Runaway1956 on Monday May 01 2017, @05:54PM (3 children)
It almost seems like an arms race. Memory gets cheaper, and more plentiful, but shoddy programmers seem to be determined to squander all of that memory.
“I have become friends with many school shooters” - Tampon Tim Walz
(Score: 2) by kaszz on Monday May 01 2017, @06:39PM (2 children)
I have noticed the same trend too. But there are ways to avoid it. Open source OS, open source applications.
And I suspect C++ and other languages has a lot to do with this. On top of people that have no business designing software being shepherded into "programming".
The CPU trend will be interesting though.. they seem to not clock faster than circa 4.5 GHz. So programmers have to be smarter about that resource or see competition running them over.
(Score: 0) by Anonymous Coward on Tuesday May 02 2017, @02:36AM (1 child)
I'm not a programmer, so keeping that in mind: is there something wrong with C++ or the way it's being used? What are the better alternatives?
(Score: 2) by kaszz on Tuesday May 02 2017, @03:47AM
When you write software in C++ (object oriented) it will too often implicitly suck in a lot of stuff, name space can get overloaded and some programmers like to allocate but free() is less popular.
Depending on task, use C.
(Score: 2) by shortscreen on Monday May 01 2017, @08:22PM
Pentium 3s were terribly memory speed limited. In some cases there was hardly any difference between a P3 800MHz or 1.4GHz if they both had PC-133 SDRAM. This was when Intel started building their shitty graphics into the motherboard chipset, so refreshing the display would eat up precious memory bandwidth. Changing the screen mode to a lower resolution and color depth to reduce bus contention would speed things up measurably. Check out these benchmark results from super pi:
Pentium 3 Coppermine 933MHz (discrete video card) - 2m17s
Pentium 3 Tualitin 800MHz (i830 graphics) - 2m35s
Pentium 3 Tualitin 1.33GHz (i830 graphics) - 2m20s
1.33GHz with lower latency 3-2-2 RAM - 2m06s
1.33GHz with screen mode set to 800x600 16-bit - 1m57s
Athlon XPs were also somewhat limited. Although they used DDR, since speeds eventually got up to 2.3GHz or so, on a 200MHz bus the latency got to be even worse than Pentium 3s (but with 64-byte line size instead of 32)
Athlon 64s and Pentium Ms greatly reduced this bottleneck by lowering latency and improving cache hit rates, respectively.
The weird thing is that the MOVSD instruction has never been optimized enough for a simple block copy to acheive anything close to theoretical memory throughput. It's always limited by the CPU instead (and read-before-write memory access patterns). I guess new CPUs have a fancier way to do block copies although I have not tried it.
(Score: 3, Interesting) by sjames on Monday May 01 2017, @03:21PM
You can, just as soon as the lumbering behemoths get around to producing CPUs that support it. It makes sense for Hynix to target GPUs first as they're more likely to adopt it faster.
Beyond that though, for a new memory tech, GPUs used for actual graphics tend to be far more tolerant to single bit errors than CPUs. This gives a little time to tune the manufacturing and get the kinks out.
(Score: 2) by kaszz on Monday May 01 2017, @04:55PM
The problem may be related to access patterns. If the CPU pattern is random enough. Then a memory which can only deliver long sequential access will not improve anything.
(Score: 2) by boltronics on Monday May 01 2017, @01:15PM (1 child)
Will be interesting to compare with HBM2, or even HBM1 for that matter (as found on the Fury X). My basic understanding was that GDDR5X is slower than HBM1, but it was going to be too costly for Nvidia to re-architect for HBM1 so they went with GDDR5X anyway since it was similar to the older GDDR5?
No surprise then that the RX400 and RX500 series only uses GDDR5 due to the mid-range high value aims, but perhaps the upcoming high-end Vega range will be using HBM2? We shouldn't have to wait long to find out.
It's GNU/Linux dammit!
(Score: 4, Interesting) by takyon on Monday May 01 2017, @02:21PM
NVIDIA did skip HBM1, which is not nearly as capable as HBM2. I hear that HBM2 (which should be HBM3 before long) is still more expensive than GDDR5X or GDDR6. GDDR6 enables rather high memory bandwidth, so the benefits of HBM aren't necessarily as apparent. HBM however does take up less die space and has lower power consumption.
Vega will have HBM. [google.com]
The rumor mill (just noticed) is explicitly saying that NVIDIA Volta will use GDDR6 [wccftech.com]. Compare to the AnandTech article in the summary which just says "SK Hynix is not disclosing the name of its partner among GPU developers".
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by kaszz on Monday May 01 2017, @04:52PM (1 child)
The GDDR5X [micron.com] interface seems to rely on a PLL clock, selectable data bus inversion and quad data rate interface for rates up 12 Gbit/s using single ended mode. So how does GDDR6 accomplish 16 Gbit/s ?
(btw, good schematic on page 13)
Some other observations is that the GDDR5X is designed to operate on 1.35 V. That is a voltage level with a very thin noise margin. And..
Note: The operating range and AC timings of a faster speed bin are a superset of all slower speed bins. Therefore it is safe to use a faster bin device as a drop-in replacement of a slower bin device when operated within the frequency range of the slower bin device.
(page 3)
Already present DRAM may actually be faster than its marking tells you.
The circuit board impedance matching, length matching, reflection avoidance, shielding etc.. pain in the %#¤.
(Score: 2) by takyon on Monday May 01 2017, @10:40PM
FTA:
Therefore voltage for GDDR6 will likely fall somewhere between 1.2 V and 1.4 V.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]