I was looking at CPUs the other day and I noticed that there is an AMD Ryzen 7 5800X and a Ryzen 7 5800X3D. The 3D version has three times (96MB) the level three cache as the non-3D version, but runs several hundred megahertz slower. cpubenchmark.net has almost identical CPU Mark scores for the two but some articles I've seen say that the 3D version has far better performance for gaming workloads.
Are there any other workloads the 3D version is better at? Does this tell us anything about the way our software is written in particular, or does it just tell us that games are not particularly good at exploiting more cores?
There must come a point of diminishing returns where adding more cache makes little difference, however three times the cache making a significant difference is interesting.
(Score: 1, Insightful) by Anonymous Coward on Monday April 10, @05:57PM (4 children)
> Are there any other workloads the 3D version is better at?
Anything without heavy task switching and resulting cache flushes?
(Score: 2) by turgid on Monday April 10, @06:11PM (2 children)
I was hoping someone might have some more insight into real workloads that benefit from the bigger cache, and preferably some measurements. Memory latency is a big problem, apparently, which is why going to more cores and threads is generally a win. 96MB is a lot of cache. My first PeeCee only had 32MB RAM (big at the time) and it didn't even use any swap for the first four months I had it.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 1, Interesting) by Anonymous Coward on Wednesday April 12, @09:08AM (1 child)
Even with the bloat that comes with modern systems, you can easily get an entire OS and userspace in that cache. I wonder at what point core count and NUMA concerns will come into play and we will have to start putting L4 cache on things. When that day comes, I can't wait for people to announce that they are running L4 in their L4.
(Score: 3, Interesting) by takyon on Wednesday April 12, @02:44PM
This just dropped:
https://www.tomshardware.com/news/intel-14th-gen-meteor-lake-cpus-may-embrace-an-l4-cache [tomshardware.com]
Nothing solid yet, but I want to see an L4 arms race as soon as possible.
Intel should also put its mobile silicon (with larger graphics) on the desktop socket to compete with AMD's 'G' APUs.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 1, Insightful) by Anonymous Coward on Wednesday April 12, @09:02AM
Depending on the exact memory state, it could actually be the opposite. The three key differences are that the 3D version has lower clocks, lower memory bandwidth, and much higher L3 cache. Those specs favor any situation where the set sizes are such that the locality of reference within the memory hierarchy is more important than instruction execution speed.
There are a number of places this can come into play. The example in the summary is one because the state of a AAA game can be so large that even though the faster processor gets things done quicker, it is constantly stuck waiting for memory. Another area that could benefit because of the locality of the data used often can make a big difference are index operations and other workloads with high amount in their shared, working, and shared working set.
Then their are ones in the middle. The 3D version will also run embarrassingly parallel tasks much faster that are the closer to the "perfect" size. But it will also be much slower on loads that are closer to the wrong size. Heavy tasking switching and workload with high cache flushes may also perform faster because the larger L3 cache can keep more of the working set in cache or because other tasks do not have to wait for the flush to higher parts of the hierarchy. But if that workload doesn't meet those requirements, then it will be slower.
And, of course, there are places where it will be slower. Stream and pipeline processing is one because the memory bandwidth can easily become the limiting factor. Complex calculations on large or small data can be slower because instruction execution speed can easily become the limiting factor. Another is coprocessor-limited or large working set processes. Again, it all depends on situations where the locality of reference and set sizes of the tasks benefit from higher memory amount more than raw speed.
(Score: 3, Informative) by takyon on Monday April 10, @06:20PM (2 children)
If you want to find non-gaming workloads that benefit from 3D cache, there's one place to go: Phoronix.
AMD Ryzen 7 5800X3D On Linux: Not For Gaming, But Very Exciting For Other Workloads [phoronix.com]
If you look at a AnandTech review of the 5800X3D or 7800X3D, it seems like there are no workloads that benefit from the cache vs. slightly higher clock speeds. Michael Larabel is obviously more thorough when it comes to non-gaming tests.
In some cases the differences are minor or the 5800X wins, and in some cases the 5800X3D is more than twice as fast.
In retrospect, it's obvious that there would be non-gaming workloads that benefit from tripled cache, because AMD makes Epyc chips with 3D cache on every core chiplet: Milan-X, soon Genoa-X [techpowerup.com].
In gaming, the tripling of cache was said to bring a +15% average performance increase. Possibly closer to 10% depending on the review you look at. It's easy to cherry pick one way or another when reviewing it because the results are all over the place.
Some of these games are using cache and/or cores inefficiently and can get an absolutely huge benefit. Some games just inherently benefit from having as much fast cache as possible to reduce DRAM accesses, notably simulation games like Factorio and Dwarf Fortress.
It's possible to pair slower RAM with the 5800X3D with less of a penalty, because the extra cache reduces the necessary memory bandwidth.
On AMD's side, you usually get a set amount of cache per CCD (32 MiB). On Intel's side, the available cache gradually climbs as you go up the stack. For example:
https://en.wikipedia.org/wiki/Raptor_Lake [wikipedia.org]
13600K: 24 MiB L3
13700K: 30 MiB L3
13900K: 36 MiB L3
This is partially responsible for Intel's top chips outperforming lower-end ones, while with AMD a 7600X/7700X can often tie the 7950X.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by turgid on Monday April 10, @06:56PM
I've been meaning to write some properly multi-threaded code for a number of years now. I supposed I'd better get around to it one of these days!
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 2) by hendrikboom on Monday April 10, @08:47PM
Bigger cache would help almost any large List program.