AMD released Threadripper CPUs in 2017, built on the same 14nm Zen architecture as Ryzen, but with up to 16 cores and 32 threads. Threadripper was widely believed to have pushed Intel to respond with the release of enthusiast-class Skylake-X chips with up to 18 cores. AMD also released Epyc-branded server chips with up to 32 cores.
This week at Computex 2018, Intel showed off a 28-core CPU intended for enthusiasts and high end desktop users. While the part was overclocked to 5 GHz, it required a one-horsepower water chiller to do so. The demonstration seemed to be timed to steal the thunder from AMD's own news.
Now, AMD has announced two Threadripper 2 CPUs: one with 24 cores, and another with 32 cores. They use the "12nm LP" GlobalFoundries process instead of "14nm", which could improve performance, but are currently clocked lower than previous Threadripper parts. The TDP has been pushed up to 250 W from the 180 W TDP of Threadripper 1950X. Although these new chips match the core counts of top Epyc CPUs, there are some differences:
At the AMD press event at Computex, it was revealed that these new processors would have up to 32 cores in total, mirroring the 32-core versions of EPYC. On EPYC, those processors have four active dies, with eight active cores on each die (four for each CCX). On EPYC however, there are eight memory channels, and AMD's X399 platform only has support for four channels. For the first generation this meant that each of the two active die would have two memory channels attached – in the second generation Threadripper this is still the case: the two now 'active' parts of the chip do not have direct memory access.
This also means that the number of PCIe lanes remains at 64 for Threadripper 2, rather than the 128 of Epyc.
Threadripper 1 had a "game mode" that disabled one of the two active dies, so it will be interesting to see if users of the new chips will be forced to disable even more cores in some scenarios.
(Score: 0) by Anonymous Coward on Friday June 08 2018, @03:07PM (1 child)
Why not use OpenCL to run it on a GPU?
(Score: 2) by DannyB on Monday June 11 2018, @02:27PM
So many things to do, so little time. I'm sure you know the story.
The project is written in Java. There are two (that I know of) projects for Java to support OpenCL in Java. I have looked into it. It is a higher bar to jump over. I might try it with a small project first. It's a matter of time and energy to do it. I'm interested in trying it.
I have to write a C kernel, and there are examples, and have that code available as a "string". (Eg, baked into the code, retrieved from a configuration file, database, etc.) I have to think about the problem very differently to organize it for OpenCL. It is a very different programming model than conventional CPUs. Basically OpenCL is parallelism at a far finer grained approach than the "work units" I described. The work done my my "work units", and thus the code, could be arbitrarily complex. As long as work units are all independent of one another. The very same code to do the work, works on a single cpu core, or multiple cores, if you have them. With OpenCL, I need to have two sets of code to maintain. The OpenCL version, and at least a single-core version for when OpenCL is not available on a given runtime. (Remember, my Java program, the binary, runs on any machine, even ones not invented yet.)
Thus, there is a philosophical issue. What I would rather see is More Cores Please. Conventional cores. Conventional architecture programming. It seems that if you had several hundred cores that were more general purpose rather than specialized for graphics, this would STILL benefits graphics. But in a much more general way.
Let me give an example of a problem that would require serious thinking for OpenCL. A Mandelbrot set explorer. My current Mandelbrot set explorer (in Java) uses arbitrary precision. Thus it does not "peter out" once you dive deep enough to exhaust the precision of a double (eg, 64 bit float). By allowing arbitrary precision math, you can dive deeper and deeper. A Mandelbrot explorer is another embarrassingly parallel problem. "Work units" could even be distributed out to other computers on a network. You just need to launch a JAR file on each node. (And those nodes don't even have to be the same CPU architecture or OS.) In a single kernel in OpenCL, I would need to be able to iterate X number of times on a pixel, using arbitrary precision math, within the bounds of how kernels work. Multiple parameters, each parameter being a buffer (an array) where different concurrent kernels operate on different elements in the parameter buffers.
It seems that with all the silicon we have now, maybe it's time to start building larger numbers of general purpose cores. This would much more rapidly produce benefits in far more every day applications than Open CL. IMO.
Don't put a mindless tool of corporations in the white house; vote ChatGPT for 2024!