Submitted via IRC for takyon
Ampere is launching two versions of its first ARM-based 64-bit server processor today in a challenge to Intel's dominance of data center chips.
Intel dominates about 99 percent of the server chip market with its x86-based processors, but Ampere is targeting power-efficient, high-performance, and high-memory capacity features with its Ampere eMAG processors for data centers.
Renee James, former president of Intel and CEO of Ampere, said in an interview with VentureBeat that customers can now order the chip from the company's website. The chips are aimed at hyperscale cloud and edge computing, using the ARMv8-A cores. The chips target big data and in-memory databases.
[...] Based on the SPECint benchmark performance, Ampere's eMAG processor can deliver about twice the performance of the Intel Xeon Gold 6130 processor at about the same price, the company said. The eMAG with 32 cores and 3.3 Ghz in performance will sell for $850, and with 16 cores at 3.3 GHz will sell for $550.
[...] Ampere designed its cores, which feature eight DDR4-2667 memory controllers, 42 lanes of PCIe 3.0 for high bandwidth I/O, 125W TDP for maximum power efficiency, and a 16-nanometer FinFET manufacturing process at contract manufacturer TSMC.
Previously: Former Intel President Launches New Chip Company With Backing From Carlyle Group
(Score: 4, Insightful) by DannyB on Friday September 21 2018, @02:43PM
I've seen that future coming for over ten years and have been preparing for it. I've even advocated for that approach here on SN.
I happen to (mostly) work in Java. The first significant current code I dabbled with was in Clojure (on the JVM). Clojure has a great concurrency story.
Next I used some of the new Java frameworks which make it easy to use worker pools and fork/join. For certain operations I can see substantial improvements in performance as well as getting all the cores to light up. Just as with anything that needs to perform well, you allocate all your structures up front, then run your performance critical code which (ideally) should not generate any garbage. This same lesson would apply to other GC languages.
Your JVM code hits three performance stages as it "warms up":
1. interpreted bytecode until the dynamic profiler notices it is using all the CPU
2. compiled to native by the C1 compiler (fast compiler generates simiple native code)
3. the previous step, queued your code to get shortly recompiled by the C2 compiler (slow compiler generates highly optimized native code)
The native code from C2 is targeted at your specific hardware, eg, whatever microprocessor you have, it's instruction set extensions, etc. Things an ahead-of-time compiler cannot do and be cross-platform to all processors.
Every performance optimization is a grate wait lifted from my shoulders.