Submitted via IRC for TheMightyBuzzard
Ampere, a new chip company run by former Intel president Renee James, came out of stealth today with a brand-new highly efficient Arm-based server chip targeted at hyperscale data centers.
The company's first chip is a custom core Armv8-A 64-bit server operating at up to 3.3 GHz with 1TB of memory at a power envelope of 125 watts. Although James was not ready to share pricing, she promised that the chip would offer unsurpassed price/performance that would exceed any high performance computing chip out there.
The company has a couple of other products in the works as well, which it will unveil in the future.
Source: TechCrunch
(Score: 3, Interesting) by takyon on Wednesday February 07 2018, @06:52PM (6 children)
What's the core count? How does it beat Xeons, Epyc, and other ARM server chips (such as the 48-core Qualcomm Centriq)?
The story also implies that this unreleased ARM chip is affected by Spectre:
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: -1, Spam) by Anonymous Coward on Wednesday February 07 2018, @08:10PM
She screamed. Oh, she screamed. In response, the sound of little boys cheering was heard. Chairs. A tile floor. A chalk board. It was a classroom.
The man was vigorously moving his hips and slamming his fist into the woman's face. It might be more accurate to say that he could stop neither his fists nor his hips. The woman screamed for help. However, the children only cheered. Rather than caring about the woman's well-being, it would be more accurate to say that the boys were actively cheering for her demise.
Every time the fist collided with the woman, mankind took a microscopic step forward towards a future where men's rights were respected. The children knew this, which is why they were so excited. A bright future awaited them.
When silence finally descended upon the woman, the children could no longer contain their excitement. Endless cheers and clapping were heard from within the classroom. The woman's motion had been completely replaced by the children's desire for freedom.
(Score: 2) by DannyB on Wednesday February 07 2018, @08:27PM (4 children)
It doesn't necessarily have to beat Xeons. It just needs to have reasonable performance, and a lower cost to achieve the same level of performance as the Xeon. The foregoing statement only holds true for applications where you have LOTS of chips1.
It probably does need to beat Centriq, or come close. Again, cost, both initial capital, and operational, and floor space square meter costs do matter.
1eg, not a desktop, where it simply is not acceptable to replace one chip with, say two chips, or 8 cores with 16 cores. Single thread performance matters. But not as much for large cluster applications where you simply tell Kubernetes to add a few hundred extra nodes to service your workload.
The lower I set my standards the more accomplishments I have.
(Score: 2) by TheRaven on Thursday February 08 2018, @09:49AM (3 children)
sudo mod me up
(Score: 2) by DannyB on Thursday February 08 2018, @03:11PM
Interesting idea. Never thought of that.
Maybe have one socket (with one or more cores) per security domain.
The idea being that even if you can Spectre / Meltdown to peek kernel memory, you can only learn secrets related to the security domain your attack code is executing in. On a multi-tenant cloud system, you can't learn secrets about other customers. Or on a Google like system, you might have successfully attacked, say, blogger nodes, but you wouldn't ever see processes from say, YouTube, or Gmail, to contrive an example.
The lower I set my standards the more accomplishments I have.
(Score: 2) by frojack on Thursday February 08 2018, @09:20PM (1 child)
It might be easier to stop speculative execution by simply not building it into the processor in the first place.
I'd like to see what percentage of typical job time is saved by speculative execution.
If it were all that great why not build that functionality into the compilers, and spend an extra two minutes in optimization at compile time and avoid the risk?
If its sot significant, just figure out how much faster the clock speed needs to be to make up for it.
No, you are mistaken. I've always had this sig.
(Score: 2) by TheRaven on Monday February 12 2018, @12:35PM
On a modern Intel chip, you have up to around 180 instructions in flight at a time. The typical heuristic is that you have, on average, a branch every 7 instructions. Every instruction between the branch being issued and the instruction before it that provides the branch condition reaching writeback is speculative. This means that, on average, around 96% of your instructions are speculatively executed.
On simpler pipelines, the number is a lot lower. A simple 7-state in-order pipeline is only speculatively executing around 50% of its instructions. So, if you disable speculative instructions entirely then you'll take a 50% performance hit on simple (read: slow) pipelines or around a 96% performance hit on high-end pipelines in the worst case. It isn't quite that bad in the average case, because (as these vulnerabilities showed) speculative execution isn't perfect, so you won't see a difference between not doing speculation and the cases where you'd see incorrect speculation. I'd expect that on a simple in-order core you'd only see around a 30% performance hit and on a high-end Intel core around an 80% hit.
That said, we only do speculative execution because most code is written in languages like C that don't provide enough high-level parallelism to keep a CPU busy. If you were to design a CPU to run a language with an abstract machine like Erlang, then you could get away without speculative execution by running instructions from another thread.
If the compiler could statically determine branch targets, then it wouldn't bother inserting branches. You can do the classical GPU approach and execute both branches and then discard the results that you don't want, but then you end up seeing performance drop by 50% for each conditional branch.
Faster than you can build, and a lot faster than you can cool (I forget the exact relationship between power consumption and clock rate, it's either square or cube - this is why hardly anything runs at over 2GHz). For a modern Intel chip to reach the same performance without speculative execution, you'd need to go around 10-20GHz, which no one has come close to being able to build (at least, not in anything that didn't run briefly with liquid nitrogen poured on it before burning out).
sudo mod me up