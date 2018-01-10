from the doesn't-raid-fix-this? dept.
Arthur T Knackerbracket has found the following story:
In 2015, Microsoft senior engineer Dan Luu forecast a bountiful harvest of chip bugs in the years ahead.
"We've seen at least two serious bugs in Intel CPUs in the last quarter, and it's almost certain there are more bugs lurking," he wrote. "There was a time when a CPU family might only have one bug per year, with serious bugs happening once every few years, or even once a decade, but we've moved past that."
Thanks to growing chip complexity, compounded by hardware virtualization, and reduced design validation efforts, Luu argued, the incidence of hardware problems could be expected to increase.
This month's Meltdown and Spectre security flaws that affect chip designs from AMD, Arm, and Intel to varying degrees support that claim. But there are many other examples.
(Score: 1, Informative) by Anonymous Coward on Wednesday January 31, @05:18PM (2 children)
This kind of Microsoft spam is not needed on the main page. Save that shite for personal journals. We really don't gain anything from hearing from Microsoft's marketeers unless they are paying for product placement. But I don't see that anywhere in the summary. So until then leave out the spam. If the topic is based in reality then it will have been covered already elsewhere. Cite those sources instead.
(Score: 0, Insightful) by Anonymous Coward on Wednesday January 31, @06:02PM
(Score: 4, Insightful) by MrGuy on Wednesday January 31, @06:07PM
Your premise is mistaken. TFA is an article on The Register. The Register article begins with a two-sentence quote form a Microsoft engineer, and a one-sentence summary of his point, but that's it. The rest is original reporting.
I'd object to this article because it's so darn elementary - yes, chips can have bugs, and Spectre/Meltdown aren't the only chip bugs out there. The article is a few quotes sprinkled with a list of a few recent flaws. But there's no interesting analysis. It's basically "here are some recent bugs." It would be awesome to have an article making a case WHY bugs might be more frequent now than in the past - other than the quote from Luu, the article offers no real support for that position. This article seems like it was written by someone who doesn't really understand the subject and has nothing really to say (some would argue that's hardly unique on El Reg) - when your best argument is a three-year old blog post from someone ELSE who might know what they're talking about, you should be asking why this article matters...
(Score: 1, Interesting) by Anonymous Coward on Wednesday January 31, @05:27PM (12 children)
This is true. I did some work on x64 (Itanium) and later the Pentium D. Most of the problems in our labs were due to faulty hard disks (the spinning kind). I headed Linux validation for servers around 2000--hopefully that helped?
(Score: 0) by Anonymous Coward on Wednesday January 31, @05:42PM (4 children)
x64 is marketing speak for x86-64. Itanic was a whole different disaster.
(Score: 0) by Anonymous Coward on Wednesday January 31, @06:02PM (1 child)
Yep, I meant ia64. God sucks as a validation engineer for my brain.
(Score: 1, Funny) by Anonymous Coward on Wednesday January 31, @08:00PM
He put those bugs in there so that he could exploit them later.
(Score: 1, Interesting) by Anonymous Coward on Wednesday January 31, @06:41PM (1 child)
x86-64 was AMD's original name
x86_64 was chosen by Linux people, based on the above
IA-32e was Intel's fucked-up name for it, since IA-64 was already taken by Itanium
AMD64 was AMD's response to Intel's attempt to claim the architecture as IA-32e
x64 was Microsoft's attempt to pick something simple and neutral
(Score: 0) by Anonymous Coward on Wednesday January 31, @09:40PM
Fuck this. ARM64 for everything.
(Score: 2) by JoeMerchant on Wednesday January 31, @06:51PM (6 children)
Race to the bottom? I mean, can we at least have a reasonable option for a validated processor that works, and works correctly, instead of one that runs 10% faster but has bugs? Put another way, if there were 2 notebook PCs at NewEgg, identical in every way except that one had 2.4GFlops effective throughput on a typical task load - with 99.999% validated design, and another with 1.8GFlops performance on the same test, but with 99.99999% validated design - isn't there a market for the more reliable machine?
(Score: 1, Informative) by Anonymous Coward on Wednesday January 31, @06:57PM (2 children)
It's nowhere near that simple. They paid for a lot more expensive people (like me) for Xeon and Itanium validation than consumer stuff. Try ECC I guess? Don't overclock (and especially don't over-volt) your stuff! I ran a lab at Intel that did high temperature, high voltage stress tests on consumer (Pentium D), and we saw lots of errors. They basically died over a few months.
(Score: 3, Insightful) by JoeMerchant on Wednesday January 31, @10:09PM (1 child)
Well, on the one hand, you (and I) are "expensive," but when that cost is spread out over millions of copies it's not nearly as much, and I guess what worries me the most is the dismantlement of the validation program, because those things are a lot harder to set up than they are to keep running.
(Score: 0) by Anonymous Coward on Thursday February 01, @01:27AM
Intel has beefed up validation after various issues--we didn't lack for money in the department. You mention spreading cost out--that's why server chips are so expensive. You have expensive people like me validating chip designs that are sold in fewer quantifies than the latest Android.
(Score: 2) by MostCynical on Wednesday January 31, @09:57PM
what the market needs is a new certification:
"This chip has been validated by the NSA"
Or, for pcs and laptops, just a nice "NSA Certified" sticker.
(Score: tau, Irrational)
(Score: 1) by khallow on Thursday February 01, @02:35AM (1 child)
How would validation catch the Spectre [wikipedia.org] bug? It's derived from subtle observation of memory caching and timing delays of the cache queues. Can't validate what you don't know you need to validate. Even if the CPU manufacturers fully fix this one, how will we validate all possible interactions of the internal components of the CPU?
(Score: 2) by JoeMerchant on Thursday February 01, @04:03AM
In our industry we have a fancy acronym that means: get a bunch of people who know something about the issues, force them to sit in a room and seriously consider them at least long enough to write a report and file it. Lately, there's a lot of handwringing around cybersecurity, and I'm constantly pinged by the junior guys who get worried about X, Y, or Z - and 9 times out of 10 it's nothing, but once in a while they bring up a good point, and some of those good points are things like Spectre - things nobody had considered before. Our development process on a single product goes on for a couple of years, the process calls for these cybersecurity design reviews periodically throughout those years, and over that time people do actually come up with this stuff. So, our reports analyze X, Y, and Z, and either write them off as adequately handled, or shut down the project until they are.
The real problem is culture - like the Shuttle launch culture that couldn't be stopped for handwringing over ice in the O-rings, or a big corporate culture that doesn't want to pay its own engineers to discover vulnerabilities in the product early enough to fix them before the rest of the world.
I just gave a mini-speech today that included: "it needs to be tested, if we don't test it our customers will."
No, you can't - but, as world leading experts in the field you should be able to figure out most of the things you need to validate before the world figures them out for you. In the case of processors that serve separate users partitioned by hypervisor, the industry could have (and likely did) think of this exploit before the hacker community. As soon as they thought of it, they should have (and likely did not) feed that knowledge back into the design process to work out effective fixes for the next generation of processors.
(Score: 0) by Anonymous Coward on Wednesday January 31, @05:37PM (6 children)
It seems to be mostly an intel issue at this point. I really never had any opinion on cpus/gpus either way but seeing the recent PR attempt to muddy the waters has turned me anti-intel. Maybe people who matter to them aren't thinking the same, but it seems like a dangerous strategy. They are clearly not to be trusted.
(Score: 2) by MrGuy on Wednesday January 31, @06:21PM (2 children)
Citation needed.
First of all, Spectre and Meltdown are different. You can read details here [meltdownattack.com]
Spectre is a flaw where "speculative execution" can leak information (this is where a processor executes a branch of code that MIGHT be needed, but only in theory stores the result if it matters). The problem with speculative execution is that it's not checked whether a given command SHOULD be executed (for example, if the program has the right access level to execute the code). However, this security issue wasn't seen as a problem, because (in theory) the result of the speculatively executed code would be thrown away if it couldn't be used. So, it might be a mechanism to let untrusted code access core kernel memory (which is Very Bad), but it was thought to be acceptable because nobody could see the result. The problem is that CPU caching could "leak" those results and be visible to other code.
Spectre affects pretty much ALL manufacturers chips - the official paper [spectreattack.com] explicitly references Intel, AMD, and ARM architectures as being affected.
Meltdown is different - it's a "sideband" attack on kernel memory that relies on using the side effects of certain legal, carefully crafted code and information about the location and layout of memory to "leak" information, including kernel memory. Meltdown does not require the use of speculative execution to leak memory.
The proof of concept attack for Meltdown detailed officially [meltdownattack.com] only works against Intel hardware, but the paper specifically cautions that there's no reason to expect that AMD wouldn't be suseptible to a similar attack.
(Score: 0) by Anonymous Coward on Wednesday January 31, @07:05PM
All people really care about is meltdown since patching for spectre seems to have minimal impact on performance. It is to the point where meltdown mitigations are being needlessly enabled for amd processors just to not make intel look so bad[1]. AMD says:
https://www.amd.com/en/corporate/speculative-execution [amd.com]
I had no preference either way until I investigated this topic and saw what looks like a massive shady pro-intel propaganda campaign.
[1] https://www.phoronix.com/scan.php?page=article&item=linux-retpoline-benchmarks&num=1 [phoronix.com]
(Score: 1, Informative) by Anonymous Coward on Wednesday January 31, @09:12PM
No, Meltdown is not applicable to AMD processors. AMD has already stated they do bounds checking when userland asks to read kernel memory to prevent this sort of thing. Something Intel inexplicably didn't think of or totally screwed up.
Also, there is a "near zero" chance that Spectre variant 2 can be exploited on AMD processors. It sounds like both AMD and Intel are equally impacted regarding variant 1. Spectre is far more difficult to take advantage of in general.
So yes, this is primarily an Intel problem.
(Score: 3, Informative) by HiThere on Wednesday January 31, @07:23PM (1 child)
Depends.
Meltdown, the currently known dangerous one, is definitely Intel and possibly a few other Intel designed chips.
Spectre, the one that is *relative* harmless, so far, if present in both Intel and Amd...except, a few really low end models.
Meltdown has currently known exploits that can work through the browser if you allow Javascript. It also has several other exploit modes.
Spectre doesn't *yet* have any known useful exploits. But it almost certain will.
P.S.: I'm not an expert here, there are several classes of Spectre, and I can't distinguish between them. If you're interested there's lots of info on the web, but unless you're working in the field distinguishing between them doesn't seem useful to me.
Put not your faith in princes.
(Score: 0) by Anonymous Coward on Wednesday January 31, @07:28PM
The reason to distinguish between them for the average person is the performance impact of the mitigation. Everyone expects a constant stream of bugs/vulns these days anyway, but not that patching for them will slow everything down to half speed or whatever. That is where intel has the main problem (according to what I've read).
(Score: 2) by Reziac on Thursday February 01, @03:52AM
Back when I was keeping track, and when both released Errata (functionally, the list of known bugs), AMD's errata list was generally about 3 times as long as Intel's. AMD dealt with this by not releasing any more errata lists.
(Score: 3, Informative) by takyon on Wednesday January 31, @06:58PM
http://www.zdnet.com/article/amd-vs-spectre-our-new-zen-2-chips-will-be-protected-says-ceo/ [zdnet.com]
https://wccftech.com/amd-zen-2-cpus-fix-spectre-exploit/ [wccftech.com]
The 12nm Zen+ is coming out this year, 7nm Zen 2 coming out next year presumably. Some were predicting that Spectre would be lingering in upcoming chip generations since it can just be addressed with a patch, but that's mostly not the case.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 3, Insightful) by maxwell demon on Wednesday January 31, @08:03PM (3 children)
Those reasons are excusable.
This one isn't.
The Tao of math: The numbers you can count are not the real numbers.
(Score: 0) by Anonymous Coward on Wednesday January 31, @08:28PM
I'd add to that that the former two are only an issue because of the third.
(Score: 2, Interesting) by shrewdsheep on Wednesday January 31, @09:40PM (1 child)
In defence of the chip designers, cycle-by-cycle emulation of new chip designs is becoming more and more difficult due to complexity. Emulating just a single boot-up can take weeks.
(Score: 2) by maxwell demon on Wednesday January 31, @09:59PM
That is an argument for the same effort being less effective. It is not an argument for reducing the effort. Quite the opposite!
The Tao of math: The numbers you can count are not the real numbers.
(Score: 3, Informative) by requerdanos on Wednesday January 31, @11:43PM
Intel is taking a lot of heat lately, but all the first-run Ryzen processors from AMD have a bug that causes random segfaults, especially when compiling under linux (a not uncommon occurrence if one likes to linux).
Here is an actual tech support letter I received from AMD. Some identifying information has been changed or obscured, otherwise it's 100% as I received it.
That's right, their answer was basically "pics or it didn't happen." I am working to comply with their request. Also, they sent me this before linux 4.15 was released, wanting to know what temperature was reported--and 4.15 is the first kernel version to feature Ryzen CPU temperature reporting.
(Score: 2) by MichaelDavidCrawford on Thursday February 01, @12:04AM
I thought my client's source resulted in bad machine code. I did not at first know the AM64 ABI so I read up on it.
The ABI has this mad as a cut snake feature in which subroutines can use up to 128 bytes of stack without adjusting the stack pointer.
But that same doc informed me that the red zone had to be disabled for Linux kernel code.
I puzzled over how to do that in Mac OS X then discovered Xcode's kernel mode setting. Just enable kernel mode and the AMD64 madness happily goes away.
However set up that Xcode project clearly was not a kernel or driver developer.
127.0.0.1 www.hosted-pixel.com # I Am Absolutely Serious
