from the starting-off-the-new-year-right dept.
Spotted over on HN:
The mysterious case of the Linux Page Table Isolation patches (archive)
tl;dr: there is presently an embargoed security bug impacting apparently all contemporary CPU architectures that implement virtual memory, requiring hardware changes to fully resolve. Urgent development of a software mitigation is being done in the open and recently landed in the Linux kernel, and a similar mitigation began appearing in NT kernels in November. In the worst case the software fix causes huge slowdowns in typical workloads. There are hints the attack impacts common virtualization environments including Amazon EC2 and Google Compute Engine, and additional hints the exact attack may involve a new variant of Rowhammer.
Turns out 2018 might be more interesting than first thought. So grab some popcorn and keep those systems patched!
Related Stories
UPDATE 2: (martyb)
This still-developing story is full of twists and turns. It seems that Intel chips are definitely implicated (AFAICT anything post Pentium Pro). There have been various reports, and denials, that AMD and ARM are also affected. There are actually two vulnerabilities being addressed. Reports are that a local user can access arbitrary kernel memory and that, separately, a process in a VM can access contents of other virtual machines on a host system. These discoveries were embargoed for release until January 9th, but were pre-empted when The Register first leaked news of the issues.
At this time, manufacturers are scrambling to make statements on their products' susceptibility. Expect a slew of releases of urgent security fixes for a variety of OSs, as well as mandatory reboots of VMs on cloud services such as Azure and AWS. Implications are that there is going to be a performance hit on most systems, which may have cascading follow-on effects for performance-dependent activities like DB servers.
To get started, see the very readable and clearly-written article at Ars Technica: What’s behind the Intel design flaw forcing numerous patches?.
Google Security Blog: Today's CPU vulnerability: what you need to know.
Google Project Zero: Reading privileged memory with a side-channel, which goes into detail as to what problems are being addressed as well as including CVEs:
Arthur T Knackerbracket has found the following story:
Qualcomm has confirmed its processors have the same security vulnerabilities disclosed this week in Intel, Arm and AMD CPU cores this week.
The California tech giant picked the favored Friday US West Coast afternoon "news dump" slot to admit at least some of its billions of Arm-compatible Snapdragon system-on-chips and newly released Centriq server-grade processors are subject to the Meltdown and/or Spectre data-theft bugs.
[...] Qualcomm declined to comment further on precisely which of the three CVE-listed vulnerabilities its chips were subject to, or give any details on which of its CPU models may be vulnerable. The paper describing the Spectre data-snooping attacks mentions that Qualcomm's CPUs are affected, while the Meltdown paper doesn't conclude either way.
[...] Apple, which too bases its iOS A-series processors on Arm's instruction set, said earlier this week that its mobile CPUs were vulnerable to Spectre and Meltdown – patches are available or incoming for iOS. The iGiant's Intel-based Macs also need the latest macOS, version 10.13.2 or greater, to kill off Meltdown attacks.
(Score: 4, Interesting) by jmorris on Tuesday January 02 2018, @03:31AM (7 children)
If this thing really does impact the cloud and the only mitigation imposes up to a 50% performance penalty, many interesting questions arise.
1. If this could be fixed with a microcode update it would be. And if it is really a Rowhammer attack, none of this is going to stop it, only mitigate. So even a CPU recall isn't likely to help. Perhaps new hardened memory modules, probably overvolted and slowed down? Big memory shortage going on currently. Hmm.
2. I'd wonder if this will cause a reevaluation of the wisdom of cramming VMs belonging to different entities on the same host but that would be silly. Of course not. If people were capable of thinking those thoughts they would have never done it in the first place.
3. I see a huge surge of new rackspace being populated and filled. [disclaimer]This is not investment advice, consider the risks of any investment strategy, etc.. [/disclaimer]
(Score: 2, Informative) by Anonymous Coward on Tuesday January 02 2018, @05:35AM (4 children)
JEDEC has already standardized a hardware mitigation for rowhammer. It is called Targeted Row Refresh and is currently an optional part of the LPDDR4 standard, although manufacturers have added it to other memory modules (which is technically a violation of the standard). Basically, the way it works is that a memory module specifies the maximum number of times a "row" or its neighbors can be accessed between refreshes. If the threshold is met, the row cannot be accessed again until after a refresh (which the module can choose to force at that point).
(Score: 0) by Anonymous Coward on Tuesday January 02 2018, @08:14AM (1 child)
Which modules? EEC and double refresh are ineffective mitigation. [futureplus.com]
(Score: 0) by Anonymous Coward on Tuesday January 02 2018, @05:23PM
For example, https://www.skhynix.com/products.do?lang=eng&ct1=36&ct2=37&rc=com [skhynix.com] But you really need to check the specs yourself for now and test yourself. Some manufacturers leave it disabled (as it is optional) in LPDDR4 and others tack it on DDR4. Also, some implementations, like Micron's, have been shown to have too high of limits or do not properly implement TRR and MAC.
(Score: 0) by Anonymous Coward on Tuesday January 02 2018, @08:42AM (1 child)
Ineffective [arxiv.org]
(Score: 0) by Anonymous Coward on Tuesday January 02 2018, @04:52PM
Sigh, didn't read the paper. They didn't try TRR (which is citation 33). They instead dismissed it on page 4 with:
And the other citation [60] tested a device with TRR and MAC disabled for performance reasons, as it is an optional extension to LPDDR4.
(Score: 3, Interesting) by Dr Spin on Tuesday January 02 2018, @09:54AM (1 child)
Rowhammer vulnerability implies DEFECTIVE HARDWARE. If you CPU is vulnerable to this, it is goods not of merchandisable quality ie unfit for the purpose for which is is sold (computing), and should be replaced without charge by the manufacturer (Intel). If not, its probably because you live in a country without adequate protection for consumers (eg USA) and (possibly violent*) protests may be required to get the law changed.
*Ask the NRA for legal advice
Warning: Opening your mouth may invalidate your brain!
(Score: 2) by Dr Spin on Tuesday January 02 2018, @10:04AM
OP: I had thought the Rowhammer failure was in the cache memory - it appears to be in main memory (at least in this context). In that case it is the memory manufacturer selling duff product, not the CPU manufacturer.
Warning: Opening your mouth may invalidate your brain!
(Score: 1, Interesting) by Anonymous Coward on Tuesday January 02 2018, @03:41AM
You only have to guess 9 bits of the address space at a time if you can go level by level.
You could guess kernel addresses if faulting on a unmapped kernel address takes a different amount of time than faulting on a mapped kernel address. An address in a big unmapped area might fault when relatively significant bits are checked, while an address that is valid might not fault until permission bits are checked.
Stuff related to performance monitoring and user-chosen LDT entries changed quite a bit. One or both of these may have been used to reveal addresses. Performance monitoring hardware has the ability to have the CPU write a log into a buffer; this could be mapped for the user. There is a way to map the LDT and/or a Linux-specific variation for the user as well.
(Score: 5, Insightful) by Knowledge Troll on Tuesday January 02 2018, @03:49AM (4 children)
The comments on the HN posting [ycombinator.com] are quite interesting. One of them referenced a patch that excludes the mitigation on AMD CPUs [lkml.org] because of the implementation that AMD uses:
(Score: 2) by RS3 on Tuesday January 02 2018, @03:04PM (1 child)
So buy AMD stock now, right?
(Score: 3, Interesting) by DannyB on Tuesday January 02 2018, @07:54PM
Buy AMD processors now.
Intel processors suffer from the vulnerability. If AMD processors do not, then I also observate . . .
Intel is strongly associated with the Management Engine, a huge vulnerability disguised as a "feature".
Gee, could this vulnerability be deliberate? A very deep obscure way to hack Linux and possibly other OSes?
Paid for by Americans for Renewable Complaining and Sustainable Whining.
People today are educated enough to repeat what they are taught but not to question what they are taught.
(Score: 2) by LoRdTAW on Tuesday January 02 2018, @06:31PM
And an update: https://lkml.org/lkml/2017/12/27/2 [lkml.org]
(Score: 2) by takyon on Wednesday January 03 2018, @06:17AM
Suddenly, any performance advantage an Intel chip had over AMD just disappeared.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 1, Troll) by Anonymous Coward on Tuesday January 02 2018, @04:34AM (8 children)
It screws over the pleb users by intentionally leaving them unable to defend themselves.
If your organization can't function if a single component is compromised and needs to be temporarily shut down, then your organization is incompetent. It's only a shame that this self-reinforcing bullshit has led to a culture where almost everybody has such single points of failure. If researchers were truly responsible and publicly disclosed flaws forthwith following discovery/confirmation then this culture of "we literally can't do business without this one specific library/OS" wouldn't have emerged and a bug in amd64 would lead to nothing more than the RISC-V servers being spun up until a patch was released.
As it stands we all just keep running with the vulnerability, hoping nobody else has figured it out, and that not having an alternative database/web-server/processor won't be viewed as grossly incompetent as it ought be.
Redundancy is the solution to actually responsible immediate and public disclosure.
Having fifty identical servers running fifty copies of Apache in fifty countries isn't redundancy, it's a single point of failure in the ISA/OS/webserver/&c.
(Score: 3, Interesting) by Anonymous Coward on Tuesday January 02 2018, @05:32AM
But how are monoculture pushers going to get lots of money if you spread over different vendors?
Oh, the humanity, the software makers would have to stick to common things and agree on them, instead of steamrolling everything with own policy. And they would have to test 32, 64, little endian, big endian... and above all, compete with others. You damn communists! /s
Yes, that is a veiled hit at RH and systemd (you probably had the MS one in mind already... worst monoculture). We have lost enough CPU archs already (SPARC, Alpha, and no yet dead but not very used Power), and we seem in the path to kill some OSes (after all the old classic Unix) or just make them poor copies of the one True Carmine Penguin, second class citizens. That must stop, or the "correct" plague will be big trouble.
(Score: 2, Insightful) by Anonymous Coward on Tuesday January 02 2018, @05:36AM (4 children)
So AWS, Azure etc. should be able to handle shutting down all their DRAM? I dislike single points of failure, but realistically avoiding that means avoiding standardization and really over complicates things.
(Score: 4, Interesting) by Anonymous Coward on Tuesday January 02 2018, @06:06AM (2 children)
There ought exist at least two types of memory which share an interface yet are implemented differently enough that vulnerabilities are very unlikely to be shared. If this was the case then they could literally just shut down the machines with the vulnerable kind, swap those sticks out, and bring them back up. Same interface, different internal details.
(Score: 2) by shortscreen on Tuesday January 02 2018, @10:08AM (1 child)
Oh! I know! What if they switch back to RAMBUS?
(Score: 4, Funny) by LoRdTAW on Tuesday January 02 2018, @01:39PM
Too late. They already patented the vulnerability and are working to release it in their next spec.
(Score: 0) by Anonymous Coward on Tuesday January 02 2018, @12:29PM
Yes. Yes they should. This is the cloud were talking about. Just compute around it like all their ads say they do.
(Score: 2) by Wootery on Tuesday January 02 2018, @11:14AM (1 child)
RISC-V servers? Aren't we getting rather ahead of ourselves?
(Score: 2, Funny) by Anonymous Coward on Tuesday January 02 2018, @12:31PM
I'm willing to take that risc ;-)
(Score: 4, Insightful) by arcz on Tuesday January 02 2018, @04:48AM (25 children)
(Score: 4, Insightful) by Azuma Hazuki on Tuesday January 02 2018, @05:04AM (4 children)
This could be thought of as a logical bug, no? In other words, everything is syntactically correct and working as expected, but real-world usage produces unwanted results? I'm not a programmer and most of the linked article was juuuuuust with my comprehension, but the whole thing had me making painful noises throughout. This is *bad.*
I agree with the analyst that this is probably something that affects virtualization and VM separation, which is why it's being worked on in such secrecy and with such haste. The scary thought is that this is a hardware thing, an emergent behavior from the interplay of software and hardware, rather than just buggy code...
I am "that girl" your mother warned you about...
(Score: 2, Interesting) by Anonymous Coward on Tuesday January 02 2018, @07:19AM (3 children)
Perhaps it just is not possible for mere humans to think of every possible angle of attack and test for it.
Bring on AI?
(Score: 2) by takyon on Tuesday January 02 2018, @07:27AM
https://blogs.microsoft.com/ai/ai-for-security-microsoft-security-risk-detection-makes-debut/ [microsoft.com]
https://www.theregister.co.uk/2017/02/15/rsa_crypto_panel/ [theregister.co.uk]
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by unauthorized on Tuesday January 02 2018, @03:02PM
There is no such thing as an AI in the real world, at least if you define AI as a human-designed construct capable of independently interpreting arbitrary set of data and generating useful new ideas from it.
Oh, it absolutely is possible, it just costs a lot more. If you choose to only buy the latest and greatest, you get the trailblazing product you paid for. There is no market interest in building safe hardware or developing safe software.
(Score: 2) by Azuma Hazuki on Tuesday January 02 2018, @10:28PM
In theory it is, though you very soon end up in "infinite monkeys" territory. In practice I think you're right. Though, what about "formally validated" hardware?
I am "that girl" your mother warned you about...
(Score: 2, Interesting) by Anonymous Coward on Tuesday January 02 2018, @05:48AM (11 children)
> This isn't even a bug at all. It's a side-channel attack. Side channel attacks are not bugs. Not every security issue results from a bug. Stupid summary.
The idea is that there is likely an undisclosed hardware bug that can be exploited if you know the physical addresses (Like Row-hammer). These physical addresses can be recovered via the side channel attack these patches are mitigating.
Leaking physical addresses might not be considered a bug, but that does not mean side channel attack vulnerability is not a bug! You can't categorically say side channel data leakage bugs are not bugs: If the linux kernel leaked the root password to user space via some timing side channel, it would be a bug.
(Score: 2) by arcz on Tuesday January 02 2018, @08:45AM (10 children)
(Score: 3, Insightful) by Dr Spin on Tuesday January 02 2018, @10:01AM (8 children)
"Its not a bug, its a feature" is not a defence. If a plane falls out of the sky because it is Thursday, whether or not the manufacturer told you their planes are not reliable on Thursdays, it is not an adequate defence - even if they claim that they tested it in February, and it often works on Thursdays in February.
If the hardware shows Rowhammer vulnerability: it is faulty, and should be returned to the manufacturer no software work arounds.
Warning: Opening your mouth may invalidate your brain!
(Score: 2) by Wootery on Tuesday January 02 2018, @11:25AM (5 children)
Way to ignore the way imperfections scale in computer systems.
An imperfection in the design of a shovel, causes some percentage of your customers to (rightly) ask for a refund when their shovel breaks.
CPUs are not like shovels. Your position appears to be that unless the CPU is perfect, all customers are entitled to a refund. This is clearly absurd.
Want a screw that will never rust or fail unexpectedly? You can get those, they're called medical-grade, and they cost vastly more than ordinary screws. Want a monitor with a guarantee of zero dead pixels or stuck pixels? The same thing applies.
You seem to want to hold consumer CPUs to the medical grade standard, without trading off against performance or cost. That just isn't realistic. If you regulate consumer-grade CPUs the way you regulate medical equipment, you kill the consumer CPU industry overnight. Formally-verified CPUs can indeed be made... at incredible expense.
(Score: 2) by Wootery on Tuesday January 02 2018, @11:28AM (2 children)
Oops, forgive the double reply: my example at the end there doesn't hold, as formal verification doesn't tend to help with side-channel issues like this. Real-world CPU perfection is even harder than that!
(Score: 2) by arcz on Wednesday January 03 2018, @05:11AM (1 child)
(Score: 1, Informative) by Anonymous Coward on Wednesday January 03 2018, @04:06PM
This is the specific issue we're discussing...
It seems they actually discovered a bug not in the x86 prefetch instructions (which could be fixed with microcode) but in Intel's speculative execution.
As to side-channel attacks, you should at least read up on anc [vusec.net] if not all 10 of the papers in the guys PHD thesis - "Software-based Microarchitectural Attacks". [gruss.cc]
(Score: 2) by arcz on Wednesday January 03 2018, @05:05AM (1 child)
(Score: 2) by Wootery on Wednesday January 03 2018, @10:23AM
That pricepoint still doesn't justify an expectation of flawless perfection, and they aren't being sold as critical-systems CPUs.
If you want a CPU appropriate for critical systems, there's a market for them. Most CPUs don't qualify, as they make different tradeoffs.
(Score: 2) by maxwell demon on Tuesday January 02 2018, @10:20PM (1 child)
OK, Company A produces wooden pedestrian bridges, advertised and sold as pedestrian bridges. Company B builds railroads and uses one of Company A's wooden pedestrian bridges as railroad bridge. The moment the first train drives over that bridge, it crashes down. Is it now Company A's fault that the bridge didn't withstand the weight of the train?
The Tao of math: The numbers you can count are not the real numbers.
(Score: 0) by Anonymous Coward on Wednesday January 03 2018, @05:02PM
No, it's Obama's fault.
(Score: 1, Informative) by Anonymous Coward on Tuesday January 02 2018, @10:06AM
Pedantic. A design error [securityfocus.com] is commonly called a bug, for these KAISER patches to exclude AMD [lkml.org] must mean there's something much more serious here...
(Score: 0) by Anonymous Coward on Tuesday January 02 2018, @12:06PM (4 children)
This is security sensationalism for you. Because all the truly severe stuff is a thing of the past everything that remains gets blown out of proportion.
(Score: 0) by Anonymous Coward on Tuesday January 02 2018, @12:44PM (3 children)
x86 prefetch instructions allow unprivileged processes to fetch privileged memory to cache but it appears there's a more severe and specific attack on Intel microarchitecture. Kindly do tell, what proportion is appropriate here?
(Score: 2) by LoRdTAW on Tuesday January 02 2018, @01:45PM
This is what happens when the poster doesn't understand the article, feels stupid, gets mad about feeling stupid and says something dumb.
(Score: 0) by Anonymous Coward on Tuesday January 02 2018, @04:30PM (1 child)
Severe: Remote code execution via malformed ping that anybody can do in their sleep (old school sploit shit).
Modern: Some cryptic stuff that nobody is going to bother doing on a large scale (low hanging fruit and all) except in highly targeted attacks (and if you're at the receiving end of one of these it doesn't matter what you do, you will get pwnt).
(Score: 0) by Anonymous Coward on Tuesday January 02 2018, @05:06PM
"math is hard". Let me guess, you were an expert in the ping of death for a whole weekend 20 years ago and now only really use your computer for online banking?
(Score: 2) by FatPhil on Wednesday January 03 2018, @04:01PM (2 children)
Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
(Score: 2) by arcz on Thursday January 04 2018, @07:58PM (1 child)
(Score: 2) by FatPhil on Sunday January 07 2018, @03:10PM
Rowhammer is a modification attack.
Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
(Score: 1, Insightful) by Anonymous Coward on Tuesday January 02 2018, @04:02PM (2 children)
Do we have a snazzy name and logo yet for this issue?
(Score: 2, Funny) by Anonymous Coward on Tuesday January 02 2018, @04:19PM
The only issue we know about is a side channel attack based on page table faults, it's pure speculation about Intel's speculative execution. Thomas Gleixner has already proposed an excellent snazzy name "Forcefully Unmap Complete Kernel With Interrupt Trampolines" [lkml.org] and the logo could be something like this? [wikimedia.org]
(Score: 2) by takyon on Wednesday January 03 2018, @06:19AM
Name: SellIntelStock
Logo: AMD logo
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by takyon on Wednesday January 03 2018, @06:47AM (1 child)
https://soylentnews.org/submit.pl?op=viewsub&subid=24128&title=Patch+for+Intel+Speculative+Execution+Vulnerability+Could+Reduce+Performance+by+5+to+35%25
If you have any other stuff you want in there, let me know.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 0) by Anonymous Coward on Wednesday January 03 2018, @04:10PM
takyon The cat has slipped the proverbial bag, you may as well link the papers? [soylentnews.org]