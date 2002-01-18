from the starting-off-the-new-year-right dept.
The mysterious case of the Linux Page Table Isolation patches (archive)
tl;dr: there is presently an embargoed security bug impacting apparently all contemporary CPU architectures that implement virtual memory, requiring hardware changes to fully resolve. Urgent development of a software mitigation is being done in the open and recently landed in the Linux kernel, and a similar mitigation began appearing in NT kernels in November. In the worst case the software fix causes huge slowdowns in typical workloads. There are hints the attack impacts common virtualization environments including Amazon EC2 and Google Compute Engine, and additional hints the exact attack may involve a new variant of Rowhammer.
Turns out 2018 might be more interesting than first thought. So grab some popcorn and keep those systems patched!
(Score: 4, Interesting) by jmorris on Tuesday January 02, @03:31AM (3 children)
If this thing really does impact the cloud and the only mitigation imposes up to a 50% performance penalty, many interesting questions arise.
1. If this could be fixed with a microcode update it would be. And if it is really a Rowhammer attack, none of this is going to stop it, only mitigate. So even a CPU recall isn't likely to help. Perhaps new hardened memory modules, probably overvolted and slowed down? Big memory shortage going on currently. Hmm.
2. I'd wonder if this will cause a reevaluation of the wisdom of cramming VMs belonging to different entities on the same host but that would be silly. Of course not. If people were capable of thinking those thoughts they would have never done it in the first place.
3. I see a huge surge of new rackspace being populated and filled. [disclaimer]This is not investment advice, consider the risks of any investment strategy, etc.. [/disclaimer]
(Score: 2, Informative) by Anonymous Coward on Tuesday January 02, @05:35AM (2 children)
JEDEC has already standardized a hardware mitigation for rowhammer. It is called Targeted Row Refresh and is currently an optional part of the LPDDR4 standard, although manufacturers have added it to other memory modules (which is technically a violation of the standard). Basically, the way it works is that a memory module specifies the maximum number of times a "row" or its neighbors can be accessed between refreshes. If the threshold is met, the row cannot be accessed again until after a refresh (which the module can choose to force at that point).
(Score: 0) by Anonymous Coward on Tuesday January 02, @08:14AM
Which modules? EEC and double refresh are ineffective mitigation. [futureplus.com]
(Score: 0) by Anonymous Coward on Tuesday January 02, @08:42AM
Ineffective [arxiv.org]
(Score: 1, Interesting) by Anonymous Coward on Tuesday January 02, @03:41AM
You only have to guess 9 bits of the address space at a time if you can go level by level.
You could guess kernel addresses if faulting on a unmapped kernel address takes a different amount of time than faulting on a mapped kernel address. An address in a big unmapped area might fault when relatively significant bits are checked, while an address that is valid might not fault until permission bits are checked.
Stuff related to performance monitoring and user-chosen LDT entries changed quite a bit. One or both of these may have been used to reveal addresses. Performance monitoring hardware has the ability to have the CPU write a log into a buffer; this could be mapped for the user. There is a way to map the LDT and/or a Linux-specific variation for the user as well.
(Score: 4, Interesting) by Knowledge Troll on Tuesday January 02, @03:49AM
The comments on the HN posting [ycombinator.com] are quite interesting. One of them referenced a patch that excludes the mitigation on AMD CPUs [lkml.org] because of the implementation that AMD uses:
(Score: 1, Informative) by Anonymous Coward on Tuesday January 02, @04:34AM (3 children)
It screws over the pleb users by intentionally leaving them unable to defend themselves.
If your organization can't function if a single component is compromised and needs to be temporarily shut down, then your organization is incompetent. It's only a shame that this self-reinforcing bullshit has led to a culture where almost everybody has such single points of failure. If researchers were truly responsible and publicly disclosed flaws forthwith following discovery/confirmation then this culture of "we literally can't do business without this one specific library/OS" wouldn't have emerged and a bug in amd64 would lead to nothing more than the RISC-V servers being spun up until a patch was released.
As it stands we all just keep running with the vulnerability, hoping nobody else has figured it out, and that not having an alternative database/web-server/processor won't be viewed as grossly incompetent as it ought be.
Redundancy is the solution to actually responsible immediate and public disclosure.
Having fifty identical servers running fifty copies of Apache in fifty countries isn't redundancy, it's a single point of failure in the ISA/OS/webserver/&c.
(Score: 2, Interesting) by Anonymous Coward on Tuesday January 02, @05:32AM
But how are monoculture pushers going to get lots of money if you spread over different vendors?
Oh, the humanity, the software makers would have to stick to common things and agree on them, instead of steamrolling everything with own policy. And they would have to test 32, 64, little endian, big endian... and above all, compete with others. You damn communists! /s
Yes, that is a veiled hit at RH and systemd (you probably had the MS one in mind already... worst monoculture). We have lost enough CPU archs already (SPARC, Alpha, and no yet dead but not very used Power), and we seem in the path to kill some OSes (after all the old classic Unix) or just make them poor copies of the one True Carmine Penguin, second class citizens. That must stop, or the "correct" plague will be big trouble.
Reply to This
(Score: 1, Interesting) by Anonymous Coward on Tuesday January 02, @05:36AM (1 child)
So AWS, Azure etc. should be able to handle shutting down all their DRAM? I dislike single points of failure, but realistically avoiding that means avoiding standardization and really over complicates things.
(Score: 3, Interesting) by Anonymous Coward on Tuesday January 02, @06:06AM
There ought exist at least two types of memory which share an interface yet are implemented differently enough that vulnerabilities are very unlikely to be shared. If this was the case then they could literally just shut down the machines with the vulnerable kind, swap those sticks out, and bring them back up. Same interface, different internal details.
(Score: 4, Insightful) by arcz on Tuesday January 02, @04:48AM (5 children)
(Score: 2) by Azuma Hazuki on Tuesday January 02, @05:04AM (2 children)
This could be thought of as a logical bug, no? In other words, everything is syntactically correct and working as expected, but real-world usage produces unwanted results? I'm not a programmer and most of the linked article was juuuuuust with my comprehension, but the whole thing had me making painful noises throughout. This is *bad.*
I agree with the analyst that this is probably something that affects virtualization and VM separation, which is why it's being worked on in such secrecy and with such haste. The scary thought is that this is a hardware thing, an emergent behavior from the interplay of software and hardware, rather than just buggy code...
(Score: 0) by Anonymous Coward on Tuesday January 02, @07:19AM (1 child)
Perhaps it just is not possible for mere humans to think of every possible angle of attack and test for it.
Bring on AI?
(Score: 2) by takyon on Tuesday January 02, @07:27AM
https://blogs.microsoft.com/ai/ai-for-security-microsoft-security-risk-detection-makes-debut/ [microsoft.com]
https://www.theregister.co.uk/2017/02/15/rsa_crypto_panel/ [theregister.co.uk]
(Score: 2, Interesting) by Anonymous Coward on Tuesday January 02, @05:48AM (1 child)
> This isn't even a bug at all. It's a side-channel attack. Side channel attacks are not bugs. Not every security issue results from a bug. Stupid summary.
The idea is that there is likely an undisclosed hardware bug that can be exploited if you know the physical addresses (Like Row-hammer). These physical addresses can be recovered via the side channel attack these patches are mitigating.
Leaking physical addresses might not be considered a bug, but that does not mean side channel attack vulnerability is not a bug! You can't categorically say side channel data leakage bugs are not bugs: If the linux kernel leaked the root password to user space via some timing side channel, it would be a bug.
(Score: 2) by arcz on Tuesday January 02, @08:45AM
