Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

Sections

SoylentNews

Log In

Create Account | Retrieve Password

Gift a Subscription

Intel's Cunning Plot to Kill Stack-Hopping Exploits at CPU Level

posted by janrinok on Saturday June 11 2016, @05:49PM

from the simple-but-smart dept.

mendax writes:

El Reg published an article that describes a clever technique Intel is considering implementing in future CPU designs to prevent certain types of malware infections called Control-flow Enforcement Technology (CET) [PDF], those that use return-orientated programming (ROP) and jump-orientated programming (JOP) to implement exploits:

CET works by introducing a shadow stack – which only contains return addresses, is held in system RAM, and is protected by the CPU's memory management unit. When a subroutine is called, the return address is stashed on the thread's stack, as per normal, and also in the shadow stack. When the processor reaches a return instruction, the processor ensures the return address on the thread stack matches the address on the shadow stack.
If they don't match, then an exception is raised, allowing the operating system to catch and stop execution. Therefore, if exploit code starts tampering with the stack to chain together malicious instructions to install malware or otherwise compromise a system, these alterations will be detected and the infiltration halted before any damage can be done.

Given that these are two of the major techniques used by exploit authors to perform arbitrary code execution, being able to block such attempts through hardware could make digital life a little bit safer.

Original Submission

This discussion has been archived. No new comments can be posted.

Intel's Cunning Plot to Kill Stack-Hopping Exploits at CPU Level | Log In/Create an Account | Top | 27 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Why compare - just replace. Why compare - just replace. (Score: 2) by frojack on Saturday June 11 2016, @06:15PM

by frojack (1554) on Saturday June 11 2016, @06:15PM (#358378) Journal

If the stack can be shadowed, and managed totally in protected system space, then why store return addresses in the normal stack at all?
(Other than programmers have come to expect the stack to contain things in a certain order, which could handled with dummy hull entries in place of the return addresses by these processors, and popped but ignored on ret. Since its only done by the processor, the software would run on other processors as well. Of course stack peeking would detect this.)
On the other hand, if there are legitimate uses for stack manipulation, say to intentionally return to the address of two jumps ago, rather than the last jump by popping entries from the stack, wouldn't this prevent such flow control?

--
No, you are mistaken. I've always had this sig.
- Re:Why compare - just replace. Re:Why compare - just replace. (Score: 0) by Anonymous Coward on Saturday June 11 2016, @06:33PM
  
  by Anonymous Coward on Saturday June 11 2016, @06:33PM (#358380)
  
  compares are much cheaper than copies.
  
  Parent
  - Re:Why compare - just replace. (Score: 2) by frojack on Saturday June 11 2016, @06:43PM
    
    by frojack (1554) on Saturday June 11 2016, @06:43PM (#358383) Journal
    
    compares are much cheaper than copies.
    All the more reason not to do BOTH.
    
    --
    No, you are mistaken. I've always had this sig.
    
    Parent
- Re:Why compare - just replace. (Score: 2) by Fnord666 on Saturday June 11 2016, @06:43PM
  
  by Fnord666 (652) on Saturday June 11 2016, @06:43PM (#358384) Homepage
  
  If the stack can be shadowed, and managed totally in protected system space, then why store return addresses in the normal stack at all?
  I think this might be a matter of detection of the issue vs. just automatically and transparently fixing it.
  
  Parent
- Re:Why compare - just replace. (Score: 1) by gdwatson on Saturday June 11 2016, @06:44PM
  
  by gdwatson (6071) on Saturday June 11 2016, @06:44PM (#358385)
  
  On the other hand, if there are legitimate uses for stack manipulation, say to intentionally return to the address of two jumps ago, rather than the last jump by popping entries from the stack, wouldn't this prevent such flow control?
  Yes, and I'd like to know how they'd address it. Tail-call elimination is important in the implementation of some programming languages; it uses just this trick.
  I suppose you could use indirect jumps to implement subroutine calls by hand, as people have done on CPUs without call/ret.
  
  Parent
- Re:Why compare - just replace. (Score: 3, Insightful) by Dunbal on Saturday June 11 2016, @08:03PM
  
  by Dunbal (3515) on Saturday June 11 2016, @08:03PM (#358406)
  
  Of course one of the most common exploits today is the trojan. Ain't nothing going to stop that program when the user has explicitly given it control.
  
  Parent
setjmp and longjmp setjmp and longjmp (Score: 3, Informative) by BsAtHome on Saturday June 11 2016, @06:51PM

by BsAtHome (889) on Saturday June 11 2016, @06:51PM (#358386)

So, how are setjmp() and longjmp() going to work? It seems like the OS must intervene and handle the exception to allow these calls to succeed. Also, handling stack-unwrapping may need to be moved into the OS for corner cases of structured exception handling.
Even when that works, then you would need controls to enable the handling of stack-smashing, whether the smashing is intentional or malicious. It may prove to be a very hard thing to distinguish between the two. If you allow it in one circumstance, what prevents it from any other circumstance? Only for debugging would it be of advantage under very controlled circumstances. But then, who is doing proper debugging these days.
- Re:setjmp and longjmp Re:setjmp and longjmp (Score: 3, Interesting) by frojack on Saturday June 11 2016, @07:06PM
  
  by frojack (1554) on Saturday June 11 2016, @07:06PM (#358392) Journal
  
  But is there ever a valid reason for stack smashing?
  Its an honest question because I've only ever heard the phrase with regard to a form of attack.
  
  --
  No, you are mistaken. I've always had this sig.
  
  Parent
  - Re:setjmp and longjmp Re:setjmp and longjmp (Score: 2) by BsAtHome on Saturday June 11 2016, @07:14PM
    
    by BsAtHome (889) on Saturday June 11 2016, @07:14PM (#358394)
    
    Well, "smashing" may not be the best description when you intentionally manipulate the stack pointer and contents. However, the point is that it is a shear impossible task to distinguish between intentional and non-intentional. The CPU does not know what the programmer intended to do and is in a terrible position to make decisions.
    Therefore, you have normally two philosophies, you always allow stack changes or you never ever allow stack changes. Tacking on a "fix" on a system that allows stack changes it like trying to empty the sea with a small bucket.
    
    Parent
    - Re:setjmp and longjmp Re:setjmp and longjmp (Score: 2) by frojack on Saturday June 11 2016, @07:32PM
      
      by frojack (1554) on Saturday June 11 2016, @07:32PM (#358398) Journal
      
      shear impossible task to distinguish between intentional and non-intentional. The CPU does not know what the programmer intended to do
      
      True, I believe we are in agreement here. Smashing isn't what either of us are thinking about. Instead we mean some corner cases that some programmers
      and some compilers may use to "optimize" returns when they know they are 18 calls deep, and don't want to back track through all of them to get back to
      where they started. So they start popping the stack.
      The description in TFS suggests this new only happens on a RET instruction, but clearly this also must trap jmp instructions that push stuff onto the stack.
      The intentional manipulation of the stack would use some forms of push / pop / peek operations, none of which would be shadowed by the processor.
      
      --
      No, you are mistaken. I've always had this sig.
      
      Parent
      - Re:setjmp and longjmp (Score: 2) by krishnoid on Saturday June 11 2016, @08:32PM
        
        by krishnoid (1156) on Saturday June 11 2016, @08:32PM (#358414)
        
        How about a combination of:
        a compiler hint/optimization indicating that this function may want to break out of multiple levels, and
        a shadow stack that allows the return value on the main stack to match the one on the shadow stack, or allow searching a list for any nested call above it?
        This way (I think -- I'm rusty on this), presuming assembly code from a structured programming language, and not hand-coded:
        in most cases, return() could only return to the previous caller,
        with a hint, return() could be allowed to break out in a structured way to previous or (going out on a limb here) "otherwise acceptable" scopes, with the additional overhead of a return address list search in those less-common cases,
        and in either case, be prevented from jumping to arbitrary code
        I suspect the corner programming cases for this would be even more corner-y, but I think (?) this would cover many situations.
        
        Parent
  - Re:setjmp and longjmp Re:setjmp and longjmp (Score: 2) by Aiwendil on Saturday June 11 2016, @09:34PM
    
    by Aiwendil (531) on Saturday June 11 2016, @09:34PM (#358430) Journal
    
    Polymorphic/selfmodifying code.
    Also it's pretty rare when at highlevel but is a fairly common trick at lowlevel (or when you work under restrictions in term of code or speed) - but in the world of "lets throw more hardware at it" there are very little reason for it.
    Also - could be handy if you want to install linux on a gamingconsole ;)
    
    Parent
    - Which is the real point... (Score: 0) by Anonymous Coward on Saturday June 11 2016, @10:33PM
      
      by Anonymous Coward on Saturday June 11 2016, @10:33PM (#358442)
      
      They're trying to plug the rooting hole, knowing every chip manufacturer will follow suit and license Intel's patent.
      This is all about corporations retaining control of our computers, not 'providing security for the owner of the system.'
      Intel already has unerasable signed ME code that has bus level access to the network at all times. Imagine when the next bios updates come out that require cryto signed bios in order to even boot (they already did it on some of the 'maker' embedded boards, see coreboot for references. Required getting your key/bios signed by intel in order to boot on your device.)
      It is not much of a stretch to imagine this on all consumer available hardware in the near future. Doesn't even require the government to make obvious laws disallowing it, the steady march of progress takes care of that, as long as the back room pacts ensure the hardware is suitably locked down over time.
      I scoffed at Continuum when it came on the air as being too fictiony. By the time it had ended it didn't seem too farfetched.
      
      Parent
  - Re:setjmp and longjmp (Score: 4, Interesting) by coolgopher on Sunday June 12 2016, @02:53AM
    
    by coolgopher (1157) on Sunday June 12 2016, @02:53AM (#358492)
    
    But is there ever a valid reason for stack smashing?
    In a former project we were tasked with implementing replacement firmware for certain comms cards deployed to hundreds of thousands of units. The new firmware was needed for regulatory compliance, and due to cost- or time-reasons not merely corners had been cut from the original firmware. Rather than actually doing anything comms related, said firmware pretty much only supported a framed serial "update protocol". And when I say "supported", I mean that in the sense of "if on a sunny day with the stars properly aligned and the right number of goats sacrificed, it may be possible to communicate with the old firmware to do an upgrade". Said state was really only available after the card had been reset, but the hardware designers had for some unfathomable reason not included a way of reset or power cycle the comms board (w.t.f.?!). There were further undocumented hardware errata that made it even more challenging to talk to the board, but that's for another story.
    So, what does one do when faced with a board in an unknown state with no obvious way of rebooting it? Well, as it turns out, the serial receive routine for the framed messages wasn't doing proper bounds checking on the length field, so with a carefully crafted message it was possible to execute a stack smash against the board. From memory I used a two-pronged attack, first to attempt overwriting the return address with the address to the reset vector, and second, as a fallback, to trigger an access to an unmapped memory region which would result in an unhandled exception, and via that also a reset. This allowed us to successfully do firmware upgrades on boards in nearly any state (the only exception being if they had interrupts closed, but that was never observed).
    This is the one and only time I've legitimately used a stack smash. Of course, if people had done their job properly originally (hardware and firmware), this would never have been needed. Makes for a fun anecdote though.
    
    Parent
- Re:setjmp and longjmp (Score: 0) by Anonymous Coward on Saturday June 11 2016, @09:19PM
  
  by Anonymous Coward on Saturday June 11 2016, @09:19PM (#358425)
  
  One way could be a linker switch that would set a "do not enforce shadow stack" flag in the executable's header, which could be used by the developer of any app using setjmp/longjmp or coroutines. I suspect Intel would want to make it possible for kernel and toolchain developers to implement something along those lines, without specifying or mandating anything.
  Of course, whatever policy the app developer chose would apply to any DLLs or shared libraries loaded by the app, which could be too restrictive if an open-ended load API such as dlopen or LoadLibraryEx was used.
  
  Parent
Do it in software (Score: 2, Insightful) by Anonymous Coward on Saturday June 11 2016, @06:53PM

by Anonymous Coward on Saturday June 11 2016, @06:53PM (#358387)

How is this better than the implementations using existing software like this one?: http://lists.llvm.org/pipermail/llvm-dev/2016-May/100346.html [llvm.org]
X86 is a complex mess with lots of junk already. I know Intel is in the business of bloating this legacy mess further but from the big picture perspective adding more and more badly interacting custom CPU features in an attempt to exploit their monopoly position don't really seem like great technical progress.
If you can't implement this kind of thing efficiently in software using nice simple compossable general purpose pieces, its time to get a new instruction set, not add several more layers of kludges. We as software engineers really need to get our acts together and make cross compilation easier and allow the world to move away from the x86 legacy mess to allow real competition in ISA design if we want good processors.
What's old is new again! What's old is new again! (Score: 5, Informative) by Anonymous Coward on Saturday June 11 2016, @07:13PM

by Anonymous Coward on Saturday June 11 2016, @07:13PM (#358393)

Well, that's exactly how I implemented the stacks in my VM a decade ago. Except: Rather than comparing to a shadow stack, just put the damn pointers on the shadow stack and use them directly without the stupid ass compare! x86 had already made manipulation of the code pointer impossible except through opcodes. It obviously also needed to make the instruction stream modifiable only through opcodes (just make call and ret use their own stack that's in read-only memory but which the opcodes can ignore this read only restriction). It's not rocket science, so don't pretend it's amazing or even cool. This is plainly obvious to anyone presented with the problem. The "compare to shadow stack" is done rather than just using the fucking shadow stack pointer to provide backwards compatibility with languages which want to unroll the stack and don't know about the new flow control opcodes. To solve the problem without duplicating the code pointers, you'll also want to be able to switch contexts and do a read only peek at the values on the shadow stack, this is to support exceptions and debuggers unrolling the stack.
The real solution is to separate code pointers from parameter stack. Like my VM, and now Intel's "new" plan, FORTH also keeps two different stacks for execution and parameters, but if the hardware doesn't support having a second stack for execution then FORTH too can get stack smashed by buffer overrun. I've been saying for decades that if you put your code pointers and data pointers on the same stack you're doing it wrong. So, Intel isn't so clever after all: SPARC also had a "shadow stack" that you blow registers onto when your compiler encounters too many intermediary values at once, and FORTH properly implemented on SPARC is already stack smash and ROP proof.
Fuck Intel for patenting shit with decades of prior art and trying to pass it off as "new" and "innovative" when they haven't even fixed the core problem: All code pointers should be in CODE memory. The Von Neumann Architecture (CODE = DATA) is insecure. Code is not data. Or, more correctly: All data is code -- Software is a VM that runs DATA as opcodes. For example, a font library is a VM for rendering font opcode. Or, a JPEG decoder library is a VM that runs JPEG opcode and outputs pixel raster data. The "VM"s (programs) all need to have available the same protections that hardware allows OS level code. Uniformity in application of security features at every level of design is the mathematically provable solution to the security problem. If we can mark pages as read only to stop self modifying code, then we also need individual words of records being able to be marked read only in order to stop modification of execution flow. Hint: The flow of execution is the program. Stop treating opcode as if it's just data. Stop treating programs as if they're not (virtual) machines, and give them the capabilities that the CPU has itself: The ability to determine what is and isn't writable memory. Once you do this you see what innovations must occur to solve the security issue.
For my next trick I will tell you how to create a new form of exploit that you can use instead of Return Oriented Programming: I call it "Methodic Oriented Programming". By manipulating a heap overflow into overwriting a C++ object's VTABLE reference (or C function pointer references), you can reprogram which code gets called when a method is called on an object (similar to how return oriented programming changes what code gets called when you return from a function). To write a Methodic Oriented Program you examine the order that methods are called on objects (or the order in which the function pointers are called). That is your instruction stream so you arrange to supply bad pointers to objects to code that will call the methods in the particular order, which you have pointing to the code you want executed, similar to return oriented programming.
Finally, I'll tell you how I defeat MOP exploits in my old ass VM design (which could also be implemented in hardware, since I have a Verilog implementation). RAM needs to be able to become read only or read/write at the word level. Sections of RAM need to be able to be marked as "formatted". When the formatted flag is set a bitmap tells the CPU which words are read only and which can be read/write. The record size of formatted RAM can be limited to the number of bits in the formatting bitmap, or you can have the "write only" words only be within the first 32 or 64 words, of "formatted" memory, and the rest be read/write. Since memory access is already virtualized this really isn't that much more expensive to implement in hardware.
My VM implementation uses a single bit bitmap for its record format data. I store all my V-tables in RAM pages marked read only, but the objects still need to reference which type they are, and thus they need a pointer in the object to the VTABLE for their type. Because that pointer in the object instance can currently be overwritten in x86 it can be modified to say the V-table is at any arbitrary location, and this is how the Methodic Oriented Programming exploits work. So, my VM just needs a single word in each object instance to be marked read only. I don't use pointers to other objects in my language implementation, I use handles instead... so I don't have to worry about function pointers being modified. The multiple bit bitmap method for read only record attribute formatting at the word level is what can allow pointers to other objects to be protected, and thus prevent MOP.
Modifying a pointer to an object on the heap can result in a bad pointer to what isn't an object and thus what isn't a V-table, and thus what isn't the right function. So, you have to mark read-only not only the reference to the object type data / V-table but also the make the pointers to other objects read only. And if you do that then you need opcodes for getting and setting object references which ensure that the protected fields are being set to values from a V-Table and not arbitrary data. If you do all that then you eliminate all exploit vectors (except JTOP : "Jump table oriented programming", heh).
TL;DR: Just posting to provide more prior art in case Intel's next "innovation" is also patented. Foiled again, Intel. The game is still afoot, and will remain so thanks to your shit implementation of TPM.
- Re:What's old is new again! (Score: 3, Informative) by RamiK on Saturday June 11 2016, @07:29PM
  
  by RamiK (1813) on Saturday June 11 2016, @07:29PM (#358397)
  
  And 3 decades ago there were LISP machines which did all of that and more in the hardware.
  
  --
  compiling...
  
  Parent
- Re:Code is not data Re:Code is not data (Score: 2) by Scruffy Beard 2 on Saturday June 11 2016, @09:39PM
  
  by Scruffy Beard 2 (6030) on Saturday June 11 2016, @09:39PM (#358431)
  Over a decade ago, I started trying to compile a list of "Safe" file formats that do not store code.
  Container formats like AVI can store arbitrary data.
  PDF files, which orignally became popular because thet act like paper, now support JavaScript.
  PS files that PDF replaces are turing-complete.
  Many "office" formats support Macros.
  I eventually concluded that even text files can execute code in the right circumstances (shell scripts, .desktop files come to mind).
  Parent
  - Re:Code is not data (Score: 3, Informative) by maxwell demon on Sunday June 12 2016, @05:40AM
    
    by maxwell demon (1608) on Sunday June 12 2016, @05:40AM (#358543) Journal
    
    Actually I think Turing-completeness is not the right criterion. The worst thing a Turing-complete but otherwise completely incapable language can do is to waste memory and CPU time; for both there exist time-proven ways to handle it.
    The danger comes when the format can interact with anything that could be used to cause damage. This includes access to the file system, access to the internet, and even the ability to output arbitrary text on standard output. [stackexchange.com]
    
    --
    The Tao of math: The numbers you can count are not the real numbers.
    
    Parent
- Re:What's old is new again! (Score: 2) by maxwell demon on Sunday June 12 2016, @05:23AM
  
  by maxwell demon (1608) on Sunday June 12 2016, @05:23AM (#358540) Journal
  
  and thus they need a pointer in the object to the VTABLE for their type.
  No, they don't. You could have a global table mapping object addresses to VTABLE addresses. Yes, it would be less efficient, but it would eliminate the VTABLE pointer inside the object. As a bonus, by deleting the entry on object destruction, an already-destructed object could no longer be used even if the full data happens to still lie in memory, as the VTABLE could no longer be found.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
  
  Parent
Let down (Score: 3, Funny) by ilPapa on Saturday June 11 2016, @08:16PM

by ilPapa (2366) on Saturday June 11 2016, @08:16PM (#358411) Journal

I saw the headline in my RSS feed and mis-read it as "stank-hopping"
This is my disappointed face.

--
You are still welcome on my lawn.
Complicated (Score: 2) by fritsd on Saturday June 11 2016, @08:28PM

by fritsd (4586) on Saturday June 11 2016, @08:28PM (#358412) Journal

I found that too complicated for my poor head..
So.. the normal stack is for Microsoft telemetry monitoring [soylentnews.org], and this shadow stack is for Intel/NSA security processor monitoring?
I'm sure it's very cunning.
not good not good (Score: 2) by Gravis on Saturday June 11 2016, @08:31PM

by Gravis (4596) on Saturday June 11 2016, @08:31PM (#358413)

while this seems like a total waste of transistors and just another bullet point for marketing, this may also restrict the use of cooperative threading because it's intended to prevent tampering with the stack. libco [byuu.org] is a cooperative threading used in emulators which have much higher accuracy as a result. if an exception is raised every time it switches cothreads, you are going to have to be able to tell the OS to ignore the exception or not be able to use some of the most brilliant programs because being clever is prohibited by Intel.
- Re:not good Re:not good (Score: 2) by RamiK on Sunday June 12 2016, @02:28AM
  
  by RamiK (1813) on Sunday June 12 2016, @02:28AM (#358486)
  
  On the list of downsides, you can throw-in significant overhead for for lots of coroutines (CSP style or just Duff's device) and split-stacks. From there, you can execute one of two attacks: Either flood the shadow stacks space with endless coroutines, or attack the hardware scheduler garbage collecting all those endless stacks by creating recursing the return pointers back and forth to jump tables.
  Extra credit for doing both at the same time.
  The good thing about this is that it will like trip over exceptions making everything so damn fragile people will stop abusing exceptions and start writing good code.
  
  --
  compiling...
  
  Parent
  - Re:not good (Score: 0) by Anonymous Coward on Sunday June 12 2016, @03:46AM
    
    by Anonymous Coward on Sunday June 12 2016, @03:46AM (#358510)
    
    The good thing about this is that it will like trip over exceptions making everything so damn fragile people will stop abusing exceptions and start writing good code.
    I guess Intel's plot truly is cunning, then.
    
    Parent
Effect on AI (Score: 1, Interesting) by Anonymous Coward on Sunday June 12 2016, @08:39AM

by Anonymous Coward on Sunday June 12 2016, @08:39AM (#358578)

We don't know for certain what the ideal hardware would be for AI to run. We may need hardware to allow random code generation and execution and so on. Intel and others are busy taking away freedom one bit at a time.
So in the end, AI programs implemented by independent developers will not be allowed to run. To run modern AI, one will need to use the government-mandated chip without limitations. Naturally, this chip will only be 'licensed' to friends of the government, like illegal spy agencies and official torture factories.
All this is done in the name of security. There is no security for the people any more and it is getting worse.

Moderator Help

Moon, n.: 1. A celestial object whose phase is very important to hackers. See PHASE OF THE MOON. 2. Dave Moon (MOON@MC).