Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Wednesday September 13 2017, @06:03AM   Printer-friendly
from the going-back dept.

Return-oriented programming (ROP) is now a common technique for compromising systems via a stack-smashing vulnerability. Although restrictions on executing code on the stack have mostly put an end to many simple stack-smashing attacks, that does not mean that they are no longer a threat. There are various schemes in use for defeating ROP attacks. A new mechanism called "RETGUARD" is being implemented in OpenBSD and is notable for its relative simplicity. It makes use of a simple return-address transformation to disrupt ROP chains to hinder their execution and takes the form of a patch to the LLVM compiler adding a new flag.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Informative) by Wootery on Wednesday September 13 2017, @09:26AM (18 children)

    by Wootery (2341) on Wednesday September 13 2017, @09:26AM (#567145)

    This is prevented by Intel's hardware solution, 'CET', if Security Now has informed me correctly. [grc.com]

    So Intel with CET is creating a shadow stack that the programmer, the normal system, has no control over. And the only data that are pushed and popped are the ones that have implied use of the stack. That is, a call instruction has an implied use of the stack because the return from the call is automatically pushed on the stack.

    Similarly, the return instruction from a subroutine has implied use of the stack because the return instruction always gets the address it's going to return to from the stack.

    So what the shadow stack does is it obeys the same implied activities for calls and returns. But the programmer has no access to it. That is, when you're pushing things on the stack, you're only pushing them on the visible stack.

    When you pop them, you're only popping them from the visible stack. But when you do a call, the visible stack and the shadow stack both store the return address. And here's the key. When you do a return, the system verifies that the shadow stack's return address matches the visible stack's return address.

    If they don't match, something is wrong, and the process is terminated. So that completely prevents malicious use or even mistaken use. This will catch bugs faster than anyone's business, immediately catch the bug.

    But it will also immediately shut down anyone trying to use stack manipulation, buffer overruns, in order to get their own code to execute, to disrupt the function of a return instruction, to have that return instruction go somewhere else because it won't match what the shadow stack has because the shadow stack they have no control over, and it will always have the original true and correct return address.

    If the system tries to return to an address that the shadow stack doesn't agree with, it immediately produces a system exception and stops running. So it's just beautiful.

    Starting Score:    1  point
    Moderation   +1  
       Informative=1, Total=1
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 2) by Virindi on Wednesday September 13 2017, @09:36AM (17 children)

    by Virindi (3484) on Wednesday September 13 2017, @09:36AM (#567147)

    And here's the key...

    This will catch bugs faster than anyone's business, immediately catch the bug.

    But it will also immediately shut down anyone trying to use...

    So it's just beautiful.

    What is all this weasel hype wording? Is it not enough to just describe technical details anymore?

    Cool that they have something like that in hardware, though. But the description quoted makes it sound like it is not in memory but only on the cpu? That does introduce some issues, like for instance, maximum stack depth.

    • (Score: 2) by Wootery on Wednesday September 13 2017, @09:55AM (16 children)

      by Wootery (2341) on Wednesday September 13 2017, @09:55AM (#567152)

      What is all this weasel hype wording? Is it not enough to just describe technical details anymore?

      Well, it's from a podcast transcript, not an Intel whitepaper.

      makes it sound like it is not in memory but only on the cpu? That does introduce some issues, like for instance, maximum stack depth.

      Looking here [intel.com], it looks like it's in memory, with protections to prevent tampering:

      There are restrictions to write operations to shadow stack to make it harder for adversary to modify return address on both copies of stack implemented by changes to page tables. Thus limiting shadow stack usage to call and return operations for purpose of storing return address only.

      The page table protections for shadow stack are also designed to protect integrity of shadow stack by preventing unintended or malicious switching of shadow stack and/or overflow and underflow of shadow stack.

      It also makes me think Gibson was probably a bit optimistic when he went on to say there'd be zero performance cost because it's "all in the hardware". It's not all on-chip, it's going to create cache pressure/memory traffic.

      • (Score: 2) by Virindi on Wednesday September 13 2017, @10:10AM (15 children)

        by Virindi (3484) on Wednesday September 13 2017, @10:10AM (#567154)

        It also makes me think Gibson was probably a bit optimistic when he went on to say there'd be zero performance cost because it's "all in the hardware". It's not all on-chip, it's going to create cache pressure/memory traffic.

        Indeed. It is clear that Intel has no choice but to store the return addresses in two places for backwards compatibility. However, if this were implemented in the compiler (and ABI), the return address would only have to be written to ONE place: the return address stack. It would use fewer memory accesses in exchange for using more simple register operations (add and subtract from the secondary stack register). Of course, on a modern cpu those are small potatoes compared to memory access.

        So it seems to me that it would actually be faster to implement it in SOFTWARE than in hardware. At least, given Intel's requirements. Obviously if they could reinvent the whole system, hardware could be faster.

        • (Score: 2) by Wootery on Wednesday September 13 2017, @10:25AM (14 children)

          by Wootery (2341) on Wednesday September 13 2017, @10:25AM (#567157)

          The question is preventing corruption of the return address stack. How can you do this in software? We have to worry about buffer overflows/bad pointer arithmetic, and ROP-style shenanigans with the instruction pointer.

          Without banning pointer arithmetic, how can you prevent a malicious C program from messing with the return pointer? We could manage a shadow stack in software, and compare return address pointers before returning, but now we're back to square one except using software. We'd also somehow have to prevent corruption of the shadow stack, and again it's starting to look like a hardware solution is the way.

          LLVM SafeStack seems to be about creating two different stacks. [llvm.org] They concede limitations:

          protection against arbitrary memory write vulnerabilities is probabilistic and relies on randomization and information hiding. The randomization is currently based on system-enforced ASLR and shares its known security limitations

          • (Score: 2) by Virindi on Wednesday September 13 2017, @10:44AM (5 children)

            by Virindi (3484) on Wednesday September 13 2017, @10:44AM (#567162)

            I was merely suggesting moving the return pointer to someplace other than the data stack (to a return pointer stack). There would be no return pointer on the data stack. The idea is that by indexing into one stack, it would be hard to make that resolve to another stack, given guard pages, a 64-bit address space, and random stack locations. In this scenario there is no comparison of return pointers because there is only one return pointer.

            In my opinion, having two return pointers is only a marginal improvement. This is just like a type of ECC, where you fully duplicate all the data with the hopes that both copies can't be modified by the attacker. Okay I guess, but it still relies on exactly the same thing and adds almost nothing. Consider what the attacker must accomplish: in the separate stack scenario, the attacker must find a separate stack someplace else in memory. When the return addresses are DUPLICATED to a second stack, the attacker must find a separate stack someplace else in memory (an identical problem) as well as overwrite the standard return address at a known place on the same stack as the data. The former problem is identical to my scenario and the latter is nearly trivial.

            how can you prevent a malicious C program from messing with the return pointer?

            Note that all I am trying to defend against here is an attacker who can write data into a variable on the data stack with a chosen offset. This is a common scenario when a structure is stored on the stack and proper checks are not present when accessing it. I have not seen anything in this discussion about an attacker who can already execute arbitrary code; if they can do so why use ROP?

            • (Score: 2) by Wootery on Wednesday September 13 2017, @04:00PM (4 children)

              by Wootery (2341) on Wednesday September 13 2017, @04:00PM (#567260)

              I was merely suggesting moving the return pointer to someplace other than the data stack (to a return pointer stack).

              We could go even further and have four: one for arguments, one for returned values, one for temporaries, and one for return-to (instruction pointer) addresses.

              Ideally what we'd want is for the return-to stack to be inaccessible except for by the call/return instructions. This would be similar to what Intel's CET does, but without the duplication. (Of course, CET's hands are tied for backward compatibility.)

              The former problem is identical to my scenario and the latter is nearly trivial.

              Ignoring the write-prevention, yes. Presumably a software solution could do something similar, especially with kernel support; CET 'just' does something with page tables after all.

              I have not seen anything in this discussion about an attacker who can already execute arbitrary code; if they can do so why use ROP?

              You're right, that would be a very different question.

              • (Score: 2) by Virindi on Wednesday September 13 2017, @08:49PM (3 children)

                by Virindi (3484) on Wednesday September 13 2017, @08:49PM (#567452)

                Ignoring the write-prevention, yes. Presumably a software solution could do something similar, especially with kernel support; CET 'just' does something with page tables after all.

                You're right, not sure why I wasn't thinking about that. Clearly it is helpful to have the page containing the return address stack, have write protection against "normal" data write instructions.

                • (Score: 2) by Wootery on Thursday September 14 2017, @08:14AM (2 children)

                  by Wootery (2341) on Thursday September 14 2017, @08:14AM (#567691)

                  I figure you could do a similar thing in software (i.e. block 'ordinary' instructions from accessing that stack) using syscalls, unless I'm missing something.

                  • (Score: 2) by Virindi on Thursday September 14 2017, @09:25AM (1 child)

                    by Virindi (3484) on Thursday September 14 2017, @09:25AM (#567706)

                    How? It would have to be written to when you make a call. I am not aware of some easy mechanism to achieve this.

                    I double checked the Intel reference manual, and attempting a call instruction when the next stack location is a non-write page results in a processor exception. So if you want to implement this in software, you'd have to have the OS deal with this exception and check on whether a call instruction was actually made (and not a normal memory write), make the push, then resume. This seems like a heck of a lot of overhead for every call!

                    Or am I missing something?

                    • (Score: 2) by Wootery on Thursday September 14 2017, @01:49PM

                      by Wootery (2341) on Thursday September 14 2017, @01:49PM (#567793)

                      You're right, there would be an awful amount of overhead there.

                      I don't think there's any efficient mechanism to restrict access to a page to only certain functions.

          • (Score: 3, Informative) by ledow on Wednesday September 13 2017, @11:07AM (7 children)

            by ledow (5567) on Wednesday September 13 2017, @11:07AM (#567169) Homepage

            Not everything can - or should - be done in software.

            To be honest, if I were designing a chip architecture nowadays, I would have a very clear separation of data and code, and I would have a lot of fine-grained control over what was writable and usable as a pointer or not and what instructions were allowed to operate on, and a complete separation of operational details (where that process is in memory, or where the stack is stored and how) from the actual software that is operating (sure, the OS is software, but there's no need to propagate such details to an OS, even, if you provide the correct hardware functionality).

            That there is a way to corrupt a stack, as any user or process, is unacceptable if we're talking about security. Hardware needs to enforce that (i.e. locking out regions of memory, etc.). But relying on software that changes and is the source of the compromise to then defend against that compromise is quite ridiculous a notion.

            There's no reason that processes even need to see actual real memory pointers either, or why they should ever be able to tweak them out of their given bounds, or certainly use them as a target to jump to for execution.

            In an era where we can compile raw, unchanged C to ECMAScript and have it work, there's no reason that we should still be using architectures that jump through all kinds of security hoops to boot but then leave everything up to the software trying to avoid touching itself.

            I'm even tempted to say - and I know someone will say that some System X does that, or that it would have overheads, or it would make IPC difficult - that process management should be hardware-controlled and only software-directed. "Hey, processor, please spin up a new thread using the new code memory region I requested earlier, this code start-point and these limits" - and then that code can't escape from those limits, be they memory regions, hardware access, CPU-usage, priority or whatever - and the controlling software cannot then use or interfere with those regions from that point onwards but can request, say, process-sleeping, prioritisation, communication or termination (and only the parent process is able to do that, traced right back to the boot process).

            Every OS reinvents the wheel to spin up a process and control it, and when we get it wrong things like this happen where processes can escape or modify other processes and their memory.

            Why is any process able to see the contents of an address range containing even its own code, or anything other than an unexecutable data area starting at literally address zero (which is offset and bounds-checked by the hardware to translate it to the real hardware address)?

            I get the "legacy code" bit, but when we went 64-bit we should have just changed the way the entire thing works, or at least add it as a processor mode option that can be switched into.

            • (Score: 2) by Virindi on Wednesday September 13 2017, @11:26AM (2 children)

              by Virindi (3484) on Wednesday September 13 2017, @11:26AM (#567175)

              process management should be hardware-controlled and only software-directed

              Hardware is not a magic bullet. The behavior you would be asking it to do would be quite complex, and there would be vulnerabilities identical to if the OS was controlling it. Except now, that behavior is both much less transparent and also harder to update.

              You are essentially asking for the OS to be implemented in CPU microcode. I, for one, have a problem with that on many levels! I'd rather the complex tasks of security on my system be visible to me, with hardware based operations which are as simple as possible. (Yes, the security model in x86 has gotten out of hand. But still it is not as bad as the scenario you suggest.)

              • (Score: 2) by ledow on Wednesday September 13 2017, @12:02PM (1 child)

                by ledow (5567) on Wednesday September 13 2017, @12:02PM (#567184) Homepage

                I'd rather the complex tasks of security be extremely visible.

                As in "process killed hard because it tried to do something it shouldn't be allowed to do, and hardware gave it zero choice but to immediately terminate." Not just "oh, just set the service to auto-restart silently in the background if it dies and make sure it runs as admin".

                Software has proven woefully inadequate at this. It's still possible to BSOD the latest OS. OS control basically consists of a user "asking" the OS to kill a process which then goes through a complicated rigmarole of processing before deciding that the process should die, and then spend half its life cleaning up after it. I'd rather it just died. Literally.. boom... not a single instruction more executed, memory removed from its possession, every pending action or callback gone forever, possibly the controlling process notified.

                I don't see how the hardware would have vulnerabilities such as this (others, sure). If the address space of the stack is LITERALLY NOT AVAILABLE to the process, no matter what it asks for, then that's the end-game.

                If you'd argued for it making debugging harder, yes, we'd need to put in debug modes into the processors for that.
                If you'd argued that it would affect performance - of course it would. But so does all the stack-tricks and ASLR and so on.
                If you'd argued that you want control of the microcode because of the processor snooping on things it shouldn't, I could agree, but that's a game we've already lost in the conventional OS world too.

                If the argument is "I can't see into my processes, or what stack was used, or how the processor set up that thread", then that's precisely the point, I feel.

                Too much is reliant on a bit of software that can be changed and overwritten detecting whether other bits of software have been changed or overwritten.

                • (Score: 2) by maxwell demon on Wednesday September 13 2017, @07:19PM

                  by maxwell demon (1608) on Wednesday September 13 2017, @07:19PM (#567402) Journal

                  As in "process killed hard because it tried to do something it shouldn't be allowed to do, and hardware gave it zero choice but to immediately terminate." Not just "oh, just set the service to auto-restart silently in the background if it dies and make sure it runs as admin".

                  You obviously have no idea what you are talking about. When a process is "silently restarted," it is killed hard. And some other process notices that the process is gone, and starts a new process for the same program. And there's zero the hardware can do against this, except possibly disallow the OS from starting a new process. But hardware that does that would be 100% useless, as you couldn't do anything with it.

                  --
                  The Tao of math: The numbers you can count are not the real numbers.
            • (Score: 2) by Pino P on Wednesday September 13 2017, @05:33PM

              by Pino P (4721) on Wednesday September 13 2017, @05:33PM (#567312) Journal

              Why is any process able to see the contents of an address range containing even its own code

              I can think of three reasons.

              Program loader
              The operating system process loading application code into RAM has to see the application code it's loading. In iOS, for example, this process is privileged and unique, and it verifies digital signatures.
              JIT engine
              A process using just-in-time recompilation has to see the code it's building. In iOS, for example, this process is privileged and unique: the only JIT recompiler allowed to run is the WebKit JavaScript engine.
              Literal pools
              The ARM instruction set has a limited range for immediate values: 8 bits rotated by some even number of bits 0 to 31. The workarounds are to split a large constant into a set of immediate load and add instructions or to load large constants from a literal pool [wikipedia.org] placed between functions and accessed using a PC-relative addressing mode. This may be impractical to avoid on ARM, the instruction set of older versions of iOS. AArch64 changes this by allowing 1 MB offsets [arm.com], which in principle would allow literal pools to end up in separate MMU pages from code.
            • (Score: 2) by meustrus on Wednesday September 13 2017, @05:50PM (1 child)

              by meustrus (4961) on Wednesday September 13 2017, @05:50PM (#567330)

              Why is any process able to see the contents of an address range

              Because the available abstractions historically never perform well enough for the few performance-critical applications. Bitmap processing is one area where going one pixel at a time is really slow with most libraries, and this has typically been solved by giving developers access to the memory underlying the bitmap.

              It probably isn't a law that abstractions will always perform worse. But we used to have access to the underlying hardware, and a lot of high-performing code was developed with this access. When the new abstraction performs poorly, the easy solution is to go back to the way it used to be so they can use the highly performing code regardless of its lack of safety.

              I get the "legacy code" bit, but when we went 64-bit we should have just changed the way the entire thing works

              Intel tried to do just that; they looked at the x86 instruction set, saw cruft, and went about designing Itanium to fix the shortcomings discovered since x86 was created and extended. But designing things right takes time, and while Intel was busy with that exercise AMD rushed to market a bolted-on 64-bit extension to x86. No longer first to market, Itanium was at a severe disadvantage by the time it came out with its incompatibility with existing x86 bytecode.

              Market forces will always prevent us from having nice things. And the market prefers performance and backwards compatibility over correctness, even when "not correct" means "not secure". It's why Linux and Windows dominate despite many more correct alternatives existing.

              --
              If there isn't at least one reference or primary source, it's not +1 Informative. Maybe the underused +1 Interesting?
              • (Score: 2) by Wootery on Thursday September 14 2017, @08:18AM

                by Wootery (2341) on Thursday September 14 2017, @08:18AM (#567695)

                Market forces will always prevent us from having nice things.

                I don't know about that - ARM is doing ok. MIPS, POWER, SPARC, SuperH, Alpha... not so much. I hope there's a future for RISC-V.

                In the GPU world they're essentially free to re-architect their hardware every generation, as everything is JIT compiled.

            • (Score: 2) by maxwell demon on Wednesday September 13 2017, @07:12PM

              by maxwell demon (1608) on Wednesday September 13 2017, @07:12PM (#567394) Journal

              (sure, the OS is software, but there's no need to propagate such details to an OS, even, if you provide the correct hardware functionality).

              So you think hardware is never buggy? If an OS has a bug, I can just install an update. If hardware has a bug, well, bad luck.

              --
              The Tao of math: The numbers you can count are not the real numbers.