Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Saturday June 11 2016, @05:49PM   Printer-friendly
from the simple-but-smart dept.

El Reg published an article that describes a clever technique Intel is considering implementing in future CPU designs to prevent certain types of malware infections called Control-flow Enforcement Technology (CET) [PDF], those that use return-orientated programming (ROP) and jump-orientated programming (JOP) to implement exploits:

CET works by introducing a shadow stack – which only contains return addresses, is held in system RAM, and is protected by the CPU's memory management unit. When a subroutine is called, the return address is stashed on the thread's stack, as per normal, and also in the shadow stack. When the processor reaches a return instruction, the processor ensures the return address on the thread stack matches the address on the shadow stack.

If they don't match, then an exception is raised, allowing the operating system to catch and stop execution. Therefore, if exploit code starts tampering with the stack to chain together malicious instructions to install malware or otherwise compromise a system, these alterations will be detected and the infiltration halted before any damage can be done.

Given that these are two of the major techniques used by exploit authors to perform arbitrary code execution, being able to block such attempts through hardware could make digital life a little bit safer.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Informative) by Anonymous Coward on Saturday June 11 2016, @07:13PM

    by Anonymous Coward on Saturday June 11 2016, @07:13PM (#358393)

    Well, that's exactly how I implemented the stacks in my VM a decade ago. Except: Rather than comparing to a shadow stack, just put the damn pointers on the shadow stack and use them directly without the stupid ass compare! x86 had already made manipulation of the code pointer impossible except through opcodes. It obviously also needed to make the instruction stream modifiable only through opcodes (just make call and ret use their own stack that's in read-only memory but which the opcodes can ignore this read only restriction). It's not rocket science, so don't pretend it's amazing or even cool. This is plainly obvious to anyone presented with the problem. The "compare to shadow stack" is done rather than just using the fucking shadow stack pointer to provide backwards compatibility with languages which want to unroll the stack and don't know about the new flow control opcodes. To solve the problem without duplicating the code pointers, you'll also want to be able to switch contexts and do a read only peek at the values on the shadow stack, this is to support exceptions and debuggers unrolling the stack.

    The real solution is to separate code pointers from parameter stack. Like my VM, and now Intel's "new" plan, FORTH also keeps two different stacks for execution and parameters, but if the hardware doesn't support having a second stack for execution then FORTH too can get stack smashed by buffer overrun. I've been saying for decades that if you put your code pointers and data pointers on the same stack you're doing it wrong. So, Intel isn't so clever after all: SPARC also had a "shadow stack" that you blow registers onto when your compiler encounters too many intermediary values at once, and FORTH properly implemented on SPARC is already stack smash and ROP proof.

    Fuck Intel for patenting shit with decades of prior art and trying to pass it off as "new" and "innovative" when they haven't even fixed the core problem: All code pointers should be in CODE memory. The Von Neumann Architecture (CODE = DATA) is insecure. Code is not data. Or, more correctly: All data is code -- Software is a VM that runs DATA as opcodes. For example, a font library is a VM for rendering font opcode. Or, a JPEG decoder library is a VM that runs JPEG opcode and outputs pixel raster data. The "VM"s (programs) all need to have available the same protections that hardware allows OS level code. Uniformity in application of security features at every level of design is the mathematically provable solution to the security problem. If we can mark pages as read only to stop self modifying code, then we also need individual words of records being able to be marked read only in order to stop modification of execution flow. Hint: The flow of execution is the program. Stop treating opcode as if it's just data. Stop treating programs as if they're not (virtual) machines, and give them the capabilities that the CPU has itself: The ability to determine what is and isn't writable memory. Once you do this you see what innovations must occur to solve the security issue.

    For my next trick I will tell you how to create a new form of exploit that you can use instead of Return Oriented Programming: I call it "Methodic Oriented Programming". By manipulating a heap overflow into overwriting a C++ object's VTABLE reference (or C function pointer references), you can reprogram which code gets called when a method is called on an object (similar to how return oriented programming changes what code gets called when you return from a function). To write a Methodic Oriented Program you examine the order that methods are called on objects (or the order in which the function pointers are called). That is your instruction stream so you arrange to supply bad pointers to objects to code that will call the methods in the particular order, which you have pointing to the code you want executed, similar to return oriented programming.

    Finally, I'll tell you how I defeat MOP exploits in my old ass VM design (which could also be implemented in hardware, since I have a Verilog implementation). RAM needs to be able to become read only or read/write at the word level. Sections of RAM need to be able to be marked as "formatted". When the formatted flag is set a bitmap tells the CPU which words are read only and which can be read/write. The record size of formatted RAM can be limited to the number of bits in the formatting bitmap, or you can have the "write only" words only be within the first 32 or 64 words, of "formatted" memory, and the rest be read/write. Since memory access is already virtualized this really isn't that much more expensive to implement in hardware.

    My VM implementation uses a single bit bitmap for its record format data. I store all my V-tables in RAM pages marked read only, but the objects still need to reference which type they are, and thus they need a pointer in the object to the VTABLE for their type. Because that pointer in the object instance can currently be overwritten in x86 it can be modified to say the V-table is at any arbitrary location, and this is how the Methodic Oriented Programming exploits work. So, my VM just needs a single word in each object instance to be marked read only. I don't use pointers to other objects in my language implementation, I use handles instead... so I don't have to worry about function pointers being modified. The multiple bit bitmap method for read only record attribute formatting at the word level is what can allow pointers to other objects to be protected, and thus prevent MOP.

    Modifying a pointer to an object on the heap can result in a bad pointer to what isn't an object and thus what isn't a V-table, and thus what isn't the right function. So, you have to mark read-only not only the reference to the object type data / V-table but also the make the pointers to other objects read only. And if you do that then you need opcodes for getting and setting object references which ensure that the protected fields are being set to values from a V-Table and not arbitrary data. If you do all that then you eliminate all exploit vectors (except JTOP : "Jump table oriented programming", heh).

    TL;DR: Just posting to provide more prior art in case Intel's next "innovation" is also patented. Foiled again, Intel. The game is still afoot, and will remain so thanks to your shit implementation of TPM.

    Starting Score:    0  points
    Moderation   +5  
       Interesting=1, Informative=3, Underrated=1, Total=5
    Extra 'Informative' Modifier   0  

    Total Score:   5  
  • (Score: 3, Informative) by RamiK on Saturday June 11 2016, @07:29PM

    by RamiK (1813) on Saturday June 11 2016, @07:29PM (#358397)

    And 3 decades ago there were LISP machines which did all of that and more in the hardware.

    --
    compiling...
  • (Score: 2) by Scruffy Beard 2 on Saturday June 11 2016, @09:39PM

    by Scruffy Beard 2 (6030) on Saturday June 11 2016, @09:39PM (#358431)

    Over a decade ago, I started trying to compile a list of "Safe" file formats that do not store code.

    • Container formats like AVI can store arbitrary data.
    • PDF files, which orignally became popular because thet act like paper, now support JavaScript.
    • PS files that PDF replaces are turing-complete.
    • Many "office" formats support Macros.
    • I eventually concluded that even text files can execute code in the right circumstances (shell scripts, .desktop files come to mind).
    • (Score: 3, Informative) by maxwell demon on Sunday June 12 2016, @05:40AM

      by maxwell demon (1608) on Sunday June 12 2016, @05:40AM (#358543) Journal

      Actually I think Turing-completeness is not the right criterion. The worst thing a Turing-complete but otherwise completely incapable language can do is to waste memory and CPU time; for both there exist time-proven ways to handle it.

      The danger comes when the format can interact with anything that could be used to cause damage. This includes access to the file system, access to the internet, and even the ability to output arbitrary text on standard output. [stackexchange.com]

      --
      The Tao of math: The numbers you can count are not the real numbers.
  • (Score: 2) by maxwell demon on Sunday June 12 2016, @05:23AM

    by maxwell demon (1608) on Sunday June 12 2016, @05:23AM (#358540) Journal

    and thus they need a pointer in the object to the VTABLE for their type.

    No, they don't. You could have a global table mapping object addresses to VTABLE addresses. Yes, it would be less efficient, but it would eliminate the VTABLE pointer inside the object. As a bonus, by deleting the entry on object destruction, an already-destructed object could no longer be used even if the full data happens to still lie in memory, as the VTABLE could no longer be found.

    --
    The Tao of math: The numbers you can count are not the real numbers.