Stories
Slash Boxes
Comments

SoylentNews is people

posted by n1 on Monday July 06 2015, @09:32PM   Printer-friendly
from the human-obsolescence dept.

MIT computer scientists have devised a new system that repairs dangerous software bugs by automatically importing functionality from other, more secure applications.

Remarkably, the system, dubbed CodePhage, doesn’t require access to the source code of the applications whose functionality it’s borrowing. Instead, it analyzes the applications’ execution and characterizes the types of security checks they perform. As a consequence, it can import checks from applications written in programming languages other than the one in which the program it’s repairing was written.

Once it’s imported code into a vulnerable application, CodePhage can provide a further layer of analysis that guarantees that the bug has been repaired.

[...] Sidiroglou-Douskos and his coauthors — MIT professor of computer science and engineering Martin Rinard, graduate student Fan Long, and Eric Lahtinen, a researcher in Rinard’s group — refer to the program CodePhage is repairing as the “recipient” and the program whose functionality it’s borrowing as the “donor.” To begin its analysis, CodePhage requires two sample inputs: one that causes the recipient to crash and one that doesn’t. A bug-locating program that the same group reported in March, dubbed DIODE, generates crash-inducing inputs automatically. But a user may simply have found that trying to open a particular file caused a crash.

[...] “The longer-term vision is that you never have to write a piece of code that somebody else has written before,” Rinard says. “The system finds that piece of code and automatically puts it together with whatever pieces of code you need to make your program work.”

“The technique of borrowing code from another program that has similar functionality, and being able to take a program that essentially is broken and fix it in that manner, is a pretty cool result,” says Emery Berger, a professor of computer science at the University of Massachusetts at Amherst. “To be honest, I was surprised that it worked at all.”


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by hendrikboom on Monday July 06 2015, @09:39PM

    by hendrikboom (1125) Subscriber Badge on Monday July 06 2015, @09:39PM (#205848) Homepage Journal

    Should I read this while believing the usual AI meaning of the words "efficient" and "effective"?

    • (Score: 2, Informative) by Anonymous Coward on Tuesday July 07 2015, @01:49AM

      by Anonymous Coward on Tuesday July 07 2015, @01:49AM (#205951)

      You should read this as copyright infringement, especially in the wake of Oracle v. Google.

      If you (or your code, AI, tamagotchi, whatever) copies compiled code from another executable that is not software you developed and own the rights to then you are illegally copying/stealing code from another software company.

      • (Score: 4, Touché) by penguinoid on Tuesday July 07 2015, @06:38AM

        by penguinoid (5331) on Tuesday July 07 2015, @06:38AM (#206005)

        My code is made of ones and zeros. If I see any ones or zeros in your code, expect to hear from my lawyer, you thief.

        --
        RIP Slashdot. Killed by greedy bastards.
  • (Score: 0) by Anonymous Coward on Monday July 06 2015, @09:52PM

    by Anonymous Coward on Monday July 06 2015, @09:52PM (#205852)

    Don't all programmers try to reuse code. In fact, don't programming languages come with header or object files that already contain usable code that people can then implement into their software. Aren't there already (open and closed source) functions available that people can download and implement into their code instead of having to write new code from scratch. Isn't the whole idea of object oriented programming that you can avoid having to reinvent the wheel so to speak.

    • (Score: 2, Informative) by jummama on Monday July 06 2015, @10:02PM

      by jummama (3969) on Monday July 06 2015, @10:02PM (#205858)

      Most programmers are humans though, not AI constructs dealing with machine code to write patches.

    • (Score: 4, Insightful) by davester666 on Tuesday July 07 2015, @07:36AM

      by davester666 (155) on Tuesday July 07 2015, @07:36AM (#206018)

      Yes, this is new. This is compiled application A, reverse-engineering application B, finding that it has bugs, then reverse-engineering application C, noticing that it doesn't have those same bugs, and replacing some of the code in application B with code from application C in an attempt to fix those bugs.

      I can see two things happening:
      -this company being sued for having their customers commit copyright violations and violating their licensing terms
      -this company being sued by their customers for introducing other bugs into their apps

      • (Score: 0) by Anonymous Coward on Tuesday July 07 2015, @08:07AM

        by Anonymous Coward on Tuesday July 07 2015, @08:07AM (#206030)

        -this application analysing itself, considering itself vulnerable and therefore incorporating new code into itself, thus changing its own functionality and finally evolving into Skynet. :-)

  • (Score: 2, Touché) by Anonymous Coward on Monday July 06 2015, @09:56PM

    by Anonymous Coward on Monday July 06 2015, @09:56PM (#205856)

    e.g. fixing up Slashcode by using Rehash?

  • (Score: 4, Insightful) by frojack on Monday July 06 2015, @10:17PM

    by frojack (1554) on Monday July 06 2015, @10:17PM (#205862) Journal

    So you've patched your executable, with no clear idea what exactly was done, no way to maintain the then-running executable.
    Next patch you have to install requires that you do this auto-patch all over again.

    And when they say it brings in code from some random other application written in any other language, I have to assume they
    mean its looking for machine code (assembly) patterns, or maybe library/API calls, which means now you may need additional libraries just to run the code.
    Virtually all compiled programs are rendered to a massive chain of api/library calls when you get down to the machine code level.

    I can't think of a dumber way to maintain code, when the system leaves you with NO maintainable code, (unless your mother tongue was x86 assembly).
     

    --
    No, you are mistaken. I've always had this sig.
    • (Score: 5, Informative) by RobotMonster on Monday July 06 2015, @10:22PM

      by RobotMonster (130) on Monday July 06 2015, @10:22PM (#205864) Journal

      Yeah - I agree 100%. Alan Turing had something to say about this... He called it The Halting Problem [wikipedia.org].

    • (Score: 1, Interesting) by Anonymous Coward on Monday July 06 2015, @10:29PM

      by Anonymous Coward on Monday July 06 2015, @10:29PM (#205869)

      Not how it works:

      "At each of the locations it identifies, CodePhage can dispense with most of the constraints described by the symbolic expression — the constraints that the recipient, too, imposes. Starting with the first location, it translates the few constraints that remain into the language of the recipient and inserts them into the source code. Then it runs the recipient again, using the crash-inducing input."

      • (Score: 0) by Anonymous Coward on Monday July 06 2015, @10:33PM

        by Anonymous Coward on Monday July 06 2015, @10:33PM (#205873)

        So, what, it just identifies a potential crash point and notifies the programmer that the program may freeze or crash at a particular point? Don't compilers already try to do this in many instances?

      • (Score: 3, Informative) by VortexCortex on Monday July 06 2015, @11:57PM

        by VortexCortex (4067) on Monday July 06 2015, @11:57PM (#205906)

        In other words, it's like a mechanic that just swaps parts and doesn't fix the root of the problem because it doesn't know what the problem is. You can even ask to have some engine noise fixed, but since it wasn't major damage from a crash it wouldn't even know something was wrong. "It runs. Everything seems fine to me." What's that REEEEEE! sound then?

        For instance, today I discovered a buffer overflow that I can bootstrap into a remote code execution exploit for the video container parser used by Firefox. I can "fix" the problem by swapping in a different encoder, but it doesn't really fix the core issue. I can also "fix" the problem by back tracing from the crash and adding a check to the specific video decoder to prevent the buffer overflow. "CodePhage can provide a further layer of analysis that guarantees that the bug has been repaired." CodePhage can't guarantee shit, and any security researcher using such absolute terms should be dismissed as insane. Either of these "fixes" will prevent the crash caused by that one video input, but my example exploit bearing video actually represents a whole class of exploits in the video decoding system. That means, I can pull off the same exploit with a number of video payloads and this "Automatic Code Repair" would not likely actually repair the correct piece of code if used in this real world instance. Worse than not fixing the bug, it'll give a false sense of security with it's "layer of analysis" (fancy way to say unit test) that just only shows the one input video doesn't crash the browser anymore, while still remaining vulnerable. Since automatic bug fixer software doesn't understand what was intended by the programmer it can, in fact, introduce other bugs. What if the function that gets swapped in assumed other parameter bounds had been checked by a prior function in the donor code-base, thus opening up recipient code to slew of potential exploits. If a programmer looked at the source they'd be able to grasp the issue more completely since they can see the intent of the code as well as what's really causing the bug, and be able to create a more complete fix. Until AI can read and understand source code and comments as well as a human, "automatic code repair" won't be worth a damn.

        Input Fuzzing already gives us a way to find crashes and point programmers at the issue. It's not like programmers are having a hard time fixing bugs once they have an input that reliably causes a crash, so automatic code repair is a crappy solution to a problem that no one has. Furthermore, not all bugs result in crashes -- most of them don't; However, many of the bugs that don't cause crashes can be leveraged to create exploits. In this case, not even input fuzzing will reveal the exploit vector, and thus CodePhage's guarantee isn't worth the bits it was emitted with. In fact, it's a prerequisite for an exploit NOT to crash the program in order to be successfully deployed. For instance, the root of a bug might be that decoding one buffer can modify another data buffer unintentionally (not just a stack smash), and that other buffer might then contain data that was already fully validated and range checked, but which is now corrupted and invalid. Cascades like this are common among exploit vectors, thus incorrect fixes allow malware authors to reuse the same exploit with slight modification -- In fact, I wouldn't be surprised if some malware authors point out a quick-fix that only appears to patch up the problem, preventing one version of the malware from running while allowing them to continue using the actual vector via a new version of the malware the day after patch Tuesday... Go ahead and add a bunch of checks to ensure the modified secondary buffer doesn't cause crashes, but that won't fix the primary buffer overrun, and subsequent refactoring (or just recompiling) can reintroduce the exploit vector.

        Instead of a crash, what could have been a red flag might have been some pink and yellow artefacts creeping across a few frames of video. That's all I needed to dig into the code and find a vector to exploit: A failed (visual) unit test. When looking at the code in a debugger that was running right before the crash I later induced I realized that spot wasn't where the bounds check was needed -- all of that code was operating perfectly as intended, I needed to look elsewhere if I wanted to create a fix... The crash payload did not indicate the source of the vector. All one needs is the effect of the bug in order to craft an exploit, a correct bug fix can be much more difficult to track down. This is why, IMO, white hat hackers who submit fully correct patches with their proof of concept exploits generally have more skill than black hat hackers who only demonstrate exploits.

        • (Score: 2) by TheLink on Tuesday July 07 2015, @03:28AM

          by TheLink (332) on Tuesday July 07 2015, @03:28AM (#205971) Journal
          And what if code A has exploit P, and code B has exploit Q, and code C has exploit R?

          Does the system keep swapping the code around as each exploit is encountered?

          And where does the system get the code in the first place? What happens if your program using that system copies and uses GPLv3 code?
    • (Score: 2) by soylentsandor on Friday July 10 2015, @09:11PM

      by soylentsandor (309) on Friday July 10 2015, @09:11PM (#207647)

      From TFA (yes, I know... I'm sorry):

      (...) it translates the few constraints that remain into the language of the recipient and inserts them into the source code.

      So it would seem it ought to be verifiable after all.

  • (Score: 0) by Anonymous Coward on Monday July 06 2015, @10:19PM

    by Anonymous Coward on Monday July 06 2015, @10:19PM (#205863)

    So you are using a piece of code that is "hacked" by this system to improve it.
    1) You call up vendor for support and "You been hacked! The CRC/MD5SUM do not match. WE will not suport you until your reload your system(s)."
    2) You are working, now it does not work. You call the PHAGE support to correct the error that their work did. And they tell you reload and try again.

    So who is going to pay for the downtime???

    Most likely you, since all computer code's EULA today disclaim any functionality or liability to do what it soppose to do.

    • (Score: 1, Interesting) by Anonymous Coward on Monday July 06 2015, @10:30PM

      by Anonymous Coward on Monday July 06 2015, @10:30PM (#205870)

      I think the point is for software developers to use it to correct their code before signing it.

      • (Score: 1) by tftp on Monday July 06 2015, @10:59PM

        by tftp (806) on Monday July 06 2015, @10:59PM (#205882) Homepage

        Not sure how would it know that a for loop from 1 to x.length()-1 should start from 2, and not from 0, because the first positions in my array are reserved, and the code inspector cannot know that unless it wants to read this here protocol description for a certain piece of machinery.

        In my experience there isn't that much in a typical application that can be instantly recognized and reused - and that is not already made into a library. Lots of code, for example, is spent on handling GUI controls - and no AI can know what I want to happen when I click this or that button.

        • (Score: 0) by Anonymous Coward on Monday July 06 2015, @11:10PM

          by Anonymous Coward on Monday July 06 2015, @11:10PM (#205885)

          I completely agree that a problem with something like this is that it's impossible for the code to properly anticipate the intent of the programmer but I was just trying to clarify that the intent of this code corrector wouldn't be for someone to try and fix already signed code but for someone to find bugs in code before signing it. Perhaps the compiler itself has a bug or at least there is a bug that doesn't manifest itself within the source code but only manifests itself after the code has been compiled due to how the compiler itself works. What something like this could do is attempt to find such bugs. However, I would argue that this isn't anything new and that existing compilers already attempt to identify potential crash points in a program and that compiler bugs are just as likely as bugs in this software itself. A programmer may misunderstand how a compiler interprets a piece of code but the proper solution isn't to create a separate program that tries to find potential crashes but to try and ensure the compiler itself works properly to identify potential crashes. This is essentially trying to solve one layer of problems (ie: compiler problems, programmer problems that can be identifies by good compilers, bad programmers that don't know how to avoid bad code) with another (another layer of code that code must go through to get checked for bugs).

      • (Score: 0) by Anonymous Coward on Tuesday July 07 2015, @03:29AM

        by Anonymous Coward on Tuesday July 07 2015, @03:29AM (#205972)

        Actually, I work with systems that sign the source code. So, any tampering is detected. Saved my butt many times. Also helps in knowing exactly what is makes it into an object module. So, if a source is rolled back, hence old sig and date time, the object auto-magically recompiles and restamps the date and time with the greatest of all parts that made it up. Much better than make, with forward moving date time only.

        Other note, they are installing changes based of other works. So, GPL code could be "copied" in to private source, corrupting ownership / copyright.

    • (Score: 1, Insightful) by Anonymous Coward on Monday July 06 2015, @10:41PM

      by Anonymous Coward on Monday July 06 2015, @10:41PM (#205876)

      Say bye-bye to your jobs

  • (Score: 3, Insightful) by mrchew1982 on Monday July 06 2015, @10:56PM

    by mrchew1982 (3565) on Monday July 06 2015, @10:56PM (#205880)

    Single celled organisms do this kind of thing all of the time, they import random genetic material to try to overcome environmental stress or even outcompete other organisms. It's a great part of why life is so resilient, thrives in unlikely places, and evolved/evolves so quickly.

    If artificial intelligence is going to take hold and thrive (terrifying though that may be) at some point it has to be able to repair its own codebase of even evolve new functionality, I view this as progress on that front, not as a way to fix simple code bloopers and remove maintainers. Of course given that this is a researcher looking for grant money, I doubt that it's even half as automated or efficient as they claim.

    • (Score: 1, Insightful) by Anonymous Coward on Tuesday July 07 2015, @03:42AM

      by Anonymous Coward on Tuesday July 07 2015, @03:42AM (#205976)

      The outcome of fixing a program is that the program should do what it's supposed to.

      The outcome of patching DNA in biology is to not do what it did before and to do something different in the future.

      Somehow, I don't see these as even similar.

      It seems to me that this proposed system is doing black box testing and patching. As a person with experience in high-reliability software (aerospace stuff), you don't do just black box testing, you do clear box so that all possible execution paths are covered. Maybe I just don't understand what they are proposing.

      • (Score: 3, Insightful) by penguinoid on Wednesday July 08 2015, @07:01AM

        by penguinoid (5331) on Wednesday July 08 2015, @07:01AM (#206365)

        Main difference is that each copy of a biological unit goes through rigorous unit testing. No exceptions.

        --
        RIP Slashdot. Killed by greedy bastards.
  • (Score: 2) by MichaelDavidCrawford on Tuesday July 07 2015, @12:14AM

    by MichaelDavidCrawford (2339) Subscriber Badge <mdcrawford@gmail.com> on Tuesday July 07 2015, @12:14AM (#205916) Homepage Journal

    I expect malware - at least some malware - uses data and executable code that it finds on its infected host. So when I set up a server I do "apt-get remove" for all the packages that my box does not really require.

    Especially the development tools. If I need to build a binary for my server, I build it locally then scp it to the server.

    --
    Yes I Have No Bananas. [gofundme.com]
    • (Score: 2) by cafebabe on Saturday July 18 2015, @08:36AM

      by cafebabe (894) on Saturday July 18 2015, @08:36AM (#210718) Journal

      Malware may consist of threaded code [wikipedia.org] which may utilize an arbitrary tail end of any subroutine. Address space randomization is supposed to make this more difficult but there exist workarounds to indirect via arbitrary offsets. Anyhow, the result is that almost any piece of executable code can be used against you. I agree that unused code should be removed to reduce attack surface. However, the remaining risk is quite large.

      --
      1702845791×2
  • (Score: 2) by PizzaRollPlinkett on Tuesday July 07 2015, @10:58AM

    by PizzaRollPlinkett (4512) on Tuesday July 07 2015, @10:58AM (#206068)

    How does this software know other applications are more secure? If we had software to determine the security of an application, we could just apply it to the current application to tell us what parts we need to make more secure, and skip the other steps.

    --
    (E-mail me if you want a pizza roll!)
  • (Score: 2) by lizardloop on Tuesday July 07 2015, @01:06PM

    by lizardloop (4716) on Tuesday July 07 2015, @01:06PM (#206095) Journal

    This reminds of a module a university lecturer taught me. One of his pet concepts was "the librarian" that all software companies should have. It would be the librarians job to ensure good code was preserved in an archive somewhere and that any potential duplication of that code was avoided. I.e if you were working on a project the librarian should be constantly suggesting existing code in the archive that might be applicable to what you're doing so that you don't "reinvent" things accidentally. I've thought for a long time that this could be automated and I've made my own very meagre experiments in this regard. This seems like another step in that direction and I look forward to further developments on this.

  • (Score: 2) by Geezer on Tuesday July 07 2015, @03:32PM

    by Geezer (511) on Tuesday July 07 2015, @03:32PM (#206141)

    I can see it now... trustednsaexploit.exe replacing security code in *.* with "good" code from youarepwned.foo