SoylentNews Comments | Automatic Code Repair

Automatic Code Repair

posted by n1 on Monday July 06 2015, @09:32PM

from the human-obsolescence dept.

"exec" writes:

MIT computer scientists have devised a new system that repairs dangerous software bugs by automatically importing functionality from other, more secure applications.

Remarkably, the system, dubbed CodePhage, doesn’t require access to the source code of the applications whose functionality it’s borrowing. Instead, it analyzes the applications’ execution and characterizes the types of security checks they perform. As a consequence, it can import checks from applications written in programming languages other than the one in which the program it’s repairing was written.
Once it’s imported code into a vulnerable application, CodePhage can provide a further layer of analysis that guarantees that the bug has been repaired.
[...] Sidiroglou-Douskos and his coauthors — MIT professor of computer science and engineering Martin Rinard, graduate student Fan Long, and Eric Lahtinen, a researcher in Rinard’s group — refer to the program CodePhage is repairing as the “recipient” and the program whose functionality it’s borrowing as the “donor.” To begin its analysis, CodePhage requires two sample inputs: one that causes the recipient to crash and one that doesn’t. A bug-locating program that the same group reported in March, dubbed DIODE, generates crash-inducing inputs automatically. But a user may simply have found that trying to open a particular file caused a crash.
[...] “The longer-term vision is that you never have to write a piece of code that somebody else has written before,” Rinard says. “The system finds that piece of code and automatically puts it together with whatever pieces of code you need to make your program work.”
“The technique of borrowing code from another program that has similar functionality, and being able to take a program that essentially is broken and fix it in that manner, is a pretty cool result,” says Emery Berger, a professor of computer science at the University of Massachusetts at Amherst. “To be honest, I was surprised that it worked at all.”

Original Submission

This discussion has been archived. No new comments can be posted.

Automatic Code Repair | Log In/Create an Account | Top | 29 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Left with an maintainable messLeft with an maintainable mess (Score: 4, Insightful) by frojack on Monday July 06 2015, @10:17PM

by frojack (1554) on Monday July 06 2015, @10:17PM (#205862) Journal

So you've patched your executable, with no clear idea what exactly was done, no way to maintain the then-running executable.
Next patch you have to install requires that you do this auto-patch all over again.

And when they say it brings in code from some random other application written in any other language, I have to assume they
mean its looking for machine code (assembly) patterns, or maybe library/API calls, which means now you may need additional libraries just to run the code.
Virtually all compiled programs are rendered to a massive chain of api/library calls when you get down to the machine code level.

I can't think of a dumber way to maintain code, when the system leaves you with NO maintainable code, (unless your mother tongue was x86 assembly).

--
No, you are mistaken. I've always had this sig.

Starting Score:	1		point
Moderation		+2
Insightful=2, Total=2
Extra 'Insightful' Modifier		0
Karma-Bonus Modifier		+1

Total Score:		4

Re:Left with an maintainable mess(Score: 5, Informative) by RobotMonster on Monday July 06 2015, @10:22PM

by RobotMonster (130) on Monday July 06 2015, @10:22PM (#205864) Journal

Yeah - I agree 100%. Alan Turing had something to say about this... He called it The Halting Problem [wikipedia.org].

Parent
Re:Left with an maintainable messRe:Left with an maintainable mess (Score: 1, Interesting) by Anonymous Coward on Monday July 06 2015, @10:29PM

by Anonymous Coward on Monday July 06 2015, @10:29PM (#205869)

Not how it works:
"At each of the locations it identifies, CodePhage can dispense with most of the constraints described by the symbolic expression — the constraints that the recipient, too, imposes. Starting with the first location, it translates the few constraints that remain into the language of the recipient and inserts them into the source code. Then it runs the recipient again, using the crash-inducing input."

Parent
- Re:Left with an maintainable mess(Score: 0) by Anonymous Coward on Monday July 06 2015, @10:33PM
  
  by Anonymous Coward on Monday July 06 2015, @10:33PM (#205873)
  
  So, what, it just identifies a potential crash point and notifies the programmer that the program may freeze or crash at a particular point? Don't compilers already try to do this in many instances?
  
  Parent
- Re:Left with an maintainable messRe:Left with an maintainable mess (Score: 3, Informative) by VortexCortex on Monday July 06 2015, @11:57PM
  
  by VortexCortex (4067) on Monday July 06 2015, @11:57PM (#205906)
  
  In other words, it's like a mechanic that just swaps parts and doesn't fix the root of the problem because it doesn't know what the problem is. You can even ask to have some engine noise fixed, but since it wasn't major damage from a crash it wouldn't even know something was wrong. "It runs. Everything seems fine to me." What's that REEEEEE! sound then?
  For instance, today I discovered a buffer overflow that I can bootstrap into a remote code execution exploit for the video container parser used by Firefox. I can "fix" the problem by swapping in a different encoder, but it doesn't really fix the core issue. I can also "fix" the problem by back tracing from the crash and adding a check to the specific video decoder to prevent the buffer overflow. "CodePhage can provide a further layer of analysis that guarantees that the bug has been repaired." CodePhage can't guarantee shit, and any security researcher using such absolute terms should be dismissed as insane. Either of these "fixes" will prevent the crash caused by that one video input, but my example exploit bearing video actually represents a whole class of exploits in the video decoding system. That means, I can pull off the same exploit with a number of video payloads and this "Automatic Code Repair" would not likely actually repair the correct piece of code if used in this real world instance. Worse than not fixing the bug, it'll give a false sense of security with it's "layer of analysis" (fancy way to say unit test) that just only shows the one input video doesn't crash the browser anymore, while still remaining vulnerable. Since automatic bug fixer software doesn't understand what was intended by the programmer it can, in fact, introduce other bugs. What if the function that gets swapped in assumed other parameter bounds had been checked by a prior function in the donor code-base, thus opening up recipient code to slew of potential exploits. If a programmer looked at the source they'd be able to grasp the issue more completely since they can see the intent of the code as well as what's really causing the bug, and be able to create a more complete fix. Until AI can read and understand source code and comments as well as a human, "automatic code repair" won't be worth a damn.
  Input Fuzzing already gives us a way to find crashes and point programmers at the issue. It's not like programmers are having a hard time fixing bugs once they have an input that reliably causes a crash, so automatic code repair is a crappy solution to a problem that no one has. Furthermore, not all bugs result in crashes -- most of them don't; However, many of the bugs that don't cause crashes can be leveraged to create exploits. In this case, not even input fuzzing will reveal the exploit vector, and thus CodePhage's guarantee isn't worth the bits it was emitted with. In fact, it's a prerequisite for an exploit NOT to crash the program in order to be successfully deployed. For instance, the root of a bug might be that decoding one buffer can modify another data buffer unintentionally (not just a stack smash), and that other buffer might then contain data that was already fully validated and range checked, but which is now corrupted and invalid. Cascades like this are common among exploit vectors, thus incorrect fixes allow malware authors to reuse the same exploit with slight modification -- In fact, I wouldn't be surprised if some malware authors point out a quick-fix that only appears to patch up the problem, preventing one version of the malware from running while allowing them to continue using the actual vector via a new version of the malware the day after patch Tuesday... Go ahead and add a bunch of checks to ensure the modified secondary buffer doesn't cause crashes, but that won't fix the primary buffer overrun, and subsequent refactoring (or just recompiling) can reintroduce the exploit vector.
  Instead of a crash, what could have been a red flag might have been some pink and yellow artefacts creeping across a few frames of video. That's all I needed to dig into the code and find a vector to exploit: A failed (visual) unit test. When looking at the code in a debugger that was running right before the crash I later induced I realized that spot wasn't where the bounds check was needed -- all of that code was operating perfectly as intended, I needed to look elsewhere if I wanted to create a fix... The crash payload did not indicate the source of the vector. All one needs is the effect of the bug in order to craft an exploit, a correct bug fix can be much more difficult to track down. This is why, IMO, white hat hackers who submit fully correct patches with their proof of concept exploits generally have more skill than black hat hackers who only demonstrate exploits.
  
  Parent
  - Re:Left with an maintainable mess(Score: 2) by TheLink on Tuesday July 07 2015, @03:28AM
    
    by TheLink (332) on Tuesday July 07 2015, @03:28AM (#205971) Journal
    
    And what if code A has exploit P, and code B has exploit Q, and code C has exploit R?
    
    Does the system keep swapping the code around as each exploit is encountered?
    
    And where does the system get the code in the first place? What happens if your program using that system copies and uses GPLv3 code?
    
    Parent
Re:Left with an maintainable mess(Score: 2) by soylentsandor on Friday July 10 2015, @09:11PM

by soylentsandor (309) on Friday July 10 2015, @09:11PM (#207647)

From TFA (yes, I know... I'm sorry):
(...) it translates the few constraints that remain into the language of the recipient and inserts them into the source code.
So it would seem it ought to be verifiable after all.

Parent

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Automatic Code Repair

Left with an maintainable messLeft with an maintainable mess (Score: 4, Insightful) by frojack on Monday July 06 2015, @10:17PM

Re:Left with an maintainable mess(Score: 5, Informative) by RobotMonster on Monday July 06 2015, @10:22PM

Re:Left with an maintainable messRe:Left with an maintainable mess (Score: 1, Interesting) by Anonymous Coward on Monday July 06 2015, @10:29PM

Re:Left with an maintainable mess(Score: 0) by Anonymous Coward on Monday July 06 2015, @10:33PM

Re:Left with an maintainable messRe:Left with an maintainable mess (Score: 3, Informative) by VortexCortex on Monday July 06 2015, @11:57PM

Re:Left with an maintainable mess(Score: 2) by TheLink on Tuesday July 07 2015, @03:28AM

Re:Left with an maintainable mess(Score: 2) by soylentsandor on Friday July 10 2015, @09:11PM