SoylentNews Comments | Machine Learning to Detect Software Vulnerabilities

Machine Learning to Detect Software Vulnerabilities

posted by takyon on Wednesday January 09 2019, @02:52PM

from the starving-programmers dept.

Bruce Schneier thinks the problem of finding software vulnerabilities seems well-suited for machine-learning (ML) systems:

Going through code line by line is just the sort of tedious problem that computers excel at, if we can only teach them what a vulnerability looks like. There are challenges with that, of course, but there is already a healthy amount of academic literature on the topic -- and research is continuing. There's every reason to expect ML systems to get better at this as time goes on, and some reason to expect them to eventually become very good at it.
Finding vulnerabilities can benefit both attackers and defenders, but it's not a fair fight. When an attacker's ML system finds a vulnerability in software, the attacker can use it to compromise systems. When a defender's ML system finds the same vulnerability, he or she can try to patch the system or program network defenses to watch for and block code that tries to exploit it.
But when the same system is in the hands of a software developer who uses it to find the vulnerability before the software is ever released, the developer fixes it so it can never be used in the first place. The ML system will probably be part of his or her software design tools and will automatically find and fix vulnerabilities while the code is still in development.

Original Submission

Starting Score:

point

Moderation

Interesting=1, Total=1

Extra 'Interesting' Modifier

Karma-Bonus Modifier

Total Score:

This discussion has been archived. No new comments can be posted.

Machine Learning to Detect Software Vulnerabilities | Log In/Create an Account | Top | 14 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Makes sense... Makes sense... (Score: 3, Interesting) by choose another one on Wednesday January 09 2019, @04:31PM (2 children)

by choose another one (515)

on Wednesday January 09 2019, @04:31PM (#784157)

Makes sense - ML is getting very good at learning games given only the rules and the "win" notification, see e.g. AlphaGo Zero.

if we can only teach them what a vulnerability looks like

This is where I slightly disagree (as well as with his space shuttle example*), teaching is not necessary - if (when) it gets in, what it did tells you what a vulnerability looks like. ML will give us a more "intelligent" direct and targeted, and likely much faster or at least more comprehensive for the same test time, version of fuzzing.

But when the same system is in the hands of a software developer who uses it to find the vulnerability before the software is ever released, the developer fixes it so it can never be used in the first place. The ML system will probably be part of his or her software design tools and will automatically find and fix vulnerabilities while the code is still in development.

There is another bit missing here - if the war is between the attackers and the developers, the attacker has massive financial rewards, the developer that finds bugs that then require fixing costing time and missed deadlines gets essentially no reward. This disparity is going to extend to the purchasing power for ML tools - until they become cheap-as-chips it will be mostly attackers who can justify investing in them, and many many devs will not have access. Still going to be a very one-sided war for some time.

[ *the shuttle wasn't an exception to prioritising fast and cheap over good, it simply failed to prioritize _any_ of those, arguably because it actually prioritized slow and expensive (maximum consumption of govt. money) over everything else. I think you could actually cover the entire development costs of the falcon program, and a flight, for the cost of one "reusable" shuttle refurbishment. ]

Starting Score:	1		point
Moderation		+1
Interesting=1, Total=1
Extra 'Interesting' Modifier		0
Karma-Bonus Modifier		+1

Total Score:		3

Re:Makes sense... Re:Makes sense... (Score: 5, Insightful) by hopdevil on Wednesday January 09 2019, @05:25PM (1 child)

by hopdevil (3356) on Wednesday January 09 2019, @05:25PM (#784189)

People tend to think ML will solve their problem, especially if they don't understand how ML works. Or even what their problem is. Sorry Schneier, but this is out of your league.
I know of several bug finding "systems", some of which use algorithms... and they are all terrible. The false positive rate is through the roof, wasting more developer time and forcing annoying coding practices to avoid the bug report. And they don't find the actual bugs.
if we can only teach them what a vulnerability looks like
If we can only teach humans what a vulnerability looks like. The first issue with applying ML in solving this (or any) problem; if you can't define the outcome in a statistically significant way, you will get garbage out. ML isn't magic, think of it as a messy statistics framework.
The second issue is that the "win condition" would need to be an actual win condition, but in pure static code analysis you cannot really get this -- there is no feedback loop. To have an actual true positive for a vulnerability, you need to have actually crossed some protected boundary. What is that boundary? Usually this is very unclear or subtle at best, even to the developer that wrote the code.
As an example, think doing what Schneier is suggesting. You would need to compile each block of code independently (and together) and test each variable. Sounds like fuzzing, which technically works, except where it doesn't. Maybe then the ML can help the fuzzer by tracing which code is getting executed and which values need to be in the input to reach more lines of code, but still you aren't using ML for vulnerabilities but throwing shit until something sticks. You will miss a lot, and ML algorithms are very expensive computationally.
Microsoft has probably the best progress thus far, but their solution requires a crazy amount of tweaking and isn't doing just static code analysis. They have some papers published but for the most part the technology is just not there yet.
Also, these guys came pretty close: https://www.darpa.mil/program/cyber-grand-challenge [darpa.mil]
*SPOILER* (click to show) *SPOILER* (click to hide)
Note: I am a security researcher. I have worked with many people and systems attempting to wield this sorcery.

Parent
- More than "close", but not replacing humans yet. (Score: 0) by Anonymous Coward on Wednesday January 09 2019, @08:34PM
  
  by Anonymous Coward on Wednesday January 09 2019, @08:34PM (#784270)
  
  We did pretty darn well in the Cyber Grand Challenge. Our code would fix bugs, then exploit them for attacking while patching them for defense. It really works.
  That said, there is so much variety in the real world. Humans aren't going away any time soon. We still hire lots of people to manually go over disassembled binary executables and crash dumps. Email me at users.sf.net, account name albert, if you are a US citizen and want to do that. There is no shortage of need for the people with low-level skills who can make sense of binary blobs and register state.
  
  Parent

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Machine Learning to Detect Software Vulnerabilities

Makes sense... Makes sense... (Score: 3, Interesting) by choose another one on Wednesday January 09 2019, @04:31PM (2 children)

Re:Makes sense... Re:Makes sense... (Score: 5, Insightful) by hopdevil on Wednesday January 09 2019, @05:25PM (1 child)

More than "close", but not replacing humans yet. (Score: 0) by Anonymous Coward on Wednesday January 09 2019, @08:34PM