Machine Learning to Detect Software Vulnerabilities

So do I. (Score: 0) by Anonymous Coward on Wednesday January 09 2019, @04:02PM

by Anonymous Coward on Wednesday January 09 2019, @04:02PM (#784152)

I've been quietly suggesting this to a few friends for the past 3-4 years who were into ML.

It's the next big leap forward in cost reducing bug fixing and reverse engineering.

Makes sense... Makes sense... (Score: 3, Interesting) by choose another one on Wednesday January 09 2019, @04:31PM (2 children)

on Wednesday January 09 2019, @04:31PM (#784157)

Makes sense - ML is getting very good at learning games given only the rules and the "win" notification, see e.g. AlphaGo Zero.

if we can only teach them what a vulnerability looks like

This is where I slightly disagree (as well as with his space shuttle example*), teaching is not necessary - if (when) it gets in, what it did tells you what a vulnerability looks like. ML will give us a more "intelligent" direct and targeted, and likely much faster or at least more comprehensive for the same test time, version of fuzzing.

But when the same system is in the hands of a software developer who uses it to find the vulnerability before the software is ever released, the developer fixes it so it can never be used in the first place. The ML system will probably be part of his or her software design tools and will automatically find and fix vulnerabilities while the code is still in development.

There is another bit missing here - if the war is between the attackers and the developers, the attacker has massive financial rewards, the developer that finds bugs that then require fixing costing time and missed deadlines gets essentially no reward. This disparity is going to extend to the purchasing power for ML tools - until they become cheap-as-chips it will be mostly attackers who can justify investing in them, and many many devs will not have access. Still going to be a very one-sided war for some time.

[ *the shuttle wasn't an exception to prioritising fast and cheap over good, it simply failed to prioritize _any_ of those, arguably because it actually prioritized slow and expensive (maximum consumption of govt. money) over everything else. I think you could actually cover the entire development costs of the falcon program, and a flight, for the cost of one "reusable" shuttle refurbishment. ]

Re:Makes sense... Re:Makes sense... (Score: 5, Insightful) by hopdevil on Wednesday January 09 2019, @05:25PM (1 child)

by hopdevil (3356) on Wednesday January 09 2019, @05:25PM (#784189)

People tend to think ML will solve their problem, especially if they don't understand how ML works. Or even what their problem is. Sorry Schneier, but this is out of your league.
I know of several bug finding "systems", some of which use algorithms... and they are all terrible. The false positive rate is through the roof, wasting more developer time and forcing annoying coding practices to avoid the bug report. And they don't find the actual bugs.
if we can only teach them what a vulnerability looks like
If we can only teach humans what a vulnerability looks like. The first issue with applying ML in solving this (or any) problem; if you can't define the outcome in a statistically significant way, you will get garbage out. ML isn't magic, think of it as a messy statistics framework.
The second issue is that the "win condition" would need to be an actual win condition, but in pure static code analysis you cannot really get this -- there is no feedback loop. To have an actual true positive for a vulnerability, you need to have actually crossed some protected boundary. What is that boundary? Usually this is very unclear or subtle at best, even to the developer that wrote the code.
As an example, think doing what Schneier is suggesting. You would need to compile each block of code independently (and together) and test each variable. Sounds like fuzzing, which technically works, except where it doesn't. Maybe then the ML can help the fuzzer by tracing which code is getting executed and which values need to be in the input to reach more lines of code, but still you aren't using ML for vulnerabilities but throwing shit until something sticks. You will miss a lot, and ML algorithms are very expensive computationally.
Microsoft has probably the best progress thus far, but their solution requires a crazy amount of tweaking and isn't doing just static code analysis. They have some papers published but for the most part the technology is just not there yet.
Also, these guys came pretty close: https://www.darpa.mil/program/cyber-grand-challenge [darpa.mil]
*SPOILER* (click to show) *SPOILER* (click to hide)
Note: I am a security researcher. I have worked with many people and systems attempting to wield this sorcery.

Parent
- More than "close", but not replacing humans yet. (Score: 0) by Anonymous Coward on Wednesday January 09 2019, @08:34PM
  
  by Anonymous Coward on Wednesday January 09 2019, @08:34PM (#784270)
  
  We did pretty darn well in the Cyber Grand Challenge. Our code would fix bugs, then exploit them for attacking while patching them for defense. It really works.
  That said, there is so much variety in the real world. Humans aren't going away any time soon. We still hire lots of people to manually go over disassembled binary executables and crash dumps. Email me at users.sf.net, account name albert, if you are a US citizen and want to do that. There is no shortage of need for the people with low-level skills who can make sense of binary blobs and register state.
  
  Parent

Not feeling all of this Not feeling all of this (Score: 3, Insightful) by fyngyrz on Wednesday January 09 2019, @04:52PM (7 children)

by fyngyrz (6567) on Wednesday January 09 2019, @04:52PM (#784168) Journal

the ML system will probably be part of his or her software design tools and will automatically find and fix vulnerabilities while the code is still in development.

I don't want something to go in and "fix" things; I'm very happy to have something point them out, but I want to do (or at least confirm) the fixes myself so I know exactly how they integrate (or don't) with what I was trying to accomplish. Not to mention learning to anticipate them and not cause them in the first place. Having such a tool is obviously valuable. Depending on it seems like a recipe for disaster to me.

--
We should start referring to "age" as "levels."
So when you're LVL 80, you're awesome.

How about . . . (Score: 2) by DannyB on Wednesday January 09 2019, @09:25PM

by DannyB (5839)

on Wednesday January 09 2019, @09:25PM (#784280) Journal

Machine Learning that identifies patterns unique to tech-illiterate, gullible, naive, and highly exploitable USERS.

Wouldn't those targets be just as valuable, perhaps more so, than software vulnerabilities.

You can protect against an implementation error or a design flaw. But can you really protect from an idiot? (yes. Yes. I said Yes. Ooops, I didn't mean to delete that! It must be the vendor's fault! Blame Canada! Etc)

By looking at enough social media data it might be possible to spot (1) suckers who can be conned out of money sent to nigeria, and (2) walking security exploits that will send their password to "the IT guy" who called them to help fix a problem they didn't know existed.

--
To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.

Code is but maths, we can PROVE it works (Score: 0) by Anonymous Coward on Thursday January 10 2019, @02:41AM

by Anonymous Coward on Thursday January 10 2019, @02:41AM (#784427)

In the end the practical solution to bugs is approaching from the other end! Write perfect code, simple as that. The problem with the traditional poke'n'hope approch is you can prove a bug exists but you can never prove by testing that bugs don't exist...

Yes doing it the formal math way will be expensive but it will be GOOD. No more software fuckups, ever. Those tend to be expensive as well and occur at inconvenient times... Having said that the hardware fuckups will be there with us for all eternity... :)

If you want to throw in machine learning, teach the box to do formal proofs. The how part is left as an exercise for the reader.

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Log In