Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Friday August 23 2019, @10:44AM   Printer-friendly
from the static-code-analysis dept.

Submitted via IRC for SoyCow2718

Facebook doesn't have the most stellar privacy and security track record, especially given that many of its notable gaffes were avoidable. But with billions of users and a gargantuan platform to defend, it's not easy to catch every flaw in the company's 100 million lines of code. So four years ago, Facebook engineers began building a customized assessment tool that not only checks for known types of bugs but can fully scan the entire codebase in under 30 minutes—helping engineers catch issues in tweaks, changes, or major new features before they go live.

The platform, dubbed Zoncolan, is a "static analysis" tool that maps the behavior and functions of the codebase and looks for potential problems in individual branches, as well as in the interactions of various paths through the program. Having people manually review endless code changes all the time is impractical at such a large scale. But static analysis scales extremely well, because it sets "rules" about undesirable architecture or code behavior, and automatically scans the system for these classes of bugs. See it once, catch it forever. Ideally, the system not only flags potential problems but gives engineers real-time feedback and helps them learn to avoid pitfalls.

"Every time an engineer makes a proposed change to our codebase, Zoncolan will start running in the background, and it will either report to that engineer directly or it will flag to one of our security engineers who's on call," says Pieter Hooimeijer, a security engineering manager at Facebook. "So it runs thousands of times a day, and found on the order of 1,500 issues in calendar year 2018."

Source: https://www.wired.com/story/facebook-zoncolan-static-analysis-tool/?verso=true


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 3, Interesting) by jmichaelhudsondotnet on Friday August 23 2019, @11:19AM (5 children)

    by jmichaelhudsondotnet (8122) on Friday August 23 2019, @11:19AM (#884035) Journal

    What if I told you Facebook was just one big bug?

    #arrestzuckerberg

    • (Score: 2) by inertnet on Friday August 23 2019, @12:57PM (3 children)

      by inertnet (4071) Subscriber Badge on Friday August 23 2019, @12:57PM (#884075) Journal

      I wonder if even a single line of his original code is still among those millions.

      • (Score: 0) by Anonymous Coward on Friday August 23 2019, @01:09PM

        by Anonymous Coward on Friday August 23 2019, @01:09PM (#884078)

        what's original code?
        first commit, version 0.0.1 ...?

      • (Score: 3, Insightful) by jmichaelhudsondotnet on Friday August 23 2019, @04:31PM

        by jmichaelhudsondotnet (8122) on Friday August 23 2019, @04:31PM (#884216) Journal

        the most important thing as far as the people up top are concerned is that it is impossible to audit so the carefully placed backdoors will survive for as long as possible, they hope for generations i bet.

        Anyone who is still using facegag after what we know about what the company really is, a cult, are delusionally trying to cling to a byegone era.

      • (Score: 0) by Anonymous Coward on Saturday August 24 2019, @09:32AM

        by Anonymous Coward on Saturday August 24 2019, @09:32AM (#884643)

        do you mean what he stole from the twins?
        yeah right

    • (Score: 2) by DannyB on Friday August 23 2019, @01:50PM

      by DannyB (5839) Subscriber Badge on Friday August 23 2019, @01:50PM (#884098) Journal

      What if I told you that Facebook eats each bug that it catches when no Elmer's glue is handy.

      Incidentally, Facebook also tries to catch bugs in their code.

      --
      The anti vax hysteria didn't stop, it just died down.
  • (Score: 3, Funny) by Bot on Friday August 23 2019, @02:03PM

    by Bot (3902) on Friday August 23 2019, @02:03PM (#884102) Journal

    I can see mt. Zoncolan from here BTW
    https://goo.gl/maps/dVQ9naoyX4igJp3D9 [goo.gl]

    --
    Account abandoned.
  • (Score: 4, Interesting) by fadrian on Friday August 23 2019, @02:27PM (4 children)

    by fadrian (3194) on Friday August 23 2019, @02:27PM (#884112) Homepage

    Static analysis tools are useful. That being said, they often cast too broad a net, finding too many false positives. In addition, they take a lot of fiddling to tune and/or to get to shut up when aforementioned false positives occur. You'll spend a lot more time with these things than you expected to when you use them. But, all-in-all, more useful than not.

    --
    That is all.
    • (Score: 2) by DannyB on Friday August 23 2019, @05:08PM (3 children)

      by DannyB (5839) Subscriber Badge on Friday August 23 2019, @05:08PM (#884239) Journal

      One of the best static analysis tools is the compiler for a language that has strong type discipline and other safety features. The language design becomes one of the major safety features, enforced by the compiler.

      --
      The anti vax hysteria didn't stop, it just died down.
      • (Score: 2) by FatPhil on Friday August 23 2019, @05:25PM (1 child)

        by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Friday August 23 2019, @05:25PM (#884248) Homepage
        Heresy! That forces the programmer to explicitly say what he wants. That might do terrible things like permit reviewers to see if what the code does matches what it's supposed to do.

        >quack<!
        --
        Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
        • (Score: 2) by DannyB on Friday August 23 2019, @05:33PM

          by DannyB (5839) Subscriber Badge on Friday August 23 2019, @05:33PM (#884253) Journal

          You're right! Better to depend on unit testing which demonstrates that the code does the right thing, at least under certain conditions, rather than that it does the right thing in principle.

          Not that I'm against unit testing. Just for testing higher level things that compilers cannot (yet) test. Because we cannot (yet) express those ideas in a language.

          --
          The anti vax hysteria didn't stop, it just died down.
      • (Score: 2) by darkfeline on Saturday August 24 2019, @02:07AM

        by darkfeline (1030) on Saturday August 24 2019, @02:07AM (#884502) Homepage

        I believe Facebook uses PHP, so...

        --
        Join the SDF Public Access UNIX System today!
  • (Score: 4, Interesting) by DannyB on Friday August 23 2019, @02:29PM (8 children)

    by DannyB (5839) Subscriber Badge on Friday August 23 2019, @02:29PM (#884113) Journal

    The platform, dubbed Zoncolan, is a "static analysis" tool that maps the behavior and functions of the codebase and looks for potential problems

    <no-sarcasm>
    So Facebook uses a dynamic language without the "bondage and discipline" static analysis of safer languages like Pascal or Java, C#, and others. (I first heard that B&D term applied to Pascal in the 80's.)

    Of all the dynamic languages for web development, they pick PHP. The only good thing about PHP is that, thank God, at least it is not Perl.

    Because: Reasons. All of the static vs. dynamic arguments have been said before and won't be repeated now. But one that I'll mention (cue whiny voice...): dynamic code should be tested with enough unit tests to be bug free.

    Who could have guessed that dynamic languages, while fantastical for quick and dirty projects, interactive development, etc are perhaps unfit for porpoise when it comes to large sized and long lived codebases. Shocker.

    So what do they do? (see quoted portion above), they now try to retrofit static analysis and associated disciplines that should have been part of the compiler in the first place.

    And a thing about Unit Tests: The static analysis of the compiler should be YOUR FIRST LEVEL of unit tests. The B&D compiler's "annoying" type testing is actually what you might be sadly attempting to test with some unit tests. Or later with a retrofit static analysis step, which is the LMAO irony here.

    I'll use a mishmash of language syntaxes here to make a point in a simple way.

    TYPE
    Color = (Red, Green, Blue)
    Weekday = (Mon, Tue, Wed, Thur, Fri, Sat, Fun, Sun)
    Colors = SET OF Color
    Weekdaze = SET OF Weekday

    Width = Integer
    Height = Integer
    Qty = Integer
    XCoord = Integer
    YCoord = Integer

    VAR
    colors : Colors
    workdaze : Weekdaze
    weakendaze : Weekdaze

    Code:

    colors = {Red}
    workdaze = {Mon, Tue, Wed, Thur, Fri}
    weakendaze = {Sat, Fun, Sun}

    if( Color.Wed IN workdaze ) { . . . }

    Now the three variables colors, workdaze and weakendaze are simple integers that represent a set of bits indicating which colors or days are in that set. So that IF statement test is a single machine instruction doing a bit test on the integer variable. So all of this high levelness isn't exactly inefficient.

    And the type Color and Weekday, while being an enum type, represent simple integer constants, but unlike C, are NOT integers nor are at all compatible with integers. A statement like:

    Color x = Color.RED

    Simply assigns zero to x. But x is not, nor is in any possible way compatible with an integer value. Unless you type cast it, which you should not do.

    Similar Qty and Width are both integers, but are NOT compatible with each other. I can't accidentally assign a Qty to a Width. When calling a Point object constructor:

    XCoord x = 32
    YCoord y = 48
    Point p = new Point( x, y )

    I cannot accidentally confuse the x and y coordinates as: new Point( y, x ), because they are type incompatible. Similarly I could not accidentally pass a Width value to a parameter of type Height.

    The code reads so clearly, is at a much higher level, it is efficient. The compiler and many other tools can reason about your code. Especially a powerful IDE which is smart enough to predict what you are about to type. It can offer choices smartly, knowing when the only things you could possibly type are Red, Green or Blue. Unlike PHP, function signatures have to match, when a caller calls a callee. If I assign a function to a variable, or parameter, and pass it around, when it is called, the type information is still present (within the compiler, not necessarily at runtime depending on implementation and language), and so the compiler can whine and complain if you don't call a function with the right parameters -- even if the function you are calling is in a variable or parameter passed in. Or a function that was the return value of some other function. (functions passed around are just pointers when you get down to machine code)

    Most of what I have just described was available in the early 1980's. By the mid to late 1980's in Pascal you could even do things along the lines of:

    Memory = ^ARRAY OF Byte

    Memory memory = (Memory) 0

    Later on . . .

    memory[ 0x3F82492C ] = 31
    if( memory[ videoBuffer + 8 ] = 60 ) { . . . }

    So you could do low level things like in C or assembler.

    <Rant>
    Yet people whined that it is not as efficient as C. Yet we've had decades of bugs in languages like C that have cost untold amounts of money. Created vast industries attempting to fix problems caused by applications written in way too low level a language. Even hecking device drivers could be written in a language as I just described. Pascal had RECORDs which were as good as C structs, and could be pointed to at any location where you had certain structures in memory.

    Next let me get started on GC (garbage collection). While GC is not for certain types of code (eg, boot loaders, device drivers, microcontrollers), it is fantastic for application code. GC magically eliminates three entire classes of bugs. They just disappear!
    1. Failing to dispose of a pointer
    2. Double disposing of a pointer
    3. Using a pointer after what it points to has been deallocated

    These bugs just vanish in a greasy black ball of flaming smoke from heck! God only knows how much time and money has been wasted by these.

    Modern GC, in the 21st century has now been the subject of DECADES of research. GCs on multiple cpu cores can be more efficient than storage management done by hand. But I won't belabor that point. I'll just say it has been shown to be true, even though once upon a time GC was costly.

    One other thing about GC. It is always done now days on separate CPU cores because we have multiple cores. So the cost of deallocating is done OUT OF LINE of your primary application code. Where non GC code would have all these 'dispose' calls, those calls disappear and now cost zero cycles on the CPU executing the main application -- making it faster. Some other cpu, not affecting the application performance does the 'dispose' of objects that get deallocated.
    </Rant>

    Q. How do you know when a language is too low level?
    A. When it forces you think about things that are IRRELEVANT to the problem you are trying to solve!
    </no-sarcasm>

    Let's all get back to using PHP now.

    --
    The anti vax hysteria didn't stop, it just died down.
    • (Score: 3, TouchĂ©) by DannyB on Friday August 23 2019, @02:32PM

      by DannyB (5839) Subscriber Badge on Friday August 23 2019, @02:32PM (#884115) Journal

      Ugh . . .
      if( Color.Wed IN workdaze ) { . . . }

      Drat . . .

      if( Weekday.Wed IN workdaze ) { . . . }

      But the compiler would have complained.

      --
      The anti vax hysteria didn't stop, it just died down.
    • (Score: 1, Informative) by Anonymous Coward on Friday August 23 2019, @04:00PM (5 children)

      by Anonymous Coward on Friday August 23 2019, @04:00PM (#884183)

      GCs on multiple cpu cores can be more efficient than storage management done by hand.

      Having read similar statements about compilers doing code optimization for literal three decades and never seeing it come true in observable reality, I am inclined to take this "can" with a similar mineful of salt.

      While a machine easily beats a human who does things mechanically like another machine, it does not have high-level understanding of the task which a human can and should apply. For example when managing memory, humans can use hierarchical allocators like talloc, pool allocators, region allocators, obstacks, freelists for object reuse, etc.

      • (Score: 2) by DannyB on Friday August 23 2019, @04:50PM (3 children)

        by DannyB (5839) Subscriber Badge on Friday August 23 2019, @04:50PM (#884226) Journal

        Even if you don't accept GC as being more efficient overall, my point still stands that all of the deallocation happens on a different cpu core than the main application. Thus the execution of the primary task sees zero cpu cycles spent on 'dispose'. More cpu cores are cheap and getting cheaper as we speak.

        I would point out two state of the art GCs. (This is now talking about JDK, the Java ecosystem.)
        1. Red Hat's Shenandoah GC
        2. Oracle's ZGC
        Both are open source and part of OpenJDK. No matter which provider you get your OpenJDK from, and there are plenty. These two GCs can handle Terabytes of heap with 1 ms GC pause times.

        Even if you just plain don't like GC for some reason, it is a part of just about all new modern languages. Unless they are intended for really low-level work. Most programming in the world is done at a high enough level to use GC languages.

        --
        The anti vax hysteria didn't stop, it just died down.
        • (Score: 0) by Anonymous Coward on Friday August 23 2019, @05:55PM (1 child)

          by Anonymous Coward on Friday August 23 2019, @05:55PM (#884267)

          my point still stands that all of the deallocation happens on a different cpu core than the main application

          And consequently brings all the thread-safety song and dance to everything memory related. Even for algorithms happily running on a single core. Maybe you think all those nice "Shenandoah*Barrier" things come free?

          Meanwhile a human can, if it is preferable, do all allocations for the worker threads from the main one prior to launching them, and this way avoid the overhead on memory management even when doing multithreaded processing.

          • (Score: 2) by DannyB on Monday August 26 2019, @03:25PM

            by DannyB (5839) Subscriber Badge on Monday August 26 2019, @03:25PM (#885667) Journal

            A single threaded application benefits from having GC done on a separate thread. All of the 'dispose' cpu cycles of a single-thread app are suddenly removed from that app and spent in a different thread.

            --
            The anti vax hysteria didn't stop, it just died down.
        • (Score: 0) by Anonymous Coward on Friday August 23 2019, @06:49PM

          by Anonymous Coward on Friday August 23 2019, @06:49PM (#884284)

          nim's GC sounds like it's pretty dang efficient.

      • (Score: 0) by Anonymous Coward on Saturday August 24 2019, @02:10AM

        by Anonymous Coward on Saturday August 24 2019, @02:10AM (#884503)

        You don't use GC because it is MORE EFFICIENT (a doubtful claim), you use GC because it makes writing programs so much faster and much less error prone!
        GC imposes an overhead compared to manual memory allocation/deallocation in that GC tends to use more memory. For most cases, WELL WORTH IT.

    • (Score: 2) by krishnoid on Friday August 23 2019, @09:25PM

      by krishnoid (1156) on Friday August 23 2019, @09:25PM (#884362)

      These bugs just vanish in a greasy black ball of flaming smoke from heck! God only knows how much time and money has been wasted by these.

      Unless you don't use GC. Then that greasy black ball of flaming smoke is actually more like a perpetual inferno [tumblr.com].

  • (Score: 3, Interesting) by Anonymous Coward on Friday August 23 2019, @03:05PM (8 children)

    by Anonymous Coward on Friday August 23 2019, @03:05PM (#884143)

    Am I the only one wondering how the fuck they have 100M lines of code?

    • (Score: 2) by digitalaudiorock on Friday August 23 2019, @03:53PM (1 child)

      by digitalaudiorock (688) on Friday August 23 2019, @03:53PM (#884172)

      I was just about to ask the same! That is patently fucking absurd.

      • (Score: 2) by DannyB on Friday August 23 2019, @05:37PM

        by DannyB (5839) Subscriber Badge on Friday August 23 2019, @05:37PM (#884255) Journal

        I bet the NSA doesn't think it is so absurd. It seems ideally suited for placing backdoors and other goodies.

        --
        The anti vax hysteria didn't stop, it just died down.
    • (Score: 2) by DannyB on Friday August 23 2019, @04:56PM

      by DannyB (5839) Subscriber Badge on Friday August 23 2019, @04:56PM (#884229) Journal

      In dynamic languages with loose type discipline, it is hard to refactor a large code base. Where one would normally see common patterns and refactor out the commonality, this may not be so easy to do in a large project using a dynamic language. In a strongly typed language with a good IDE, the tools do a lot of this work and analysis for you.

      Thus, a lot of reinventing of similar wheels probably happens.

      Speculation: Since the typing is loose, probably a lot of other semantics are loose as well. Probably functions that obviously seem to do something, have weird edge or corner cases and are poorly documented as to parameters and results. It's easier to just roll your own version of this function to be sure it is done 'right'.

      --
      The anti vax hysteria didn't stop, it just died down.
    • (Score: 2) by Megahard on Friday August 23 2019, @06:20PM (2 children)

      by Megahard (4782) on Friday August 23 2019, @06:20PM (#884274)

      Apollo 11: 145K lines of code
      Mars Curiosity Rover: 500K lines of code
      Facebook: 100M lines of code

      Something is seriously wrong here.

      • (Score: 2) by krishnoid on Friday August 23 2019, @09:22PM

        by krishnoid (1156) on Friday August 23 2019, @09:22PM (#884361)

        You got a problem with that? Then invent your own Facebook! With blackjack, and hookers! And send it to the moon!

      • (Score: 2) by DannyB on Monday August 26 2019, @03:27PM

        by DannyB (5839) Subscriber Badge on Monday August 26 2019, @03:27PM (#885670) Journal

        For Apollo 11, a "line" of code was probably one machine instruction.

        --
        The anti vax hysteria didn't stop, it just died down.
    • (Score: 2) by krishnoid on Friday August 23 2019, @09:21PM

      by krishnoid (1156) on Friday August 23 2019, @09:21PM (#884360)

      I can't answer how, but I can answer why. This way the bugs get confused, trapped, and just plain exhausted when trying to get anywhere, and can never make it out into the real world.

    • (Score: 0) by Anonymous Coward on Saturday August 24 2019, @02:05AM

      by Anonymous Coward on Saturday August 24 2019, @02:05AM (#884500)

      100M across *all* of their stuff: The web site, mobile apps on Android & iOS, desktop apps for Windows & Mac, etc. And for each of those, translations in most(?) of the world's major languages.

(1)