SoylentNews Comments | How Facebook Catches Bugs in Its 100 Million Lines of Code

How Facebook Catches Bugs in Its 100 Million Lines of Code

posted by Fnord666 on Friday August 23 2019, @10:44AM

from the static-code-analysis dept.

Submitted via IRC for SoyCow2718

Facebook doesn't have the most stellar privacy and security track record, especially given that many of its notable gaffes were avoidable. But with billions of users and a gargantuan platform to defend, it's not easy to catch every flaw in the company's 100 million lines of code. So four years ago, Facebook engineers began building a customized assessment tool that not only checks for known types of bugs but can fully scan the entire codebase in under 30 minutes—helping engineers catch issues in tweaks, changes, or major new features before they go live.
The platform, dubbed Zoncolan, is a "static analysis" tool that maps the behavior and functions of the codebase and looks for potential problems in individual branches, as well as in the interactions of various paths through the program. Having people manually review endless code changes all the time is impractical at such a large scale. But static analysis scales extremely well, because it sets "rules" about undesirable architecture or code behavior, and automatically scans the system for these classes of bugs. See it once, catch it forever. Ideally, the system not only flags potential problems but gives engineers real-time feedback and helps them learn to avoid pitfalls.
"Every time an engineer makes a proposed change to our codebase, Zoncolan will start running in the background, and it will either report to that engineer directly or it will flag to one of our security engineers who's on call," says Pieter Hooimeijer, a security engineering manager at Facebook. "So it runs thousands of times a day, and found on the order of 1,500 issues in calendar year 2018."

Source: https://www.wired.com/story/facebook-zoncolan-static-analysis-tool/?verso=true

Original Submission

This discussion has been archived. No new comments can be posted.

How Facebook Catches Bugs in Its 100 Million Lines of Code | Log In/Create an Account | Top | 30 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

This is hilarious(ly stupid) This is hilarious(ly stupid) (Score: 4, Interesting) by DannyB on Friday August 23 2019, @02:29PM (8 children)

by DannyB (5839)

on Friday August 23 2019, @02:29PM (#884113) Journal

The platform, dubbed Zoncolan, is a "static analysis" tool that maps the behavior and functions of the codebase and looks for potential problems

<no-sarcasm>
So Facebook uses a dynamic language without the "bondage and discipline" static analysis of safer languages like Pascal or Java, C#, and others. (I first heard that B&D term applied to Pascal in the 80's.)

Of all the dynamic languages for web development, they pick PHP. The only good thing about PHP is that, thank God, at least it is not Perl.

Because: Reasons. All of the static vs. dynamic arguments have been said before and won't be repeated now. But one that I'll mention (cue whiny voice...): dynamic code should be tested with enough unit tests to be bug free.

Who could have guessed that dynamic languages, while fantastical for quick and dirty projects, interactive development, etc are perhaps unfit for porpoise when it comes to large sized and long lived codebases. Shocker.

So what do they do? (see quoted portion above), they now try to retrofit static analysis and associated disciplines that should have been part of the compiler in the first place.

And a thing about Unit Tests: The static analysis of the compiler should be YOUR FIRST LEVEL of unit tests. The B&D compiler's "annoying" type testing is actually what you might be sadly attempting to test with some unit tests. Or later with a retrofit static analysis step, which is the LMAO irony here.

I'll use a mishmash of language syntaxes here to make a point in a simple way.

TYPE
Color = (Red, Green, Blue)
Weekday = (Mon, Tue, Wed, Thur, Fri, Sat, Fun, Sun)
Colors = SET OF Color
Weekdaze = SET OF Weekday

Width = Integer
Height = Integer
Qty = Integer
XCoord = Integer
YCoord = Integer

VAR
colors : Colors
workdaze : Weekdaze
weakendaze : Weekdaze

Code:

colors = {Red}
workdaze = {Mon, Tue, Wed, Thur, Fri}
weakendaze = {Sat, Fun, Sun}

if( Color.Wed IN workdaze ) { . . . }

Now the three variables colors, workdaze and weakendaze are simple integers that represent a set of bits indicating which colors or days are in that set. So that IF statement test is a single machine instruction doing a bit test on the integer variable. So all of this high levelness isn't exactly inefficient.

And the type Color and Weekday, while being an enum type, represent simple integer constants, but unlike C, are NOT integers nor are at all compatible with integers. A statement like:

Color x = Color.RED

Simply assigns zero to x. But x is not, nor is in any possible way compatible with an integer value. Unless you type cast it, which you should not do.

Similar Qty and Width are both integers, but are NOT compatible with each other. I can't accidentally assign a Qty to a Width. When calling a Point object constructor:

XCoord x = 32
YCoord y = 48
Point p = new Point( x, y )

I cannot accidentally confuse the x and y coordinates as: new Point( y, x ), because they are type incompatible. Similarly I could not accidentally pass a Width value to a parameter of type Height.

The code reads so clearly, is at a much higher level, it is efficient. The compiler and many other tools can reason about your code. Especially a powerful IDE which is smart enough to predict what you are about to type. It can offer choices smartly, knowing when the only things you could possibly type are Red, Green or Blue. Unlike PHP, function signatures have to match, when a caller calls a callee. If I assign a function to a variable, or parameter, and pass it around, when it is called, the type information is still present (within the compiler, not necessarily at runtime depending on implementation and language), and so the compiler can whine and complain if you don't call a function with the right parameters -- even if the function you are calling is in a variable or parameter passed in. Or a function that was the return value of some other function. (functions passed around are just pointers when you get down to machine code)

Most of what I have just described was available in the early 1980's. By the mid to late 1980's in Pascal you could even do things along the lines of:

Memory = ^ARRAY OF Byte

Memory memory = (Memory) 0

Later on . . .

memory[ 0x3F82492C ] = 31
if( memory[ videoBuffer + 8 ] = 60 ) { . . . }

So you could do low level things like in C or assembler.

<Rant>
Yet people whined that it is not as efficient as C. Yet we've had decades of bugs in languages like C that have cost untold amounts of money. Created vast industries attempting to fix problems caused by applications written in way too low level a language. Even hecking device drivers could be written in a language as I just described. Pascal had RECORDs which were as good as C structs, and could be pointed to at any location where you had certain structures in memory.

Next let me get started on GC (garbage collection). While GC is not for certain types of code (eg, boot loaders, device drivers, microcontrollers), it is fantastic for application code. GC magically eliminates three entire classes of bugs. They just disappear!
1. Failing to dispose of a pointer
2. Double disposing of a pointer
3. Using a pointer after what it points to has been deallocated

These bugs just vanish in a greasy black ball of flaming smoke from heck! God only knows how much time and money has been wasted by these.

Modern GC, in the 21st century has now been the subject of DECADES of research. GCs on multiple cpu cores can be more efficient than storage management done by hand. But I won't belabor that point. I'll just say it has been shown to be true, even though once upon a time GC was costly.

One other thing about GC. It is always done now days on separate CPU cores because we have multiple cores. So the cost of deallocating is done OUT OF LINE of your primary application code. Where non GC code would have all these 'dispose' calls, those calls disappear and now cost zero cycles on the CPU executing the main application -- making it faster. Some other cpu, not affecting the application performance does the 'dispose' of objects that get deallocated.
</Rant>

Q. How do you know when a language is too low level?
A. When it forces you think about things that are IRRELEVANT to the problem you are trying to solve!
</no-sarcasm>

Let's all get back to using PHP now.

--
To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.

Starting Score:	1		point
Moderation		+2
Interesting=1, Informative=1, Total=2
Extra 'Interesting' Modifier		0
Karma-Bonus Modifier		+1

Total Score:		4

Re:This is hilarious(ly stupid) (Score: 3, Touché) by DannyB on Friday August 23 2019, @02:32PM

by DannyB (5839) on Friday August 23 2019, @02:32PM (#884115) Journal

Ugh . . .
if( Color.Wed IN workdaze ) { . . . }
Drat . . .
if( Weekday.Wed IN workdaze ) { . . . }
But the compiler would have complained.

--
To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.

Parent
Re:This is hilarious(ly stupid) Re:This is hilarious(ly stupid) (Score: 1, Informative) by Anonymous Coward on Friday August 23 2019, @04:00PM (5 children)

by Anonymous Coward on Friday August 23 2019, @04:00PM (#884183)

GCs on multiple cpu cores can be more efficient than storage management done by hand.
Having read similar statements about compilers doing code optimization for literal three decades and never seeing it come true in observable reality, I am inclined to take this "can" with a similar mineful of salt.
While a machine easily beats a human who does things mechanically like another machine, it does not have high-level understanding of the task which a human can and should apply. For example when managing memory, humans can use hierarchical allocators like talloc, pool allocators, region allocators, obstacks, freelists for object reuse, etc.

Parent
- Re:This is hilarious(ly stupid) Re:This is hilarious(ly stupid) (Score: 2) by DannyB on Friday August 23 2019, @04:50PM (3 children)
  
  by DannyB (5839) on Friday August 23 2019, @04:50PM (#884226) Journal
  
  Even if you don't accept GC as being more efficient overall, my point still stands that all of the deallocation happens on a different cpu core than the main application. Thus the execution of the primary task sees zero cpu cycles spent on 'dispose'. More cpu cores are cheap and getting cheaper as we speak.
  I would point out two state of the art GCs. (This is now talking about JDK, the Java ecosystem.)
  1. Red Hat's Shenandoah GC
  2. Oracle's ZGC
  Both are open source and part of OpenJDK. No matter which provider you get your OpenJDK from, and there are plenty. These two GCs can handle Terabytes of heap with 1 ms GC pause times.
  Even if you just plain don't like GC for some reason, it is a part of just about all new modern languages. Unless they are intended for really low-level work. Most programming in the world is done at a high enough level to use GC languages.
  
  --
  To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
  
  Parent
  - Re:This is hilarious(ly stupid) Re:This is hilarious(ly stupid) (Score: 0) by Anonymous Coward on Friday August 23 2019, @05:55PM (1 child)
    
    by Anonymous Coward on Friday August 23 2019, @05:55PM (#884267)
    
    my point still stands that all of the deallocation happens on a different cpu core than the main application
    And consequently brings all the thread-safety song and dance to everything memory related. Even for algorithms happily running on a single core. Maybe you think all those nice "Shenandoah*Barrier" things come free?
    Meanwhile a human can, if it is preferable, do all allocations for the worker threads from the main one prior to launching them, and this way avoid the overhead on memory management even when doing multithreaded processing.
    
    Parent
    - Re:This is hilarious(ly stupid) (Score: 2) by DannyB on Monday August 26 2019, @03:25PM
      
      by DannyB (5839) on Monday August 26 2019, @03:25PM (#885667) Journal
      
      A single threaded application benefits from having GC done on a separate thread. All of the 'dispose' cpu cycles of a single-thread app are suddenly removed from that app and spent in a different thread.
      
      --
      To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
      
      Parent
  - Re:This is hilarious(ly stupid) (Score: 0) by Anonymous Coward on Friday August 23 2019, @06:49PM
    
    by Anonymous Coward on Friday August 23 2019, @06:49PM (#884284)
    
    nim's GC sounds like it's pretty dang efficient.
    
    Parent
- Re:This is hilarious(ly stupid) (Score: 0) by Anonymous Coward on Saturday August 24 2019, @02:10AM
  
  by Anonymous Coward on Saturday August 24 2019, @02:10AM (#884503)
  
  You don't use GC because it is MORE EFFICIENT (a doubtful claim), you use GC because it makes writing programs so much faster and much less error prone!
  GC imposes an overhead compared to manual memory allocation/deallocation in that GC tends to use more memory. For most cases, WELL WORTH IT.
  
  Parent
Re:This is hilarious(ly stupid) (Score: 2) by krishnoid on Friday August 23 2019, @09:25PM

by krishnoid (1156) on Friday August 23 2019, @09:25PM (#884362)

These bugs just vanish in a greasy black ball of flaming smoke from heck! God only knows how much time and money has been wasted by these.
Unless you don't use GC. Then that greasy black ball of flaming smoke is actually more like a perpetual inferno [tumblr.com].

Parent

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

How Facebook Catches Bugs in Its 100 Million Lines of Code

This is hilarious(ly stupid) This is hilarious(ly stupid) (Score: 4, Interesting) by DannyB on Friday August 23 2019, @02:29PM (8 children)

Re:This is hilarious(ly stupid) (Score: 3, Touché) by DannyB on Friday August 23 2019, @02:32PM

Re:This is hilarious(ly stupid) Re:This is hilarious(ly stupid) (Score: 1, Informative) by Anonymous Coward on Friday August 23 2019, @04:00PM (5 children)

Re:This is hilarious(ly stupid) Re:This is hilarious(ly stupid) (Score: 2) by DannyB on Friday August 23 2019, @04:50PM (3 children)

Re:This is hilarious(ly stupid) Re:This is hilarious(ly stupid) (Score: 0) by Anonymous Coward on Friday August 23 2019, @05:55PM (1 child)

Re:This is hilarious(ly stupid) (Score: 2) by DannyB on Monday August 26 2019, @03:25PM

Re:This is hilarious(ly stupid) (Score: 0) by Anonymous Coward on Friday August 23 2019, @06:49PM

Re:This is hilarious(ly stupid) (Score: 0) by Anonymous Coward on Saturday August 24 2019, @02:10AM

Re:This is hilarious(ly stupid) (Score: 2) by krishnoid on Friday August 23 2019, @09:25PM