Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

Sections

SoylentNews

Log In

Create Account | Retrieve Password

Gift a Subscription

Why Gift

Even Anonymous Coders Leave Fingerprints

posted by chromas on Monday August 13 2018, @02:22PM

canopic jug writes:

Wired is reporting on a presentation given at Def Con 26 by Rachel Greenstadt, an associate professor of computer science at Drexel University, and Aylin Caliskan, Greenstadt's former PhD student and now an assistant professor at George Washington University, entitled Even Anonymous Coders Leave Fingerprints. Stylistic expression is uniquely identifiable and not anonymous, that includes code especially. There are privacy implications for many developers because as few as 50 metrics are needed to distinguish one coder from another.

The researchers don't rely on low-level features, like how code was formatted. Instead, they create "abstract syntax trees," which reflect code's underlying structure, rather than its arbitrary components. Their technique is akin to prioritizing someone's sentence structure, instead of whether they indent each line in a paragraph.

Original Submission

This discussion has been archived. No new comments can be posted.

Even Anonymous Coders Leave Fingerprints | Log In/Create an Account | Top | 29 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

(1)

just more proof just more proof (Score: 2) by takyon on Monday August 13 2018, @02:47PM (5 children)

by takyon (881) <takyonNO@SPAMsoylentnews.org> on Monday August 13 2018, @02:47PM (#720993) Journal

Anonymous Coders Could be Identified Even from Compiled Code [soylentnews.org]
Using the Internet will be your eventual death sentence.

--
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
- In other news... In other news... (Score: 2) by fyngyrz on Monday August 13 2018, @06:51PM (4 children)
  
  by fyngyrz (6567) on Monday August 13 2018, @06:51PM (#721096) Journal
  
  In other news, 99% of all c code positively identified by researchers as having been written by Kernighan & Ritchie.
  if (closedStyle) { braceForResults(); }
  
  Parent
  - Re:In other news... Re:In other news... (Score: 2) by legont on Tuesday August 14 2018, @01:13AM (3 children)
    
    by legont (4179) on Tuesday August 14 2018, @01:13AM (#721200)
    
    Hate it. Should be:
    if( closedStyle ) { braceForResults(); }
    That's so I can search for " closedStyle" and find them all
    
    --
    "Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
    
    Parent
    - search search (Score: 2) by fyngyrz on Tuesday August 14 2018, @01:46AM (2 children)
      
      by fyngyrz (6567) on Tuesday August 14 2018, @01:46AM (#721203) Journal
      
      That's so I can search for " closedStyle" and find them all
      Sounds like you need a better editor. Or search algorithm. :)
      
      Parent
      - Re:search Re:search (Score: 2) by legont on Tuesday August 14 2018, @03:59AM (1 child)
        
        by legont (4179) on Tuesday August 14 2018, @03:59AM (#721229)
        
        Nah, style got to support simplicity. If choosing between style and algorithm, algorithm is to die first.;)
        
        --
        "Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
        
        Parent
        
        Simple (Score: 2) by fyngyrz on Tuesday August 14 2018, @04:13PM
        
        by fyngyrz (6567) on Tuesday August 14 2018, @04:13PM (#721408) Journal
        
        The way I see it, "simple" means not having to type a certain way so your search will actually work. Then the algorithm supports whatever you do, rather than you supporting the algorithm.
        But what do I know. :)
        
        Parent
This isn't new, but maybe its news This isn't new, but maybe its news (Score: 4, Interesting) by Hyperturtle on Monday August 13 2018, @02:55PM (1 child)

by Hyperturtle (2824) on Monday August 13 2018, @02:55PM (#720996)

As a network engineer, I have known this for years.
For the devices with CLIs, and often, the ticketing systems in the organization with the hardware -- you can get a feel for who wrote what and what to expect when you go to look at that engineer's results.
I expect it's no different with programming or technical writing or master thesis statements. It also means that it becomes easy to tell who ran the wizard, copied from the internet, or shamelessly took someone else's work and used it as their own without so much as removing inadvertant metadata because the copier didn't understand what was going on.
This presents good and bad things to any individual -- if you are a fraud, it is easier to spot without necessarily having those checking really understand the work. And if you are not a fraud, you are more easily identified because of it.
And if you use the wizard and then copy the wizards configs to make it look like you know what you are doing, that is also easy to spot... it's like that kids song "one of these things is not like the others, one of these things is not the same..." who can spot the generic wizard auto-script hidden in the 'custom configuration'?
That's another good way to identify who claims to be an expert but isn't, or who leverages tools available to them without unnecessarily reinventing the wheel.
(and for those of us saving time and money, try to remember to remove the references to example.com before you blame the network... some network engineers CAN sniff the traffic and see that it doesn't work because the default domains in the example were not changed to reflect the business requirements!!)
Anyway the takeaway for me is that it's always been possible to determine who is writing what--given enough time and examples. Eventually, their style, or lack of one--comes through. This helps immensely in determining who is really writing their code (as opposed to everyone's favorite that outsourced his job and is just collecting the checks), who is struggling, who's a wizard and who's not, etc...
If this is alarming, then try to take the proper opsec to make sure you are harder to identify. Soylent also makes a great practice ground for your opsec training... Given enough examples, I am sure we can identify anyone that writes profilically and then tries to pass as anonymous coward... try to find the hidden Hyperturtle or Runaway or whoever! (Not that we would ever do that...)
- Re:This isn't new, but maybe its news (Score: 4, Insightful) by Runaway1956 on Monday August 13 2018, @03:34PM
  
  by Runaway1956 (2926) on Monday August 13 2018, @03:34PM (#721009) Journal
  
  I think there is a nuance here, that you may or may not be seeing. Sure, an expert in any given field can spot subtle differences between the work of his peers, or the work of his subordinates. He has the knowledge and experience, he can perform whatever task at hand in a dozen different ways. He KNOWS his field, and can quickly come to know the people in his field.
  These people seem to be promising a new tool to managers and law enforcement, that will enable non-experts to determine who has done what, and how they did it. Plug and play script kiddies using an AI to figure out who the "good guy" and the "bad guy" is.
  When you tell your supervisor that "Bob didn't write this, it's over his head, and none of the writing matches his work", that is treated like an opinion, and weighted according to a purely subjective point of view. If the computer tells them the same thing, well, "THAT'S SCIENCE!!" Expect to see this introduced into a court of law as evidence, one day soon. Even before that, expect to see it in the hands of an HR drone, justifying one person getting a raise, and another person being fired.
  On a sidenote - my handwriting is very distinctive. It's ugly, it's large, and I write with the same brand, style, and size of bold black pen all the time. To boot, I sign or initial pretty much every piece of paper I touch. Recently, one of the managers who sees my handwriting all the time and should know better, accused first me, then a couple other people of hanging a red tag on a piece of equipment. No signature, written with a cheap blue pen, in small, precise cursive letters. It almost, but didn't quite, convince me that it was a woman's handwriting.
  Oftentimes, the very people who should know, have the fewest clues to work with. My immediate supervisor had to tell the wannabe-manager that none of the people he accused could have done it. One of the persons accused is not even literate in English!! (From all accounts, he's very good in Spanish, but I couldn't verify that with my limited vocabulary!)
  
  Parent
Solution Solution (Score: 1, Insightful) by Anonymous Coward on Monday August 13 2018, @03:15PM (2 children)

by Anonymous Coward on Monday August 13 2018, @03:15PM (#721003)

Copy all your code from StackOverflow.
- Re:Solution (Score: 2) by takyon on Monday August 13 2018, @03:38PM
  
  by takyon (881) <takyonNO@SPAMsoylentnews.org> on Monday August 13 2018, @03:38PM (#721011) Journal
  
  Parent AC gave the only workable solution.
  Thank G_d for StackOverflow! Suck it, MDC! [soylentnews.org]
  
  --
  [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
  
  Parent
- Re:Solution (Score: 0) by Anonymous Coward on Tuesday August 14 2018, @02:36PM
  
  by Anonymous Coward on Tuesday August 14 2018, @02:36PM (#721376)
  
  I thought the solution would be to wear rubber gloves while coding. :-)
  
  Parent
Not fingerprints, though Not fingerprints, though (Score: 0) by Anonymous Coward on Monday August 13 2018, @03:31PM (2 children)

by Anonymous Coward on Monday August 13 2018, @03:31PM (#721008)

It's a catchy phrase, but fingerprints are obviously more uniquely identifiable. If you want to go all CSI/Forensics, a much closer analogy is tool marks... or bullet ballistics.
- Re:Not fingerprints, though Re:Not fingerprints, though (Score: 2) by Runaway1956 on Monday August 13 2018, @03:35PM (1 child)
  
  by Runaway1956 (2926) on Monday August 13 2018, @03:35PM (#721010) Journal
  
  Go easy on that ballistics nonsense. You'll end up triggering someone!
  
  Parent
  - Re:Not fingerprints, though (Score: 0) by Anonymous Coward on Monday August 13 2018, @03:46PM
    
    by Anonymous Coward on Monday August 13 2018, @03:46PM (#721015)
    
    Don't worry you get triggered enough for everyone.
    *STAND DOWN SJW HIT SQUAD!*
    
    Parent
How to be undetectable How to be undetectable (Score: 2) by ikanreed on Monday August 13 2018, @04:02PM (6 children)

by ikanreed (3164) on Monday August 13 2018, @04:02PM (#721021) Journal

1. Go to random subroutines, and put a comment at the top consisting of this text: //Who the fuck writes this garbage? I would never have done anything this fucking stupid
2. Include several if(!condition){//handle this later I'm sure it won't come up}
3. Follow zero project-wide indentation and code-style rules.
- Re:How to be undetectable Re:How to be undetectable (Score: 2) by takyon on Monday August 13 2018, @04:34PM (3 children)
  
  by takyon (881) <takyonNO@SPAMsoylentnews.org> on Monday August 13 2018, @04:34PM (#721032) Journal
  
  All or part of that could make your code easier to identify.
  Some real solutions are to copy or "steal" code, have other parts of your code written, tidied, or obfuscated by computers (not you) if possible, don't share code if you can't take the previous steps, or never post code that can be linked to your real name or identity, so that your code (written however you like it) can only be linked from one Anon (you) to another (also you).
  
  --
  [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
  
  Parent
  - Re:How to be undetectable Re:How to be undetectable (Score: 4, Funny) by ikanreed on Monday August 13 2018, @04:42PM (2 children)
    
    by ikanreed (3164) on Monday August 13 2018, @04:42PM (#721036) Journal
    
    I was trying to joke about what it seems like every coder does. I knew when I was posting it it was a kinda limp joke. Didn't realize it was so flaccid as to be unrecognizable.
    
    Parent
    - Re:How to be undetectable Re:How to be undetectable (Score: 2) by takyon on Monday August 13 2018, @04:59PM (1 child)
      
      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Monday August 13 2018, @04:59PM (#721042) Journal
      
      The problem is that somebody is going to end up reading this [ic.ac.uk] and consider it a legit strategy for writing anonymous code.
      
      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      
      Parent
      - Re:How to be undetectable (Score: 0) by Anonymous Coward on Tuesday August 14 2018, @02:47PM
        
        by Anonymous Coward on Tuesday August 14 2018, @02:47PM (#721383)
        
        omg I love that!!!
        
        Parent
- Re:How to be undetectable (Score: 0) by Anonymous Coward on Monday August 13 2018, @05:57PM
  
  by Anonymous Coward on Monday August 13 2018, @05:57PM (#721070)
  
  Those things don't go into the compiled code.
  
  Parent
- Re:How to be undetectable (Score: 0) by Anonymous Coward on Tuesday August 14 2018, @05:57PM
  
  by Anonymous Coward on Tuesday August 14 2018, @05:57PM (#721451)
  
  That's not fair, I at least put in:
  if (badcondition) { throw new Exception("How did that happen? "); }
  
  Parent
Textanalysis comes to code (Score: 2) by looorg on Monday August 13 2018, @04:31PM

by looorg (578) on Monday August 13 2018, @04:31PM (#721029)

Not to sound all grumpy but Textanalysis has come to source code ... who could have guessed. Nothing said it had to be "written" text as words and/or sentences. People putting any word to paper (or screen) apply themselves somehow to their work, no matter if it's written text or source code.
What if I learn something? What if I learn something? (Score: 2) by Snotnose on Monday August 13 2018, @05:08PM (3 children)

by Snotnose (1623) on Monday August 13 2018, @05:08PM (#721047)

As a C programmer I used ? quite a bit. Well, not quite a bit but probably 2-3 times more often than the next guy. Some people hate it. "It's too complicated, I don't understand it". Not my problem, learn the language.
Now I'm doing Java and OO. There are a lot of subtleties in those libraries, and as I figure them out my code changes, sometimes radically. I doubt you could match last year's Java (C with Java syntax) with today's Java (Java with OO as a baseline) to identify me as the author.

--
When the dust settled America realized it was saved by a porn star.
- Re:What if I learn something? (Score: 2) by maxwell demon on Monday August 13 2018, @06:34PM
  
  by maxwell demon (1608) on Monday August 13 2018, @06:34PM (#721084) Journal
  
  So in other words, as long as you are not new to the language it's easy to identify you as the author: It's when nobody else understands your code. ;-)
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
  
  Parent
- Re:What if I learn something? (Score: 0) by Anonymous Coward on Monday August 13 2018, @06:57PM
  
  by Anonymous Coward on Monday August 13 2018, @06:57PM (#721100)
  
  I once worked on a contract where we had to add features to an existing C coded program, but we were explicitly prohibited from modifying any existing code unless absolutely necessary to the new features. The existing code had been written by two people years ago -- one a very experienced C programmer, and the other a history major graduate who was still learning to code. (don't ask; nobody could explain that one to me, either)
  Who wrote which code, given the extremes of experience, was so clear that it was funny. You could even tell when the history guy wrote which part, as his learning curve was evident. The hardest part of the project was keeping myself from cleaning up his code. The client was adamant about that though, so... *shrug*
  I can't imagine being able to automate the detection of something like that, to be honest. But then there's a lot of things I don't know about.
  Oh, and ? is absolutely a perfectly cromulent operator, indeed. Personally I only used it when the operators were pretty simple, though. No need to deliberately obfuscate code for the next person working on it -- and plenty of times, the next person working on it is you, long after you forgot what the heck it was you were doing.
  
  Parent
- Re:What if I learn something? (Score: 2) by KritonK on Monday August 13 2018, @08:17PM
  
  by KritonK (465) on Monday August 13 2018, @08:17PM (#721121)
  
  I thought I'd mention that ? works in Java, as well. The , operator works as well, at least in for statements.
  
  Parent
it really doesn't matter (Score: 2) by stretch611 on Monday August 13 2018, @06:49PM

by stretch611 (6199) on Monday August 13 2018, @06:49PM (#721094)

If you are in a small company or team environment, this will be obvious. People on your team will pick up on your habits in coding style. The more obvious the style the sooner and easier it will be to pick up. However, the people that work with you will learn far quicker about your ability based on how you interact with them, what questions you ask, what suggestions and ideas you offer, and how many times they see you browsing over to copy code from various public websites. And even if you work remotely, they will learn how smart you are coding and learn how often you google for code... After all, a person who is clueless in every meeting is not going write good code without leaning on the rest of the team asking for constant help.
So in a non-anonymous team environment you will not be able to hide your style or lack thereof.
When you write code, you do tend to write it based on previous experience, the more you did something in the past, the more likely you are to do it in the future. It affects all aspects of programming. Do you write functions 125 lines long or do you create smaller functions no larger than one screen of code? How and where do you declare variables? do you abuse globals? do you always set a default or never and are your defaults, zero, empty strings, or nulls, only initial values? Do you actually write comments and are they actually useful? How about variable names? two letter variables or full words, and do you use camelCase? lack all capitalizations, use underscores between words, or only capitalize the first letter? Even how you organize functions into libraries can be a sign of distinction between coders, so can using included source code.
But, here is the real problem with identifying code... If you are in a company, your team will likely be able to determine it was you with very little effort at all. If you are on a huge public project on the internet, people probably will not spend the time and effort to look at your contributions... especially if they work. (of course if your stuff causes problems constantly, other people will be constantly looking at it to figure out how to fix it and your sloppy crap will be easy to spot) If you truly keep it anonymous, the cost to trace source code back to you based on full analysis of your coding habits will rarely be worth the effort. If it is worth the effort, any idiot should realize that source control requiring non-anonymous logins should be a requirement on the project to begin with.

--
Now with 5 covid vaccine shots/boosters altering my DNA :P
O RLY (Score: 2) by Bot on Monday August 13 2018, @07:20PM

by Bot (3902) on Monday August 13 2018, @07:20PM (#721109) Journal

- see here, the cracker worked all night, had installed a firmware backdoor, had booted the cracked workstation onto the corporate network, had deployed his payload. The change was of course logged so we could give a look at it. The code was obfuscated, and we have a hunch that it was compiled, then reverse engineered and recompiled on a different tool, all of it of course with aggressive optimization flags. A real mess.
- so he got away with it?
- no we got an ID at the police station by noon.
- wow, did they see through all that obfuscation?
- kinda. The guy left his fingerprints on the keyboard.

--
Account abandoned.
Inverse birthday problem (Score: 2) by pipedwho on Monday August 13 2018, @09:49PM

by pipedwho (2032) on Monday August 13 2018, @09:49PM (#721136)

The problem with this sort of technique is that the reliability of the match drops quickly as the search space grows relative to the number and quality of markers being used.
So comparing a sample set of 100 coders may yield excellent results at 99% accuracy, while the match at 10000 coders is likely to result in 100 matches that are indistinguishable from each other with any semblance of probability. Increasing the search space makes this worse.
And that assumes you have a reliable sample set to use as a reference. With online proliferation of information copy/paste and reference material/examples, the search space cannot be easily categorised in the same way DNA can be used to narrow down the search to family members cross referenced in other ways. Additionally, at higher search quantities the reliability drops to a point that a malfeasant intentionally doing a few things they normally avoid doing would likely skew them out of the match, or require the matching algorithms to be even less accurate (and therefore harvesting an even larger set of false positives to ween through).

(1)

Moderator Help

Anyone can make an omelet with eggs. The trick is to make one with none.

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Log In

Related Links

Even Anonymous Coders Leave Fingerprints

just more proof just more proof (Score: 2) by takyon on Monday August 13 2018, @02:47PM (5 children)

In other news... In other news... (Score: 2) by fyngyrz on Monday August 13 2018, @06:51PM (4 children)

Re:In other news... Re:In other news... (Score: 2) by legont on Tuesday August 14 2018, @01:13AM (3 children)

search search (Score: 2) by fyngyrz on Tuesday August 14 2018, @01:46AM (2 children)

Re:search Re:search (Score: 2) by legont on Tuesday August 14 2018, @03:59AM (1 child)

Simple (Score: 2) by fyngyrz on Tuesday August 14 2018, @04:13PM

This isn't new, but maybe its news This isn't new, but maybe its news (Score: 4, Interesting) by Hyperturtle on Monday August 13 2018, @02:55PM (1 child)

Re:This isn't new, but maybe its news (Score: 4, Insightful) by Runaway1956 on Monday August 13 2018, @03:34PM

Solution Solution (Score: 1, Insightful) by Anonymous Coward on Monday August 13 2018, @03:15PM (2 children)

Re:Solution (Score: 2) by takyon on Monday August 13 2018, @03:38PM

Re:Solution (Score: 0) by Anonymous Coward on Tuesday August 14 2018, @02:36PM

Not fingerprints, though Not fingerprints, though (Score: 0) by Anonymous Coward on Monday August 13 2018, @03:31PM (2 children)

Re:Not fingerprints, though Re:Not fingerprints, though (Score: 2) by Runaway1956 on Monday August 13 2018, @03:35PM (1 child)

Re:Not fingerprints, though (Score: 0) by Anonymous Coward on Monday August 13 2018, @03:46PM

How to be undetectable How to be undetectable (Score: 2) by ikanreed on Monday August 13 2018, @04:02PM (6 children)

Re:How to be undetectable Re:How to be undetectable (Score: 2) by takyon on Monday August 13 2018, @04:34PM (3 children)

Re:How to be undetectable Re:How to be undetectable (Score: 4, Funny) by ikanreed on Monday August 13 2018, @04:42PM (2 children)

Re:How to be undetectable Re:How to be undetectable (Score: 2) by takyon on Monday August 13 2018, @04:59PM (1 child)

Re:How to be undetectable (Score: 0) by Anonymous Coward on Tuesday August 14 2018, @02:47PM

Re:How to be undetectable (Score: 0) by Anonymous Coward on Monday August 13 2018, @05:57PM

Re:How to be undetectable (Score: 0) by Anonymous Coward on Tuesday August 14 2018, @05:57PM

Textanalysis comes to code (Score: 2) by looorg on Monday August 13 2018, @04:31PM

What if I learn something? What if I learn something? (Score: 2) by Snotnose on Monday August 13 2018, @05:08PM (3 children)

Re:What if I learn something? (Score: 2) by maxwell demon on Monday August 13 2018, @06:34PM

Re:What if I learn something? (Score: 0) by Anonymous Coward on Monday August 13 2018, @06:57PM

Re:What if I learn something? (Score: 2) by KritonK on Monday August 13 2018, @08:17PM

it really doesn't matter (Score: 2) by stretch611 on Monday August 13 2018, @06:49PM

O RLY (Score: 2) by Bot on Monday August 13 2018, @07:20PM

Inverse birthday problem (Score: 2) by pipedwho on Monday August 13 2018, @09:49PM