Stories
Slash Boxes
Comments

SoylentNews is people

posted by chromas on Monday August 13 2018, @02:22PM   Printer-friendly

Wired is reporting on a presentation given at Def Con 26 by Rachel Greenstadt, an associate professor of computer science at Drexel University, and Aylin Caliskan, Greenstadt's former PhD student and now an assistant professor at George Washington University, entitled Even Anonymous Coders Leave Fingerprints. Stylistic expression is uniquely identifiable and not anonymous, that includes code especially. There are privacy implications for many developers because as few as 50 metrics are needed to distinguish one coder from another.

The researchers don't rely on low-level features, like how code was formatted. Instead, they create "abstract syntax trees," which reflect code's underlying structure, rather than its arbitrary components. Their technique is akin to prioritizing someone's sentence structure, instead of whether they indent each line in a paragraph.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by pipedwho on Monday August 13 2018, @09:49PM

    by pipedwho (2032) on Monday August 13 2018, @09:49PM (#721136)

    The problem with this sort of technique is that the reliability of the match drops quickly as the search space grows relative to the number and quality of markers being used.

    So comparing a sample set of 100 coders may yield excellent results at 99% accuracy, while the match at 10000 coders is likely to result in 100 matches that are indistinguishable from each other with any semblance of probability. Increasing the search space makes this worse.

    And that assumes you have a reliable sample set to use as a reference. With online proliferation of information copy/paste and reference material/examples, the search space cannot be easily categorised in the same way DNA can be used to narrow down the search to family members cross referenced in other ways. Additionally, at higher search quantities the reliability drops to a point that a malfeasant intentionally doing a few things they normally avoid doing would likely skew them out of the match, or require the matching algorithms to be even less accurate (and therefore harvesting an even larger set of false positives to ween through).

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2