Wired is reporting on a presentation given at Def Con 26 by Rachel Greenstadt, an associate professor of computer science at Drexel University, and Aylin Caliskan, Greenstadt's former PhD student and now an assistant professor at George Washington University, entitled Even Anonymous Coders Leave Fingerprints. Stylistic expression is uniquely identifiable and not anonymous, that includes code especially. There are privacy implications for many developers because as few as 50 metrics are needed to distinguish one coder from another.
The researchers don't rely on low-level features, like how code was formatted. Instead, they create "abstract syntax trees," which reflect code's underlying structure, rather than its arbitrary components. Their technique is akin to prioritizing someone's sentence structure, instead of whether they indent each line in a paragraph.
(Score: 2) by looorg on Monday August 13 2018, @04:31PM
Not to sound all grumpy but Textanalysis has come to source code ... who could have guessed. Nothing said it had to be "written" text as words and/or sentences. People putting any word to paper (or screen) apply themselves somehow to their work, no matter if it's written text or source code.