Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Wednesday March 21 2018, @04:35PM   Printer-friendly
from the rigid-coding-guidelines++ dept.

Anonymous coders can be identified using stylometry and machine learning techniques applied to executable binaries:

Source code stylometry – analyzing the syntax of source code for clues about the author – is an established technique used in digital forensics. As the US Army Research Laboratory (ARL) puts it, "Stylometry research has proven that anonymous code contributors can be de-anonymized to reveal the original author, provided the author has published code before."

The technique can help identify virus makers as well as unmask the creators of anti-censorship tools and other outlawed programs. It has the potential to pierce the privacy that many programmers assume they have.

Source code is designed to be human-readable, but binaries – typically produced by compiling or assembling source code – have fewer characteristics that may suggest authorship. Toolchains can be instructed to strip out variable names, function names and other symbols and metadata – which may say something about the author – and alter the structure of code through optimization.

Nonetheless, the researchers – Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt and Arvind Narayanan – building on work described in a 2011 paper, demonstrate that binary files can be analyzed using machine-learning and stylometric techniques.

If you want to remain an anonymous coder, you'd better not contribute anything under your own name publicly:

When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries (arXiv:1512.08546 [cs.CR])

We evaluate our approach on data from the Google Code Jam, obtaining attribution accuracy of up to 96% with 100 and 83% with 600 candidate programmers. We present an executable binary authorship attribution approach, for the first time, that is robust to basic obfuscations, a range of compiler optimization settings, and binaries that have been stripped of their symbol tables. We perform programmer de-anonymization using both obfuscated binaries, and real-world code found "in the wild" in single-author GitHub repositories and the recently leaked Nulled.IO hacker forum. We show that programmers who would like to remain anonymous need to take extreme countermeasures to protect their privacy.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by cocaine overdose on Thursday March 22 2018, @12:25AM

    Sure, run this in bash:

    sed 's:m..........................:m:;s:......$:de:;s:...::;s:....$:&:;s:..:n:;s:.........$::' <<< 'Re:Look at me, I know how to bullshit a paper'
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2