Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 17 submissions in the queue.
posted by martyb on Wednesday March 21 2018, @04:35PM   Printer-friendly
from the rigid-coding-guidelines++ dept.

Anonymous coders can be identified using stylometry and machine learning techniques applied to executable binaries:

Source code stylometry – analyzing the syntax of source code for clues about the author – is an established technique used in digital forensics. As the US Army Research Laboratory (ARL) puts it, "Stylometry research has proven that anonymous code contributors can be de-anonymized to reveal the original author, provided the author has published code before."

The technique can help identify virus makers as well as unmask the creators of anti-censorship tools and other outlawed programs. It has the potential to pierce the privacy that many programmers assume they have.

Source code is designed to be human-readable, but binaries – typically produced by compiling or assembling source code – have fewer characteristics that may suggest authorship. Toolchains can be instructed to strip out variable names, function names and other symbols and metadata – which may say something about the author – and alter the structure of code through optimization.

Nonetheless, the researchers – Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt and Arvind Narayanan – building on work described in a 2011 paper, demonstrate that binary files can be analyzed using machine-learning and stylometric techniques.

If you want to remain an anonymous coder, you'd better not contribute anything under your own name publicly:

When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries (arXiv:1512.08546 [cs.CR])

We evaluate our approach on data from the Google Code Jam, obtaining attribution accuracy of up to 96% with 100 and 83% with 600 candidate programmers. We present an executable binary authorship attribution approach, for the first time, that is robust to basic obfuscations, a range of compiler optimization settings, and binaries that have been stripped of their symbol tables. We perform programmer de-anonymization using both obfuscated binaries, and real-world code found "in the wild" in single-author GitHub repositories and the recently leaked Nulled.IO hacker forum. We show that programmers who would like to remain anonymous need to take extreme countermeasures to protect their privacy.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by maxwell demon on Thursday March 22 2018, @12:35PM (1 child)

    by maxwell demon (1608) on Thursday March 22 2018, @12:35PM (#656566) Journal

    Note that if such programmer identification are used in forensics, such a code transformation program could also be used to create false evidence against someone: Analyze code written by the target, then take some malware and optimize it to be "recognized" as the target's work by the algorithm. Spread the malware a little bit, then run the analysis on it (with the well-known result) and arrest the target who has been "identified" as the author.

    --
    The Tao of math: The numbers you can count are not the real numbers.
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by DannyB on Thursday March 22 2018, @02:29PM

    by DannyB (5839) Subscriber Badge on Thursday March 22 2018, @02:29PM (#656605) Journal

    Exactly what I had in mind when I said: Imagine the possibilities!

    --
    To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.