Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution

Authors: Lucy Simko (Paul G. Allen School of Computer Science & Engineering, University of Washington), Luke Zettlemoyer (Paul G. Allen School of Computer Science & Engineering, University of Washington), Tadayoshi Kohno (Paul G. Allen School of Computer Science & Engineering, University of Washington)

Volume: 2018
Issue: 1
Pages: 127–144
DOI: https://doi.org/10.1515/popets-2018-0007

Download PDF

Abstract: Source code attribution classifiers have recently become powerful. We consider the possibility that an adversary could craft code with the intention of causing a misclassification, i.e., creating a forgery of another author’s programming style in order to hide the forger’s own identity or blame the other author. We find that it is possible for a non-expert adversary to defeat such a system. In order to inform the design of adversarially resistant source code attribution classifiers, we conduct two studies with C/C++ programmers to explore the potential tactics and capabilities both of such adversaries and, conversely, of human analysts doing source code authorship attribution. Through the quantitative and qualitative analysis of these studies, we (1) evaluate a state-of-the-art machine classifier against forgeries, (2) evaluate programmers as human analysts/forgery detectors, and (3) compile a set of modifications made to create forgeries. Based on our analyses, we then suggest features that future source code attribution systems might incorporate in order to be adversarially resistant.

Keywords: Authorship Attribution, Source Code Attribution, Machine Learning, Adversarial Stylometry, Privacy, Computer Security

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution-NonCommercial-NoDerivs license.