Towards Seamless Tracking-Free Web: Improved Detection of Trackers via One-class Learning

Authors: Muhammad Ikram (Data61, CSIRO and UNSW, Sydney, Australia), Hassan Jameel Asghar (Data61, CSIRO, Sydney, Australia), Mohamed Ali Kaafar (Data61, CSIRO, Sydney, Australia), Anirban Mahanti (Data61, CSIRO, Sydney, Australia), Balachandar Krishnamurthy (ATT Research Lab, New York, USA)

Volume: 2017
Issue: 1
Pages: 79–99
DOI: https://doi.org/10.1515/popets-2017-0006

Download PDF

Abstract: Numerous tools have been developed to aggressively block the execution of popular JavaScript programs in Web browsers. Such blocking also affects functionality of webpages and impairs user experience. As a consequence, many privacy preserving tools that have been developed to limit online tracking, often executed via JavaScript programs, may suffer from poor performance and limited uptake. A mechanism that can isolate JavaScript programs necessary for proper functioning of the website from tracking JavaScript programs would thus be useful. Through the use of a manually labelled dataset composed of 2,612 JavaScript programs, we show how current privacy preserving tools are ineffective in finding the right balance between blocking tracking JavaScript programs and allowing functional JavaScript code. To the best of our knowledge, this is the first study to assess the performance of current web privacy preserving tools in determining tracking vs. functional JavaScript programs. To improve this balance, we examine the two classes of JavaScript programs and hypothesize that tracking JavaScript programs share structural similarities that can be used to differentiate them from functional JavaScript programs. The rationale of our approach is that web developers often “borrow” and customize existing pieces of code in order to embed tracking (resp. functional) JavaScript programs into their webpages. We then propose one-class machine learning classifiers using syntactic and semantic features extracted from JavaScript programs. When trained only on samples of tracking JavaScript programs, our classifiers achieve accuracy of 99%, where the best of the privacy preserving tools achieve accuracy of 78%. The performance of our classifiers is comparable to that of traditional two-class SVM. One-class classification, where a training set of only tracking JavaScript programs is used for learning, has the advantage that it requires fewer labelled examples that can be obtained via manual inspection of public lists of well-known trackers. We further test our classifiers and several popular privacy preserving tools on a larger corpus of 4,084 web- sites with 135,656 JavaScript programs. The output of our best classifier on this data is between 20 to 64% different from the tools under study. We manually analyse a sample of the JavaScript programs for which our classifier is in disagreement with all other privacy preserving tools, and show that our approach is not only able to enhance user web experience by correctly classifying more functional JavaScript programs, but also discovers previously unknown tracking services.

Keywords: Machine learning, one class SVM, pulearning, measurements, JavaScripts, tracking, privacy, usability, security

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 license.