Unveiling Web Fingerprinting in the Wild Via Code Mining and Machine Learning

Valentino Rizzo; Stefano Traverso; Marco Mellia

Unveiling Web Fingerprinting in the Wild Via Code Mining and Machine Learning

Authors: Valentino Rizzo (Ermes Cyber Security S.R.L., Turin, Italy), Stefano Traverso (Ermes Cyber Security S.R.L., Turin, Italy), Marco Mellia (Politecnico di Torino & Ermes Cyber Security S.R.L., Turin, Italy)

Volume: 2021
Issue: 1
Pages: 43–63
DOI: https://doi.org/10.2478/popets-2021-0004

Download PDF

Abstract: Fueled by advertising companies’ need of accurately tracking users and their online habits, web fingerprinting practice has grown in recent years, with severe implications for users’ privacy. In this paper, we design, engineer and evaluate a methodology which combines the analysis of JavaScript code and machine learning for the automatic detection of web fingerprinters. We apply our methodology on a dataset of more than 400, 000 JavaScript files accessed by about 1, 000 volunteers during a one-month long experiment to observe adoption of fingerprinting in a real scenario. We compare approaches based on both static and dynamic code analysis to automatically detect fingerprinters and show they provide different angles complementing each other. This demonstrates that studies based on either static or dynamic code analysis provide partial view on actual fingerprinting usage in the web. To the best of our knowledge we are the first to perform this comparison with respect to fingerprinting. Our approach achieves 94% accuracy in small decision time. With this we spot more than 840 fingerprinting services, of which 695 are unknown to popular tracker blockers. These include new actual trackers as well as services which use fingerprinting for purposes other than tracking, such as anti-fraud and bot recognition.

Keywords: Tracking, Fingerprinting, Machine Learning, Static Code Analysis, Dynamic Code Analysis

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 license.