When Speakers Are All Ears: Characterizing Misactivations of IoT Smart Speakers

Authors: Daniel J. Dubois (Northeastern University), Roman Kolcun (Imperial College London), Anna Maria Mandalari (Imperial College London), Muhammad Talha Paracha (Northeastern University), David Choffnes (Northeastern University), Hamed Haddadi (Imperial College London)

Volume: 2020
Issue: 4
Pages: 255–276
DOI: https://doi.org/10.2478/popets-2020-0072

artifact

Download PDF

Abstract: Internet-connected voice-controlled speakers, also known as smart speakers, are increasingly popular due to their convenience for everyday tasks such as asking about the weather forecast or playing music. However, such convenience comes with privacy risks: smart speakers need to constantly listen in order to activate when the “wake word” is spoken, and are known to transmit audio from their environment and record it on cloud servers. In particular, this paper focuses on the privacy risk from smart speaker misactivations, i.e., when they activate, transmit, and/or record audio from their environment when the wake word is not spoken. To enable repeatable, scalable experiments for exposing smart speakers to conversations that do not contain wake words, we turn to playing audio from popular TV shows from diverse genres. After playing two rounds of 134 hours of content from 12 TV shows near popular smart speakers in both the US and in the UK, we observed cases of 0.95 misactivations per hour, or 1.43 times for every 10,000 words spoken, with some devices having 10% of their misactivation durations lasting at least 10 seconds. We characterize the sources of such misactivations and their implications for consumers, and discuss potential mitigations.

Keywords: smart speakers, voice assistants, privacy, IoT, voice command, voice recording, wake word

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 license.