Contextualizing Interpersonal Data Sharing in Smart Homes

A key feature of smart home devices is monitoring the environment and recording data. These devices provide security via motion-detection video alerts, cost-savings via thermostat usage history, and peace of mind via functions like auto-locking doors or water leak detectors. At the same time, the sharing of this information in interpersonal relationships—though necessary—is currently accomplished on an all-or-nothing basis. This can easily lead to oversharing in a multi-user environment. Although prior work has studied people’s perceptions of information sharing with vendors or ISPs, the sharing of household data among users who interact personally is less well understood. Interpersonal situations make data sharing much more context-based and, thus, more complicated. In this paper, we use themes from the theory of contextual integrity in an online survey ( 𝑛 = 1 , 992) to study how people perceive data sharing with others in smart homes and inform future designs and research. Our results show that data recipients in a smart home can be reduced to three major groups, and data types matter more than device types. We also found that the types of access control desired by users can vary from scenario to scenario. Depending on whom they are sharing data with and about what data, participants expressed varying levels of comfort when presented with different types of access control (e.g., explicit approval versus time-limited access). Taken together, this provides strong evidence that a more dynamic access control system is needed, and we can design it in a more usable way.


INTRODUCTION
Individuals can use web portals or smartphone apps to view the current status and recent events captured by smart devices in their home, including the current indoor temperature, whether lights are on or off, captured video footage, and past conversations with a voice assistant.Monitoring their home enables them to understand their daily routines better, confirm everything is in order while away, and maintain accountability if something goes wrong [28].
When multiple people share a smart home-whether as residents, landlords, or guests-equal access to the data may raise privacy concerns.There have been several reports about how smart home devices are turned into surveillance devices, enabling abuse and stalking [46].Blocking everyone's access to smart-home data is also not practical [47].For example, when smart home devices are shared between romantic partners or roommates, one may expect all device data to be equally accessible.Similarly, babysitters, a handyperson, and in-home caregivers may need the data to perform their tasks more effectively.Likewise, in the event of robbery and theft, it may be helpful or necessary to share video footage with law enforcement or insurance companies as evidence.
This potentially leaves us in an uncomfortable scenario: either share all data with someone (complete privacy loss) or share no data with them (complete utility loss).Many studies explore users' perceptions of institutional entities that receive smart home data, such as manufacturers, ISPs, and governments [7,9,38,52].These parties, although often essential, never have a personal connection with the smart home users.Users' perceptions and expectations of them may differ from those of other smart home users, with whom the owner may personally interact.Others that studied data sharing with individuals personally known to the resident, on the other hand, often focus narrowly on a particular device (e.g., voice assistants) or population (e.g., Airbnb, domestic workers, elders, and so forth) [3,34].They fail to put these scenarios in a more general setting, where multiple relationships, devices, and contexts co-exist.With big tech companies building cross-vendor platforms for smart homes (e.g., Google Home [21], Samsung's SmartThings [42], Apple's HomeKit [6]), a smart home system needs to accommodate and shift settings between different scenarios, which requires a more holistic privacy setting or access control mechanism.
Understanding varying contexts of users' preferences and decisions is the first step towards building a holistic usable system, but it is challenging, as a context often contains multiple factors that are at play during one's decision process.Prior work has demonstrated the need for more expressive context-aware access control systems that can encompass different scenarios in a smart home [13,23,38].Unfortunately, these systems often face usability challenges in real life, where people find them too complicated to use [51].
Therefore, this paper aims to inform a more efficient and effective design of data-sharing mechanisms for smart homes by exploring people's preferences for data sharing in smart homes and how their preferences change based on varying contexts (e.g., data recipients, data types, and sharing principles).To analyze these questions, we designed a vignette survey based on the Contextual Integrity (CI) privacy framework [39].Specifically, we used the survey to answer three main research questions: RQ1: How do smart-home users perceive the sharing of smarthome data with different people?RQ2: Do smart-home users' preferences for data sharing depend more on device types or data types?How would device and data types change users' opinions?RQ3: What access control mechanism can support users' preferences about data sharing in various situations?
We use the CI framework to disambiguate these contexts.We chose the CI framework because it is a well-established framework that breaks a context down into a set of parameters, making it easier to analyze the situation systematically [39].Using the theory of CI, we identify and decouple three different factors that are relevant to data sharing in a smart home: (1) data recipient (i.e., who will receive the data), (2) data type (i.e., what type of data is shared), and (3) sharing principles (i.e., under what circumstances would the data be shared).For each factor, we made a list of potential variants, relying on past literature, actual home IoT devices, and market statistics.This led us to 2,178 combinations, including 11 types of data recipients, 18 types of data from 6 different smart home devices, and 11 types of access control mechanisms.We recruited 1,992 participants from Prolific [26].
We found that data recipients are the most crucial factor in our participants' decision process.Based on participants' responses, the 11 types of data recipients can be roughly divided into three categories: long-term residents (spouses/significant others, kids, and roommates), domestic workers (in-home caretaker, handyperson, babysitter), and incidental users (guests, local law enforcement, landlords, neighbors, and insurance representatives).Participants are generally willing to share data with all the long-term residents.They may be hesitant to share data with domestic workers, but they are still more comfortable sharing data with them than with incidental users.In other words, it seems like participants are willing to share more with longer-term relationships rather than transient ones.
On the device and data type side, we found that participants felt similarly about sharing the same type of data from different devices (e.g., sharing occupancy data from smart door locks and smart thermostats), but their attitudes are more divided when it comes to data types.Participants are generally unwilling to share video, audio, and home occupancy data, while caring less about utility data (e.g., power or heat consumption).The sensitivity of device usage history (status changes or commands), however, varies depending on the device type.The usage history of smart door locks is considered the most sensitive, as it contains the lock status of the door, while smart thermostats' usage history is the least.
Depending on data recipients, data types, and people's original opinions about data sharing, one might prefer different access control mechanisms (e.g., in what situation someone can or cannot access the data, for how long, and so on).We found that in almost all situations, participants would be very upset if someone gained access without explicit approval.Other than explicit approval, whether the data recipients, especially the domestic workers, are on site or not could greatly change one's opinion about data sharing.Remote access for them may be undesirable unless otherwise specified.Device location, on the other hand, could be a crucial contextual factor for roommates' access, as shared devices are likely to be placed in the common area, while everyone's own device will be in their rooms.
To sum up.we make the following important contributions in this paper: • Following the theory of contextual integrity, we conducted a large-scale vignette survey ( = 1, 992) to gauge people's preferences for interpersonal data sharing in a smart home context.

BACKGROUND & RELATED WORK
Privacy in a smart home can be highly contextual [7,23,38].People's privacy preferences can vary between different scenarios, and it is hard for anyone to navigate the sea of contexts that can influence one's privacy attitudes.
In this section, we first introduce the Contextual Integrity framework, a theoretical privacy framework that helps privacy researchers structuralize privacy-related contexts.It assists us in disentangling contextual factors involved in interpersonal data sharing.We then discuss past research on privacy settings designs and contextual access control, and how our work differs from them.A direct comparison between our work and prior studies can be seen in Table 1.

Interpersonal data recipients
Data or device?
Interpersonal contexts

Contextual integrity in smart homes
The Contextual Integrity (CI) framework is one of the most wellestablished frameworks to contextualize one's privacy attitudes [39].
The CI framework is a normative model "for evaluating the flow of information between agents (individuals and other entities), with a particular emphasis on explaining why certain patterns of flow provoke public outcry in the name of privacy (and why some do not)" [10].The CI framework has been widely used to study privacy norms in various scenarios, and the smart home is no exception.Several studies have used the CI framework to study people's privacy preferences for sharing data in a smart home.For example, prior work has studied people's attitudes towards sharing data with manufacturers, third parties, ISPs, governments, and advertisers [3,[7][8][9]38].Other studies explore privacy attitudes towards personal relationships, but their scope is limited to one or a few particular types of users [7,35], or a particular device [3].Therefore, we conduct a study that systematically examines the privacy norms of various types of users, devices, data, and access control mechanisms.Our work compares the norms of various scenarios to understand the similarities and variations across circumstances better.Such findings can guide future smart home designers to develop the system more effectively.

Smart home privacy research and access control
Understanding people's perceptions of smart-home devices in their daily lives is becoming an increasingly important field of research as the market grows.Previous research found that current smarthome systems frequently fail to meet users' privacy demands and expectations [27,45].Some studies thus designed various smart home interfaces and systems to enhance the data transparency in a smart home and enable better privacy settings [14,16,36,44].Through surveys, interviews, and co-design workshops, researchers find that users have a strong need for data control [12,49].Access control is needed to enable users to control how their data can be accessed.Prior work has demonstrated that traditional access control methods often fall short of recognizing interpersonal dynamics and contexts in a home setting [13,23].Several research initiatives have thus explored the addition of contextual factors to smart homes' access control design [24,43,51].Despite being technically plausible [43], a context-aware access control system can be complicated to use and difficult to capture the desired contexts, resulting in erroneous system behaviors and frustration for the users [24,51].For example, Zeng et al. conducted a study among a small sample of homes, consisting of only long-term residents (e.g., immediate family members, couples, and respectful roommates).Given their trusted relationship and frequent device usage, creating detailed access is unnecessary [51].The reality, however, is much more complicated.Roommates can be strangers met through leasing companies, and one's relationship with their spouses or significant others may turn sour and become less trusted [11].With the rise of integrated multi-vendor, multi-user home IoT platforms facilitated by vendor agnostic technologies like Matter [15], more people, such as landlords, handypersons, and domestic workers, will also be involved in a smart home scenario.Context-aware access control is still a need in those situations.
A question thus surfaces -how can we make contextual access control systems more usable?Finer granularity can better capture users' intentions, but making it too fine can also easily render the system unusable.Simplification of a context-aware system is thus needed.To understand how one can simplify a context-aware system, we must have a broader understanding of how various factors (e.g., data recipients, data types, sharing principles) affect people's comfort with data sharing and each other, which is the goal of this paper.Some prior studies' focuses are not on interpersonal data sharing [9,13,38], and even if they are, their exploration is limited, making it hard to identify similarities and differences between scenarios, which is key to simplification.Abdi et al. studied a wider range of interpersonal data recipients, comparable with our study, but their exploration of data types (only data from voice assistants) and contextual factors are limited, failing to construct a more holistic view of access control for smart homes [3].Similarly, He et al. explored various data recipients and data types from different devices, but their focus is more on device control than data sharing; they also failed to explore the correlation between contextual factors to data recipients and types [23].Without understanding the correlation between these design aspects, all contextual factors are equalized and up to users' choice, leading to an over-complicated design.Our work, on the other hand, attempts to understand which contexts, or sharing principles, are more critical to users' decision-making process.These findings can eventually inform a more efficient design for context-aware access control in smart homes.

METHODS
Our study mainly involves a vignette about participants' comfort with data sharing in various contexts.Using the Contextual Integrity (CI) framework as the guideline, we vary parameters like data recipients, data types, and sharing principle to gain a more holistic view of the norm of interpersonal data sharing in smart homes.In this section, we detail our application of the CI framework to survey design, the recruitment process, and the analysis methods.

Survey design
According to the CI framework, there are five parameters that may affect people's perception of privacy, including data senders, data recipients, data types, data subjects, and transmission principle.When designing the survey, we first defined each variable as follows.

Data senders.
A data sender is the person who makes the decision to share data with other smart-home users, or, in our context, the one who grants other users access to some part of the data.We assume the smart home's primary user is our survey's main data sender.The primary user here is the one who lives with, controls, and manages the smart home devices.It does not matter if they are the homeowner or the tenants, but we do assume they have control over who they would like to share their smart-home data.The participants will enter the survey as a data sender.They will be informed through an imaginary scenario, detailed in Section 3.2.
It is possible that users other than the primary user can also be the data sender.However, assigning participants to different roles in the same survey can easily cause confusion.A between-subject design may solve the problem, but it will create a sample size that is too big for us to possibly collect.Therefore, we only measure participants' perceptions of other users being data senders.
3.1.2Data recipients.Since our work focuses on interpersonal data sharing, we only consider data recipients who may interact with the primary user of the smart home directly.
We first consulted prior work that studied privacy issues in interpersonal relationships to form a representative list of relationships that may happen in a smart home [3,7,23].We also conducted a brainstorming session to account for potential smart home users that were not previously studied.It led us to 11 different types of data recipients, shown in Table 2.

Conditions Types
Data Recipient Spouse/Significant other [23] Kids (12-year-old) [3,23] Guests [3,13] Neighbors [3,13,23] Roommates [3,13,23] Landlords [50] Handyperson [5,13] Babysitters [13,23] Insurance agency representatives Local law enforcement [3,7] In-home caretakers [13,35] Device and Data Types Door locks [3,7,13,25,34] Home occupancy Visitor Usage history TVs/Streaming devices [34] Audio clips Watching history Usage history Security cameras [3,13,34,38] Audio clips Home occupancy Video clips Usage history Thermostats [3,7,13,19,34] Home occupancy Utility usage Usage history Voice assistants [7,13,19,34] Audio clips Home occupancy Usage history Lightbulbs [13,34] Home occupancy Usage history Based on the selected devices, we studied the data types produced by these devices by consulting prior work [3,23] and existing smart home devices.We set up several inclusion criteria when considering what data types to include.We only consider data types directly collected through the given device's sensors.Therefore, account information, such as email addresses, photos, home addresses, and names, is not considered part of the study.In addition, data collected by the device's manufacturer but not shown to the users, such as GPS data from users' smartphones, is also not included in the study to avoid confusion.With these criteria established, all the team members gathered and discussed what data types should be included in the survey until an agreement was reached.It left us with 18 data types from the six selected devices.The full list of data types for each device is shown in Table 2.The definition of each device and data type can be found in Appendix C.

Sharing principles.
Our survey assumes that data sharing in smart home systems is governed by access control. 1 Under this assumption, sharing data with someone is equivalent to granting them access to that data.Therefore, we consider sharing principles as contextual factors that can be built into access control components of a smart home system.For example, one's access to data can depend on whether the data captured is from a private or common area, making the device's location a contextual factor for access control.
We consider two types of contexts: system contexts and social contexts.For system contexts, we consulted prior literature on potential contexts one can use for access control [22][23][24]38].To keep the scale of the study manageable and to focus on the ones that are most informative for designing smart homes, we only consider contexts that are practical for today's smart home system, such as notice and choice (either no consent needed or consent on each use), time (a window of visible events), location (access conditioned on device location or accessor location), and content (access to content that either is or is not about the accessor) [22,24,38].
For social contexts, we mainly considered two types of social contexts: verbal promises (verbal promise not to share data) and legal bounds (legal promise not to share data) [51].We acknowledge that it is not a complete list, but social contexts can easily differ from situation to situation.For example, although purpose of data access has been proven to be a powerful indicator of people's willingness to share data [1,7,38], it is not included in our study, as it is harder to apply to interpersonal relationships, whose needs and purposes are often more versatile than collective entities like third-party services.
The full list of contexts we considered and their definitions can be found in the survey texts we included in Appendix A.

Survey instrument
Our survey revolves around three factors: the data recipient, the data type, and the sharing principle.These factors allow us to assess privacy-with context-in the smart home setting.We iteratively piloted the survey to assess the timing, fatigue, and quality of responses.The full text of the survey may be found in Appendix A. Here, we briefly describe the survey.
The survey started by informing participants that they would be commenting on scenarios involving smart-home devices.Participants were initially selected based on experience with these devices (crowdsourcing platform profiles), confirmed in the survey prior to participant consent (for an overview of these devices, see Table 2).
Participants were asked to imagine that they currently owned one of these devices (e.g., a smart door lock) and that this device was rolling out a new feature allowing fine-grained control over the sharing of device information.For example, the participant could allow a handyperson to see who has opened or locked a smart door lock, but not what the passcode to the door is.Each participant was then randomly assigned a relationship and device type when asked about their comfort in sharing data.For example, how comfortable would you be (five-point Likert) when sharing your {viewing history} from your {smart TV} with your {babysitter}?
We then addressed questions related to sharing principles.If a participant said they were uncomfortable or somewhat uncomfortable with the initial scenario, we then asked a series of follow-up questions focusing on how that discomfort might change depending on a different sharing principle.Questions here formed a matrix (five-point Likert) and were grounded on five themes: notice and choice (no consent needed or request consent on each use), time window of use (access restricted to a time period), location (access only when the device is in a certain place or only if the accessor is in the home or only if no one is in the home), content (only when accessed data is about home's occupant or only when access is not about home's occupant), and externally-enforced restricted access (only after a verbal or legal promise to not disclose data).The selection of these conditions is described in the previous section (Section 3.1).We asked this matrix of questions for both a day-to-day scenario and an emergency scenario.
On the other hand, if the participant said they were comfortable or somewhat comfortable, this evidenced a lack of privacy concern for the setting, allowing us to take a slightly different approach.We first asked a similar matrix of questions aimed at sharing principles.These questions were grouped around notice and choice (approval or no approval needed to see data), location (the device is in a private location, accessor is not in the home, no one or someone is in the home), and further sharing (accessor can share data with others).We then looked further at sharing among relationships by asking whether the participant would be comfortable if the accessor shared this information with a set of 11 relationships from Table 2.If the participant said they were neither comfortable nor uncomfortable, we also asked this set of relationship-based questions, in addition to asking the participant (free-text) to describe any scenarios in which they might be uncomfortable with sharing the data in this setting.
We ended the survey with an assessment of mobile privacy preferences via the Mobile Users' Information Privacy Concerns (MUIPC) [48] and demographic questions.

Recruitment
We recruited participants via Prolific [26], an online crowdsourcing platform, between June and August 2023.Studies have shown researchers prefer Prolific because the data collected is more reliable and of higher quality than that of other platforms, such as Amazon Mechanical Turk and CloudResearch.Based on data quality measures such as honesty, attention, reliability, and comprehension, Prolific consistently delivered the highest quality of data [40,41].We estimated the survey to take 10 minutes.Participants were paid $2 for successful completion of the survey.Our university's Institutional Review Board (IRB) has reviewed and approved the study.
After receiving the IRB approval, we use Prolific's pre-screeners to ensure that the participants are at least 18 years old, reside in the U.S., have an approval rate over 95% on Prolific, and own the smart home device that is assigned to them when they enter the survey.
Table 3 shows the demographics of our participants.Our participants are mostly gender-balanced.Over 60% of them are between 25-44.They are more educated than the general public of the U.S. 69.5% reported to have a bachelor's degree or above.The average duration of our study is 10.6 minutes, with a median of 8.5 minutes.

Analysis
We placed two attention checks in our survey.Anyone who failed both attention checks was automatically removed from the analysis.We employed a five-point Likert scale to measure participants' comfort in sharing data with someone.During the analysis, we first assigned numeric values, ranging from -2 to 2, to the five scale points.Here, -2 represents "uncomfortable" and 2 represents "comfortable".We then calculated the mean of participants' responses and referred to it as average comfort score in the rest of the paper.
In addition, for all Likert-scale questions, we used Kruskal-Wallis tests first to determine if there was a significant difference among various groups.If the test result showed a significant difference among groups, we then performed pairwise Mann-Whitney U tests with Bonferroni correction to identify which groups were significantly different.For cases where we would like to examine if a correlation between two variables existed, we used Chi-square tests.
We also use logistic regression to model people's comfort with data sharing (normalized).We first used data recipients, data types (of different devices), MUIPC, and demographic information as independent variables.Neighbors and Camera: Video are chosen as the baseline for data recipients and data types because they receive the lowest average comfort score among all groups.For each demographic category, we picked the largest group as the baseline.After discovering that MUIPC and demographic information have an insignificant impact on the output (Appendix B), we removed them from the model and re-run the logistic regression, with data recipients and data types being the independent variables.We then used the model to rank data recipients and data types based on their contribution to people's comfort in data sharing.For all statistical tests, we set  = 0.05.
Although our survey collected some qualitative data, we only performed ad-hoc analysis due to the volume of collected data.Therefore, we did not present qualitative results in this paper except for using several quotes.Our results thus focus on quantitative analyses instead.

Limitations
Our study has limitations that are typical of user studies.For one, participants on crowdsourcing platforms are typically younger, more educated, and more technologically savvy than the general population [4,17,18,29,30].Additionally, participants may have been biased in their responses due to social desirability (e.g., attempting to provide likable answers) or demand effects (i.e., inferring a study purpose when responding) [31,32].To reduce these biases, we avoided mentioning "privacy" in the study's introduction and instead phrased the study as a survey on preferences for data sharing.We also used neutral statements throughout the survey instrument (Appendix A), used gender-neutral names for our vignettes, and provided participants with options to select "n/a" when necessary.Participants were filtered for familiarity with smart devices.We saw the trade-off for a limited participant pool as worth the gained ecological validity by knowing participants were familiar with the devices they were commenting on.Our participants were also limited by being in the United States, so our findings may not generalize to other countries with different cultures.For example, some cultures' views on familial relationships are different from people from the US, which may lead to different levels of comfort in smart home data sharing.
The study may also have been impacted by our look at several, but not all, smart-home devices, data types, and transmission principles.Despite our efforts to be comprehensive about realistic smart-home situations, different contexts may provide different outcomes.We leave these to future work.
Finally, similar to all self-reported privacy research, our results may suffer from the privacy paradox-participants' actual behaviors may be inconsistent with their self-reported attitudes [33].We thus discuss our results with care in Section 6.

RESULTS: DATA SHARING PREFERENCES
In this section, we detail our findings about how people's comfort in data sharing changes based on data recipients (RQ1), device types, and data types (RQ2).

Overview
After collecting responses from 1,992 participants, we found that data recipients influence participants' comfort in data sharing the most, as shown in Figure 1.Over half of the participants are comfortable sharing data with people who live with them, such as spouses or significant others, kids, roommates, and in-home caretakers.They are least comfortable with sharing data with their neighbors.
The effect of devices or data types on people's decision process is less prominent than on data recipients.We rarely observe significant differences in our participants' attitudes among the same type of data collected by different devices, while their opinions differ by different types of data collected by the same device.The result indicates people's privacy attitudes are more influenced by data types than device types, which shows the potential to simplify the data-sharing mechanisms for smart homes.

Data recipients
We analyzed how participants' comfort regarding data sharing varies across various smart home users, using pairwise Mann-Whitney U tests and logistic regression.The main results of the latter can be found in Table 4. Based on our results about their comfort for data sharing in general, we can roughly divide potential users in a smart home into three categories: long-term residents, domestic workers, and incidental users.

4.2.1
Participants are comfortable sharing data with long-term residents.We found that participants are generally comfortable sharing data with spouses (or significant others), children, and roommates.Over 60% of the participants chose "somewhat comfortable" or "comfortable" when asked about sharing data with these three types of users.We also ran a logistic regression for participants' comfort in data sharing, as shown in Table 4.It turns out that these data recipients are the only ones who receive an odds ratio over 10, meaning that the participants are over ten times more comfortable sharing data with them than with neighbors.Given that these types of data recipients stay at the smart home longer than any other user types, we categorized them as long-term residents.
Contrary to prior studies [23,38,51], which find children are often subject to stricter access control policies, our participants express comfort in sharing smart home data with their kids.Twothirds (66.7%) of participants whose assigned data recipient is kids expressed comfort with data sharing, comparable to those assigned with roommates (66% expressed comfort,  = 1.000).One possible reason is that prior studies focused more on access control for device control than data sharing.The former raises concerns about children misusing devices and causing safety hazards, while the latter is more harmless.
We also found participants are significantly more comfortable sharing data with their spouses than any other parties listed, including roommates ( = 0.001), despite us explicitly mentioning that the participants share the smart home device with their roommates in our provided scenario.This suggests a preference for data separation even when the smart home devices are equally shared.

4.2.2
Participants are more comfortable sharing data with domestic workers than other incidental users.We ran a logistic model on participants' self-reported comfort in data sharing when being presented with a randomly selected pair of a data recipient and a data type.For the two categorical independent variables, data recipients and data types, we select neighbors and cameras' video data as the baseline, respectively, because they are the groups with the lowest average comfort score.Table 4 shows the odds ratios, 95% confidence intervals, and the  values for all data recipients and data types.It turns out that participants' data-sharing attitudes towards handyperson, babysitters, and guests are significantly different from those towards neighbors, while such significant differences are not observed among other incidental users (i.e., non-long-term residents), such as local law enforcement, insurance agency representatives, and landlords.
It may be expected that some participants feel comfortable sharing data with guests, as they are from the primary users' social circle.The remaining incidental users are all strangers, or acquaintances at best, but interestingly, participants are generally more comfortable sharing data with domestic workers, such as handypersons and babysitters, than other types of incidental users.One key difference between domestic workers and others may be the fact that they are actively hired by the primary user for help, which provides a potential reason for sharing data with them.Although other incidental users may also have a reason for asking for the data, it is less clear whether offering data without restriction will actually help the primary user.For example, the insurance company may deny the primary users' compensation request based on the given data.A landlord or police may even use the provided data against the primary user, making the latter less comfortable allowing such access.

Device types vs. data types
Among all smart home devices, participants are most comfortable sharing data from a smart thermostat ( = 0.412), followed by a smart lightbulb ( = 0.145).Unsurprisingly, participants feel least comfortable sharing data from a smart security camera ( = −0.535),closely followed by a voice assistant ( = −0.512).
It is worth noting that the statistics about a device are largely affected by the type of data the device collected.We found that people are most comfortable sharing utility data (e.g., electricity and heat) and usage history data (e.g., what time the device is on or off), with a mean comfort score of 0.645 and 0.099, respectively.It is also unsurprising that participants are generally uncomfortable sharing video ( = −0.918)and audio data ( = −0.724).

4.3.1
Data types matter more than device types.Much prior work has studied the privacy norms on the granularity of devices.Many smart home systems (e.g., SmartThings) also group data by devices.One smart home device, however, can collect different types of data, and different devices may collect data of the same type.This raises questions about whether people's privacy attitudes change across device types or data types.
We found that device types had a limited impact on people's comfort with data sharing.Three types of data in our study are collected by multiple smart home devices -occupancy, audio clips, and usage history (e.g., when the device is on/off, with no video or audio attached when applicable).Among these three data types, we only observed significant differences among usage history collected by different devices, which we will discuss in the next subsection.
On the contrary, for five out of six smart home devices, we observed significant differences among different data types collected by one device.The smart door lock is the only device for which we did not observe a significant difference among various types of collected data ( = 0.690).As expected, people were least comfortable sharing audio or video data than other types of data: 67.2% of our participants expressed discomfort with sharing audio, and 70.9%We also found that participants were often uncomfortable sharing occupancy data.Across all devices, 59.4% of participants expressed discomfort sharing occupancy data with others, which is significantly higher than data types like usage history (44.5%,  < 0.001) or utility data (28.2%, < 0.001).

4.3.2
The sensitivity of usage history depends on device types.As mentioned previously, we found that participants' comfort in sharing the usage history of a device is significantly correlated with the given device's type ( < 0.001).Among all devices, the participants believe that the usage history of smart door locks is the most sensitive, as it indicates whether or not the door is locked.On the other hand, although smart security cameras are often viewed as the most privacy-sensitive smart home device, participants are pretty neutral about sharing the camera's usage history data (whether the camera is triggered, with no videos attached), resulting in an average comfort score of -0.027.One participant wrote: "...Just a camera being activated seems not that important unless no one is supposed to be home.A pet could also cause this.It doesn't tell you much otherwise, ... " The participants are most comfortable sharing the usage history of a smart thermostat.Compared to the usage history of smart thermostats ( = 0.591), participants are significantly more uncomfortable sharing the usage history of smart door locks ( = −0.339, = 0.001) and voice assistants ( = −0.236, = 0.003).The other two data types show no significant differences when collected by different devices ( = 0.927 for audio data,  = 0.191 for occupancy).

RESULTS: SHARING PRINCIPLES
As previously discussed, the participants are generally uncomfortable sharing smart-home data with others, unless they are long-term residents.Therefore, it is crucial for smart home systems to deploy proper access control mechanisms, not only for device control, but for data access as well.The question is, how should we design the access control system for smart homes?In this section, we discuss contexts and access control mechanisms that may influence people's data-sharing preferences (RQ3).

Explicit approval is necessary
Whether the primary user has given explicit approval for access is the most critical factor regarding data sharing, as Figure 2 suggests.On average, 51% of all participants who originally expressed uneasiness about data sharing reported that they were at least somewhat comfortable sharing data with someone as long as explicit approval was given.Such attitude changes were especially notable when the data recipient was a kid or a domestic worker.It means that participants acknowledge these parties may need access to the data, but such access must be given explicitly.
Interestingly, for all the participants who originally were comfortable or somewhat comfortable sharing data, 47% of them changed their answer to uncomfortable or somewhat uncomfortable if explicit approval was not given.The shift is widespread when the data recipient is a domestic worker, but less so when it comes to the kid.It further shows that explicit approval is completely necessary for allowing domestic workers access to smart home data.For kids, however, it may be more of a personal opinion.

Contexts that cause discomfort
For participants who originally reported at least somewhat comfortable sharing data with the data recipient, we provide a list of contexts that may cause some concerns.The results show that depending on who the data recipient is and what data type is presented to the participants, the contexts that can cause discomfort also differ.The proportion of participants who switched sides (from "Uncomfortable/Somewhat uncomfortable" to "Comfortable/Somewhat comfortable", and vice versa) when we added more contexts to the original question about their comfort in sharing a particular type of data with a certain person.The contexts are listed on the y-axis of all the heatmaps, and the number in the cells denotes the proportion of people who changed their opinion in a way noted by the title of each figure.The darker the color, the more participants changed their opinion after seeing the contextual factors listed on the y-axis.5.2.1 Data from private areas are off-limits even for long-term residents.Even though participants are often comfortable sharing data with other residents in the home, even if no explicit approval is given, devices from private areas are likely to be exceptions.This is especially true when the data recipients are the primary user's roommates.As shown in Figure 2a, 60% of the participants, who originally reported they were comfortable sharing data with a roommate, stated that they would be at least somewhat uncomfortable if the device is from a private area.42% reported the same for kids.Access to data from private areas causes discomfort for more people than accessing data without explicit approval does.For roommates, accessing data from private areas is significantly more upsetting than accessing data without explicit approval ( = 0.021).Such observations are not made in other types of residents.It is possible that people believe long-term residents need to have access to the data for a longer period, making it more likely for unintended data sharing to happen.5.2.2 Unlimited access duration is unacceptable for domestic workers.Most participants believe that anyone other than long-time residents should only have temporary access to the data.Such belief is most obvious when it comes to handyperson and babysitters.72% of participants were originally comfortable sharing data with a handyperson, but they would switch sides if there were no time limitations on their access.One thing worth noting is that the number of participants worried about domestic workers accessing data remotely is comparable with unlimited access duration.It indicates that although temporary access is crucial for domestic workers, it does not have to be determined by time.The presence of domestic workers in the home is equally important.

5.2.3
People who trust landlords may not mind them having access remotely.According to Table 4, landlords are among the least welcomed types of users in a smart home system, which is almost comparable to neighbors.That being said, among participants who were queried about landlords as the data recipient, 15.1% of them do report that they are at least somewhat comfortable sharing data with the landlords (Figure 1).In those cases, participants are generally okay with landlords accessing their data remotely, as long as the landlord has been given explicit approval and the access is temporary.Only 26% of participants said that remote access would cause them discomfort in this case (Figure 2a).
The same observations also hold for local law enforcement as well.Although only 20.1% of participants are positive about local law enforcement having access to their data, for those who are okay with it, it is mostly acceptable for them to have access remotely as well.Such claims cannot be made regarding domestic workers, neighbors, and guests.These roles are seen as less likely to need access when they are not present in the primary users' homes.

Contexts that mitigate discomfort
In general, a lot fewer participants changed their opinion if they already felt uncomfortable sharing data with someone.Aside from explicit approval being the most dominating contextual factor, we also found that some contextual factors can sway quite a few participants' attitudes.

Data from common areas can be shared between roommates.
Although only 27.2% of our participants reported being uncomfortable or somewhat uncomfortable sharing data with their roommates, it turns out 43% of them would be likely to reconsider their decision as long as the device is from a common area.It echoes our previous conclusion in Section 5.2.1.As a result, device location is the most crucial contextual factor that needs to be considered when sharing the device with a roommate.

5.3.2
Recipient's presence matters more than access duration.We have discussed in Section 5.2.2 how unlimited access duration makes participants less comfortable sharing data with domestic workers.What is the way to mitigate such concern remains unanswered.
Figure 2c shows that other than explicit approval, the recipient's presence is the leading contextual factor that can mitigate the participants' concerns about data sharing with domestic workers and incidental users.When the data recipients were domestic workers, around one-third of the participants changed their opinion to a more positive one once they learned that the data would only be shared when the data recipients were on site.It is likely a result of the expectation that if domestic workers are at one's home, then they are likely working, which indicates a purpose of the need for data access.

5.3.3
No one is home rarely matters for data sharing.In almost all cases, accessing the data when nobody is home did little to mitigate, if not aggravate, people's privacy concerns.The only exception may be the primary users' spouses or significant others.30% of the participants found them more comfortable sharing data with their spouses or significant others when nobody was home.Interestingly enough, our participants also welcome this contextual factor a little more regarding security cameras than other devices.17% switched their answer to comfortable or somewhat comfortable when the given device is a camera, while ≤ 10% did so with all other devices.The reason may be that they do not want their spouses or significant others to spy on them.

Emergency requires different access control mechanisms
If participants reported uncomfortable or somewhat uncomfortable sharing data, we further asked them how they would behave if it were for an emergency (e.g., fire, theft, medical issues).Aligning with previous research [7,35], we also found that our participants are more likely to allow access during an emergency, as shown in Figure 2e and Figure 2f.What interests us, however, is that participants would also prefer to grant access in an emergency based on when the data is recorded, besides explicit approval.The need for using data captured time as a contextual factor is not observed when it is a day-to-day scenario, as depicted in Figure 2c and Figure 2b.In hindsight, it makes sense as emergencies are often one-time incidents, which means it would be easy for participants to pinpoint the time when the emergency occurs, and make decisions based on that.A day-to-day scenario, on the other hand, is more likely to have routines or incidents that happen repetitively.

Data subjects matter
If participants stated that they were uncomfortable sharing data with the given recipient, we further asked them how comfortable they would be if the shared data involved the data recipient.This question is only asked when the data type is video or audio clips, as other data types may not always associate with a data subject (e.g., home occupancy).The results are shown in Figure 3.
In all situations, at least 20% of participants expressed comfort in sharing video and audio data, if the data recipients are involved in it.For data recipients like spouses (or SOs), roommates, and inhome caretakers, over 40% of participants believe it is fair for these data recipients to access the data if they are involved.Interestingly, Figure 3: The number of participants who switched from "Uncomfortable/Somewhat uncomfortable" to "Comfortable/-Somewhat comfortable" after knowing the data involves the data recipient.our participants do not think the same for children, even though they are also long-term residents.Our guess is that because young children (12 years old in our study) are under the supervision of their parents, the parent's decision about whether the children should have access to the video or audio data has little to do with whether they are present in it.
In addition, we also noticed that 46% of participants are willing to share video data if the data recipient is involved, which is more than those for audio data.We speculate on two potential reasons.One is that participants believe audio data is less sensitive than video data, so it would be less of a concern whether the data recipient is involved.Another reason might be it is less likely for audio data to be accidentally recorded (e.g., a voice assistant often needs a wake-up word to start recording) than video data would be.The users of these devices should be fully aware of the data collection if they actively interact with it, and thus, there is less need for transparency.

The possibility of delegation
In this section, we discuss how the participants perceive delegationletting others have the ability to grant access to more people.For participants who initially stated that they were comfortable in data sharing, we further asked how they felt if the data recipients had the ability to share the data with someone else.On the other hand, for participants who initially stated that they were uncomfortable sharing data with the data recipient, we further asked how they felt if the data recipient verbally promised, or was legally bound not to share the data.5.6.1 Participants do not like the idea of delegation, even for spouses/SOs.Although we anticipated that participants would not like the idea, it still surprised us that it easily became the most influential contextual factor that makes people unwilling to share access in all situations (Figure 2a and Figure 2b).Even for the most trusted type of data recipient, such as spouses or significant others, 51% of participants stated that they would be upset if their spouses or SOs shared the data with someone else.Some quotes from the participants shed light on the reasoning behind this.It turns out that some participants are worried that their spouse or significant others would share the data without discussing it with them first.For example, when asked about situations where they would feel uncomfortable sharing data with their spouse, one participant wrote: "Unless I have a reason not to trust Blaire, like she gave the use of the service to someone that shouldn't have it or without mutual agreement, ... " Another participant, who also shared the same concern, wrote: "I would not want Blaire to share the data with anyone else without explicit permission." 5.6.2Social interactions rarely mitigate participants' concerns.As mentioned in Section 3.1, we considered two types of social interactions: verbal promises and legal bounds.In general, we found that social interactions are less effective in mitigating participants' discomfort with data sharing than many other contextual factors, comparing Figure 4 to Figure 2 and Figure 3.Even if the data recipient is legally bound not to disclose the data, only 22% of the participants, on average, changed their initial opinion across all groups.The number decreases to 5.5% when it comes to verbal promises.
First of all, it seems that no matter whether the data recipients are legally bound not to disclose the data to anyone, or verbally promise it, it means little if the primary user, or the participants here, believe that they do not need the data in the very first place.
Second, compared to verbal promises, legal bounds are much more preferred.For data recipients like local law enforcement, handyperson, and in-home caretakers, over 20% more participants changed their minds when the data recipients were legally prohibited from disclosing the data.Similar to tech support for computers and smartphones, legal responsibility for not disclosing the data may ease peoples' concerns about data sharing, especially when participants may have to give away some data for other benefits (e.g., granting handyperson access to get the device fixed).

DISCUSSION
In this section, we talk about the lessons we learned from our results and how these lessons can inform future smart home designs.These lessons can be applied to manufacturers who would like to build a more context-aware access control for their own devices and standard makers building a unified smart home.

Data sharing and the length of the stay
As discussed in Section 4.2, the 11 types of data recipients we included in the survey can be roughly categorized into long-term residents, domestic workers, and incidental users.Based on participants' responses to each category of users, it seems that the longer one's stay at home is, the more comfortable participants feel about sharing data with them.
The reason behind this could be multi-folded.First of all, longerterm users may have a larger need for access to the data, making their requests for data access more reasonable.Secondly, if someone is not trusted by the primary user, it is unlikely for them to be at their home all the time.It turns out, that how long someone stays at home can be a measurable potential proxy for intention (why they need the access) and trust-the two factors that are recognized as the main motivators for granting access.It not only works for allowing access to longer-term residents, but it might also be an indicator that someone is leaving (e.g., roommates moving out, or breaking up with a former significant other), nudging the primary user to revoke this person's access.Although more research is needed to confirm our hypothesis, it could provide a potentially new perspective on how to set up the default policies for someone and how to make the system adaptive to changes in one's relationships and life.

Device control vs. data access
Compared to prior research, one interesting thing we noticed is that people's desired access control policies for device control and data access are different.For example, multiple works have discovered that people would often restrict their children's ability to control devices, in case they mess up the system settings or do something unwanted or unsafe [19,20,23].In our study, however, people are generally okay with their kids being able to view the data, except for video data.Similarly, smart thermostats have always been a source of fight over controls [37].On the other hand, whether someone can view the data on smart thermostats is not something people care very much about.Therefore, viewing and operating the device should be two different categories of access control.A user should be able to specify whether they would like someone to operate the device or view the data for better transparency.
In addition, simply separating the viewing and operating privileges may not be enough.Matter, a new IoT standard [15], has already designed different levels of access in their system, including view, proxy-view, operate, manage, and administer.Each privilege subsumes the capabilities of the prior ones.Although it might be reasonable to assume that someone needs privileges like viewing and operating to manage or administrate devices, comparing our findings with prior works suggests that viewing and operating should be parallel privileges.Someone can allow others to use the device, but not see the history data recorded by it.For example, it is common for guests to use a voice assistant in one's home, but being able to see others' past conversations with the voice assistant may not be acceptable.Therefore, we believe that viewing and operating privileges should be granted or revoked independently.

The design of temporary access
It is not surprising that temporary access is commonly desired.Many of today's smart home devices or systems have already designed various kinds of temporary access.For example, prior studies have found that smart door locks have four types of users: owner, residents, recurring guests, and temporary guests [22,25], which is very similar to the categories we made in Section 4.2.Recognizing the fact that access should be temporary by design is merely the first step towards a more privacy-respectful system.Understanding what constitutes "temporary access" and how to design a system that matches people's mental models better is the next step.
In our study, there are mainly two types of contextual factors that decide when someone could or could not have access: access duration and different parties' presence at home.We found that the data recipient's presence at home influences people's decisions more than access duration (Section 5.3.2).It indicates that it may be more intuitive for people to create policies based on the data recipient's presence, than deciding how long or at what time the recipient should have access.Indeed, the reason why people want to specify the duration of the access is likely to limit the access only to the time when the recipient is on site.It also covers cases where a domestic worker or a guest does not have a regular visit schedule, saving the trouble of changing their access's time window or granting access explicitly every time they visit.
That being said, we do not propose to remove access duration as an access control mechanism.The benefit of specifying a time window for access is that it is deterministic.The primary user would know exactly when someone will or will not have access.Granting access based on one's presence does not have such a guarantee.If the data recipient has malicious intentions, they could simply show up at one's place unannounced and gain access.Therefore, although granting access based on one's presence may be more convenient, it can only work in a trusted relationship or for insensitive data.

Contexts recommendations for explicit approval
Making access control decisions explicitly is often criticized for putting too much burden on the users, asking them for permissions repeatedly, with an overly complex mechanism [2].Prior works thus have tried to simplify the access control system by automating the decision process for users [9].Our results, however, show that people actually desire explicit approval, and can even get upset when none is provided (Section 5.1).
Although the result could be attributed to the privacy paradox, it still reflects people's fear of losing control over their own homes.As noted by Colnago et al., people's concern about losing their autonomy can overpower their desire for convenience [14].Letting the system make decisions on behalf of the users is thus not a solution, especially when the model's prediction can be complicated to explain.
Therefore, we believe explicit approval is necessary, but we could prioritize certain contexts during the approval process, simplifying the complexity of the interface.For example, long-term residents who live in the household may often need to gain access to data.For these data recipients, a one-time explicit approval may be enough.The main context that needs to be considered is the location of the device, especially when the data recipient is one's roommate.For those one-time or rare visitors, explicit approval could be mandatory every time they visit, and each time the access will be temporary, unless otherwise specified.Given the infrequency of their visit, not much burden will be put on the primary users.The system could also save the previous access configuration, so the future approval process would be simple and quick.Recurring visitors could be trickier to deal with.Depending on whether their visit is regular, one could either choose to use their presence as a trigger for allowing access, or set up recurring time windows for future access.

Technical solutions vs. social solutions
Technical solutions alone are often insufficient to solve humancentered problems, as people intuitively rely on social norms to make decisions [51].How to make a smart home system acknowledge social interactions and utilize them for access control would be an interesting research direction.
Although our study did not find a verbal promise useful for people's data-sharing decisions, legal restrictions, on the other hand, actually mitigate some people's concerns.For some types of data recipients we mentioned in the study, such as in-home caretakers, it won't be surprising if they are under some obligations for not sharing information about the care receiver.It would be interesting if such legal promises could be recognized and verified by the smart home system, so that the users can be ensured that the shared data will be handled appropriately.If such legal promises can be obtained and verified by the system, it could become one of the contextual factors for access control in smart homes.

A SURVEY TEXT
[ validate device, consent, introduction ] Imagine you own some smart home devices, and have an app (Smart Home app) installed on your smartphone that helps you control and monitor all your smart home devices.You could add others through the app.Once a user installs the Smart Home app on their smartphone and is added to your home, they can see all the device activities through the app.Here, for the demonstration purpose, let's assume you want to add a new user, Alex, to the Smart Home app.As shown in the figures below, you can invite Alex to your smart home through the Smart Home app.After being added, Alex needs to install the Smart Home app on their smartphone and accept the invitation.Upon acceptance, Alex can now see all the activities that happened on these smart home devices through the Smart Home app.You can safely assume that the Smart Home app is the only way for Alex to see past events that happened on these devices.

Figure 1 :
Figure 1: The distribution of participants' comfort of sharing different data with various stakeholders in a smart home.The two-level y-axis details each row's device and data types.Each row shows participants' responses regarding the device and data type listed on the left, while each column shows their responses regarding the recipient type listed on the top.

Figure 2 :
Figure2: The proportion of participants who switched sides (from "Uncomfortable/Somewhat uncomfortable" to "Comfortable/Somewhat comfortable", and vice versa) when we added more contexts to the original question about their comfort in sharing a particular type of data with a certain person.The contexts are listed on the y-axis of all the heatmaps, and the number in the cells denotes the proportion of people who changed their opinion in a way noted by the title of each figure.The darker the color, the more participants changed their opinion after seeing the contextual factors listed on the y-axis.

Figure 4 :
Figure4: The number of participants who switched from "Uncomfortable/Somewhat uncomfortable" to "Comfortable/-Somewhat comfortable" after knowing the data recipient is legally bound (or verbally promised) not to disclose the data.

Figure 5 :
Figure 5: Images shown the participants in explanation of fine-grained access control The app recently rolled out a new feature that enables more fine-grained control over what activities other users can see.For example, as shown in the following figure, Alex can see when the smart door is locked or unlocked, and who has gained or lost access to the smart door.However, Alex cannot see if the passcode on the door has been changed or not.Now, please imagine you have added the following user, {Name}, to the Smart Home App.{Name} is your {relationship}.Assume you have a {device}.This device is shared between you two.The {device} can be unlocked by your smartphone or other Internetconnected devices (based on GPS location), or by typing in a pin code.
• We performed an in-depth analysis of the survey results, showing how data recipients, data types, and sharing principles are associated, and how they, together, affect people's access control decisions.•We summarized a list of design implications for future smart home systems, informing designers and researchers of the potential simplification we can do for future context-adaptive systems.

Table 1 :
Comparison with prior works.Non-interpersonal data recipients (e.g., manufacturers, advertisers) and contexts (e.g., encryption, data storage) are not counted in the table.

Table 2 :
List of data recipients and data types we included in the survey.Full definitions can be found in Appendix C.
3.1.3Device and data types.Our work considers several data types that are collected based on our selection of IoT devices.We chose seven IoT devices to use in our study based on popularity and controversy (i.e., most often discussed in prior work): smart locks, smart thermostats, smart TVs, smart light bulbs, voice assistants, security cameras, and smart plugs.However, smart plugs were removed from the selection of devices as different privacy implications may occur caused by devices plugged into the outlet, not the smart outlet or plug itself.

Table 3 :
Demographics of the survey participants.

Table 4 :
Logistic regression for participants' comfort in sharing various data with different data recipients.We use neighbors as the baseline for data recipients, and cameras' video data for data types as they receive the lowest average comfort score among all groups.The larger the odds ratios are, the more comfortable participants feel about the listed conditions.The model here omits participants' MUIPC responses and demographics, because they have insignificant impact on the outcome.A full model with all measured variables can be found in Appendix B.