Privacy Rarely Considered: Exploring Considerations in the Adoption of Third-Party Services by Websites

Modern websites frequently use and embed third-party services to facilitate web development, connect to social media, or for monetization. This often introduces privacy issues as the inclusion of third-party services on a website can allow the third party to collect personal data about the website's visitors. While the prevalence and mechanisms of third-party web tracking have been widely studied, little is known about the decision processes that lead to websites using third-party functionality and whether efforts are being made to protect their visitors' privacy. We report results from an online survey with 395 participants involved in the creation and maintenance of websites. For ten common website functionalities we investigated if privacy has played a role in decisions about how the functionality is integrated, if specific efforts for privacy protection have been made during integration, and to what degree people are aware of data collection through third parties. We find that ease of integration drives third-party adoption but visitor privacy is considered if there are legal requirements or respective guidelines. Awareness of data collection and privacy risks is higher if the collection is directly associated with the purpose for which the third-party service is used.


INTRODUCTION
Contemporary websites often use third-party services for certain functionality, design, or media resources. The underlying reasons are as multifaceted as the purposes for which external resources are used in web development. Web content is often monetized via online advertising and marketing [50], which frequently involves the inclusion of advertising networks to target ads to website visitors' presumed interests and web analytics to measure the success of online marketing campaigns. User expectations regarding the look and functionality of websites, paired with time and resource constraints in web development, were also found to drive the adoption of third-party resources [19], such as design frameworks, contact forms, and external media hosting. This reliance on third parties can come at the cost of website visitors' privacy. By embedding external resources, websites provide third-party vendors with the opportunity to collect personal data about the website's visitors, such as their IP address, visited pages, and access to long-term identifiers the third party may have stored in visitors' browsers [50]. This data collection potentially allows them to track people across the Web, learn large shares of their browsing histories, and use this information to infer interests or demographics.
Considering that third-party resources are often automatically retrieved in the background without visible indication, this may be at odds with privacy legislation. For example, the European Union's General Data Protection Regulation (GDPR) [20], in effect since 2018, demands that processing of personal data is grounded on one of six legal bases, including user consent, is transparently communicated, and a "privacy by design and by default" approach is followed. Privacy risks of third-party website resources have been pointed out by courts and technical guides, noting, for example, that use of the most prevalent third-party service [16,34,39], Google Analytics, is only compliant with privacy law with IP anonymization [80]. Recent years have also seen the introduction of more privacy-friendly ways to embed externally hosted media or social media functionality [30,31]. Still, post-GDPR measurements have shown little change in the prevalence of third-party web tracking [14,76,88], and practices that are already "quite pervasive" [19] may be hard to change. In early 2022, several European courts and data protection authorities have directed attention towards the privacy implications of third-party use through decisions that declared the use of certain services a GDPR violation: the Austrian and French data protection boards for Google Analytics [59,67], the Belgian one for IAB Europe's Transparency and Consent Framework (TCF), the basis for many third-party consent providers [7], and a German court for Google Fonts [43], with more decisions expected to follow [8].
Website creators are a crucial part of the third-party tracking ecosystem, as it is them who integrate third parties into websites and enable them to track visitors' behavior across the Web. Thus, the lack of change in third-party use on websites under the GDPR raises the question to what extent people tasked with the creation and maintenance of websites are aware of the privacy risks of third-party use and if visitors' privacy is considered both in the decision that leads to the selection of third-party services and in integration itself. Though prior work has studied the history [45,91] and prevalence [16] of third-party web tracking and its underlying mechanisms, little is known about the decision processes behind the use of third-party services on websites and if website visitors' privacy is considered in the process.
Previous work that has studied developer behavior in adopting [65] and updating [71,72] third-party libraries focused on smartphone apps, e. g., investigating developers' privacy considerations in their use of mobile advertising networks [55,82], their awareness of data collection through third-party tools for unspecified types of functionality including ads and analytics [4], and their adoption of alternative APIs that preserve location privacy [37]. Third-party services and libraries for websites differ from those for the mobile ecosystem in their availability for a greater variety of purposes, the potential for higher technical complexity, and higher sophistication of advertising ecosystems [36,46,87]. Websites also lack apps' distribution through a centralized platform, whose requirements may shape developers' understanding of privacy aspects, including what data is considered sensitive [85]. On the Web, the omnipresence of consent notices that implement IAB Europe's TCF [32] and often list a site's third-party vendors could have led to higher awareness of data collection through third parties on websites compared to the mobile space, where consent prompts are much less prevalent [41].
In this work, we address this research gap with findings from a mixed-methods online study with 395 participants involved in the design, development, deployment, maintenance, or management of websites. We combine survey answers with web privacy measurements and investigate how ten website functionalities frequently associated with use of third-party services have been integrated into websites and how visitors' privacy was considered in the process. We go beyond prior work by exploring privacy considerations between different types of functionality that may not be equally prone to third-party use [50], as well as factors that influence the adoption of first-vs. third-party solutions to integrate a functionality. More specifically, we make the following contributions: • We extend web privacy research on the prevalence of thirdparty services by contrasting their use with first-party integrations for different purposes, regarding their prevalence, factors that drive use of first vs. third-party solutions, and consideration of alternatives. We find that the decision in favor of third-party services, as in the mobile domain [71], is driven by ease of integration, features, cost, and familiarity with a service, while privacy rarely is a decisive factor. However, we find use of privacy-friendly integration for web analytics and programming/design resources, and selfhosting tends to be the primarily considered alternative to third-party solutions, rather than another third party. • Like work on cryptographic APIs [1] and mobile ad networks [55], we find that changes to a service's default configuration are rarely reported. However, participants who did adjust defaults often did so in response to privacy-related court rulings or guidelines by data protection authorities.
• We find higher awareness of data collection pertaining to a third-party service's core functionality, such as financial information for payment or behavioral data for analytics, whereas awareness is lacking for data collected in less prominent contexts, particularly the transmission of IP addresses and device information. • From a methodological perspective we contribute to the ongoing discussion about ethics in security and privacy research by discussing implications and lessons learned from using public GitHub data to recruit people involved with web development, a method previously used by developercentered research [1,2,26,56,71,73,74,81,84,92]. Our findings show the need for researchers and the web development community to raise awareness of the privacy risks associated with third-party use on websites, as well as the need for clearer regulatory guidance and requirements for privacy-friendly defaults.

THIRD-PARTY SERVICES IN WEB DEVELOPMENT
Advantages of third-party use in web development differ by actor: Web developers benefit from ease of integration as often all that is required is to copy and paste HTML or JavaScript snippets from the vendor's website [69]; potentially faster website load times through use of content delivery networks (CDNs) or caching in visitors' browsers if widely used [66,69]; and the fact that many popular third-party services are available free of charge. The latter often comes at the cost of the third-party vendor collecting data about the website's visitors for monetization through advertising [19,50]. Independent of the functionality a third-party service provides to the website, requesting a remotely hosted resource via HTTP inherently involves the transmission of the website visitors' IP address, which some jurisdictions consider personal information [17], to the remote server, along with device information in the browser's user agent and the currently viewed page. The third party can use these to infer additional information about individuals, such as other websites they visit that also include the third-party service [49]. A mitigation is to host the remote resource locally, if possible [19,50].
Other privacy risks and mitigations depend on the type of functionality provided. As our study is centered around common use cases for third-party services in web development, we started by identifying these through review and comparison of existing categorizations in the literature and by web tracking projects. We found such classifications in the works of Sørensen and Kosta [76], Libert and Nielsen [50], by WhoTracks.me [38], Third Party Web [34,35], and DuckDuckGo's Tracker Radar [15]. While categorizations differ in granularity and focus, we identified large overlap from the perspective of website owners. We did not consider categories that apply only in a first-party context (e. g., hosting, distribution) or only make sense combined with other categories (e. g., tag management). We ended up with ten common website functionalities, shown in Table 1 with associated privacy risks and possible mitigations. The latter are generally possible on two levels: selection how to integrate the desired functionality (self-implemented, locally or remotely hosted third-party service) and efforts in integration of the selected solution to configure it in a more privacy-friendly way. Extensive data collection; data sharing with others (e. g., ad networks); tracking of browsing behavior across the Web due to widespread use [16,34,39] Configured to collect less data [25]; services that collect less data or can be self-hosted (e. g., Matomo) [19] Embedded media Non-text content (e. g., videos, audio files, slideshows, interactive maps) embedded into web pages.
Hosting: YouTube (videos), Google Maps (maps); embedding code by hosting 3P Data transmitted upon page load, not only upon interaction with remotely hosted embedded content Self-hosting, two-click solutions [31], YouTubenocookie [19] Customer interaction Mechanisms that enable specific websitevisitor interactions (e. g., contact forms, comments, chat).

Google Forms, Facebook Comments, Disqus
Various personal data transmitted; leakage of this data to third parties, including ad networks, even before submission [75,79]; Disqus: data sharing with ad networks by default without notice [6,28] Plugins for content management system (CMS)

User login / authentication
Allows users to create accounts on the website and log in.
Single-sign on with credentials from popular services (e. g., Apple, Google, Twitter, Facebook) Providers can learn on which other sites people use their credentials and when [40] CMS-provided integration, privacy-friendly identity providers

Payment
Allows visitors to pay for services and goods offered on the website.
Varies between regions [9], e. g., Pay-Pal, Venmo, Alipay Sharing of sensitive personal and financial information with payment provider [68] and possibly other 3Ps uninvolved in transaction [62] Limited by prevalence and practicality; Data transmission upon page load; in EU, liability of site owners for data processing by SM companies through buttons/widgets [5] Limited (3P by definition required) -two-click mechanisms [30,31,61], static profile links

Website protection
Mechanisms to protect against (distributed) denial-of-service attacks, spam, or data scraping.
Google reCAPTCHA, services based on text / behavioral analysis, security proxies (Cloudflare) Wide range of behavioral data collected to distinguish humans from bots [13,19,60] Against non-targeted spam: honeypots, easy math or language questions [13] 1 General risks are (i) transmission of visitors' IP address and user agent to the third-party service, which can allow the latter to track people across the Web, especially if the service is widely used [49]; and (ii) the third party potentially requiring visitors to accept extensive privacy policies [13,19]. 2 Always viable are self-implementation (except for payment and some social media integration) and using a third-party service that collects less personal information.

RELATED WORK
Previous work has studied the prevalence and evolution of thirdparty web tracking and developers' privacy behaviors in third-party use in the mobile app ecosystem.
Evolution of Third-Party Web Tracking. Web tracking has been studied extensively, including the prevalence of third-party tracking services on websites. Tracking has been identified since 1996, and since then increased in prevalence and complexity [45], with the most popular services covering up to 75 % of websites in 2015 [91] and hundreds of different known tracking services [70] whose use increases with website popularity, and visible differences between regions and website types [33]. Large-scale investigations confirmed that more than half of websites leak user data or load third-party scripts [49]. The GDPR going into effect in May 2018 increased the prevalence of cookie consent notices, while actual tracking practices did not change much [14] or could not be directly attributed to the GDPR [76]. While there were clear differences between website visits from US or European users, implying that companies collect less data from the latter [11], previous research overall did not find significant positive changes due to the GDPR.
Developers' Privacy Considerations. Developers' considerations of users' privacy have been studied in different contexts, but there are few insights into why specific third-party services are used in web development. Previous work found that developers of mobile apps are often unaware of third-party data collection [4], and therefore tend to collect more data than necessary. Furthermore, developers showed a limited perception of privacy threats, often based on their organization's guidelines [29]. Mhaidli et al. investigated how and why mobile app developers use and choose ad networks and whether they consider associated risks for users [55]. They found that developers see advertisements as the only viable way to monetize their apps and consider ad networks to be responsible for protecting app users' privacy, not themselves. Tahaei et al. confirmed this and showed that app developers find existing privacy information and controls confusing and hard to use [82]. Other studies investigated public forums to see how developers deal with privacy regulations and changes to them, finding that they mostly try to uphold standards defined by large companies [85] or are focused on recent changes or events [48] when discussing privacy. When asked to solve privacy-focused tasks, developers tend to use better-documented alternatives and copy examples, which could be adopted by privacy-friendly services [37]. They often struggle with embedding privacy into their application due to a lack of knowledge, privacy contradicting app requirements, or task complexity [63,74]. Another problem are third-party vendors' competing business interests, leading them to employ dark patterns that steer developers towards privacy-unfriendly defaults [83].

METHOD
To investigate the privacy practices and decision processes behind third-party use on websites, we conducted a mixed-methods study consisting of an online survey with 395 people involved in the creation and administration of websites, paired with an analysis of participants' websites, if provided in the survey.

Survey Design
Our survey was inspired by the work of Mhaidli et al. [55] and consisted of five parts. It was conducted in English and implemented on a self-hosted LimeSurvey instance. To prevent early priming about privacy, we framed the survey as exploring practices in the selection and use of web technologies on websites and only introduced questions about privacy and data collection practices in Part 4. Appendix A contains the full survey.
Part 1 assessed participants' background regarding their work on websites, including experience with the functionalities in Table 1.
To provide context for the rest of the survey, Part 2 asked participants to think of one specific website they had recently worked on and to only keep this website in mind for subsequent questions. Participants could optionally provide the website's URL (Q2-0). The survey consent form explained that this information would be used to check which web technologies were present on the website. At this point we investigated the methodological question if requiring participants to provide a website had an effect on dropout rates: We made Q2-0 mandatory for half of GitHub-recruited participants (see Section 4.2) but could not find evidence that this had an impact on dropout rates or willingness to provide a website. Part 2 proceeded to ask about website metadata, including the country it was based in, the participant's role with regard to the website, and which of the ten functionalities in Table 1 were present on the site (Q2-6). To balance level of detail and survey length, we chose to display more detailed questions only for up to three functionalities. For this, Q2-7 asked, for each functionality indicated to be present in Q2-6, to what degree the participant had been involved in the decision of how this functionality should be integrated (selection), in the integration process itself, and in maintenance or management of the integrated solution. From the functionalities for which any kind of involvement had been indicated, three were randomly selected, for which Parts 3 and 4 would be shown.
Part 3 investigated how a functionality was integrated in terms of first-vs. third-party solutions and, if applicable, embedding mechanism. It also asked about the underlying decision process including reasons for selection and considered alternatives, information sources, and the people involved. Part 4 explored participants' understanding of the data collected through third-party services and efforts made to protect visitors' privacy in the integration process.
Finally, Part 5 asked demographic questions and if participants had received training or educated themselves on data protection or privacy. At the end, participants were debriefed about the study's privacy focus and given the option to either withdraw from the study or to submit their answers. Six participants withdrew here.
To assure survey quality, we first conducted "think-aloud" cognitive interviews with seven web developers and two content creators, recruited via convenience sampling. After each interview, we addressed identified issues and repeated this process until no further issues emerged. A pilot launch of the survey with 101 participants recruited from GitHub (see Section 4.2) did not yield evidence of any remaining issues, so we proceeded with data collection.

Recruitment
Our recruitment approach was guided by the goal to obtain different perspectives on website functionality integration. We leveraged two recruitment channels to reach a diverse sample: websites' contact information to reach individuals in a range of website-related roles, and GitHub to reach web developers. People were eligible to participate if they were at least 18 years old, worked on websites in some capacity (e. g., website design, development, deployment, maintenance, management), and were comfortable taking the survey in English. Participation was voluntary and uncompensated.
To cover a diverse range of websites in recruitment, we searched the top 100,000 popular website domains on the Tranco list 1 [44] for email addresses related to a website's technical administration. We visited each domain on the Tranco 100K in October 2020 using OpenWPM 0.13 [16] and searched the homepage for links assumed to lead to subpages containing privacy policies, terms of service, and contact information. We identified these using a list of key phrases compiled through manual inspection of 10 websites randomly sampled for each of the top 20 website languages in the Tranco list. We downloaded the corresponding subpages and the homepage and searched them for email addresses with a regular expression. Since websites often list contacts responsible for the content (e. g., editors on news pages, politicians on government sites) rather than administration, we excluded subpages with more than four email addresses. After removing duplicates, invalid email addresses, and subpages with more than 4 addresses, we were left with 109,862 unique email addresses for 53,496 websites.
Previous work studying web developers' security and privacy practices has used public GitHub repositories to recruit developers on a large scale [1,2,26,56,71,73,74,81,84,92]. We also used this approach because it allowed us to recruit people likely involved with web development without hand-picking them, as would have been the case for one-by-one contact on platforms such as LinkedIn. Though prior work is not always clear on where exactly on GitHub users' email addresses were collected (options include commit email addresses and users' profile pages), from discussions with authors of some previous studies we know that the use of commit email addresses is common. Following this previously used method, we analyzed commits made into public GitHub repositories in August 2020 to identify e-mail addresses of people working on websites, as indicated by the respective commit including file extensions related to web development (.js, .php, .css, .html, .htm). Anticipating a low response rate, we sent invitations to 37,000 email addresses, in addition to 12,000 contacted during pilot testing.

Research Ethics
Prior to conducting the study we looked into opportunities for ethical and data protection review at our institutions. At the time this study was designed, conducted, and evaluated, the authors were affiliated with Leibniz University Hannover (LUH) and Ruhr University Bochum (RUB), both located in Germany, and the University of Michigan (U-M) in the US. RUB only had an IRB for research in psychology, which was not meant to be mandatorily consulted by security and privacy researchers. LUH's IRB only targeted project proposals, not individual research papers. The co-author from U-M did not directly work with raw response data or interact with participants and confirmed with U-M's IRB that their oversight and approval was therefore not required. Nevertheless, we followed best practices for research conduct and transparency. To ensure GDPR compliance of our study, we consulted RUB's and LUH's data protection officers. They both independently considered our study design and specifically the approach for GitHub recruitment to be covered by the GDPR's research privilege.
In Q2-2 we required some participants to provide the URL of a website they had worked on, following Mhaidli et al.'s study design [55]. We explained in the initial consent form that this data would only be used to check the website for the presence of thirdparty services. Participants required to fill this field were able to drop out or proceed without penalty by entering arbitrary input.
Regarding recruitment, we carefully considered the implications of sending email invitations to website contacts and GitHub developers at a large scale. As mentioned above, the two consulted DPOs considered this recruitment approach to be GDPR-compliant. We contacted each email address only once (i. e., we did not send any confirmations or reminders) and gave email recipients a oneclick option to opt-out of further contact. Still, we received a small number of emails with negative sentiments from people who were not aware that their public GitHub commits contained their email address. Upon this feedback we put up a page on our institution's website that explained our study, why the GitHub-recruited recipient's email address was visible in commits into public repositories, and what steps could be taken to hide it. Despite these efforts, one recipient filed a complaint with our state's data protection authority, upon which we immediately stopped recruitment via GitHub, rather than waiting for the outcome. Three months later the DPA informed us that they did not consider the GDPR's research privilege to apply, because GitHub users, who are often unaware of their commit email addresses being publicly available, do not expect to be contacted via these addresses for the purpose of scientific research. We discuss the concrete problem with GitHub's mechanics for email addresses in more detail in Section 7.4. The DPA advised us to refrain from future recruitment via public GitHub commits but did not take formal action.
When we designed and launched the study, ethical concerns with recruitment via public GitHub commits were not obvious: The method was established in the community [1,2,26,56,73,74,92], even post-GDPR [71,81,84], and had passed ethical or IRB review at different universities in the US, Europe, Australia, and at the NIST Human Subjects Protection Office. As such we followed established research practice at the time, as well as sought consultation/approval regarding GDPR from two data protection officers from different institutions, who independently concluded the recruitment method to be covered by the GDPR's research privilege. In hindsight, we agree with participants' and the DPA's concerns regarding GitHub recruitment, which is why we decided to fully discuss our experience in this paper. We consider this aspect of our work a valuable lesson learned for the community in how legal or ethical assessment of established study methods can -and should -evolve. Section 7.4 discusses implications for future work.
We want to stress that all participants whose data is reported in this paper provided their information with informed consent, obtained both at the beginning of the survey and at the end after debriefing about the study's privacy focus. The issue pointed out by the DPA lies with the recruitment method, not with the data we received from the willing and consenting survey participants.

Data Cleaning
Across all recruitment phases, 2,177 people opened the survey link, 667 proceeded past the welcome page, and 452 completed the survey. Out of these, we removed 41 that had not seen Parts 3 and 4 due to a lack of reported involvement, nine who selected contradictory levels of involvement, and seven who provided multiple websites. To increase data quality, we examined response times. Average completion time was 20:42 minutes. We did not observe any suspicious patterns and thus did not remove any answers. This left us with a total of 395 valid responses. Two authors inspected all open-response "Other" answers and re-coded answers that matched existing closed-ended options after discussion and mutual agreement. For website analysis, one author inspected all provided URLs (Q2-0) and removed all answers that were not URLs (e. g., "client confidential") or could not be resolved to a website.

Data Analysis
Two of the authors applied thematic analysis [10] to the answers to open-ended questions. First they independently reviewed the data to identify recurring themes and created individual codebook drafts for each question. Next, they discussed these drafts and merged them into a first joint codebook. All data was then jointly coded by both researchers, who discussed problematic cases until an agreement was reached, which at times required refining codes' definitions and scopes and, thus, revisiting previously coded answers. We did not compute inter-rater reliability, as the number of responses was small enough to not require splitting up between multiple researchers [54]. Each open-ended response could be assigned one or more codes, as participants often mentioned more than one relevant talking point. Appendix B contains the final codebooks.
To assess to which extent participants' responses about websites' integrated functionalities matched actual practice, we checked the provided websites with OpenWPM [16]. For each provided URL, we accessed the front page, searched it for links to subpages, and visited up to 100 unique pages randomly selected from these to ensure we gained a complete picture [87]. We performed crawls from Germany, California, and India to cover possible differences between jurisdictions [11,32,90]. For each page, we collected all HTTP(S) requests and compared the list of found third-party services with those mentioned in the respective survey response, using the WhoTracks.me [39] categorization as a basis. Finally, we compiled metadata on the provided websites: top-level domains (TLDs), website topics based on the McAfee Real-Time Database [53], and popularity based on the same Tranco list we used for recruitment.
For data analysis we mainly rely on descriptive statistics because the variance in response counts per website functionality would cause statistical tests to often be underpowered. Where statistical tests are appropriate and possible we used Fisher's exact tests to check if differences between categories were significant and corrected for multiple tests with the Benjamini-Hochberg procedure.

RESULTS
Our results show that, as in other domains, user privacy is rarely considered in web development. Yet, we do find influence of regulators' guidelines for some types of functionality, and self-hosting is a prominently considered alternative to third-party use. We also find a widespread lack of awareness that third-party use implies transmission of IP addresses and device metrics to the third party.

Sample
We first describe the sample of 395 participants and 361 websites they provided to support the main part of the survey. . This is consistent with demographics surveys of people working with web technologies, whose large majority are men, typically in the 24-34 age range, holding a bachelor's degree in technical fields [12,27,78,94].
Participants' work with websites (Q1-2) was most frequently in a full-time position (41.8 %), though freelancing and part-time employment were also common, as was non-paid work (hobbyist 31.4 %). In the last three years, participants had mostly worked on 2-5 websites (43.8 %; Q1-1). As for previous experience with the ten website functionalities (Q1-3), all but one participant reported at least one functionality, with a mean of 5.28 (sd 2.37, median 5). Experience with front-end programming or design libraries (83.0 %) and user login or authentication (80.5 %) was most common, while the fewest participants had worked with privacy plugins (29.9 %) and advertising (23.0 %). Participants held on average 3.4 different website-specific roles (std 2.58, min 1, max 13, median 3; Q2-1) and most often worked as (web) developer, programmer, or software engineer (85.3 %). Other frequently reported roles include administrator/web operator, user experience design, content creator or contributor, and product or project manager. Most participants worked alone (35.7 %) or in teams of sizes 2-5 (35.7 %) (Q2-2). 42.0 % had received prior privacy training. The most common resources of such training were self-study (38.6 % of participants with training), employer training, courses at a university or school, and other nononline courses, including certifications such as CISSP. Table 6 in Appendix D has detailed data about participants' demographics and background in their work with websites.

Websites Provided by Participants.
In Q2-0, we asked participants to provide a website they had recently worked on that would serve as a reference for Parts 3 and 4 of the survey. Data cleaning left us with 361 unique valid websites, for which we compiled descriptive statistics. The most frequently occurring TLDs were .com, .org, and .de, followed by domains associated with web development, such as .github.io or .dev. Thematic classifications by McAfee were available for 264 (83.8 %) domains, the most common being Business, Internet Services, and Education/Reference. 141 registered domains (44.8 %) appeared on the Tranco top 1-million list, with a mean ranking of 104,767 (min 5, max 958,899, std 168,620.3, median 46,695). Overall we find that participants mainly reported international sites aimed at providing services or information, but also a significant amount of smaller and/or personal sites hosted on popular platforms and a multitude of other thematic categories, creating a diverse sample of websites.
Participants named 72 different countries as the seat of the company behind the website (Q2-3).
Coding of the open-ended answers to Q2-4 revealed that the websites were mostly targeted at a global or multi-regional audience; Table 7 in Appendix D also lists the most popular individual target regions. Almost half of the websites (44.8 %) were reported not to have a website-specific revenue model (Q2-5). On average they relied on 0.91 sources of revenue (std 1.03, min 0, max 5, median 1). Most common were products/services sold on websites (20.5 %), subscriptions/membership (17.5 %), and revenue streams not explicitly listed in Q2-5 (14.4 %). Table 7 in Appendix D contains the full website statistics.

Privacy Considerations in Selection
To find out if privacy played a role in the decision how to integrate a desired functionality, we investigated what functionalities were present on participants' websites, whether they were integrated via first-or third-party solutions, and the underlying decision process, including considered alternatives, consulted information sources, and the people involved.

Integrated Functionalities.
In Q2-6 we asked participants which of the ten functionalities in Table 1 were present on their website. Participants' websites used on average 5.2 of them (sd 2.3, min. 1, max. 10, median 5). In its "present" column, Table 2 lists how often each functionality was mentioned. The numbers show that the reported prevalence of functionalities differs greatly. Most commonly used were programming or design resources (355 / 89.9 % of websites), customer interaction tools (268 / 67.8 %), and web analytics (251 / 63.5 %).
To assess the number of third parties the websites actually use, we combined the data collected from three server locations to ensure that no configurations dependent on visitors' IP or region biased our results. Out of 361 unique websites provided we were not able to access 10. On average, each website contacted 6.2 third-party domains (min 0, max 144, std 6.95, median 3) and 80 sites made no requests to third parties at all. Table 2: Reported functionalities on websites (Q2-6; n = 395), participants' involvement with them (Q2-7; relative to "present"), and, based on that, how often they were randomly assigned survey parts 3 and 4.  For 76 sites we found mismatches between Q2-6 responses and third parties observed on the website. The most common observation was a request to Google's advertising domain doubleclick.com (42 cases), followed by site analytics (14), CDNs (12), customer interaction (6), and embedded media (5). The rest belonged to other functionalities not covered by the survey. The high prevalence of requests to advertising domains despite the fact that developers had not reported the use of advertising -confirmed by manual inspection -can be explained by third parties loading additional services [87]. The majority of requests went to doubleclick.com, contacted by locally hosted Google Analytics scripts. Other cases involved social media bookmarking services like AddThis or ShareThis that contact various advertising domains.
In the other direction, 136 responses reported functionalities for which website analysis did not find obvious requests to matching third parties. The majority of these cases concern scripts for customer interaction (64), embedded media (70), or social media integration (46). Besides methodological limitations outlined in Section 6, the explanation was often that the functionality was hosted locally, e. g., via CMS plugins, as reported in Section 5.2.2.
Last, we compared the hosting strategies against privacy-friendly recommendations [19]. Table 3 lists results for selected services. We found that for many common third-party services like analytics, videos, and maps the main strategy was to embed the well-known services. For example, 158 websites made use of Google Analytics, while only 15 used the privacy-friendly alternative Matomo. Out of those 15 another 4 were found to be using both, e. g., on subsites. For more technical functionalities like programming and design resources we observed more variation in first-vs. third-party hosting. While we found only six websites that used privacy-friendly font hosting sites (such as Fork Awesome or Fontello [19]), 86 hosted additional fonts on their own server. For the widely used web programming library jQuery the results were reversed: The majority (138) self-hosted the script, while 72 used CDNs to serve the files. Again there were sites using both strategies, for example, when a library was used multiple times by different components or plugins.

Prevalence of First-Party vs. Third-Party
Solutions. Q3-2 investigated how the different functionalities were integrated into websites. We focused on the hosting location (first-party solution, third-party software installed locally on the own system, or thirdparty service remotely included from vendor's server). For embedded media and social media, we also investigated (Q3-2c/2d) how remote resources were embedded into the website: via self-written code, code provided by the third party, or an embedding method provided by another third party (such as social media plugins that support multiple social media sites). Figure 1 shows the prevalence of each hosting and embedding type. We observe that websites predominantly self-host solutions for customer interaction (user comments, contact forms, chat, etc.), privacy popups and forms, and embedded audio. Remotely hosted third-party solutions are dominant for analytics, payment, and hosting of embedded video and map content, while prevalence of the different hosting types was more varied in the other categories.
As shown in Figure 1(b), remotely hosted media are typically embedded using the code provided by the hosting service. Social media share buttons and embedded feeds, whose functionality implies the requirement to access an API provided by the social network, more or equally often use one of the two third-party embedding variants. By contrast, buttons or links to the website's social media profiles, which do not trigger an action specific to the social network, are more frequently integrated via first-party solutions.
Q3-2 also asked participants to specify which concrete service the website used. Coding revealed the following categories of functionalities to have a clear market leader: advertising (Google Ads / Ad-Sense / DoubleClick for Publishers [63.6 % of participants who used a third party and provided an answer]), analytics (Google Analytics, 65.7 %, followed by Matomo, 10.3 %), embedded videos (YouTube, 90 %), embedded maps (Google Maps, 62.5 %). We observed a more varied use of third-party services for programming and design resources (top 3: Bootstrap (18.2 %), React (17.5 %), jQuery (14.7 %)). For website protection, participants equally often mentioned web security libraries, which they considered self-hosted third-party services, and Google's reCAPTCHA as the most popular remote third-party service (12.1 % for both).
Overall, our findings match expectations: Third-party use seems more prevalent for website functionalities that (mostly) require  third parties to be involved, such as payment services or social media integration, or that were deemed to be complex to self-host or implement, such as analytics or video and map resources [50].
As for the concrete third-party services used, web tracking research has repeatedly identified Google's services to be the most prevalent third-party services on the Web [16,38,88]. Still, we measured some efforts at privacy-friendly configuration of Google services.

Decision
Process. Next, we investigated how people had arrived at these solutions to integrate different website functionalities.
People Involved in the Selection Process. We learned about who was involved in the selection process in two ways. For participants involved in the selection of how to integrate a functionality (Q2-7), we evaluated their roles with regard to the website (Q2-1). Across all categories, people involved in selection predominantly had technical roles. For given roles we also observed higher involvement in the selection of functionalities that closely relate to that role, such as customer support for customer interaction or sales for advertising. Q3-8 asked participants not involved in selection who had made that decision. Here participants most frequently referred to developers, with the notable exception of privacy popups or forms, for which the decision often lay with the legal team, data protection officers, or management. This is also the functionality where participants reported the lowest involvement rates (see Table 2). Figures 5 and 6 in Appendix C have details for both questions.
Resources Used for Selection. Across all categories, participants mainly relied on official websites and documentation to select how to integrate a given functionality (Q3-6); also frequently named were the website's team, online articles, and forums. The same information sources were reported as most commonly consulted in the selection of ad networks for mobile apps [55]. Also confirming the findings of previous work [4,55], terms of service or privacy policies were rarely consulted, except for payment, privacy plugins, and advertising (16.7 % for each). Figure 7 in Appendix C has detailed numbers. This suggests that not even functionality where people directly enter sensitive information, such as customer interaction, prompts developers to look up a third-party service's data processing practices. This could be due to the complexity and length of these documents, which reinforces the need that thirdparty services present their key privacy practices in a condensed, easy to understand, and accessible form [4].

Reasons for the Selection of Existing Solutions.
Coding of the openended answers to Q3-3 identified reasons why the respective integration solutions had been selected for each functionality. Figure 2 investigates the reported reasons for two mutually exclusive groups: purely self-hosted solutions, whether first-party or via a locally hosted third party, where collected data is expected to stay on the website's host system, vs. solutions that only rely on remote thirdparty hosting and thus can involve information being sent to a third-party server. Figure 2(a) shows the prevalence of each code for each of these integration types, aggregated across all functionalities. We find that the most prevalent decision factors for either integration type are ease of integration and features, though these play a bigger role in the adoption of pure third-party solutions. The "Other" category mainly comprises generic answers such as "I just like it" (P323-Social) or "it's the best" (P188-Login), which explains its relatively high prevalence. Beyond these general factors for adoption, we observed that some mainly occurred for certain functionalities, such as revenue for advertising, legal considerations for privacy plugins, security for login/authentication, familiarity for programming/design and analytics, and popularity for payment. Privacy aspects were rarely mentioned, except for analytics ("I wanted something very minimalistic, non-intrusive" [P353], "I care about users privacy" [P83]). These observations confirm findings in the mobile space that third-party adoption is driven by the goal  to save time and effort through code reuse [71] and additionally finds that these factors can fuel the reasoning both for or against third-party use and there are differences between functionalities.
Consideration of Alternatives. Participants involved in the selection of a functionality were asked in Q3-4 whether they had considered alternatives to their chosen integration solution. Figure 3 shows that across all categories, this was answered negatively by a large share of participants, from 16.7 (advertising) to 50.7 % (analytics). A similarly low rate was reported in the work of Mhaidli et al., who found only two out of nine interview participants to have made some effort in considering and comparing different mobile ad networks before settling on one [55]. Rather, participants were found to select a network based on some "vague awareness" of what was popular and commonly used with good experience. We found similar sentiments in our data for functionality with a clear market leader, notably the prevalent use of analytics, for which the outstanding popularity of Google Analytics was confirmed by our measurements ( Table 3). The answers to Q3-2 suggest that people consider it the "default" solution and do not even think about possible alternatives. Except for payment, which is only practical with the involvement of third parties, most considered alternatives were first-party solutions, even for functionalities considered difficult to self-host such as video content or (targeted) advertising [50]. This could again hint at people rarely choosing between different thirdparty services but rather deciding between either self-implementing a functionality or using a specific third-party service.
For embedded and social media, participants also had the option to indicate whether they had considered embedding mechanisms from other sources. Of the 62 people who had been asked this question for social media integration, 12 (19.4 %) had considered using code provided by the social networks and 4 (6.5 %) had considered code by another third party. The embedded media category was shown to 86 participants, 9 of whom (10.5 %) had considered selfwritten embedding code, 3 (3.5 %) code provided by the resourcehosting third party, and 4 (4.7 %) code by another third party.
As for the reasons why alternatives were considered or not (Q3-5), Figure 2 in (b) and (c) investigates this for self-hosting vs. pure remote third-party use. We observe that, like for the selection of the current solution (a), ease of integration is a prominent factor to both consider and not to consider alternatives. Somewhat unexpectedly, for pure use of third parties this reason and resources appear to be factors to research rather than to not consider possible alternatives. This could hint towards users of third-party services not always being content with what those offer and decision processes to be complex. However, the most important factor not to consider alternatives appears to be familiarity with the selected solution, for self-hosted solutions even more so than for use of remote third-party services. The "Other" responses to this question mainly comprised satisfaction with the current solution, low priority of the respective functionality, or mere statements that it was unnecessary to look for alternatives ("It wasn't required" [P316-Privacy]; "The first way worked" [P241-Analytics]).

Privacy Considerations in Integration
Beyond the selection phase, we investigated participants' privacy practices in the stage of integrating the selected solution.

Resources Used for
Integration. For integration itself, the answers to Q3-7 paint a similar picture as the resources for selection (Q3-6). Again, the main sources of information were official websites/documentation and the website's team. Online articles and forums are less often used for actual integration compared to the selection phase. Terms of service and privacy policies again were rarely consulted. Though not directly comparable in answer space, the 20 % of privacy plugin users who consulted terms of service or a privacy policy are in the same dimension as the legal information sources used to integrate consent forms for advertising in mobile apps [81] (14.1 % for "Legal policies (e. g., GDPR)" and 9.9 % for "legal teams"). Figure 8 in Appendix C shows detailed data for Q3-7.

Privacy Protection Efforts.
When asked in Q4-2 if they had employed specific measures to protect website visitors' privacy when configuring their solution to implement a functionality, participants' answers did not vary significantly (p > 0.05, Fisher's exact test) across functionalities. For all of them, about a quarter of participants reported to have employed privacy protection mechanisms, another quarter stated to not have used them, about one third did not know, and the rest did not provide an answer. Table 4 shows what privacy protection efforts participants reported to have made in the configuration of their solution. Participants frequently referred to data minimization ("I don't really collect user information, and when I do, I keep it to a minimum to get the job done" [P361-Programming]) and secure transfer ("encryption and [TLS]" [P84-Inter]). Another prominent theme in the answers was first-vs. third-party selection, including self-hosting as    a means to protect visitors' privacy ("Remove tracking from social media buttons by replacing them with a similar button" [P385-Social]), careful selection of the third party with privacy in mind ("I chose a font service that I believed would respect user privacy" [P136-Progr]), and using settings offered by the third-party service to collect less data. Prominent themes in individual categories are security for login/authentication (32.5 %) and customer interaction (28.1 %); anonymization, data minimization (22.2 % for both), and third-party settings (30.6 %) for analytics. The explanation for the repeated occurrence of security mechanisms, including access control, is that developers often conflate privacy with security [29,85]. Across all categories, only 24 answers to Q4-3a explained the motivation behind the measures to protect visitors' privacy. 20 named regulatory requirements mostly from privacy law but, in the case of payment providers, also industry regulations. Two participants mentioned an unspecified "requirement" for analytics and another two a self-commitment to privacy (for analytics and social media). Table 5 shows the reported reasons not to make privacy-protecting configurations. Most frequently, the solution was perceived not to collect any personal data, which was especially prevalent for programming/design (39.1 %; "because the third party does not collect anything" [P109-Progr]), embedded media (18.6 %), and social media (34.8 %); in the latter case, the responses often referred to first-party integrations of profile buttons or links ("they're just links" [P98-Socia], "simply images, wrapped in anchor tags" [P289-Socia]). Other prominent themes were trust in the third party to adequately protect users' privacy ("I thought the default setup already protects the visitors' privacy enough" [P243-Analy], "I trusted [Cloudflare] to not collect excessive information" [P321-Prote]) and the perception that it was impossible to do anything about collected data ("there is nothing I can do in GA to change the data Google collects" [P396-Analy]), particularly for analytics (27.6 %). Trust in third-party vendors and the perceived inability to do something about the data collection were also recurring sentiments in why developers of mobile apps stick to a service's default configuration [55]. Finally, some answers simply deemed privacy protection unnecessary ("I don't care about privacy because 'data is king'" [P295-Payment]), prominently for programming/design (39.1 %) and embedded media (18.6 %). .

Awareness of Third-Party Data Collection
Q4-1 more closely investigated the assumed lack of awareness of third-party data collection. For the third-party users of each functionality (as by Q3-2), Figure 4 shows the percentages who thought that the service collected specific types of data. We observe that participants had a solid understanding of data collection implied by a service's core functionality. For example, a majority of participants reported that third-party privacy popups/forms collect cookies, that payment services require contact and financial information, or that advertising and analytics collect device information and user online activities. However, beyond this, participants' understanding of data collection was limited. This is especially evident in the case of IP addresses and device information: As HTTP(S) requests to a remote resource involve transmission of a user's IP address and user agent, this information is always available to the third party. More indirect is the opportunity for the third party to derive additional information via these technical parameters, such as tracking users across sites that use that service and learning their browsing behavior. It appears that many participants embed third-party software and either do not know or are uncertain of the true extent of data collection by the third party. This is supported by the responses to Q3-9 that let participants rate the integrated solution with regard to different metrics. Between 48 % (advertising) and 75.71 % (website protection) of participants reported to be Satisfied or Very Satisfied with the privacy offered by their integrated solution, while only up to 8.73 % (analytics) expressed some degree of dissatisfaction. This suggests that data collection by third parties is often either accepted or unknown.

LIMITATIONS
Our study has some limitations. First, we aimed to recruit a diverse sample and we are confident that it provides a wide range of perspectives on third-party adoption but may not include every type of website or third-party user. Websites and third-party services are not easy to categorize, and therefore participants might have interpreted our categories differently (see Table 1). However, we provided examples and aimed for a sensible compromise between lengthy explanations and too much room for interpretation.
Second, a limitation of any survey is self-reported data. We cannot verify to what degree participants were actually involved with the provided website or if they consistently answered for the same site. Analyzing self-reported information is common in research involving developers [1,55,63,74] and manual inspection of survey responses suggests that participants answered consistently. Our survey was voluntary and uncompensated, which might have introduced bias, especially since experts tend to be well-paid and hard to reach. However, a lack of compensation was found to yield higher motivation or engagement in developer studies [1,2,26,56].
Further limitations apply to our website analysis. As data collection took multiple weeks, it is possible that in some instances websites changed between participants' responses and website analysis. Additional discrepancies might have been introduced due to our categorization differing slightly from WhoTracks.me, third parties using the same domain for multiple purposes, or participants not knowing or naming the functionalities on their website.

DISCUSSION
Our findings provide insights into how web developers and people in similar roles select how to integrate a desired functionality, configure the selected solution, and if they are aware of the privacy risks associated with third-party services. For selection, we find the prevalence of third-party use to vary by functionality. In configuration, specific efforts to protect website visitors' privacy mostly appear to be made if mandated by technical guidelines on privacy law. Based on these findings, we discuss the need to raise awareness of the privacy risks of third-party use on websites and to promote adoption of privacy-friendlier alternatives. On the methodological level, our work is a case study for how the perception of research methods previously deemed acceptable can change over time.

Lack of Awareness of 3P Data Collection
Our research confirms the previously suggested lack of awareness [19] to what extent the use of third-party functionality on websites can pose risks to visitors' privacy. While developers appear to be aware of data collection closely tied to the main purpose of a third-party service, they often seem to not know or ignore the possibility that their visitors' personal data could be collected for other purposes, or simply trust the third-party service to not collect data or to employ adequate privacy protection. For analytics, our results hint at a somewhat higher privacy awareness than for other functionalities. This could be due to data collection simply being the main objective of web analytics, or due to prominent and recent guidelines on GDPR-compliant use of web analytics [42,80]. Similarly, concrete legal requirements have led to the adoption of privacy notices and forms, while developers appear to find it difficult to implement the more generic "privacy by design" approach promoted by the the GDPR or the NIST Privacy Framework [57]. Public discussion and additional guidelines could help raise awareness for the privacy risks of other types of third-party services on websites, and on operationalizing "privacy by design" for website development and integration, ideally addressing a wide range of website-related roles. Measures to raise awareness would also need to communicate risks beyond the immediate control of developers, as third-party services often connect and share data with each other without users' knowledge [88], and different understandings of the sensitivity of the collected data, such as IP addresses.
Referring developers to a service's privacy policy is insufficient to communicate its privacy risks. While privacy policies can be expected to contain information about the data collected by a thirdparty service, our results show that they are rarely used when selecting or configuring services. This is unsurprising given that privacy policies are notoriously hard to understand, and the GDPR, a law pursuing greater transparency, has even led to an increase in the length of online privacy policies [14]. As an additional aid, privacy labels for mobile apps have recently been introduced into Apple's and Google's app stores [3,21]. With web development not taking place inside such a closed ecosystem, there are no centralized platforms developers could turn to for advice and comparison of different services that integrate a given functionality. For those who use common CMSes, their plugin repositories could introduce similar labels, placing privacy information more prominently than in a legal document. Alternatively, IDEs [47] and CMS editors could help assess the number of third-party requests in website code or problematic configurations for popular services and display advice.

Promoting Privacy Engineering
Our work confirms earlier findings from the mobile ad ecosystem that developers often feel resigned and unable to effect change in a third-party ecosystem governed by the exchange of revenue or functionality for access to website visitor's personal information: Previous work found sentiments that users' personal data would be collected by platforms and vendors, irrespective of the developer's decisions [55], and both developers [55] and third-party vendors [83] deem the respective other party responsible for the protection of users' data. One option to break this cycle of blame and instigate change would be to encourage developers that they can indeed make a difference through privacy-conscious integration of functionality [55]; after all, it is developers and end users that made these vendors that prevalent and powerful through use and promotion of their services. While in the past it was often browser vendors and developers of privacy-enhancing extensions who fueled advancements in website visitors' privacy, such as the option to block third-party cookies, relegating privacy protection to the browser comes at the risk of breaking websites and could overwhelm users with configuration options and prompts. Thus, promoting privacy-by-design with website creators would be a more holistic approach that can ensure that privacy is considered from the beginning of the web development process, desired website functionality works as expected, and the burden is not placed on website visitors. A website that practices data minimization and privacy by design could even render annoying consent notices unnecessary for the benefit of both websites and visitors.
We found notable involvement of DPOs or legal experts only for privacy popups or forms, i. e., functionality added for the administration of a website's existing data processing practices. This could be an indicator that privacy is still regarded as something to be "added later" instead of being considered throughout the development process. Moreover, web development is often done in small teams or by single persons without a privacy professional at hand. When the decision is in the hands of developers and made in early stages of the development process, our results show that ease of integration and familiarity with solutions are the driving factors for adoption. This does not necessarily mean that developers do not care about privacy, but it is simply not an important concern given deadlines and limited resources in small teams [51]. While at the beginning of development it is often unclear what user data the final (web) application will need [4], this does not preclude the involvement of privacy considerations from the beginning. Iterative privacy impact and risk assessment processes that continuously evaluate functional requirements against privacy implications could help ensure that the desired functionality is implemented using the least amount of personal data, thus complying with frameworks that follow a data minimization or privacy-by-default approach.

Promoting Privacy-Friendlier Alternatives
While advice to self-host [66,69] or use privacy-friendly alternatives to popular third-party services [19] has increased in recent years, we found that only few participants heeded such advice. Others reported not knowing alternatives to the solution they used or did not have the time or resources to look for them. This should be interpreted as a challenge to better promote privacy-friendly alternatives for both the developers of these services and the privacy and security research community at large. We found ease of integration, features, and cost to be among the most frequently reported factors that cause developers to adopt a certain solutionrequirements currently easiest to satisfy by a service available free of charge that instead monetizes visitor data. It remains a major challenge to reconcile the demand for usability, features, and lowest possible cost if monetization of visitor data is not an option.
On the configuration level, privacy-friendlier options do exist but are often hidden or obscured by dark patterns [82]. For example, YouTube's setting for "privacy-enhanced mode" is only revealed when one scrolls down in the "Embed" dialog while the standard embed code is directly visible. Vendors could encourage use of the privacy-friendly configuration by making it more prominent or even the default, though there is no incentive for this if the service's business model is based on monetization of personal data, as is often the case with third-party services offered free of monetary cost. Privacy laws and court rulings were identified as drivers of privacyrelated settings in ad networks [83], analytics services [59,67], and cookie consent notices [18]. Thus, public policy measures and regulatory guidance could go one step further and require vendors to make the privacy-friendly option the default.

Section 4.3 described how recruitment via email addresses in public
GitHub commit metadata came under the scrutiny of our state's data protection authority. We now discuss what part of the process had raised concerns with the DPA, what this means for future recruitment in privacy and security research, and what could be done in advance to decrease the likelihood of facing similar problems.

Recruiting Developers on GitHub.
Email recipients who asked how we found their email address on GitHub often pointed out that they had set their email address to "private" on their GitHub user profiles. While this setting hides the address from the public profile, it does not affect the visibility of the email address in commits to public GitHub repositories. Any given commit into a public repository has a corresponding *.patch file, available at https://github. com/<user>/<repository>/commit/<commit_hash>.patch. The second line in this file shows the author of the commit, along with their email address. This is due to the core concept behind GitHub's public repositories, where all commits, including metadata, are public. The documentation [24] describes how users can configure Git(Hub) to use their GitHub-provided "noreply" email address, which will remove their real email address from the commit metadata but still associate their contributions with their GitHub account.
Email feedback showed that many GitHub users are not aware of these mechanics and settings. This was also the issue at the core of the DPA's assessment, which argued that GitHub users pushing commits into public repositories did not expect to be contacted via their commit email addresses for the purpose of scientific research, and this lack of awareness constituted a legitimate interest of the user that outweighed public interest in scientific research. In addition, users of GitHub's API are bound by GitHub's terms of service and privacy statement [23]. GitHub's privacy policy considers a user email address public information (unless made private as described above) but proceeds to limit its use "for the purpose for which [the] user authorized it" [22]. Following the DPA's argument, this likely does not include being contacted for the purpose of participation in scientific research. It remains for the community to decide what influence such company policies should have on the question of what is considered ethical in privacy and security research, and, looking further ahead, how to handle company policies on data use that contradict what is permissible under applicable law.
For future recruitment of study participants we recommend, as also suggested by the DPA, to only use contact information that has visibly been made public by the individuals themselves with the intention of allowing the general public to contact them. GitHub's email address mechanics and users' lack of knowledge about them had neither been mentioned nor addressed by previous work that used public GitHub repositories for recruitment. We hope that our experience can inform the ongoing debate about ethics in privacy and security research and the search for alternatives to reach diverse sets of developers in a reliable, ethical, and affordable way.

The Need for A Priori Community-Based Ethics
Review. It has long been best practice in human subjects research to obtain prior review via an institutional review board (IRB) or a similar entity to ensure that participation in the study does not cause undue harm to humans. However, in practice, many institutions, especially outside the US, do not have such a review board, or review is not always mandatory, as was the case for our study. But even if prior IRB review had been available, it remains doubtful whether it could have prevented the complaint to the DPA. The main goal of IRB review is to ensure that a study complies with human subjects regulations, not to provide a comprehensive ethics and legal assessment. In fact, we took additional steps to get GDPR assessments from our institutions' DPOs before running the study. The challenge is that in privacy and security research, a deep ethics and legal review would often require specific technical domain knowledge (e. g., GitHub's handling of commit email addresses), associated risks, and their legal evaluation. These are aspects that are often not covered by IRB guidelines or board members' background due to their differing function. Legal assessment in particular can be subject to rapid evolution through new laws and court rulings, requiring involvement of legal experts who keep up with this constant change.
Recently the privacy and security research community has identified this need for thorough ethical review and multiple venues have set up ethics committees that can be involved in the review process if a submission raises ethical concerns with reviewers. This work went through this very process, and we highly value the thorough ethics review we received, which concluded that we adequately addressed our study's ethical implications. While ethical review after submission is an important step in ensuring that published privacy and security research did not cause undue harm to the people whose behavior and systems were studied, it effectively comes too late, at a time any potential harm would have already been caused. Hence, the community needs to consider how to provide ethical guidance before potentially harmful research is carried out, for example, by means of a "standing ethics review board" of expert volunteers that can complement institutional review in the study design phase. Such a priori ethics review would (1) help prevent unethical privacy and security research before it occurs, (2) provide researchers with experience and confidence in how to address ethical implications, and (3) minimize the sometimes arbitrary and ad-hoc assessments of a study's ethical implications by reviewers. An existing example is the Tor Research Safety Board [86]; providing committees of domain experts that cover the whole privacy and security field would pose a major challenge. Hence, such a priori review would not have to be mandatory for all submissions but could become a valued community resource.

CONCLUSION
We report findings from an online survey with 395 people working with websites on how common website functionalities are implemented, in particular if third-party services are used and whether and how respective privacy implications have been considered.
While we observe that the selection process is influenced by a variety of factors, we find that often factors such as a third-party service's popularity and ease of integration fuel adoption decisions. By contrast, website visitors' privacy only plays a notable role in web analytics, a functional category which has been explicitly addressed by data protection authorities. Except for privacy popups and forms, data protection officers and legal counsels are rarely involved in the decision processes that lead to the integration of third-party services into websites despite potential privacy implications.

Your Background
First we would like to learn about your background and your work on websites. Throughout this survey, by "work on websites" we mean your involvement to some degree in the design, development, deployment, maintenance, and/or management of a website.

Integration of Website Functionalities (category-specific)
In Part 3 we would like to ask you a few questions about the integration of some of the functionalities you indicated to have worked with on the website. You will be shown these questions for at most three different functionalities, regardless of how many you have selected in the previous question.
(For up to three categories randomly selected from those the participant has indicated involvement in the previous question, they are asked the following questions.) You indicated that you have been involved to some degree in the integration of [FUNCTIONALITY (examples)] on the website. Now we would like to ask you a few more questions about how this functionality has been integrated. 3 Revenue (Not) using this solution affects revenue and conversion, and therefore income. Performance (Not) using this service affects site performance, e. g., loading times or server computation load. Ease of Integration It is very easy/hard to implement or integrate the solution. Ease of Use It is very easy/hard to use the solution (once it has been integrated). Customization The solution can(not) be easily customized to the participant's needs. Features The solution (does not) offer(s) specific features that the participant deems important for their use case. Cost It would be cheap/expensive to use the solution. Resources The solution was cheap in non-monetary resources, such as time or workforce. Popularity The solution is very popular, widespread, or even a market leader. Availability The solution is easily accessible, e. g., because it is already in use. Familiarity Friends, colleagues, or the participant themselves know or use the service, allowing the participant to benefit from this experience. Privacy Privacy was a relevant reason; the service was used because it, e. g., allowed privacy-increasing configurations. Security Security was a relevant reason; the service was used because it, e. g., allowed security-increasing configurations. Dependence (In)dependence on/from libraries or services that, e. g., might suffer from outages or be abandoned by their developers in the future. Legal The service was used due to legal requirements to, e. g., add a privacy policy or cookie banner. Other Other concrete reasons not covered by the codes above.
No answer The participant did not provide an answer to the question, either by filling in nothing, something incomprehensible, or not providing an answer to the question (e. g., instead repeating what they did, not why).
Type of Effort Made to Protect Website Visitors' Privacy (Q4-3a) No Personal Data No personal data is collected.

Data Minimization
Only the necessary personal data is collected; data collection is as minimal as possible. Self-Hosting Services are self-hosted; all data stays within the respective organization. 3P Selection Third-party services are carefully selected; there was a conscious decision for/against certain third parties. 3P Setting Third-party services are configured in ways that increase privacy, e. g., by limiting the amount of collected data, encrypting data etc. User Consent Users were informed that their data would be available to third parties and gave their consented to this data processing before the functionality was loaded. Transparency Privacy Policies or similar information on data practices is available to users. Data Access The access to the data/server is limited; access is controlled. Anonymization Data is anonymized and cannot be used to identify certain individuals. Security Security practices to avoid known attacks or vulnerabilities (e. g., to avoid XSS) are in place, that increase privacy by decreasing the probability of data leaks. Other Other concrete reasons not covered by the codes above.
No answer The participant did not provide an answer to the question, either by filling in nothing, something incomprehensible, or not providing an answer to the question.

Reasons to Protect Website Visitors' Privacy (Q4-3a)
Regulatory Some regulatory framework, e. g., law or industry standards, mandate privacy protection measures. Requirement An unspecified requirement, e. g., by the customer, mandates privacy protection measures. Self-Commitment The participant applied privacy protection measures out of intrinsic motivation, without external influence.
Reasons Not to Protect Website Visitors' Privacy (Q4-3b) No Data Collected The solution does not collect any personal data, so there is no need for privacy protection. Data Minimization Only strictly necessary data is collected, so there was/is no need for privacy protection. Self-Hosting The service is self-hosted, and there is no need for additional measures as access is limited and no external services are involved. Trust in 3P Trust in the third party to employ adequate measures to protect visitors' privacy. Impossible Data collection cannot be controlled or limited, it is impossible to increase privacy. Website Purpose The website's purpose makes privacy protection unnecessary, e. g., because its main content is only accessible in a logged-in state. Priorities Functionality (by adding third party services) has a higher priority than increasing privacy by avoiding these services. Payoff Privacy measures include too much effort in terms of e. g., workload, cost, time. Unnecessary It is not necessary to increase privacy. Answers with this code include no explanation, but often indicate a lack of awareness, care or external requirements. Lack of Knowledge Participants are not able to adjust settings due to e. g., a lack of knowledge or skill with the service. Other Other concrete reasons not covered by the codes above.
No answer The participant did not provide an answer to the question, either by filling in nothing, something incomprehensible, or not providing an answer to the question.