Mitigating Inference Risks with the NIST Privacy Framework

The NIST Privacy Framework describes itself as a comprehensive approach to organization-wide privacy program management. However, inferences can yield sensitive information of identities or attributes from nonsensitive information. Privacy governance must protect this information. Although many people and organizations are expanding their privacy definitions to include inferences, our gap analysis reveals that the framework’s mapped controls are insufficient for managing inference-driven risk. The framework does not attend organizational focus to privacy inference risk sufficiently to support its stated claim of comprehensive risk management. Applying the framework to past incidents where ostensibly protected information was re-inferred, we analyze how organizations can better mitigate inference-based privacy violations. Finally, we recommend detailed improvements to the framework’s controls to account better for inferences. Our recommendations encompass augmenting and mapping additional privacy risk controls to increase implementing organizations’ awareness of inference risks, updating controls that depend on protecting specific PII categories, and enhancing organizations’ proficiency in translating legal and policy requirements into technical implementations.


INTRODUCTION
Organizations face substantial challenges in practical privacy risk management.Although privacy enhancing technologies (PETs) can support better privacy outcomes, many organizations struggle to identify the nature and scope of their privacy risk and, consequently, their use of risk controls (including PETs).Risk governance within organizations is often ad-hoc and focused on legal compliance [2,85].Beyond cybersecurity, few tools exist to structure risk assessment and mitigation into repeatable practices.The preeminent privacy-focused risk management tool is the U.S. National Institute of Standards and Technology (NIST) Privacy Framework [56].Through exemplary applications, we show thatwhile a risk-based approach to governance can improve substantive privacy outcomes-the current NIST Privacy Framework attends mostly to risks borne of unauthorized information flows and cybersecurity failures while under-attending to the problem of managing data inferences, a key problem on which the privacy research literature focuses.We believe a risk-based approach to governance can improve substantive privacy outcomes.That is, this gap limits both practical privacy risk management and the extent to which PETs can be brought to bear on practical problems.
Organizations face increasing levels of risk when using sensitive data.Despite the many legally mandated data protection practices, 1technocratic decisions, like placing a privacy team in an advisory role rather than an embedded role, hinder privacy efforts [85].They impede organizational risk professionals and leaders from foreseeing compliance problems.The NIST Privacy Framework is meant to aid organizations in addressing this gap.How well does the framework outline a risk management program that identifies and mitigates privacy risks of all sorts, arising both for individuals and organizations?
The problem of bringing privacy risk governance into practice is important because it drives data handling decisions during product design and deployment.Much privacy research has focused on the risk of re-identification of anonymized data, explicitly rejecting the popular notion that redacting personally identifiable information (PII) prevents re-identifying an individual in de-identified data [15,64,71].As such, the community has largely refocused its efforts around provable guarantees and quantifiable metrics, either of formal indistinguishability properties like differential privacy [24] or of system-level information-hiding properties [33,84].Privacy practitioners face the problem of how to avoid the disclosure of sensitive facts about individuals and organizations.
The NIST Privacy Framework [56] describes itself as a comprehensive approach to organization-wide privacy program management through enterprise risk management.It aspires to enable dialogue among executives, managers, and practitioners so as to organize, assess, plan, and execute a privacy program in any organization, customized to that organization's needs.The framework asserts that cybersecurity can exist without data privacy safeguards but that protecting privacy is not possible without effective cybersecurity [56].For privacy programs to effect their necessary practicalities, the framework maps categorized risks to controls [55].Cybersecurity and the framework's mapped controls come short of sufficiently managing the risk of attackers inferring attributes [43], e.g., as in mosaic theory [4].
We analyze how well the NIST Privacy Framework identifies and controls the risks of sensitive data inferences by applying it to four inference incidents.To our knowledge, this is the first published work analyzing the effectiveness of the NIST Privacy Framework against real incidents of unexpected, sensitive inferences.In general, we find that the framework does not guide organizations' attention to enough of the various inferences risks that they should consider mitigating, depends too much on an obsolete practice (i.e., protecting privacy by protecting PII categories), and has limited guidance for translating policies into technical implementation.We further make recommendations for improving the framework, focusing on its mapped controls.We consider inference risks to be one of two distinct forms and establish a taxonomy to describe them: Re-identification Inferences Associating anonymized data with the original individuals, regardless of the presence of so-called PII [64].Operational Inferences Inferring sensitive attributes about individuals and organizations.Note that even with perfect disclosure controls to inhibit re-identification inferences, operational inferences unrelated to PII may still be possible, depending on the threat model [29,67].
After reviewing related work in §2, we consider how the NIST Privacy Framework provides organizations with a measure of their privacy risk management, especially regarding inferences, in §3.Our primary contribution in §4 is to validate the framework's capacity to identify and mitigate inference risks by applying it to four incidents.In §5, we propose recommendations for how organizations could have disrupted these incidents' structural and systemic causes.Our recommendations to organizations and for improving the NIST Privacy Framework can help organizations better mitigate inference-based privacy violations.Finally, we conclude in §6.

RELATED WORK
Privacy has become too complex for an individual to handle [39].Instead, organizations must develop a comprehensive privacy risk governance program, and the NIST Privacy Framework exists to guide this effort.Definitions of privacy diverge, however, which complicates privacy protections [8,23,48].Because legal requirements often drive organizations' privacy programs, we first review how the law accounts for inferences and individuals' rights related to inferences made about them.We then consider the information industry's competing interests in inferring sensitive information for profit.Complementary to the framework, we examine how PETs and privacy enhancing techniques can help mitigate inferences.

Accounting for Sensitive Inferences in Law
Unlike some notable U.S. privacy and privacy-related laws that do not mention inferences, we expect that newer privacy laws will incorporate inferences.Researchers have reported this shortfall in the Health Insurance Portability and Accountability Act (HIPAA) [79,82], Children's Online Privacy Protection Act (COPPA) [42,80], and Family Educational Rights and Privacy Act (FERPA) [81,89].Because the authors of some cybersecurity and privacy controls mapped to the NIST Privacy Framework derived their content from laws [58], we explore how laws account for inferences, including individuals' rights.The impact of inference attacks as a privacy risk are more significant than the way the law currently treats them.Of note, nearly all legal references to inferences that we found relate to re-identification, not operational inferences.
Holder [40] analyzes the GDPR and CCPA-the most progressive privacy laws available-for inferences and found that these laws were too ambiguous or contradictory, largely leaving aspects concerning inferences open to interpretation.He proposes that inferences drawn from seemingly innocuous data can be sensitive and should be protected by law and furthermore, that there is a hierarchy of harms that determines the sensitivity of personal data [40], amplifying that some privacy violation harms are intangible [75].In other words, private data, regardless of whether inferred, needs protection to minimize the harm that they can cause to people.
Wachter and Mittelstadt [83] make recommendations to improve the GDPR to account for protecting against inferences by establishing an individual's "right to reasonable inferences", i.e., a "right on how to be seen" as complementary to a "right to be forgotten." They examine whether inferences are personal data, which would require the "right to know about, rectify, delete, object to, or port" [83], except that deletion can come with prohibitions on (re)creating new inferences.This points suspiciously to adding another PII category rather than considering individuals' personal privacy, which we address in §5.3.
They also investigate organizations' facilitation of subject access requests, i.e., individuals' rights to their inferred personal data.If allowing individuals to access their inferences would reveal an organization's trade secrets or intellectual property, then Wachter and Mittelstadt recommend that organizations not be obligated to comply [83].On the other hand, Edwards and Veale [25] argue that the exception to divulging trade secrets or intellectual property should not be needed because a "right to an explanation" may not actually yield the expected result.Instead, they assert that the rights to deletion and porting are far more important.
As privacy laws evolve to incorporate inferences and individuals' rights pertaining thereto, organizations will need to adapt accordingly.As such, we highlight potential definitions and arguments discussed in our community and, as part of our recommendations in §5.4,include steps for enhancing organizations' proficiency in translating legal requirements into policy and technical implementations.

Inferences for Marketing
Although there are many business models that depend on inferences, including determining creditworthiness, insurance risk, and suitability for matching with an employer, mate, etc., perhaps the most prolific field is marketing [49].For example, Google [47], Facebook [7], Proctor & Gamble [21], and other companies have yielded huge returns on their targeted advertising investments.While identifying people's potential shopping interests-based on inferences from their metadata or activities-is not typically a malicious act, there is consternation regarding whether society condones this type of inference [73,85].After its reputation experienced a temporary dip, Target [38] and other organizations learned to conceal how well they can infer individuals' attributes.Nevertheless, organizations continue to direct their advertising at specific users because it is more effective than generalized advertising [7,73].Although targeted advertising often involves proprietary methods and data that are never intended to be released beyond an organization's authorized analysts, organizations often share inferences with their contracted third-party analysts, circumventing sharing restrictions [44].Our analysis shows how businesses depending on the NIST Privacy Framework may not be identifying and mitigating inference risks as well as expected.

Privacy Enhancing Technologies and Techniques
The NIST Privacy Framework's mapped controls recognize various PETs and privacy enhancing techniques to help in the practical work of mitigating inferences.Even so, de-identification is nontrivial, especially for mitigating inferences [50].Narayanan, Huey, and Felten [51] explore this challenge and present a "precautionary approach" to safeguarding privacy in data sets, positing that future re-identification capabilities may be unknowable.Nevertheless, technologies and techniques that place the burden of provable privacy on the data set owner may help in lasting ways for organizations desiring to protect their constituents' privacy.

Differential
Privacy to De-identify Data.Differential privacy is one of the most prevalently discussed PETs today [19].Garfinkel, Abowd, and Powazek [31] report on the U.S. Census Bureau's work in implementing differential privacy for the 2020 census, highlighting the challenges experienced.Surprisingly, their three recommendations for furthering the implementation of differential privacy-a technology-into the Bureau's work are not technical, but relate to organizational communications, bridging management and technical personnel [31].This parallels many of the cybersecurity and privacy controls in NIST Special Publication (SP) 800-53, which affect multiple levels of implementing organizations, not just technical aspects [58], like PL-8 Security and Privacy Architectures, described in §4.2.4.

Applying
Contextual Integrity to Inferences.Nissenbaum explores the concept of privacy protection in the vein of contextual integrity (CI).By definition, CI "is preserved when information flows generated by an action or practice conform to legitimate contextual informational norms; it is violated when they are breached" [59].Analysts identify each flow by the five-tuple {Subject, Sender, Recipient, Information Type, Transmission Principle}.Applying the analogy of a food chain to explain the concept of inferences, primitive data are at the bottom of the food chain but are elevated when consumed by analytic processes, becoming inferences.From inferences, analysts can form additional, higher-level inferences or predict instantiation of primitive data.For example, Target inferred that a customer was pregnant by analyzing her shopping history (primitive data up to inference) and predicted which products she might consequently purchase to support her while pregnant (inference down to probable primitive data) [38].Organizations' privacy programs should consider inferences' context from the subjects' perspectives so as to better protect individuals' privacy.Martin and Nissenbaum [46] use CI to explore the important question of privacy in public, which relates to our question because it helps conceptualize how people may support or oppose inferences because of their data source or destination.Investigating the preservation and violation of CI, they discover how people can have privacy concerns regarding publicly available records in some contexts (i.e., inappropriate flows) but not others (i.e., appropriate flows).In their study, participants express the greatest concern for privacy violations when the source of the information flows were data brokers.Participants perceive data brokers' as having increased capability to "extract knowledge that is attractive to other stakeholders in various sectors," [46], that is, to infer sensitive information.The idea that sensitive information can change hands without there being a direct, observable flow increases the threat vectors to privacy risk.Knowing this should prompt data brokers to guard better against privacy violations from the inferences that they generate.

NIST PRIVACY FRAMEWORK
The ever-increasing frequency of privacy incidents evoked people's awareness of their need for better privacy protections.Cybersecurity compliance alone proving insufficient, NIST published its Privacy Framework [56].Much of the work in protecting individuals' and organizations' privacy involves cybersecurity and many potential adopters were already using the NIST Cybersecurity Framework [53].The Privacy Framework's authors acknowledge that they based the structure on the Cybersecurity Framework [56].Beyond structure, the privacy framework's close relation to the cybersecurity framework is also evident: 57 of 100 privacy subcategories are the same as or have similar intended effect as a cybersecurity subcategory [54] and both frameworks rely on the same list of mapped controls [55,58].NIST acknowledges that checklist-like compliance requirements lend themselves readily to assessments that focus on compliance (not the framework's purpose) rather than on "achieving a positive outcome for privacy" [8].As inferences are still possible, the inability to protect privacy by checklist compliance is evident in the framework.
While the NIST Privacy Framework [56] has achieved substantive goals for some adopters [57], our gap analysis reveals that the framework's mapped controls fall short in identifying and mitigating inference risks.Our recommendations to update the framework's mapped controls to account better for inferences will help organizations mitigate this risk.

Framework Structure and Implementation
The NIST Privacy Framework' guiding method to organizationwide privacy program management relies heavily on communications throughout the organization.It aspires to enable dialogue among executives, managers, and practitioners so as to organize, assess, plan, and execute a privacy program in any organization, customized to that organization's needs [56].Because no statute or regulation requires framework adoption and its unique application per organization, using it for privacy assessment and compliance is unsuitable.Beyond compliance, many organizations have still benefited by it [57].Even so, we find that it is ineffective for helping organizations mitigate inference risks.
Organizations will be most effective in implementing any control (and really, the whole framework) as a whole-of-organization effort with cognizance of interactions between the system and component levels.Protecting privacy is a sociotechnical endeavor.Approaching it strictly from a technical standpoint will lead to failure [45].For example, SA-8 (33) spans policy, personnel training, procedures, and technical components.Thus, a one-dimensional implementation could limit the other value of effecting this control.
The Privacy Framework [56] describes organizations' risk posture in terms of profiles, a core, and implementation tiers.The idea is for organizations to iterate through the framework's core subcategories in their profile, and select applicable mapped controls for each.They can self-assess their competency with the implementation tiers.
The framework encourages organizations to generate profiles to model their privacy activities and risk tolerance: one representing their current posture and at least one target profile modeling their desired privacy risk management goals.Organizations can use these profiles as a way to self-evaluate their requirements vis-à-vis posture.Each profile maps to the core's functions, categories, and subcategories.
The core embodies the highest levels of an organization's privacy activities as functions: identify, govern, control, communicate, and protect.Each function is divided into overarching, programmaticlevel desired privacy outcomes called categories and then technicaland management-level desired privacy outcomes as subcategories.Each of the subcategories maps to various controls in NIST SP 800-53 [58] that serve as suggestions for mitigating the risk to that subcategory.
Organizations can assess their own capability to manage privacy risk using the Framework's implementation tiers, a non-compulsory progression of privacy program management proficiency.The four tiers are partial, risk informed, repeatable, and adaptive [56].
First published in 2005, NIST updated SP 800-53 [58] in 2020 to its current version, revision 5, to incorporate privacy controls, providing the framework with hundreds of security and privacy controls grouped into 20 "families, " such as "Planning (PL)." Each family consists of base controls, each of which may have more specific control enhancements, identified in parenthetical suffixes to the control number.For example, AU-16(3) means "Audit and Accountability (AU) Control #16 (Enhancement #3)."A control enhancement's name is in the format: base control name, the pipe character '|', and the enhancement name.AU-16(3)'s name is "Cross-Organizational Audit Logging | Disassociability" [58].Control names are necessarily succinct and do not, by themselves, provide sufficient descriptions of their controls.NIST maps most framework subcategories to specific controls and control enhancements [55].Because mapped controls are suggestions to help achieve the subcategory posture and some overlap in their effects, each adopting organization needs to determine which controls would contribute positively to the organization's privacy goals.Selecting a control enhancement implies selecting its base control as well.Each control may have related controls and references cited to help practitioners choose the best controls.There is a balance between applying every mapped control to avoid deciding against a control-which is simple but potentially wasteful [9]-and not selecting an applicable control.The subjectivity necessary to implement the NIST Privacy Framework well makes it unsuitable as a compliance or certification framework.

Inferences in the Framework
Guarding against inference incidents is a complex endeavor, despite the framework having only one subcategory that refers to inferences by name or conceptually.Incidentally, this is the framework's only mention of "inference," in the Control (CT) Function, Disassociated Processing (DP) Category, Privacy Subcategory #3 (P3) (CT.DP-P3).
CT.DP-P3: Data are processed to limit the formulation of inferences about individuals' behavior or activities (e.g., data processing is decentralized, distributed architectures).[56] This definition applies to operational inferences, dealing with attributes rather than re-identification.Because many initially think of re-identification as the inference risk to be mitigated, some may question whether the NIST Privacy Framework should incorporate operational inferences.Consider that page 4 of the framework explains that problematic data actions, which lead to exposure of private data, impact individuals, "singly or in groups (including at a societal level)" [56].Furthermore, the California Attorney General (AG) opines that inferences about identified individuals are personal data [5].As such, organizations must influence others' inferred perceptions to mitigate the risk of operational inferences.
Subcategory CT.DP-P3 maps directly to nine controls and control enhancements, listed in Table 1 [55].Although most of these controls focus more on technical than organizational aspects, three of the mapped controls (PL-8, PM-7, and SA-17) pertain to high-level actions that management can employ to bolster the organization's privacy risk management activities and give practitioners greater authority to execute policy.
The framework addresses re-identification in subcategory CT.DP-P2 but focuses on data processing so as to "limit the identification of individuals" [56].It does not address inferences directly.Of its eight mapped controls, three also map from CT.DP-P3 (AC-23, SA-8 (33), and SI-19) [55].The other five controls relate to re-identification not necessarily involving inferences.

ANALYSIS
Past incidents demonstrate how organizations have suffered harms from inferences and elicit opportunities for learning how to mitigate them.We select four incidents that occurred before NIST published its Privacy Framework [56], knowing that the involved organizations could not have relied on the framework.This forms a baseline from which we evaluate how well the NIST Privacy Framework could have helped in mitigating inference risk.To understand the value of the framework for mitigating inferences we assess how well it could have helped mitigate the incidents described in §4.1.In extending our qualitative analysis, we find in §4.2 that the framework is largely insufficient to help organizations mitigate inference risk in general.

Inference Incidents
We refer to our four incidents as the EdX, NYC Taxi, Strava Heat Map, and Pizza Index incidents.Our taxonomy categorizes inference incidents as either re-identification or operational incidents.We considered all of the following incidents but chose these in § §4.1.1-4.1.4to convey the essential incident structure for each inference type, enabling their generalizability to other inference incidents.The limitations of our selection process include the following: the lack of an exhaustive search for inference incidents, including those not documented publicly, and there may be edge cases not covered Table 1: Controls from NIST SP 800-53 [58] Mapped [55] to Privacy Framework [56] Subcategory CT.DP-P3 and our summaries thereof.Adapted from [58].Each control number is serialized, for example, AU-16 (3)  The New York City (NYC) Taxi and Limousine Commission (TLC) passenger re-identification incident in §4.1.2 is exemplary of the ability for attackers to combine anonymous data with publicly available information from sources beyond the affected organization's control.Here, stalkers can map-out patterns of life without ever being within proximity of the victim.The concept of combining information follows the inference re-identification pattern in the Netflix prize [52], AOL search history [3], and Massachusetts Group Insurance Commission [70] data set incidents.Our other re-identification incident, EdX in §4.1.1,involves how a future employer could use EdX course performance as a hiring discriminator.
Of our operational inferences, the Strava incident in §4.1.3demonstrates how an aggregation of decisions about protecting nonsensitive data can enable sensitive data inferences, a problem identified in the access control and privacy literature for decades [64,66].Similar inferences from smart meter activity (inferring providers' potentially unjust controlled blackouts) were possible because of an electric company's lack of protecting sensitive data [76].Similar to the Pizza Index incident in §4.1.4(in which Domino's inferred attributes of the Department of Defense (DoD) and its members), the Cambridge Analytica scandal involved inferring Facebook users' political affinity (an attribute), resulting in penalties for Facebook [27,44].Note that the victim is often not the primary organization involved in operational inferences.
4.1.1Re-identification Inferences: MIT and Harvard EdX Data.In 2014, Massachusetts Institute of Technology (MIT) and Harvard published a data set of student online course performance on their EdX platform.They adopted an anonymization approach using quasi-identifier (QI)-based de-identification to achieve k-anonymity, facially consistent with the requirements of FERPA.Nevertheless, Cohen [14] was able to re-identify anonymized records by executing downcoding and predicate singling-out (PSO) attacks on the highdimensional data set.He further showed that individuals without specialty education, experience, or tools, including "a prospective employer, a casual acquaintance, and an EdX classmate," could re-identify students' records [14].
Cohen speculated that the EdX experts de-identified the data set on one attribute after another, followed by selective row deletion.This "disjointed, " rather than atomic, de-identification process degraded the initially-established k-anonymity [14].Furthermore, the widespread technique of using QI-based de-identification on highly dimensional data sets can result in the preservation of sufficient information to enable re-identification whereas differential privacy techniques can overcome this [15].If FERPA's authors had incorporated re-identification inference threats or if the EdX data set de-identification had used an atomic rather than a disjointed process, re-identification would have been much more difficult.
4.1.2Re-identification Inferences: NYC Taxi Data.In response to a Freedom of Information Law request, the NYC TLC released its taxi data from 2013 [20].The TLC anonymized the data by hashing the PII fields, including driver license and medallion numbers.Of course, a dictionary attack quickly resolved the finite range of these fields.From an inferences perspective, though, analysts can reidentify passengers and infer patterns of life, whether favorable, benign, or unbecoming, by combining the data with information that is publicly available or otherwise known.Published examples of privacy inference re-identification include stalking celebrities and re-identifying a man who visited multiple strip clubs [77].
Beyond the shortfall in de-identifying the data-exhibiting a disparity between the law and technical compliance [28,60]-the opportunity for inferences poses a greater risk to individuals and liability to the TLC.Tockar [77] demonstrated how differential privacy techniques could have aided in preventing inference-based reidentification.However, Douriez et al. [20] demonstrated that differential privacy is insufficient after transforming the taxi data set into a moving object database of trajectories.Similar re-identification opportunities exist today [10,18,30].
4.1.3Operational Inferences: Strava Heat Map.The Strava fitness app collects shared geo-location tracking data from its users and aggregates them to generate a global heat map of workout locations, the brightness directly proportionate to the popularity [37], enabling operational inferences.Only days passed between Strava publishing its heat map in 2018 before people noticed stark workout paths in the Afghanistan wilderness.An easy inference identifies them as coalition forward operating bases, inhabited by deployed military personnel.If the bases are well-known locations, this inference reveals no new information.On the other hand, the heat map also helped reveal a covert Central Intelligence Agency (CIA) site near Camp Lemonnier in Djibouti, an inference which could be fatal to the lives of those agents.
Strava de-identified the data for its heat map by aggregation.An analyst looking only at the heat map would be hard pressed to re-identify any individual.Nevertheless, many considered some of the inferences that observers made regarding the heat map to be violations of privacy [37].Strava's position on the issue was that it only included tracked workouts in its heat map that users shared with them, corresponding with each user's privacy settings.Strava also noted that users could opt-out of sharing [37], thus blaming its users for revealing covert military and paramilitary bases.In other words, individuals' behaviors induced harm to an organization unexpectedly and that organization transferred that harm back to the individuals' inadvertent disregard for their organizations' operational security (OPSEC) policies, which might not have been specific enough to this risk.Still, perhaps the best place to limit the spread of sensitive information is at its source.Many communities use OPSEC principles but, incidentally, the U.S. Government (USG), especially the DoD, has the most explicit program policy publicly available [67].
4.1.4Operational Inferences: Pentagon Pizza Orders.In 1998, the Washington Post interviewed the owner of 59 Domino's franchises in the DC area.He had established the "Washington Pizza Index," the number of pizza orders directly correlated with the level of USG activity [69], enabling operational inferences.Furthermore, based on the area of the franchise, he could conjecture whether the pizza was headed for the White House, congressional offices, or the Pentagon.Although this incident is relatively old, we select it because of its straightforward analysis and availability of relevant public information.As of the date of the interview, the record for the greatest number of pizzas ordered was by people working in the Pentagon during the Persian Gulf War [69].
During a war, one might expect the DoD to be working longer hours so an unusually high volume of pizza orders may be normal.However, extra hours devoted to planning in the days leading up to a major operation may reveal too much, potentially harming the success of the operation by signaling its imminence to adversaries.Likewise, a corporate build-up working toward the launch of a new product could give rise to similar inference opportunities.Also, the signals are not limited to food orders as parking lot fullness, employee shuttle activity levels, etc. can likewise provide opportunities for unwanted inferences.These operational inferences are a matter of OPSEC.As with the Strava incident, it may be best to limit the spread of sensitive information is at its source.
4.1.5Lessons Observed.The themes that would have helped in better mitigating re-identification inferences were largely technical and those that would have helped in better mitigating operational inferences were largely sociotechnical.To identify and mitigate inference risks, one must understand them.In aggregate, these observations indicate the unlikelihood that the framework's authors considered such inference risks.The following contributed to the inference re-identification of people in de-identified data: • retaining too many QIs (EdX, Netflix, MA Health) • a lack of ability to translate legal requirements unambiguously into technical implementations (EdX, NYC) • disjointed, rather than atomic, de-identification (EdX) • lack of understanding de-identification methods (NYC, AOL) Sociotechnical aspects, including human behavior and configuration decisions were the primary contributors to enabling operational inferences.Note that the harmed organizations (coalition forces, CIA, and DoD) in these operational inference incidents were involuntarily harmed by the behavior of their members (deployed and headquarters forces) and third-party organizations (Strava, Domino's, and the Washington Post).Both were unexpected OPSEC failures.The root of these observations is a lack of thought towards the potential consequences of actions, including: • risks of sensitive inferences from personal data were not identified (all operational inferences) • interaction of design choices with end users' defaults concealed risks (all operational inferences) • irregular behavior in aggregate spurred actions to reveal sensitive information (Strava, Pizza, Smart Meter)

Framework Effectiveness via Controls Mapped for Inferences
This section constitutes our assessment of the effectiveness of the NIST Privacy Framework subcategory CT.DP-P3 and SP 800-53 controls to which it maps for mitigating the risks involved in the inferences described in §4.1.Table 1 lists these controls and their enhancements.For each control, we assess whether it or its enhancements apply and would have helped perceive and mitigate risk.By "applicable," we mean that if the control had been implemented properly, it would have meaningfully contributed to mitigating risk in that incident.If the control is applicable, we estimate its degree or likelihood of implementation.If we find that it was partially or fully implemented or likely implemented, we analyze why it fell short in mitigating the inference.We provide a summary of our findings in Table 2.For generating its heat map, because Strava executed an aggregate query on its database, none of the PETs suggested by this control would have helped prevent the operational inferences.Arguably, however, aggregation clearly abated data mining opportunities in the heat map, but operational inferences were still possible.Similarly, for the Pizza Index, management deliberately disclosed the inference publicly during a newspaper interview [69].This control is therefore not applicable for operational inferences.
AC-23 has no control enhancements.

AU-16
(3): Cross-Organizational Audit Logging | Disassociability.AU-16 focuses primarily on the protection of users' identities when coordinating auditing with external organizations, like linking tables through blinded keys to disassociate logs from their data.
Although the principles of this control's "disassociability" enhancement could apply, despite no organization implementing it, auditing does not specifically pertain to the incidents in §4.1.Similar to AC-23, privacy-enhancing cryptography-any of several schemes that enables data processing without revealing the data [1,6]-may have helped mitigate re-identification inference incidents.Specifically, private information retrieval [72] techniques may have been able to help in both the EdX and taxi incidents, if, for example, there was a query interface for researchers in lieu of making the whole "anonymized" data sets public.This control would not have helped mitigate the operational inferences so it is not applicable to the operational inferences.

IA-8(6): Identification and Authentication (Non-Organizational
Users) | Disassociability.IA-8 focuses primarily on organizations' external users' interactions with an organization's information systems.Because each of the incidents involved publicly released data, non-organizational users do not authenticate to access the data.This base control does not apply; though, the principles of IA-8(6) could still apply.
Interestingly, each of the organizations in the inference incidents partially implemented a principle of this control, specifically, that they sought to "make identity attributes less visible to transmitting parties" [58], but their de-identification attempts were ineffective at or not applicable to mitigating the inferences.This is more evidence demonstrating the ineffectiveness of protecting privacy only by PII categories.In the EdX data, the QIs remaining after de-identification contributed to analysts' re-identification inference capability.For the taxi data set, the ability to combine publicly available information with released de-identified data enabled passenger re-identification.Strava's and the (perhaps inadvertent) Pizza Index's disassociation of individuals' identities through aggregation-like striving for k-anonymity with sufficient k-may be an application of a principle of this control, but was not applicable to and could not have mitigated the subsequent operational inferences.

PL-8:
Security and Privacy Architectures.The NIST SP 800-53 describes security and privacy architectures as system-level manifests comprised of three components: requirements for protecting organizational information and PII, a description of how they support the enterprise architecture (explained in control PM-7 in §4.2.5), and assumptions regarding the system and its dependencies.
Here, the presence of an architecture proved insufficient to mitigate the inferences, indicating organizational difficulty with translating intent into policies and policies into technical implementations.
For re-identification inferences, the data controller's architecture protects the data.While a private organization's system-level security and privacy architecture may not be publicly available, we found that EdX had a privacy policy dated February 6, 2012 that was still in effect when the 2013 academic year began [26], the year of data comprising the EdX data set.The policy comprehensively covered the entire student experience, including that private data may be made available publicly, "to the extent permitted by FERPA" [26].As such, EdX likely had some type of system-level security and privacy architecture from which they derived this privacy policy, indicating apparent compliance, yet re-identification inferences were still possible.Similarly, NYC law governed the TLC's handling of PII.Specifically, chapter 5 of title 10 of the administrative code [12] defines PII and lists requirements for publicly disclosing that a security breach occurred; however, in 2013, there was no statutory requirement for agencies of the city to have privacy policies or architectures nor is there public evidence that the TLC had them.Therefore, the TLC likely had some level of awareness of its need to protect individuals' privacy but likely did not have a security and privacy architecture.In this analysis, it is clear that there was a gap in translation.Without additional internal details from these organizations, we speculate that the breakdown occurred in translating intent into policies, which would indicate the polices were insufficient, or translating policies into technical implementation.
For operational inferences, PL-8 differs from the other controls described thus far because the victim organization's architecture protects the data, not so much the data controller's.In the cases of both Strava and the Pizza Index, note that the U.S. DoD-generalized here from the "Coalition Forces" that would have been in Afghanistan in 2018-was the affected party, not Strava, the data controller, or any individual.The DoD definitely has a system-level security architecture [74] and privacy program [63], but neither has a role in preventing the inference of sensitive information, which the OPSEC program [78] governs.Apparently, the DoD OPSEC program was insufficient in translating these types of operational inference risks for DoD employees to take into account.
Although subcategory CT.DP-P3 maps to the PL-8 base control, framework adopters may choose to implement any of PL-8's two control enhancements: (1) Defense in Depth.This enhancement focuses on applying a defense-in-depth posture [34] while developing and administering security and privacy architectures.Because the EdX and NYC Taxi data sets were made public, i.e., outside an organization's control, layering additional defenses would not have mitigated these inferences.Defense-in-depth could have contributed to informing an OPSEC program, however, in the Strava and Pizza Index incidents via additional safeguards to limit the signals coming from these DoD sites.
(2) Supplier Diversity.This enhancement focuses primarily on addressing the potential issues of a monoculture [86] and would not have applied in these incidents because a monoculture did not contribute to the incidents.4.2.5 PM-7: Enterprise Architecture.In contrast with PL-8, PM-7 is a higher-level control that encompasses security and privacyamong a host of other organizational concerns-integrating systems that might each have their own security and privacy architecture.In these incidents, the presence of an architecture proved insufficient to mitigate inference risk, indicating organizational difficulty with translating policies into technical implementations.Similar to a system-level security and privacy architecture, the existence of an organization's enterprise architecture may not be public knowledge.EdX did not indicate on its website that it had an enterprise architecture, but EdX probably had some type of overarching business model that would meet the spirit of this control.NYC's website included a 2016 job posting that enabled us to infer that the city has an enterprise architecture [13], though being a big and disparate organization, this may be a miscategorization.Even so, without access to these architectures, we are unable to determine how well they account for privacy risks.In contrast, the DoD-the generalized victim of the operational inference incidentshas a publicly available enterprise architecture [11]; however, it is not actually a specific architecture but a framework for developing architectures within the DoD [17].Nevertheless, it mentions risk management and guiding security and information assurance requirements but not privacy [11] so we gauge this as a partial implementation.Without additional details from these organizations, we speculate whether the breakdown occurred in translating intent into policies or translating policies into technical implementation.
"Offloading" is PM-7's sole control enhancement.It recommends that organizations move all non-essential supporting functions to separate the functions from critical systems and data, applying a "least authority" principle [68].There is also an implied assumption that the component or contracted organizations performing these functions would be experts therein and thus have better quality and more efficient security and privacy safeguards specific to those functions, thus decreasing the likelihood of privacy violation.While the degree to which EdX offloads non-essential supporting functions is unclear, governments typically delegate or contract many functions so it is likely that both NYC and DoD offloaded functions to some degree.For all of the above, there is no indication that offloading contributed to the incidents.4.2.6 SA-8(33): Security and Privacy Engineering Principles | Minimization.SA-8 and its enhancements serve as an avenue through which organizations can incorporate security and privacy engineering principles.It also applies a lens by which the framework reminds its adopters to operationalize these principles at all stages of a system's life cycle.SA-8 (33) focuses on minimizing PII, a concept we discuss further in §5.3.
The EdX case demonstrates how protecting privacy via confidentiality fails.It collected students' level of education, gender, and year of birth, each as optional fields and inferred a country of residence from the user's public Internet Protocol (IP) address [14].According to its privacy policy [26], EdX uses these data for at least nine different purposes, of which nearly all relate to some form of data analysis.Being optional, none of these PII fields matter to enrolling in and completing a course, the core function of the EdX platform.As such, EdX did not apply this security control.These extra PII fields enabled re-identification inferences by allowing the combination of QIs [14].
The NYC TLC partially implemented SA-8 (33), its data containing unnecessary PII fields for drivers but not passengers.These fields enabled analysts to learn the annual income of re-identified taxi drivers [20]; however, minimizing passenger PII proved ineffectual at preventing re-identification because of the ability to combine publicly available information with precise time and location data in the data set.Interestingly, unlike normal taxi service, mobile app based ride share services (e.g., Lyft and Uber) require passenger identification to support payment and "other purposes." For Strava, minimization via aggregation did not prevent operational inferences [37].Even without PII present in or contributing to the heat map, operational inferences were still possible because of the information publicly available related to current events [36].On the other hand, there is no indication that Domino's stored any of their customers' PII beyond transaction completion; such PII would not have been applicable to minimize to mitigate the operational inferences anyway.But for the other three incidents, the ability to combine publicly available information with the de-identified data sets enabled these inferences.

SA-17: Developer Security and Privacy Architecture and Design.
This control guides adopting organizations in setting requirements for external system developers.Since we have no indication that any of the incidents in §4.1 employed or contracted external developers, this control in not applicable for these cases.Instead, control PL-8, analyzed in §4.2.4,applies to internal developers.4.2.8SC-2(2): Separation of System and User Functionality | Disassociability.SC-2 focuses primarily on separating user-level and system-or privileged-level functionality.Given the public nature of the data sets and inferences in the incidents, this base control would not have applied.The principles of SC-2(2) could still apply, though.For the one organization that appeared to have implemented this control enhancement (i.e., EdX), remaining quasi-identifiers (QIs) were problematic for preventing inferences.
For re-identification inferences, EdX's privacy policy [26] clearly communicated that it tracks users' interactions with the website; however, course enrollments and the number of forum posts per course-both QIs-were the only interaction data included in the data set [14].In other words, we suspect that EdX did not normally separate its users' interaction data (so as to not hinder internal data analysis) but did so partially to release a de-identified data set.The NYC taxi data controllers did not implement this control.To arrive at this conclusion, we considered that drivers' system interaction state information to be the association of the specific driver with each route completed.
For operational inferences, Strava's data aggregation to generate the heat map removed all users' state information from the data, but it had no effect on revealing clandestine locations.Thus, SC-2(2) is not applicable to the Strava operational inference.If the DoD OPSEC Program alerted Pentagon employees to the risk, they could have omitted information from their pizza orders (i.e., delivery address) by taking-out or dining-in, especially if at a further location than the closest Domino's to the Pentagon.Following this practice, Domino's franchise owner would have had to incur a greater burden to associate the orders with the Pentagon.4.2.9SI-19: De-Identification.EdX, NYC, and Strava each partially implemented SI-19 but, as previously explained, de-identified data insufficiently to prevent inferences.On the other hand, the Domino's franchises' owner de-identified his customers but not their association with the USG, which, in isolation, is not necessarily a sensitive correlation.Arguably, though, exposing this operational inference was whole purpose of his interview with the Washington Post [69].
Although subcategory CT.DP-P3 maps to the SI-19 base control, framework adopters may choose to implement any of SI-19's eight control enhancements: (1) Collection.De-identify the data a priori by limiting the collection only to the necessary fields.We analyze this concept of minimization under control SA-8(33) in §4.2.6.
(2) Archiving.To protect data stored long-term, this control enhancement urges de-identification of data before archiving them such that private data-whose intended utility was temporal-do not need to and will not be archived.Data archiving did not play a role in the presented incidents because the goal was data release.
(3) Release.De-identifying data before releasing it outside of the organization is the fundamental goal; however, the presented incidents demonstrated that it is a challenging endeavor.See also our discussion of disassociability and control IA-8(6) in §4.2.3.
(4) Removal, Masking, Encryption, Hashing, or Replacement of Direct Identifiers.In the EdX data set, fields that could serve as direct identifiers were either removed or replaced prior to its public release [14] but enough information remained to comprise QIs.The NYC TLC hashed direct identifiers, but did not employ a key or salt to the hash as the control recommends, which resulted in the reidentification of taxi drivers [77].Strava removed direct identifiers through aggregation.We analyze the Pizza Index de-identification from the Domino's perspective with this base control and from the Pentagon perspective with control SC-2(2) in §4.2.8.
(5) Statistical Disclosure Control.This control enhancement applies to situations in which multiple versions of a data set enable analysts to infer specific attributes because of changes from one version to the next.For example, an analyst could capture images of the Strava heat map at various times to create a longitudinal data set and make inferences.For the other incidents, this control is not applicable because only one version of each data set was released.
(6) Differential Privacy.This is one of the most promising deidentification PETs to mitigate against re-identification inferences.We briefly present some of the foremost challenges to implementing differential privacy in §2.3.1 and analyze the potential success for differential privacy with control AC-23 in §4.2.1.
(7) Validated Algorithms and Software.There are many privacy preserving technologies available that help guard against inferences in specific use cases.For example, First, Hasan and Fritz [35] devise a method that protects students from having their genders inferred from their online coursework interactions.Second, Zhang et al. [90] develop a method to anonymize data generated by human-wearable devices, inhibiting analysts' ability to infer age, gender, height, and weight while preserving the data well enough to maintain the ability to categorize activities, e.g., walking and running.Of the presented incidents, none employed a validated de-identification algorithm like either of these examples.
(8) Motivated Intruder.Analogous to a penetration test for an information system, which is covered by control CA-8 Penetration Testing, this control enhancement involves attempting to re-identify de-identified data.There is no indication that any party in the presented incidents applied this control.

RECOMMENDATIONS
Based on our analysis of NIST's Privacy Framework helping organizations mitigate privacy inferences, summarized in Table 2, we recommend that privacy programs incorporate the following inference-related components.We also propose how to incorporate these components into the framework [56].To better mitigate an attacker's ability to combine anonymized data with other information, we seek to increase the coverage of inferences in mapped controls in § §5.1-5.2.The aim is to perceive risk and engage controls to offset it by increasing organizations' awareness of inference risks and ability to potential exposures.First, we examine existing inference-related controls that are not mapped to subcategory CT.DP-P3, described in §3.2.Then, we recommend augmenting specific controls with inference-relevant verbiage and references and then mapping those controls to CT.DP-P3.In §5.3, we propose updating controls pertaining to PII to account for the obsolescence of solely safeguarding PII categories to protect privacy.Finally, we discuss challenges associated with translating legal requirements and policy into implementable technical solutions in §5.4.Although we recommend only small changes to the framework's controls and their mappings to subcategory CT.DP-P3, making inference risk much more visible will improve organizations' risk awareness and ability to mitigate problems. 2

Inference-Related Controls not Mapped
In the course of our analysis, we found additional existing inferencerelated controls in the NIST SP 800-53 [58] that were not mapped to CT.DP-P3 [55].Specifically, we found two controls containing words derived from "infer" and four other inference-related controls, none of them mapped to the inferences subcategory.
Of the two controls containing derivatives of "infer," we analyzed SI-19(5) above in §4.2.9.The other control-PL-4(1) "Rules of Behavior | Social Media and External Site/Application Usage Restrictions"-would have helped mitigate the Strava operational inference.It relates to social media users' interactions pertaining to the organization's information.Had deployed coalition forces considered their location to be organizational information-why else would they be in a foreign combat zone?-then they could not, by policy, have shared their tracked workouts without violating this control [37].An OPSEC risk analysis could determine whether this information was okay to release.PL-4(1) is not mapped to any NIST Privacy Framework subcategory [55] whatsoever but should be mapped to CT.DP-P3.
Although numerous controls in the NIST SP 800-53 [58] can help mitigate violations of privacy in general, the following four additional controls would help organizations mitigate inferences.
• AC-4(9) Information Flow Enforcement | Human Reviews.Sometimes humans can predict and mitigate the potential for inferences, as was the U.S. Census Bureau's standard practice prior to adopting differential privacy [31].In the NYC taxi incident, humans with the requisite expertise possibly could have foreseen the re-identification of taxi drivers and the ability to infer passengers' identities.We recommend mapping each of the above four controls to NIST Privacy Framework subcategory CT.DP-P3.

Other Controls for Addressing Inferences
In addition to inference-related controls that were not mapped to CT.DP-P3, we found controls facially unrelated to inferences but that actually have an inference relevance.We recommend augmenting these controls-pertaining to organizational literacy of inferences, privacy violation disclosures, and organizations' privacy policies-and creating one new control to address inferences.
To mitigate an inference-based compromise, one must first recognize and understand the threat.While there are many proprietary methods to develop threat models, Linking, Identifying, Nonrepudiation, Detecting, Data Disclosure, Unawareness, Non-compliance (LINDDUN) specifically applies to privacy and, more recently, accounts for inferences [87].To achieve this in the framework, we recommend bolstering control PM-16 "Threat Awareness Program" to incorporate inference-related threats.From the discussion section of PM-16's description, we this modification in which we add the italicized portion of this quote: Because of the constantly changing and increasing sophistication of adversaries, especially the advanced persistent threat (APT), it may be more likely that adversaries can successfully breach or compromise organizational systems or infer private organizational information.[58] Then, PM-16 will directly support mapping CT.DP-P3 to a new control enhancement to base control AT-2 "Literacy Training and Awareness," which currently has six control enhancements [58].This seventh control enhancement, drafted in Figure 1, would be all about inferences, including how they can occur and have occurred and how to mitigate their potential to damage the organization.Increasing organizations' literacy on inferences naturally leads to a more effective execution of other controls, including AC-1 "Policy and Procedures, " PM-28 "Risk Framing, " and RA-8 "Privacy Impact Assessments, " to which multiple privacy framework subcategories map.In addition, these organizations are likely to develop and maintain more effective architectural controls, PL-8 and PM-7, to which multiple subcategories also map, including CT.DP-P3 [55].
Inference training is especially crucial for the personnel involved in executing controls AC-22 "Publicly Accessible Content" and AC-24 "Access Control Decisions" because of their role in publicizing data from which threat actors could make damaging inferences.
In addition to increasing awareness of inferences in general, organizations' disclosure procedures, which relate to incident response, for sensitive information also need modification.Therefore, we also recommend expanding the following disclosure-related controls with our italicized additions to incorporate inferences.ment is an analysis of how personally identifiable information and sensitive organizational information is are handled to ensure that handling conforms to applicable privacy requirements, determine the privacy risks associated with an information system or activity, and evaluate ways to mitigate privacy risks." [58] Other subcategories of the NIST Privacy Framework map to these controls [55], which contribute to the organization-wide privacy preserving effort advocated by the framework itself [56].
Organizations should also incorporate inference concepts into their privacy program documentation.For example, an organization can increase its privacy transparency by publishing a privacy policy that clearly addresses the sources used in making inferences and communicates, as suggested by Wachter and Mittelstadt [83]: • Its intent to infer sensitive data from non-sensitive data.
• Its intended purpose for the inference and how the source data are relevant, which should align ethically with their use, e.g., not inferring a predilection to gambling for targeted advertising of gambling opportunities.• The assumed and proven accuracy and reliability of its inference methods.
Control PM-20(1) "Dissemination of Privacy Program Information | Privacy Policies on Websites, Applications, and Digital Services" addresses this well, stipulating that organizations' privacy policies are relatively easy to understand and enable informed consent [58]."Notice and consent" is the prevalent legal standard [85] but opponents posit that transparency is actually more important.Transparency rather than compelling consent to privacy policies would lead to "inaccessible [privacy-exposing application programming interfaces (APIs)], coding to prevent scraping, automatic deletion of data, and blocking of cookies" [85].

PII Protection Requirements Are Obsolete
Over a decade ago, researchers established that protecting privacy by protecting only information in PII categories is ineffective [64,71].Cohen further demonstrates the insufficiency of kanonymity and other QI-based anonymization techniques [15] on the EdX data set [14].QI-based de-identification relies on "unstated assumptions on the data distribution" [14] and likewise, is insufficient to mitigate re-identification, even when employed in FERPAcompliant anonymization.The gap between legal requirements and technical implementations enables workarounds-both unintentional and intentional-to the de-identification process [15,28,60].
Incorporating the concept of inference generation from lingering QIs into controls like SI-19(6) "Differential Privacy" would prompt organizations to mitigate this inferences vulnerability.Protecting PII categories is still important; however, we found that multiple controls' dependence on the concept of using PII and/or QIs as sole safeguard(s) against privacy violations limited their effectiveness.As such, we recommend updating all controls in the NIST SP 800-53 [58] that depend on identifying PII categories in determining what data to safeguard.These updates are in the same spirit as Holder's recommendations to update the GDPR and CCPA language for inferences [40, pp. 1352-1353].These would also improve framework subcategory CT.DP-P2, described in §3.2.
The NIST SP 800-53 lists 83 controls in that mention PII, listed here in Figure 2 [58].Propitiously, we found 4 (5%) that already have wording sufficient to incorporate inferences.In addition, 17 (20%) controls mention PII simply as an example of sensitive information to organizations and associated individuals.We do not recommend any change to these controls to account for inferences because of their indication that PII categories are not the sole privacy concern.Similarly, we found that 22 (27%) controls are adequate as they are because they describe safeguarding PII-which is still importantwithout depending directly on identifying and protecting its categorized data as the sole method of privacy protection.Finally, 40 (48%) controls need modification to incorporate inferences.In these controls, appending "and other sensitive information that could lead to re-identification of individuals" to instances of "personally identifiable information" would be minimally sufficient.

Translating Legal Requirements into Technical Implementations
As a society, we generally value privacy; however, there are many concepts of privacy, including what it is and entails, and diverse opinions regarding sharing, trust, obfuscation, decisions to control one's privacy, and protection from those that might intrude into or exploit one's private matters [62,85].In an effort to protect people's privacy, legislatures make laws requiring organizations protect it.Legislatures often define principles and standards in laws rather than defining clear rules; whereas the former is often not computable in logic standards, computers apply rules very well [28,61].This distinction matters because organizations need to translate legal requirements into policies for the laws to have their intended effect.In turn, organizations must translate those policies into technical implementations.Strategic ambiguity and delegation of detail in laws can also be good, though, because of the afforded flexibility.Another issue with legal and policy requirements that can arise with NIST standards and guides is checklist compliance.Many of the controls in the NIST SP 800-53 [58] mention the term "law" or "legal" but none explicitly describe translating law to the organizational policy or technical levels.Most uses of "law" refer to complying with "applicable laws" or referencing "law enforcement."Most instances of "legal" recommend seeking legal counsel for matters addressed by a control [58].In consideration of these usages, we do not recommend any modification to the NIST Privacy Framework or controls to improve organizations' ability to translate law into policy and technical implementations.
Instead, legislatures and organizations can take actions to help overcome translation challenges, which could reduce litigation in the judiciary.Legislatures can work with scientists to establish a formal, mathematically provable definition of privacy or just implementable requirements [22,60,61].People with the International Association of Privacy Professionals (IAPP)'s Certified Information Privacy Professional (CIPP) certification-which exists for the purpose of "putting privacy law and policy to work" [41]-may benefit both legislatures and organizations.Although Pasquale [65] envisions a future in which artificial intelligence (AI) would replace much of the tedious labor in the legal profession, we are not there yet.For the present, improvisational narrative and discourse to address cases not previously examined still need human involvement because there is no precedent in the training data [65].As such, systems at all levels (legislative, policy, technical, and judicial) that implement legal logic standards must be adaptable to new laws or else risk obsolescence in a "technological-legal lock-in" that stifles legal evolution [16] and, to a lesser extent, industry.
Checklist compliance-completing a checklist for the sake of compliance and believing that the subject is now secure-can arise easily within organizations striving to meet legal or policy compliance requirements.In analyzing controls in § §4.2.2, 4.2.3, 4.2.5, 4.2.6, and 4.2.8, we described how applying the principle of a control is most effective in contrast to its verbatim implementation, which is like checklist compliance.Checklists have their place and, when developed and implemented well (a challenging endeavor), can be highly useful for guiding professionals through most any complex process [32].However, even NIST recognizes that compliance requirements lend themselves readily to assessments that focus on compliance rather than on "achieving a positive outcome for privacy" [8].For example, the EdX data were treated per FERPA de-identification requirements but re-identification was still possible [14].An appropriate checklist for the NIST Privacy Framework might include overarching goals, like communicate within the organization, identify risk, mitigate risk, and repeat continuously.Such a checklist could serve as a risk communication tool.Privacy impact assessments (PIAs) provide another checklist compliance opportunity as organizations strive to comply with applicable laws and government regulations; they help protect organizations by their mere existence as a paper trail but, by themselves, do not provide security or privacy [85].In summary, we caution organizations to avoid checklist compliance for privacy.

CONCLUSION
As organizations face growing complexity in protecting privacy and risk in using sensitive data, resources to help organizations identify the nature and scope of their privacy risk also need to evolve.The practical need for complying with privacy laws and meeting people's expectations for organizations to protect their privacy led us to ask how we could improve the NIST Privacy Framework [56], a prominent guide to implementing organizational privacy programs.We focus on mitigating inference-based privacy violations, taxonomically defining inferences as re-identification or operational inferences.To determine how to improve organizations' defenses against privacy inferences and the NIST Privacy Framework, we apply the framework to past incidents of re-identification and operational inferences.This analysis revealed shortcomings in the framework's capacity to identify inference risk and recommend offsetting mitigations.Our recommendations include increasing organizations' awareness of inferences by expanding the mapping of inference-related controls and augmenting selected other controls to account for inferences, updating controls that depend on protecting specific PII categories or quasi-identifiers (QIs) as sufficient for protecting privacy, and improving the ability for organizations to translate legal requirements into policy and policy into technical implementations.Further analyses of NIST Privacy Framework effectiveness would contribute to this field of research, especially if conducted by framework-implementing organizations on privacy incidents or near-incidents involving inferences.

Figure 2 :
Figure2: Controls Mentioning "Personally Identifiable Information" in the NIST SP 800-53[58] means "Audit and Accountability (AU) Protection #16 (Enhancement #3)" and its name has the format "Base Control Name | Control Enhancement Name." Remove elements of private data from data sets and evaluate re-identification viability.by this taxonomy.Nevertheless, our selections exemplify and are representative of privacy inference incidents in general.

Table 2 :
[88]77]ry of Our Analysis of the Effectiveness of NIST SP 800-53 Controls[58]Mapped[55]to Privacy Framework[56]Subcategory CT.DP-P3 Against the Incidents Presented in §4.1.Each cell's symbol in the middle four columns represents the degree to which the control was implemented.The last column indicates the lesson observed.Data Mining Protection.As indicated in Table2, none of the organizations implemented this control.Applying differential privacy well would have worked to safeguard both the EdX and taxi re-identification inference incidents; Cohen and Tockar both claimed as much[14,77].Limiting database queries or enabling accountability notifications for atypical database queries or accesses would not have helped because the data sets were made public.Applying homomorphic encryption[88], a PET suggested by this control, would not have helped either because the plaintext data were public.Publishing encrypted data would have had no value.

•
AU-13 Monitoring for Information Disclosure."Examples of organizational information include personally identifiable information retained by the organization or proprietary information generated by or inferred about the organization." • PM-21 Accounting of Disclosures."Develop and maintain an accurate accounting of disclosures of personally identifiable information and incidents of re-identified individuals in purportedly anonymized organizational data, including. . ." • RA-8 Privacy Impact Assessments."A privacy impact assess-