User-Controlled Privacy: Taint, Track, and Control

We develop the first language-based, Privacy by Design approach that provides support for a rich class of privacy policies. The policies are user-defined, rather than programmer-defined, and support fine-grained information flow restrictions (considering individual application inputs and outputs) with temporal constraints. Our approach, called Taint, Track, and Control (TTC), combines dynamic information-flow control and runtime verification to enforce these policies in the presence of malicious users and developers. We provide TTC’s semantics and proofs of correct enforcement, formalized in the Isabelle/HOL proof assistant. We also implement our approach in a web development framework and port three baseline applications from previous work into this framework for evaluation. Overall, we find that our approach enforces expressive user-defined privacy policies with practical runtime performance.


INTRODUCTION
Motivation and Problem Statement.Over the last decade, new legislation, such as the European Union's General Data Protection Regulation (GDPR), has paved the way for a more extensive recognition of individuals' right to privacy.While, from a legal viewpoint, the GDPR has set a de-facto global standard [68], the level of effective privacy enjoyed by users of online applications remains unsatisfactory.Amid reports of widespread non-compliance, the gap between expectations and real-world practice has only become more evident.
Privacy by Design (PbD), the principle of "design[ing] and develop [ing] products with a built-in ability to demonstrate compliance" [74], offers a promising path to improving the status quo.With Privacy by Design, privacy policies are specified that define who can use which data, when, and for which purposes.Incentives for practitioners to adopt PbD include developing or maintaining a competitive advantage in privacy-critical markets [51] and avoiding the costs resulting from major privacy breaches and noncompliance [42,66].Moreover, Article 25 GDPR makes it obligatory for data controllers to use 'state-of-the-art' techniques to implement data protection principles [14].Whenever technologies for ensuring compliance are available "at a reasonable price, " controllers are legally obliged to use these (or similar) technologies [55].
Privacy policies can be specified by developers (e.g., at the code level) or end-users (e.g., through a policy management interface).An example of a user-specified policy that addresses the key GDPR concern of purpose limitation is policy  1 in Figure 1, which states that Alice's personal data shall never be used for marketing.Assuming that Alice's personal data is only input by herself, a conservative Policy Textual description  0 "Alice's personal data can be used for any purpose"  1 "Alice's personal data shall never be used for marketing purposes"  2 "Alice's posts in the Minitwit app shall never be used for marketing purposes; after one week, it shall only be used for service purposes; for analytics purposes, it can only be sent to trustedanalytics.com, but not to any other party"  3 "Alice's personal data shall only be shown to herself" Figure 1: Example privacy policies for Alice interpretation of this policy is that Alice's inputs and any data derived from those inputs shall never be used for marketing purposes.Privacy policies can be enforced using a language-based approach [74]: developers write applications in a specially designed programming language that guarantees the enforcement of policies either statically or dynamically.Language-based PbD can in turn be implemented as an extension of traditional information-flow control (IFC) [53,74], leveraging the similarity between the two approaches: both PbD and IFC must restrict data flows and guarantee that, by default, the protection enjoyed by data items extends to any other data derived from them.But despite these similarities, existing IFC designs cannot be directly reused for PbD, since the assumptions made by these designs do not match the requirements of privacy laws.In particular, the following requirements stand out: [R1] User-specified policies.Privacy policies must be specified by individual users, rather than developers.In the GDPR, this reflects that the allowed usage of data depend on end-user consent, which must be freely given [1,Art. 7].
[R2] Per-input policies.The policy language must distinguish between different user inputs to support fine-grained restrictions on data usage, e.g., to provide the additional protection enjoyed by special data categories [1,Art. 9].Users must be able to define different restrictions for each of their inputs.[R3] Time-dependent policies.The policy language must allow users to define restrictions that apply only for a specific time period.For example, the GDPR has the notion of a storage period [1,Rec. 45] during which data can be used and retained.
Policy  2 in Figure 1 exemplifies these requirements: user Alice demands that her posts in a microblogging platform are never used for marketing purposes, that only a single trusted third-party can be sent information about the posts for analytics purposes, and that after a week, the messages can only be used for service purposes.These limitations extend to all data derived from her posts.The policy is user-specific (only concerning Alice's inputs), per-input (only applying to Alice's posts), and time-dependent (as additional constraints apply after a week).
State of the Art.Requirements [R1-3] are out of the scope of most previous work on IFC.Existing approaches usually provide little or no support for time-dependent policies [R3], and define restrictions Grayed-out components are inactive in the corresponding phase.The operations in Track take place in any order.Blue: User-provided.Red: Developer-provided.

Users
Wrapper Enforcer on information flow in terms of the data model [5,18,21,25,45,64,77,78], rather than single inputs [R2].Moreover, and critically, their focus is on developer-specified, rather than user-specified, policies [R1].In some works [21,45,77,78], developers can build custom user interfaces where users can select from different predefined policies, but this requires additional effort and does not provide users with formal guarantees at the level of individual inputs.
To enforce user-defined policies [R1], dynamic approaches appear most promising.Indeed, first-order temporal policies [R2-3] are well-supported by those runtime enforcement (RE) techniques developed within the runtime verification community [56,69].These techniques consider timestamps event traces modeling executions of an application at runtime (e.g., events represent specific actions, like the applications' inputs and outputs).The policy enforced is defined as a set of such traces, i.e., a trace property.To ensure compliance, some of the application's actions (typically, outputs) can be suppressed.Concretely, a Policy Enforcement Point (PEP) is informed before each such action is performed; the PEP reports candidate events to a Policy Decision Point (PDP) that then check the compliance of the resulting trace with the policy.Based on the PDP's verdict, the PEP either allows or suppresses the action.
To cover [R1-3], a temporal first-order policy language [7,52] can be chosen, and the overall policy to be enforced can be expressed as the conjunction of individual policies specified by end-users.
Our solution.In this paper, we develop the first language-based PbD approach that combines IFC and trace-based runtime enforcement to satisfy the requirements [R1 -3].We support a broad class of applications, which we call online applications, that run and continuously interact with users by collecting and processing their data (Section 2).This class includes (but is not restricted to) web applications, services, or other applications with an event-driven behavior.Our approach, called Taint, Track, and Control (TTC), brings together (i) an enforcer for metric first-order temporal policies [56] serving as the PDP, (ii) an interpreter for a programming language implementing a variant of dynamic IFC, (iii) a wrapper that controls inputs and outputs and serves as the PEP, and (iv) persistent storage, such as a database or file system.Once the enforcer is loaded with a policy and the interpreter with a program defining the application's functionality, user inputs are handled in three phases (see Figure 2): Taint: The wrapper receives a new user input and tags it with a fresh unique taint (UT) (1).The UT is a unique bitstring identifying one specific input during its lifetime.The wrapper informs the enforcer about the new UT by emitting an event (2).The input and UT are sent to the interpreter (3).Track: The semantics implemented by the interpreter ensures that at all times, every data item in memory is tagged with the UTs of all inputs that have influenced its current value.
The interpreter interacts with persistent storage, to which UTs are propagated.Control: Any output that the interpreter attempts to perform must go through the wrapper (1).Upon receiving an output request, the wrapper generates events capturing the attempted output and all user inputs that influenced it.To identify the latter, it uses the UTs of the candidate output.These events are sent to the enforcer (2), which can accept or reject the output (3).If its verdict is positive, the wrapper emits the output (4).
TTC guarantees the enforcement of users' privacy policies against an adversary that can impersonate other users or tamper with application code.The latter covers the realistic threat scenario of a malicious developer with no direct access to the application's production data.In particular, we make the following contributions: (1) We introduce information-flow traces capturing inputs, outputs, and interference between them, and we specify privacy policies covering [R1-3] using such traces (Section 3).(2) We introduce TTC, the first language-based PbD approach for enforcing such privacy policies in online applications.We provide formal semantics and correctness proofs checked in the Isabelle/HOL proof assistant (Section 4, Appendix A). (3) We implement WebTTC, a proof-of-concept web development framework that incorporates TTC.WebTTC applications enforce user-defined privacy policies (Section 5).( 4) We demonstrate TTC's ease of use and efficiency by presenting and evaluating our development framework.We port applications from previous works to our framework and show their practical runtime performance (Section 6).
This works focuses on consent-based PbD.As consent is at the core of the GDPR's design [68] and omnipresent in web services [15], tackling consent-based PbD is an important first step towards more comprehensive enforcement of legal requirements.send (" trustedanalytics .com " , " Analytics " , username ) 13 send (" evilanalytics .com " , " Analytics " , username ) 14 return render ( " timeline .html " , 15 { " posts " : ( " Service " , posts ) , 16 " ad " : ( " Marketing " , ad )}) Figure 3: Excerpt from the code of Minitwit Still, requirements [R1-3] do not cover the full range of legal bases for processing defined in the GDPR: e.g., necessity for fulfilling a contract or the controller's "legitimate interest" can also be invoked [1,Art. 6].Therefore, in the conclusion (Section 8), we discuss how our work could be extended to support a more comprehensive fragment of privacy laws.

ENFORCEMENT IN ONLINE APPLICATIONS
In this section, we first introduce the application model that we address (Section 2.1).We then refine it into a model of applications that run in an architecture featuring both an interpreter and an enforcer (Section 2.2).Finally, we define our adversary model (Section 2.3).

Modeling Online Applications
In this paper, we will consider applications that repeatedly receive queries from users, collect user data (inputs), process it, and send data back to users (outputs).This covers web applications as well as other types of applications such as microservices.We generically refer to such applications as online applications.Further, we assume that each output of an online application is labeled with a GDPR-style purpose, such as "Service" (if the output is necessary for offering the service for which the application was primarily developed), "Marketing" (for, e.g., ads), or "Analytics".

An
Example.The code in Figure 3, taken from a simple microblogging platform, exemplifies the behavior of online applications.Users can call a function user_timeline that takes a username and displays the user's posts.The first part of the function's code (l.6-10) extracts username's posts from the database into a variable posts.It then generates a personalized ad in the variable ad based on the posts' content (l.11).Next, username is sent to two third-parties (l.12-13) for analytics purposes.Finally, the page is rendered using posts (for purpose Service: the application's primary function is to allow users to see each other's posts) and ad (for purpose Marketing) and returned to the caller (l.18-20).
2.1.2Model and Assumptions.More formally, we consider applications that execute the following input-output loop.First, an input is received by an application endpoint from some user.This input is processed, modifying the application's internal state and resulting in a sequence of outputs to different users, which include both users of the application and third-parties such as other controllers.After producing its outputs, the application returns to a send ( " trustedanalytics .com " , " Analytics " , username ) 13 send ( " evilanalytics .com " , " Analytics " , username ) 14 return render ( " timeline .html " , 15 { " posts " : ( " Service " , posts ) , 16 " ad " : ( " Marketing " , ad ) }) Figure 4: Excerpt from the code of Minitwit, II state where it can accept further inputs.We model the execution of such applications as deterministic sequences of transitions of one of the following three forms: These respectively correspond to reading  simultaneous input arguments, producing a single output, and performing non-IO transitions.In the above transitions,  is the user producing the input or receiving the output; the   are the input or output values;  is the function performing IO; and  ∈ N is the timestamp according to some global clock.For inputs, (  ) 1≤ ≤ are the (distinct) arguments of  to which the inputs are passed.For outputs,  is the output's purpose.Non-IO transitions are marked with the special symbol •.
We assume that we have a fixed initial state  0 and monotonic timestamps, i.e., if Example 2.1.Assume that at the timepoint  0 , the user Alice calls user_timeline (l. 5) with username Bob.Her input has label i(Alice, user_timeline, {username ↦ → Bob},  0 ) .
Performing a • transition, the function computes the list of Bob's posts and the corresponding ad (l.[6][7][8][9][10][11].It then produces four outputs (l.[12][13][14][15][16]  where  and  stand for the value of the respective variable after line 16, and Since the relation → is deterministic, the application's state only depends on the sequence of timestamped inputs that it receives.This sequence can be represented as , where   are users,   are functions,    are the names of input arguments,    are the input values, and  +1 >   for all .When, starting in the state  with input sequence  , the application can reach the new state  ′ with a remaining (non-processed) input sequence  ′ , we write ,  = ⇒  ′ ,  ′ .

Privacy-Enforcing Applications
Next, we refine the previous formalism into a model of applications whose behavior results from the interplay between an interpreter and an enforcer.We first recall standard definitions of traces and enforcers, and then present our model.

Traces and Enforcers.
A signature is a triple Σ = (D, A, ) where D ⊇ N is an infinite set of constant symbols, A is a finite set of actions, and  : A → N is an arity function.An event over the signature Σ is an expression of the form ( 1 , . . .,   () ), where  ∈ A and   ∈ D, that encodes an action and its parameters.A trace stores the full history of the actions performed by the applications; it is a finite timestamped sequence of sets of events, i.e., a sequence  = (  ,   ) 1≤ ≤ where the   are strictly increasing timestamps, and the   are finite sets of events.The concatenation of two traces  and  ′ is denoted by  •  ′ , assuming that the first timestamp of  ′ is larger than the last timestamp of .The set of events (resp.traces) over a signature Σ is denoted by E Σ (resp.T Σ ).Finally, a policy  is a subset  ⊆ T Σ .Any trace  in  is called compliant with .
For the purpose of enforcement, the system's actions are all observable, and must be classified according to whether they can only be observed (only-observable actions, Obs ⊆ A), or can also be suppressed at the PEP (suppressable actions, Sup ⊆ A). 1 An enforcer for a policy  is a function that takes a trace  that complies with  and a new timestamped set of events, examines the set, and returns a subset of these events that should be suppressed in order for the trace to remain compliant.More formally, given a trace  and (, ) such that  • (, ) is a trace and  complies with , an enforcer for  returns a subset  ′ ⊆  of suppressable events such that  • (,  \  ′ ) complies with .The set  ′ is called the enforcer's verdict.Any policy  such that the empty trace complies with  and an enforcer exists for  is called enforceable.Example 2.2.If the action capturing the outputs of a system is suppressable, the policy "the system shall perform no output to Alice" is enforceable; a corresponding enforcer would simply suppress any tentative output to Alice.On the other hand, the policy "the system shall perform an output to Alice" is not enforceable, since the system can suppress, but not cause outputs.

Applications.
The rest of the paper considers applications whose behavior is uniquely determined by the interplay of two components: an interpreter component and an enforcer component.
The interpreter component is loaded with a program , provided by developers, which encodes the application's functionality.
The enforcer component provides an enforcer E for an enforceable policy  over some signature Σ.To support user-defined policies, the privacy policy  is assumed to be of the form ∩  ∈U   , where U is the set of users and for all  ∈ U,   denotes the individual policy of user .The intersection ∩  ∈U   is the property expressing the simultaneous compliance with all (  )  ∈U .
A privacy-enforcing application is an online application whose behavior is fully determined by (i) user inputs (ii) the program executed by the interpreter and (iii) users' privacy policies.The inputs to a privacy-enforcing application are provided by users and are thus only-observable, while the outputs are performed by the application and are thus suppressable.We obtain a family of 1 Previous work has also considered the case when some actions can be observed and caused by the enforcer [10].We focus on the only-observable and suppressable actions.transition relations for the entire application of the form → , , where  ranges over the set of programs and  over the set of policies.We write ⇒ , for the relation ⇒ that we obtain as in Section 2.1.2by setting → to → , .

Adversary Model
In this paper, we consider the enforcement of users' privacy policies against both malicious users (end-users or third-parties) and malicious developers that can tamper with program code.
We consider an adversary that can impersonate both users and developers.As a user , the adversary can interact with the application, observe its outputs, and set and read the policy   .As a developer, she can observe and set the program .The adversary, however, can neither observe nor directly influence the content of persistent storage or the policies   .Neither can she observe or modify the internal state of the interpreter, wrapper, and enforcer components, or tamper with the network.In practice, the assumption that the adversary cannot directly tamper with the behavior of the TTC components is sufficient as long as (malicious) developers have no access to the production infrastructure, which is managed by a distinct group of trusted privacy-compliance specialists.
Furthermore, for purpose-based usage to be correctly enforced, we assume that a trusted party (e.g., privacy compliance specialists or external auditors) has certified that the purposes labeling the various outputs generated by  are appropriately set.This should apply irrespective of whether the adversary has impersonated the developers or not.This assumption may seem strong, but it is unavoidable once concepts such as 'purpose of processing' are introduced; currently, such checks can only be performed by privacy or legal experts, and are not readily automatable.In practice, this requirement can be fulfilled via appropriate organizational measures, such as compulsory validation of annotations by compliance teams whenever production code is deployed or modified.
Finally, we consider a termination and timing-insensitive setup in which the time intervals between inputs and outputs do not convey information about the values of inputs, and every function is assumed to terminate after a fixed duration.To this end, we assume that every function performs a fixed number of outputs at fixed time intervals after receiving any set of inputs, and that this interval is smaller than the interval between consecutive inputs.

TRACES AND PRIVACY POLICIES
In this section, we develop TTC's policy language in three steps.First, we show how to encode the IO behavior of applications by collecting i and o labels into input-output (IO) traces (Section 3.1).Second, we use sets of IO traces to define information-flow traces (Section 3.2) that capture interference between inputs and outputs.Finally, we show how to use these information-flow traces to specify privacy policies covering requirements [R1-3] (Section 3.3).

Input-Output Traces
The complete input-output (IO) trace is obtained by considering the sequence of all such pairs (  ,   ).In the following, we label the relation ⇒ , introduced in Section 2.2.2 with the IO trace produced by the execution.We write and  is the IO trace produced by the sequence of transitions from  to  ′ .

Information-Flow Traces
We now define information-flow traces, which will provide our ground truth on which to specify privacy policies.Our definition aims at (i) minimizing the amount of information to be stored in the trace to improve performance while preserving privacy and (ii) giving a precise meaning to what "data derived from an input" means.To address (i), we replace the values stored in the IO traces by unique input and output identifiers.To address (ii), we introduce a new Itf event based on a notion of (non)interference [46] capturing the influence of an input on an output.The signature is 5).

Inputs and Outputs.
In information-flow traces, the In event encodes an input and the Out event an output.Since timestamps increase with every IO transition, each input can be uniquely identified by its timestamp  and argument name .We can use the pair  = (, ), which we call a unique taint (UT), as a unique input identifier.UTs allow us to refer to individual inputs specifically and are independent of the input's value.Similarly, each output can be uniquely identified by its timestamp.We obtain the following rules: • Each in(,  ,   ,   ) in the IO trace at timestamp  becomes In (,  , (,   )) in the information-flow trace; • Each out( , , , ) in the IO trace at timestamp  becomes Out ( , , , ) in the information-flow trace.
where the dots are placeholders for the Itf events discussed next.
Note that a distinct  argument no longer shows up in In, since  becomes part of the identifier input  = (, ).

Interference.
To reason about derived data, IO traces also contain events of the form Itf(, ), where  is a UT and  an output identifier (i.e., an output timestamp).Informally, Itf (, ) is present in the trace if, and only if, observing the value of the output with identifier  allows us to learn at least one bit of information about the value of the input with UT .This definition of Itf allows us to capture (non)interference [46] in a fine-grained way, encoding the influence of a single input on a single output.In contrast, conventional IFC approaches group inputs according to users and, further, according to security levels.
Example 3.3.Assume that, in Example 2.1, the table post contains a single post by user Bob.The text of this post, an input from a previous function call, is tagged with a UT  1 .Consider first the application without proactive checks (Figure 3).At timestamp  0 , user_timeline retrieves Bob's posts in the variable posts.Two inputs have influenced the content of posts: the one marked by  1 , and Alice's latest input to user_timeline (with value username = "Bob"), which has some UT ( 0 , username).Thus, the informationflow trace must contain Itf ( 1 ,  3 ) and Itf (( 0 , username),  3 ) at timestamp  3 .Further, username is sent to the two analytics thirdparties, which receive the outputs with identifiers  1 and  2 .Through generate_ad, the two inputs that influenced posts also influence ad, which is output with the identifier  4 .Hence, we have the following information-flow trace: where the dots represent the In and Out events from Example 3.2.
A formal definition of the relation "input  = (, ) interferes with output  =  1 given the input sequence  , " written (, ) ⇝ ,,  1 , is given in Appendix A.1.Using this relation, we can state the last rule defining our information-flow trace: • Let  be an input sequence,  a program, and  a policy.Assume that we execute the application defined by → , on  .Event Itf ((, ) ,  1 ) is in the information-flow trace at timestamp  1 if, and only if, (, ) ⇝ ,,  1 .We write trace , (, ) to state that the processing of all inputs in  using → , generates the information-flow trace .
Our Itf event plays a role analogous to the leak modality of SecLTL [32] by capturing a hyperproperty [23]: the occurrence of the Itf event depends on the existence of an alternative execution of the program in which a different input value leads to a different output value.In general, deciding whether such an alternative execution exists is impossible; hence, the information-flow trace cannot be fully computed at runtime.In Section 4, we will see how this trace can be soundly overapproximated using dynamic IFC techniques.

Privacy Policies
We can finally define a privacy policy as a policy over informationflow traces, i.e., a subset of T Σ IF .We say that the online application  = S,  0 , → , enforces the privacy policy  ⊆ T Σ IF iff for any input sequence  and trace , we have trace , (, ) ∈ .In our application model, inputs are only-observable, while outputs are suppressable.Interference, which results from the program's behavior, can only be observed.Hence Obs = {In, Itf}, Sup = {Out}.
[R2] Restrictions can be defined down to the level of single inputs by specifying privacy policies that prohibit outputs involving interference from specific inputs identified by their UT.
[R3] Privacy policies can be temporal, since traces are.
Example 3.4.A way to specify a privacy policy as defined above is using a temporal logic.Metric First-Order Temporal Logic [7,19] provides the temporal operators ♦  ('once in the past within time interval  ') and □ (always) that can be used to formalize the policies in Figure 1.A full description of MFOTL and a formalization of the policies from Figure 1 in MFOTL can be found in Appendix C. Discussion.In this section, we have shown how informationflow traces can be used to specify a general class of user-defined privacy policies.In general, the richness of this policy language and its explicit use of interference can challenge users.Hence the question naturally arises of how best to design appropriate interfaces for collecting user consent, and what concrete (high-level) policy language should be exposed to users in such interfaces.Interface and policy language design might involve standard techniques such as specification patterns [36], graphical representations [62], or natural language generation [79], and might benefit from a transdisciplinary effort.The policies collected through such interfaces could then be converted into an expressive formal language such as MFOTL.We see designing such interfaces and languages as an important, but distinct, problem.Hence, in the rest of this paper, we focus on the enforcement of policies expressed as MFOTL formulae, which provide the required expressivity and tools.

TAINT, TRACK, AND CONTROL
We now introduce our Taint, Track, and Control (TTC) approach.We first present a high-level overview of TTC (Section 4.1).Then, we proceed with a description of TTC's formal semantics: we define TTC programs (Section 4.2), introduce a new data structure called tagged values (Section 4.3), describe the state space on which TTC applications operate (Section 4.4), and then give the semantic rules for each of the Taint, Track, and Control steps (Sections 4.5-4.7).Finally, we establish the guarantees that TTC provides (Section 4.8).

TTC: a High-Level Overview
At its core, TTC consists of (a) an architecture for privacy enforcement and (b) an enforcement strategy carried out in three phases, called Taint, Track, and Control respectively.
The TTC architecture has the components shown in Figure 2: an enforcer serving as a PDP, an interpreter, a wrapper orchestrating the interaction between the interpreter, enforcer, and users, and serving as a PEP, and a persistent storage module.
Enforcement in this architecture is structured in three phases: Taint: A set of user inputs is received by the wrapper.Each user input is tagged with a fresh UT and the corresponding In events are forwarded to the enforcer.Then, the interpreter is loaded with the code of the function called by the user, and passed the (tagged) user inputs.Track: This code is executed by the interpreter, updating the content of permanent storage and computing a tentative output value.For every data item in both working memory and permanent storage, the interpreter additionally maintains a UT history (see next section) that conservatively approximates the set of user inputs that have influenced its current value.At the end of the Track phase, the interpreter returns an output value to the wrapper.Control: The wrapper computes (an overapproximation of) the Out and Itf events corresponding to the new tentative output.The UT history of the output is used to identify a superset of all inputs that interfered with the output, and hence generate the right Itf events.These events are forwarded to the enforcer, which responds with a verdict.Finally, the output is emitted if and only if the enforcer's verdict is empty, i.e., if and only if the new output complies with the policy.After a Control phase, the system can either start a new Track phase or return to the Taint phase to process the next set of arguments.The UT histories of data in permanent storage are persisted, thereby tracking the influence of inputs over their entire lifetime.
Example 4.1.Let us revisit Example 2.1 in the TTC context.In the Taint phase, the wrapper receives the input username = Bob from user Alice querying function user_timeline.It tags it with the UT  = ( 0 , username) and sends the corresponding In event to the enforcer.It then forwards the tagged input to the interpreter.In the Track phase, the interpreter computes the first output (to be sent to trustedanalytics.com), which it sends back to the wrapper.Then, in the Control phase, the wrapper generates the corresponding Out and Itf events, which it sends to the enforcer.Depending on the enforcer's verdict, the corresponding output is (or is not) forwarded to trustedanalytics.com.After the end of the Control phase, a new Track phase starts, this time producing the second output (to evilanalytics.com).This Track phase is followed by a new Control phase handling the output to evilanalytics.com.Two other iterations follow to generate and handle the two outputs to Alice.After the last output has been produced, a new Taint phase can take place to process the next user input.
In practice, developers should properly handle data that they cannot output, rather than just have outputs be suppressed.This requires proactive checks that allow programs to obtain enforcer verdicts ahead of time and react to them appropriately.With proactive checks, developers can, e.g., remove data items from a page being built if these items cannot be displayed according to the current policy.To support such proactive checks, the interpreter must be able to perform direct queries to the enforcer in the Track phase.
Example 4.2.A version of the code from Figure 3 with proactive checks is shown in Figure 4. Proactive enforcer calls are performed via the primitive check.If the content of posts cannot be used for marketing, a non-personalized ad is generated (l.11).The function filter_check (l.1-5) filters out messages that cannot be shown to the caller user for service purposes; it is used (l.10) to remove all posts that the caller is not allowed to see from the page.These operations occur during the Track phase.

TTC Programs
A TTC program is a mapping  →  ( ) from function names to non-empty, finite sequences of tuples of the form Given a function name  , the tuple sequence  ( ) specifies the behavior of  .More precisely, each tuple in the sequence encodes a block of non-IO source code code  followed by a single output statement that attempts to output the content of variable out_y  to user out_u  for purpose out_p  .The complete source code of the function is obtained by concatenating the (code  ) 1≤ ≤ and their respective output statements.Each out_u  can take any value in U or the special value me denoting the user that performed the latest input.The integer time  ∈ N is the fixed duration of executing code  .In the following, the various components of  ( )  will be denoted by  ( )  .code, . . .,  ( )  .timerespectively.Note that this definition fits into the time-insensitive framework described in Section 2.3 above.
To simplify the presentation, we consider non-IO source code written in a Turing-complete language called TTCWhile that manipulates integer values.We fix a set O of binary operator symbols and a mapping Ω : O → (D × D → D) interpreting these symbols.For illustrative purposes, we assume that O contains at least two symbols == and + such that Ω(==) = (, ) ↦ → if  =  then 1 else 0 and Ω(+) = (, ) ↦ → if {, } ⊆ N then  +  else 0. The syntax of TTCWhile programs is given by the following grammar: where  ∈ D is a constant, ⊕ ∈ O a binary operator symbol,  a variable name,  a user name, and  a purpose.
A source code, represented by the non-terminal  ("block") is a (possibly empty) sequence of statements.Statements include assignments of expressions ( = ), while loops (while  {}), and the special check instruction ( = check   ).Expressions can be constructed from constants (), variables (), and binary operators ( ⊕ ).The instruction  = check    puts 1 into  if the content of a variable  can be output to some user  for some purpose , and 0 otherwise; the user that performed the last input can be referred to as me.From assignments and while loops, we define if statements as syntactic sugar, writing if  {} as shorthand for t = ; while t {; t = 0}.
For the rest of this section, we fix a TTCWhile program , an enforceable policy  ⊆ T Σ IF , and choose an enforcer E for .

Tagged Values
Tagged values are a new data structure designed to support finegrained information-flow tracking and proactive checks.
A tagged value is a pair  = ⟨, ⟩ of a value  ∈ D and a finite list  = [{ℓ 11 , . . ., ℓ 1 1 }, . . ., {ℓ 1 , . . ., ℓ   }] of finite sets of UTs.The list  is called a UT history.The UT history  represents all inputs that have influenced the value , as well as dependencies between these influences.More specifically, whenever a variable x contains  as above, we require the following properties to hold: ( (1) Inputs with UTs ℓ 1 , ℓ 2 , and ℓ 3 may have influenced x's value; (2) No input has influenced that ℓ 1 tags x; (3) The input with UT ℓ 1 has influenced that ℓ 2 and ℓ 3 tag x.
In practice, such a history can be obtained, e.g., by executing the code x = 10; if y {x = u + v} when y, u, and v contain inputs ⟨1, [{ℓ 1 }]⟩, ⟨0, [{ℓ 2 }]⟩, and ⟨1, [{ℓ 3 }]⟩ respectively.In this case: (1) The values of the inputs y, u, v with UTs ℓ 1 , ℓ 2 , ℓ 3 respectively have all influenced the final value of x; (2) The value of the input y with UT ℓ 1 has influenced the final value of x irrespective of the value of any other input; (3) The value of the input y with UT ℓ 1 has influenced that u and v (with UTs ℓ 2 and ℓ 3 ) influence x: if the if block is executed, there is influence of u and v on x, whereas there is no such influence if the if block is not executed.
When property (1) above holds, the Itf events resulting from outputting  must all be of the form Itf (ℓ   , ) for some ℓ   in .This allows for a sound overapproximation of Itf events at runtime.
Unlike conventional dynamic IFC techniques based on tagging data with labels from a fixed security lattice, our notion of tagged values allows for arbitrarily fine-grained, per-input informationflow tracking.The use of UT histories fulfilling properties (1)-( 3), rather than unordered UT sets that can fulfill only (1), is motivated by the need to perform proactive checks while still keeping track of information flows.As we will demonstrate in Section 4.6.2below, UT histories allow for such checks to be performed on outputs that combine several inputs by providing an order in which individual UTs can be checked.

State Space
TTC applications operate on a state space S TTC = S  ×S  ×S  ×S  , where S  , S  , S  , and S  are the state spaces of the wrapper, the interpreter, the enforcer, and persistent storage (also called memory), respectively.The states of the wrapper and interpreter are represented as record types; their structure will be introduced in the next sections.The enforcer state stores a trace approximation.A trace approximation is a trace  ∈ T Σ IF = S  that contains at least as many Itf events as the actual information-flow trace of the system.Finally, the memory state is a mapping from variable names to tagged values.

Taint
The semantics of the Taint phase is given by the rule This uses the three auxiliary functions This taint rule can be performed only when the wrapper state has its step parameter set to Taint.First (Step 1 in Figure 2, highlighted in yellow), the next set of inputs is retrieved.The caller user , the called function  , and the input timestamp  are read in the wrapper's state.Moreover, the timestamp  is incremented by  ( ) 1 .timeto obtain the timestamp  ′ =  +  ( ) 1 .time of the next tentative output, and the field i, which encodes the index of the tuple of  ( ) currently being executed, is set to 1. Second (Step 2 in Figure 2), the trace approximation is updated with the In events corresponding to the last input.Third (Step 3 in Figure 2), the interpreter's state is initialized using the auxiliary function init  : its field code receives the source code  ( ) 1 .code of  ; its program counter history, discussed in the next section, is initialized to []; the constant u_me is set to ; finally, the single-UT enforcement oracle c, which will be used to execute check instructions in the Track phase, is set to  E  ,,, .The function c takes as arguments a UT ℓ, a user , and a purpose ; it returns 1 if one can perform an output to  for purpose  at timestamp  ′ with a value influenced by ℓ without violating .To detect violations, c calls the enforcer E. The auxiliary function usr makes it possible to refer to the caller user  in c using the special constant me.Finally, the arguments   are set in memory, each tagged with a fresh UT (,   ).

Track
In the Track step, the interpreter executes the code stored in its code field.We first present the semantics of assignments and loops (Section 4.6.1),then the semantics of checks (Sections 4.6.2),and finally state the track rule itself (Section 4.6.3).

TTCWhile Semantics.
The small-step semantics of TTCWhile is shown in Figure 6.The five rules assign, while_true, while_false, pop, and check (Figure 6b) are of the form , ,   ′ ,  ′ ,  ′ , where  is a source code,  is a program counter history, and  is a memory state.The program counter history is a list of UT histories.It keeps track of all inputs that have influenced the control flow: when entering a while loop, the UT history of the variable in the loop branches is added to ; when leaving the loop, this history is removed from .Three rules econst, evar, and ebinop of the form [[]] () = ⟨, ⟩ (Figure 6a) evaluate expressions to tagged values over a given memory state .
The evaluation rules econst and evar are straightforward.Rule ebinop evaluates expressions of the form  1 ⊕  2 .For this, the two subexpressions are recursively evaluated to The assign rule defines the semantics of assignments  = .First, [[]] is evaluated to some ⟨, ⟩.Then, () receives a new tagged value with value  and UT history all() • , where all() concatenates all UT histories in  (see Figure 6c).Thus the new UT history of () contains all UTs in the history of [[]] (), as well as all UTs in the program counter history.This reflects that the inputs that influenced the value of () are those that influenced the result of evaluating  or the control flow.Prepending of all() to  records that the presence of the UTs from  in () depends on the inputs with UTs in all(), which have influenced the control flow.
The while_true and while_false rules define the semantics of loops while  {}.If () = ⟨, ⟩ and  ≠ 0, then while_true is applicable.This rule adds  at the front of the code to be executed and prepends  to , recording that the inputs corresponding to the UTs in  now influence the control flow.To cover any implicit flows that the loop's execution can cause, while_true additionally adds the UTs  to () for all variables  that can be modified by .The set of such variables is computed using the auxiliary function lhs (see Figure 6c).Finally, a special pop instruction is inserted after while.Its semantics is defined by the pop rule, which removes the top element of the program counter history after the while loop completes.If  = 0, the rule while_false is applied instead.In contrast to while_true, it does not add  to the code to be executed.
The formal noninterference property satisfied by this semantics is given in Appendix A (Lemma A.25). 6b defines the semantics of  = check   .It uses the function  in Figure 6c, calling it with the arguments (, , ) where  is the UT history of ().The function  is defined so that this call returns 1 if, and only if, c(, , ℓ) = 1 for every UT ℓ contained in , where c is the single-UT enforcement oracle defined in the previous section.In other words,  (, , ) = 1 if all inputs that have influenced  can be output to  for purpose .This matches the intuitive semantics of check.

Checks. The check rule in Figure
While defining the value of  (, , ) using c(, , ℓ) is easy, choosing the right definition for the UT history of  (, , ) is more difficult.This is because this definition must both (a) account for all information flows between user inputs and the c(, , ℓ) and (b) ensure that the result of the check can still be branched on to proactively react to (especially negative) enforcer verdicts.
Regarding (a): In general, the output value of  can be influenced by user inputs in such a way that concrete information about inputs can be learnt.An example of this is given in Appendix A.2.
Regarding (b): We expect check to be used to proactively check if an output is allowed: typically, after executing  = check   , an if branch will be used to generate different output values depending on whether  can be output or not.But if  is tagged by any UT for which  (, , •) = 0, then no output value computed within the if branch may be output, removing the practical benefit expected from the check primitive.Therefore, we do not want  (, , ) to be tagged by any UTs for which c(, , •) returns 0. Formally: In Appendix A.2, we show that the following algorithm, together with our definition of UT histories, defines a  (, , ) that fulfills the constraints expressed in both (a) and (b): • Call c(, , ℓ) for every UT ℓ in  in the order of the history; • If any call returns 0, then return 0, otherwise return 1; • Tag the return value with all UTs for which c returned 1 and that are not in the last set of the history, in the same order as in the original history.The definition of  in Figure 6c formalizes this.If c( 0 ,  0 , ℓ 1 ) = 0, then we have  ( 0 ,  0 , ) = ⟨0, []⟩: the first check performed is on ℓ 1 , which immediately fails.By invariant (2) of UT histories (see Section 4.3), we know that the presence of ℓ 1 in the first set of  has not been influenced by the value of any input.Hence, the output of  ( 0 ,  0 , ) does not depend on the value of any input and the empty UT history in ⟨0, []⟩ correctly accounts for all information flows (a).Moreover, since the history is empty, condition (b) is trivially fulfilled.
Note that the algorithm we described requires an order on the set of UTs tagging a given value in memory that reflects the dependencies between the influences of the various inputs.This motivates the introduction of UT histories.

track Rule. The rule for the Track phase is
where * denotes the transitive closure of .After executing the code in   .code( 1 ), the wrapper moves the system into a Control state using a •-transition, updating the memory state ( 2 ).

Control
The rule defining the semantics of the Control phase is where  ( , , , , ) = {Out ( , , , ) First, the recipient user  ′ , the purpose , and the output value  of the tentative output defined by the program are retrieved (Step 1 in Figure 2).Second, an overapproximation  ( ,  ′ , , , ) of the set of events caused by the output is computed, and the enforcer's verdict is obtained (Steps 2-3 in Figure 2).Third, depending on whether  = ∅ (output complies with ) or  ≠ ∅ (output violates ), the output is either performed or suppressed, with the trace approximation being updated accordingly (Step 4 in Figure 2).Finally, the system returns to a Track state incrementing  if  < | ( )|, or to a Taint state if  = | ( )|; the corresponding updates in the state of the wrapper and interpreter use the auxiliary functions next  and next  (Step 5 , not shown in Figure 2).

Guarantees
We now state and discuss sufficient conditions on  that guarantee the correct enforcement of the policy  by any application implementing our TTC semantics.These conditions are (1) Itfmonotonicity and (2) independence of past outputs.
(1) We say that a policy  is -monotonic for some action  ∈ A iff, for any trace  that violates , adding additional  events in  cannot restore compliance with .Formally,  is -monotonic iff for all  = ((  ,   ))  and In general, TTC cannot enforce policies that are non-Itf-monotonic, since UTs overapproximate information flows.
(2) We say that a policy  is independent of past outputs if compliance with  does only dependent on past In events, but not on past Out and Itf events.If we allow for malicious developers and users, policies that depend on past outputs cannot, in general, be enforced using TTC.This is shown in Appendix B. To support such policies, one would need to adopt a coarser propagation of UTs in the pres- | len() | keys() | sql( ,) | render(,) | redirect() | get_session( ) (15) | set_session(, ) | get( ) | post( ) | send(, , ) | check(, , ) | me() Figure 7: Syntax of PythonTTC programs ence of implicit flows, or to store the UTs of all inputs that affected the content of the trace approximation.Both approaches risk an explosion in the number of UTs to be stored ('label creep').Hence, in this paper, we focus on policies that are independent of past outputs.
Example 4.8.All policies in Figure 1 are independent of past outputs, since only In events appear below temporal operators.If  is enforceable, Itf-monotonic, and independent of past outputs, and the online application  TTC = S TTC ,  0,TTC , → , terminates, then  TTC enforces the privacy policy  ⊆ T Σ IF .
Finally, we discuss the usefulness of the check instruction for different classes of policies.The check instruction enables the program to obtain the enforcer's verdict in advance for individual inputs or data items.For instance, in the code from Figure 3, the HTML representations of a sequence of events are incrementally generated and added to the page only when check succeeds.The programmer's intent is that by doing so, she can ensure that the page sent to the user will be accepted by the enforcer.However, this is only the case if the enforcer's verdict on a UT history  ⋓  is no stricter than the conjunction of its verdicts on  and  respectively.Otherwise, the output can fail despite the checks succeeding; this would not affect the application's security but may reduce its usability.A policy of the form Φ  , however, always fulfills the above condition, and can therefore be chosen as a general form of policies amenable to both enforcement and checking.
Formal Proofs.All definitions and theorems from Sections 2-4 have been machine-checked using the Isabelle/HOL proof assistant.The formalization, which has approximately 4,300 lines of Isar code, is openly available [57].More details are provided in Appendix A.

IMPLEMENTATION
To show how our approach can be used, we have implemented WebTTC, a web programming framework with TTC semantics.
WebTTC features a Python-like language called PythonTTC that extends TTCWhile with support for basic web programming primitives.PythonTTC's syntax is presented in Figure 7. Figures 3 and 4 show examples of PythonTTC code.Beyond standard Python features (functions, dictionaries, lists, sets, l. 1-8 and 9-13), PythonTTC  3), generating one event per argument.Suppression of one of these events causes the corresponding argument to be set to a default error value.The check primitive from TTCWhile is also available (l. 19), and the constant me returns the identifier of the current user (l.19).Note that despite using a Python-like syntax, PythonTTC cannot make use of external Python libraries, since these libraries may, in general, rely on Python features not supported by PythonTTC.Instead, WebTTC supports web programming directly as part of PythonTTC.
WebTTC compiles PythonTTC programs deployed by programmers into Python3 code.It provides an out-of-the-box interface in which users can log in and specify their privacy policies, and implements the wrapper through which they can interact with individual applications.We use SQLite 3.31.1 for persistent storage and the state-of-the-art MFOTL enforcer EnfPoly [56] for enforcement.WebTTC consists of approximately 3,500 lines of Python code.We provide a ready-to-use image [58] containing all artifacts and tests.
The framework discharges our model's assumptions as follows: (1) The assumptions about the privacy policies (independence of past outputs, monotonicity, enforceability) are verified by restricting policies to a 'safe' syntactic fragment of MFOTL on which these requirements are fulfilled [7,56].(2) To keep the number of outputs in each function fixed, sending data to third-parties is disallowed in if and then blocks.(3) The intervals between inputs are kept larger than the functions' running time by processing the next input only after the previous function terminates.(4) Termination is ensured by introducing a timeout on function execution.A more powerful alternative would be requiring (automated) termination proofs from developers.Proving the termination of programs is a well-researched task (see, e.g., [44]) that is largely orthogonal to this work.

EVALUATION
Beyond showcasing the feasibility of TTC, our framework provides a basis for demonstrating how TTC can be used to enforce nontrivial user-provided privacy policies in real-world applications.We evaluate WebTTC by answering the following research questions: RQ1.Is WebTTC sufficiently expressive to develop realistic web applications?How is the size of the code base impacted?RQ2.How much runtime overhead does WebTTC incur?How does it scale with the database size and policy choice?RQ3.How does the runtime performance of WebTTC compare to the performance of state-of-the-art IFC languages?RQ1: Generality.We have implemented the following benchmark applications from previous work within our framework: • A conference management system (Conf) [77]; • A HIPAA-style health record manager (HIPAA) [77]; • The microblogging app Minitwit [75]; and • Minitwit + , a Minitwit extension with personalized ads.
For each of these applications, we have implemented a Flask/Python baseline application without security, both using pure SQL (for Conf and Minitwit) and the SQLAlchemy ORM (for HIPAA).
For the two applications implemented in Jacqueline [77], we also consider the original implementation.To our best knowledge, the Riverbed [75] implementation of Minitwit is not publicly available.
For each application, every argument of call or render was annotated with a purpose in {Service, Marketing, Analytics} as in Figure 3.In the considered applications, this process was straightforward, as all webpage components and calls to third-parties had a clearly identifiable purpose (e.g., a block containing ads has purpose Marketing, a list of other users' posts has purpose Service).However, we recall that correct purpose annotation is critical and generally requires certification by trusted legal experts (see Section 2.3).
We have been able to implement the original application functionality of all four applications in WebTTC.Table 1 shows the size of the code base for each of these implementations.Each PythonTTC implementation requires under 20 lines of security code, which consists mostly of applications of check in views showing lists of objects.The number of lines of code needed to implementation the application's functionality grows by about 10% with respect to the unsecured baseline, and is comparable to Jacqueline's.
In the experience of this paper's authors, porting applications from the Flask/Python baseline to WebTTC was a relatively smooth process.The similarity between the two interfaces allowed us to convert most code straightforwardly without altering its structure.The main difference between the Flask and WebTTC source codes consists of the introduction of checks in functions displaying lists of objects that may be subject to different permissions, and in the addition of purpose annotations.Some additional testing with various user policies was also required to ensure that the check statements had been adequately added.However, since programmers need not design the policies themselves, we expect the overall process of porting applications to WebTTC to be less time-consuming and more mechanical than with conventional IFC frameworks.
RQ2: Enforcement overhead.We assess the total runtime overhead of TTC by comparing the latency of selected function calls from our four test applications to the latency of the Flask/Python implementation without security.We thus obtain a conservative estimate of the overhead of TTC.For each application, we selected (i) a function displaying one relevant entity, if available (a paper in Conf, an individual in HIPAA), (ii) a function displaying a list of entities (all papers for Conf, all individuals for HIPAA, the latest 30 posts in Minitwit), and (iii) a function adding one entity, if available (a (e) Requirements covered by the policies paper in Conf, a post in Minitwit).For each function, we measure the page latency using the Python requests library on a high-end laptop (Intel Core i5-1135G7, 32 GB RAM) over  = 100 repetitions, while varying the number  of entities, the number  of users, and the policies   of users.Large values of  cover the scenario where the application was used for an extended period of time.The range of values for  and  is taken from Yang et al. [77].We set   to either   0 ,   1 ,   2 ,   3 where, for  ∈ {0, . . ., 3},    is identical to policy   in Figure 1, except that Alice is replaced by  and posts are replaced by papers in Conf and individuals in HIPAA.We denote by O Flask the ratio between the latencies of WebTTC and Flask.
The results of our experiments is presented in Tables 2a-b.The latency is constant with respect to  and  for functions of type (i) and (iii), as well as for timeline.In these functions, a finite number of elements is displayed and WebTTC only adds a constant number of checks, leading to a constant, low overhead between 1 and 2.5.The latency of these functions remains consistently below 15 ms, irrespective of the choice of , , and   .For view_papers and view_index, we observe both linear latency in  and an increase in latency for large  and , with the overhead versus Flask being 1-10 for view_papers, or even higher for view_index.The high overhead is caused by the large number of checks and more complex UT propagation induced when combining many inputs into a single output.These functions show all entities stored in the database, a 'stress test' [77] that is unlikely to materialize in applications featuring appropriate pagination.
Provided that pagination is introduced, our experiments thus show that the overhead of the TTC application with respect to a baseline without security is moderate, and that its performance is at least similar to that of conventional IFC systems.With a slowdown of 1 to 2.5× and latency below 15 ms for all values of the parameters tested, the usability of the case study applications is not significantly impacted by the additional runtime costs.
RQ3: Comparison with the state of the art.In the previous section, we have given a conservative estimate of the overhead of TTC with respect to a baseline without security.Next, we compare TTC's runtime performance on Conf and HIPAA to that of Jacqueline [77], a state-of-the-art Python-based IFC web framework [78].For all values of  and  tested, WebTTC exhibits a latency equal to or lower than Jacqueline's (see Table 2a-b).Better performance is also observed in those functions for which WebTTC exhibits a significant overhead with respect to the baseline without security.
Since Jacqueline is based on a different, developer-specified policy model, we additionally converted Conf's and HIPAA's original IFC policies from [77] into MFOTL privacy policies  Conf and  HIPAA that lead to equivalent information-flow restrictions (see Appendix D for details).The results of these additional experiments, presented in Table 2d, confirm that for all values of  and , WebTTC exhibits a latency equal or lower than Jacqueline's.
Table 3 summarizes previous work with respect to support for [R1-3], existence of formal correctness proofs for a core of the language ('Proofs'), and the enforcement of noninterference, as opposed to just access control ('Noninterference').Riverbed [75] is, to our knowledge, the only approach tackling user-defined data usage policies [R1].However, Riverbed does not consider noninterference, and comes with no formal proofs.More critically, its policy language has very limited expressivity: a user can only (dis)allow that all their data is aggregated with data of other users or written to permanent storage.As a result, users can neither associate custom policies to different inputs they provide nor exchange information with users that have different policies.
The dynamic IFC systems of Jeeves [78], Jacqueline [77], Hails [45], LWeb [64], and Estrela [11] rely on policies defined at the datamodel level.Hails [45] and LWeb [64] are based on Haskell monads, Estrela [11] on modifying database queries, and Jeeves [78] and Jacqueline [77] on a custom  calculus.In Jeeves and Jacqueline, sensitive values are evaluated either to their actual value or a default value according to the content of some level variable, which is set according to the application's policy.In all these systems, the fields or rows of databases can be assigned confidentiality labels, whose semantics may depend on the values of the data and on the context of execution.Such approaches support fine-grained policies but not arbitrary per-input policies as in requirement [R2], as they model interference from the database to outputs rather than from inputs to outputs.In particular, labels are not persisted between user queries.
Jif [61], SIF [21], and JSLINQ [5] support static reasoning about interference between inputs and outputs in a fine-grained way, thus fulfilling [R2].Additionally, in SIF [21] and JSLINQ [5], programs can dynamically create new security labels encoding data usage permissions.This can be used to let users select between different policy options.However, even then, the policy space available to users is still strictly limited by the developers' choices, and users obtain no formal guarantees that the preferences they express are correctly taken into account.Hence, these systems fall short of fulfilling [R1].In contrast, SELinks [25] uses a type system to guaran-tee that access to sensitive data is always guarded by appropriate policy checks.In SELinks, arbitrary functions can be assigned to security labels to define custom policies, a mechanism that could possibly be extended to support user-defined policies; however, SELinks does not provide noninterference guarantees.None of the above approaches supports time-dependent policies [R3].
Other Approaches.While we focus on the enforcement of IFC in server-side code, IFC techniques are also used in the browser [9,12,13,17,27,30,54,73] to avoid leaks caused by client-side code.
Another line of research has considered temporal IFC from the point of view of runtime verification and (trace-based) runtime enforcement.Hyperproperty extensions of temporal logic such as Hy-perLTL [22] and SecLTL [31] can be used to monitor noninterference.However, hyperproperty monitors usually take the set of all system traces as an input [2,16,24,[38][39][40]50], a set which cannot be computed in reasonable time for general programs.
Differential privacy [35] is a promising approach to enforcing privacy regulations [26], providing strong statistical privacy guarantees.However, being statistical, these guarantees may be practically insufficient or of limited usability depending on the data type, the size of datasets, and the queries considered [33,34,76].
The combination of fully homomorphic encryption [43] with trusted computing allows for outsourcing complex computations with reduced information leakage [41,63].However, these approaches are still prohibitively slow for general computation, while efficient alternatives [41] sacrifice the coverage of implicit flows.

CONCLUSION
In this paper, we have presented Taint, Track, and Control (TTC), the first language-based PbD approach that enforces user-defined, fine-grained, temporal privacy policies in online applications against both malicious peers and developers.We have introduced a notion of information-flow traces that can be used to define expressive privacy policies; defined the formal semantics of TTC and proven its correctness; and implemented and evaluated WebTTC, a framework that allows for the development of private-by-design applications.
The work described in this paper can be extended to address further aspects of privacy regulations.In addition to requiring consentbased usage, the GDPR recognizes users' "right to be forgotten."Enforcing such a right requires timely erasure of data at rest, i.e., causation of systems actions [56].Hence, adding support for causation to our TTC framework would allow it to cover a larger set of the GDPR requirements.GDPR also allows programmers to disregard user consent when data is anonymized, or when other legal grounds (e.g., legitimate interest) are met.One may therefore consider extending TTC with controlled declassification of user inputs.
Extending TTC (and WebTTC) to support more features of modern software systems, like communicating components, provides another avenue for future research.With more features and a broader GDPR coverage, we also plan to conduct studies involving external developers and users, exploring how TTC can help develop of new generation of user-centric, private-by-design software systems.Definition A.7.Let ℓ be a UT.The relations = ℓ ⊆ H 2 and ≈ ℓ ⊆ T 2 are defined as follows: We lift ≈ ℓ to (Var → T ) 2 by defining We prove a series of technical lemmata, of which the most important are: Lemma A.8 (equiv_ind_history, equiv_ind_tv).For any UT ℓ, the relations = ℓ and ≈ ℓ are equivalence relations.
In general, the output value of  can be influenced by user inputs.Consider the following code  1 , where u 1 and p 1 are arbitrary: This program first checks x, assigns a to z if x is non-zero, and checks z.The two verdicts are written into u and v, respectively.Then, it stores the result of the comparison u == v in w.We claim that if w is 0 after executing  1 on  0 , the initial value of x must be nonzero.The proof is by contraposition: if x is initially 0, then the if block is never executed; instead, the UTs of x are added to the UTs of z, which can be modified within the if block.Hence, when z is checked, it contains exactly the same UTs as x contained initially, and the values of u and v must be equal.Hence if w is 0, then x cannot have been 0 initially, i.e., x influences w through the use of check.Therefore, our definition of  must ensure that, after executing  1 , w is tagged with the initial UT of x.
Example A. 19.With the code  1 introduced above, our definition gives In the former case, ℓ 3 is not allowed to be output, and w must contain 0 after executing  1 irrespective of the value of x, as both calls to check return 0. Hence, w is not influenced by the value of x.In the latter case, ℓ 3 is allowed to be output, but the initial content of y may not be.In this case, the first check may yield  1 = 1 and the second  2 = 0, and we may learn that x was initially nonzero by observing that w contains 0. But since after executing  1 , the variable w now contains ⟨ 1 =  2 , [{ℓ 3 }]⟩, this information flow is correctly tagged by the UTs.In both cases,  (u 1 , p 1 , [{ℓ 3 }]) is only tagged with ℓ 3 if  (u 1 , p 1 , ℓ 3 ) = 1.

A.3 TTCWhile (TTCWhile.thy)
We first need to define a few properties of programs: Definition A.20 (Termination, terminates').We say that a code  terminates on a program counter history  and a memory state  for a given choice of c, written terminates'    c, iff there exists  ′ and  ′ such that , ,  * c ,  ′ ,  ′ .
Definition A.21 (pops).For any code , we denote by pops() the number of pop statements that appear in  (excluding potential pop statements inside while blocks).
Definition A.22 (Well-formedness, wf_code).We say that a code  is well-formed, written wf_code , iff it contains no pop statements inside its while blocks.
We prove that evaluating the same TTCWhile expression against two indistinguishable memory states produces indistinguishable tagged values: A.4 TTC Semantics (TTC.thy) We prove our enforcement property in two steps.First, we fix a set of objects and assumptions about them (a locale in Isabelle parlance) and prove that, if these assumptions hold, the semantics of TTC guarantees correct enforcement.Second, we show that the assumptions in the locale hold for TTCWhile.Sufficient enforceability conditions are established in [56].
Using the operators ♦  and □, the policies from Figure 1  where  expresses the condition under which v consents that the data she input to function  ′ can be output to user  by function  for purpose .

D CONVERSION OF JACQUELINE IFC POLICIES D.1 Conf
In the Conf application from [77], the authors enforce the following restrictions: • The email of users is only visible to the chair or user itself; • The author of papers is visible if the phase of the submission process is 'final', or if the current user is the author, a chair, a PC member, and there is no conflict with the paper; • The coauthors, review assignments, paper versions, reviews, comments are visible only if the current user is the author, a chair, a PC member, and there is no conflict with the paper.We assume that the phase is not final, that there are no conflicts, and that the user identifiers are of the form "0", "1", . . .where "0" is a chair, "1", . . ., "10" are PC members, and "11", . . .are regular users.In this setup, the following TTC policy imposes the same information-flow restrictions as in the original application:

D.2 HIPAA
In the HIPAA application from [77], the authors enforce the following restrictions: • The data from model Individual is visible to the individual; • The location from model HospitalVisit is visible to the patient, hospital, and users with profile type 6; •
The patient ID from model Treatment is visible to the patient and prescribing entity; • The patient ID from model Diagnosis is visible to the patient and recognizing entity; • The shared information from model BusinessAssociate-Agreement is visible to the covered entity and business associate; • The standard, first party, second party, and purpose from model Transaction is visible to the first party and second party; • The email from model UserProfile is visible to the user itself.We perform tests with Individual only, hence we consider:   ‹u› HIPAA = ∀ ′ , , , , ,  , .Out( ′ , , , ) ∧ Itf(, ) ∧ ♦ In( , , ) ⇒  = "Individual" ∧  = ‹u›.

Table 1 :
Size of code base for each application