Why Privacy-Preserving Protocols Are Sometimes Not Enough: A Case Study of the Brisbane Toll Collection Infrastructure

The use of Electronic Toll Collection (ETC) systems is on the rise, as these systems have a significant impact on reducing operational costs. Toll service providers (TSPs) access various information, including drivers’ IDs and monthly toll fees, to bill drivers. While this is legitimate, such information could be misused for other purposes violating drivers’ privacy, most prominent, to infer drivers’ movement patterns. To this end, privacy-preserving ETC (PPETC) schemes have been designed to minimize the amount of information leaked while still allowing drivers to be charged. We demonstrate that merely applying such PPETC schemes to current ETC infrastructures may not ensure privacy. This is due to the (inevitable) minimal information leakage, such as monthly toll fees, which can potentially result in a privacy breach when combined with additional background information, such as road maps and statistical data. To show this, we provide a counterex-ample using the case study of Brisbane’s ETC system. We present two attacks: the first, being a variant of the presence disclosure attack, tries to disclose the toll stations visited by a driver during a billing period as well as the frequency of visits. The second, being a stronger attack, aims to discover cycles of toll stations (e.g., the ones passed during a commute from home to work and back) and their frequencies. Weevaluatethe success rates of our attacks using real parameters and statistics from Brisbane’s ETC system. In one scenario, the success rate of our toll station disclosure attack can be as high as 94%. This scenario affects about 61% of drivers. In the same scenario, our cycle disclosure attack can achieve a success rate of 51%. It is remarkable that these high success rates can be achieved by only using minimal information as input, which is, e.g., available to a driver’s payment service provider or bank, and by following very simple attack strategies without exploiting optimizations. As a further contribution, we analyze how the choice of various parameters, such as the set of toll rates, the number of toll stations, and the billing period length, impact a driver’s privacy level regarding our attacks.


INTRODUCTION
For the time period 2019 till 2030, it is predicted that the global ETC market will grow at a compound annual rate of 8.3 percent, reaching about 18.5 billion U.S. dollars by 2030 [6].Also, in 2019, the EU issued a directive to make ETC systems in Europe fully interoperable [14].Thus, a careful analysis of privacy concerns arising from such an important technology is crucial.
Road-infrastructure vs. autonomous-device based ETC.We can distinguish two important (post-payment) ETC technologies.In road-infrastructure based ETC, on-board units (OBUs) mounted inside a vehicle interact with road-side units (RSUs) to determine tolls.In deployed systems, typically, OBUs simply send over encrypted user IDs when passing an RSU, which are then used by the toll service provider (TSP) to update the user's balance.In (academic) proposals for privacy-preserving ETC systems, more complex cryptographic protocols are executed by OBUs and RSUs.Based on the user's final balance at the end of a billing period, the TSP issues an invoice to the user.In autonomous-device based ETC, road-infrastructure devices such as RSUs are not needed.Instead, the OBU determines the toll for road segments autonomously using location and time information from GPS.The OBU and the TSP interact in order to compute the final balance of a user.In this paper, we primarily focus on (privacy-preserving) road-infrastructure based ETC.However, as our attacks are fairly generic and only require minimal information as input (see later), we expect that they are also applicable to disclose a driver's (cycles of) road segments in an autonomous-device based ETC.
Privacy concerns.TSPs, often private companies, store sensitive information to charge drivers.This data typically includes drivers' identities and home addresses, wallet balances being the total toll drivers owe to the TSP by the end of a billing period, and the locations and times drivers pass toll stations.Naturally, privacy concerns arise due to the fact that such sensitive user data is in the hands of (private) TSPs.The work [32] provides an overview of the resulting privacy issues.For instance, a TSP could misuse the data to infer drivers' movement patterns and places of interest.Also, third parties, including law enforcement agencies and insurance companies, may try to persuade the TSP to get access to the data for prosecution or commercial purposes.This demonstrates that the collected data could be misused for purposes beyond billing drivers.
PPETC leaking minimal information.To address these issues, several privacy-preserving ETC (PPETC) schemes have been proposed so far, e.g., [2,10,15,31,34] (cf.Section 10 for an overview).Such schemes aim to disclose only the minimal information to the TSP necessary to charge drivers but protect the anonymity and unlinkability of a driver's transactions.Typically, this includes the drivers' identities, home addresses, and final wallet balances, which are needed to issue invoices at the end of a billing period, as well as a database of anonymous and unlinkable transaction records (each containing at least location and time information 1 ) which are provided by the RSUs.
Research question: real-world privacy provided by PPETC.As privacy protection in current PPETC schemes does not mean that there is no information leakage at all but certain minimal information (required to ensure the core functionality) is still leaked, it is interesting to see if this information is already sufficient to violate privacy in a practical scenario.Besides the information leaked by the protocols, the deployment of a PPETC scheme in a realworld scenario also fixes certain parameters and provides additional background information which is all relevant to a driver's privacy: the pricing scheme, number of toll stations, road infrastructure information, statistical data about driver behavior, etc.To the best of our knowledge, the impact of such information on the "real-world privacy" of a PPETC scheme has not been thoroughly analyzed yet.A driver's privacy certainly depends on the complexity of the subset sum problem (SSP) [28] which, in our case, is concerned with finding toll prices that add up to a driver's wallet balance (monthly total toll).By solving the SSP, an adversary can learn the toll stations a driver passed during a billing period [2,15,34].Although the general SSP is NP-complete, there are variants that can be solved in polynomial time [33].Moreover, we can expect that in a real-world scenario, the parameters relevant for its time complexity (number of toll prices) will usually be small values.
Our Contribution.In this paper, we provide evidence that a driver's privacy may not be preserved in general when a PPETC system is used to replace an existing ETC in practice.We do so by focusing on Brisbane's ETC infrastructure as a case study, using its real parameter settings (set of toll rates, number of toll stations), real statistical data (distribution of wallet balances, maximal number of visited toll stations), and real background information (road infrastructure). 2ur attacker is weaker than the one typically considered by PPETC schemes, the latter being the TSP colluding with all RSUs, as it exploits less information: it does not require the transaction records provided by RSUs as input.In fact, the minimal information used by our adversary is, e.g., available to the payment service provider (e.g., Apple Pay [1]) or bank of a TSP's customer.This makes our attacks generic and broadly applicable.
We present two generic attacks.The first one, we call "toll station disclosure attack" (TSD), is a variant of the presence disclosure attack [36].It tries to discover the toll stations visited by a driver during a billing period as well as the number of visits.The second attack, called "cycle disclosure attack" (CD), aims to identify the cycles made by a driver during a billing period as well as their frequencies.A cycle starts and ends at the driver's home location and passes through one or more toll stations.In comparison to the first attack, the adversary discloses more information (e.g., the order in which toll stations are visited) since the cycles already include the visited toll stations.The first attack is based on solving a variant of the SSP.The second attack exploits the output of the first one.
We evaluate our attacks based on information and statistics from the Brisbane ETC setting.This evaluation shows that our attacks achieve considerably high success rates.More precisely, the first attack achieves a success rate of 94% and the second one a success rate of 51% for 61% of the drivers.Although these success rates are already pretty high and answer our research question, we explore certain heuristics to further improve them by assessing the plausibility of solutions to the underlying SSP or removing implausible solutions.Note that exploiting transaction records, in addition, may (only) lead to better success rates or stronger attacks (but also to a stronger attacker model), which we leave as future work.As a further contribution, we study how certain parameter choices (toll rates, number of toll stations, length of billing period) affect the success rate of our attacks and give some recommendations to transportation system engineers based on our findings.

BACKGROUND
Here, we define various terms needed for our study.
Transaction: The transaction  is a tuple consisting of a toll location (), toll price () and time (), i.e.,  = ⟨, , ⟩.The set  = { 1 ,  2 , . . .,   } consists of all transactions made by all drivers within a billing period.All the transactions are anonymous.
Billing period: The billing period is a system parameter fixed by the TSP and denoted by  .It consists of a starting time and an end time.All transactions in this interval belong to the set of transactions associated with this billing period.
Toll price: This is a parameter fixed by the TSP.We define the set of toll prices as  = { 1 ,  2 , . . .,   }, where   ∈ D is a fixed toll fee that is assigned to toll station   .This study focuses on static toll pricing schemes with fixed prices, in contrast to dynamic toll pricing schemes where toll fees vary based on factors like the day of the week or time of day.
Graph: The map of a city, including toll stations and toll roads, is represented by a directed graph as  = ( , ).In graph ,  represents the toll stations, i.e.,  = .Each edge represents a toll road.The graph may include other information available on the city's map, such as the distance between two toll stations.
Set of frequencies: The frequency   is the number of times a driver visits the toll station   in a billing period.We define the set  = { 1 ,  2 , . . .,   }, where   ∈ N 0 is the frequency corresponding to toll station   .Note that N 0 is a set of non-negative integers, and   = 0 means the driver has not visited the toll station   .
Drivers' identity: The set  = { 1 ,  2 , . . .,   } denotes a subset of unique identities (e.g., passport number) of TSP customers.This information is needed to associate a monthly toll fee (wallet) with its corresponding driver's identity and home address in order to issue an invoice at the end of a billing period.For simplicity, we assume that each customer coincides with the driver who uses the TSP's services.Hence, each identity  ∈  can be linked with both a driver and a customer.Assuming that each driver only uses one fixed vehicle, each  can be associated with the driver's vehicle.This assumption is valid if an OBU inside a vehicle is fixed and cannot be attached to another vehicle.Hence, there is a one-to-one relationship between the identity  and a driver/customer and its corresponding vehicle.
Drivers' home address: A subset of home addresses, associated with  , is denoted as  = {( 1 , ℎ 1 ), ( 2 , ℎ 2 ), . . ., (  , ℎ  )}, where   ∈   and ℎ  in each tuple corresponds to a subscribed driver's identity and his home address, respectively.A driver provides the TSP and payment provider with their home address to which invoices are sent.
Cycle: In this study, we consider a cycle in graph  as a circuit, which is a non-empty trail in which the first and last vertices are equal [42].A cycle includes one or more toll roads on which at least one toll station is located.The cycle may also encompass one or more roads without any toll stations.
Subset sum problem: The SSP is a well-known NP-complete problem, which is defined as follows [28].We consider the set  = {  : 1 ≤  ≤ ,   ∈ N 0 } and the value  ∈ N 0 , i.e., a non-negative integer.The aim is to find   s such that Linear diophantine equation: A linear diophantine equation is a linear equation whose solution is restricted to be integers [35].Each variable in the equation has at most a degree of one.Equation 1 represents such equations.
Solving the equation falls into integer optimization problems where the variables take integer values [39].The SSP can be interpreted as solving Equation 1, where   ,  ∈ N 0 .
Wallet balance: The wallet balance is the total toll fee based on which the TSP issues an invoice for a driver, and it should be paid by the end of the billing period.Note that each wallet is associated with a driver's identity so that the TSP can charge the driver.The driver's wallet  is the summation of the toll prices of all visited toll stations by the driver within a billing period.Given , the following linear diophantine equation holds: is the toll price of the toll station   , and   is the corresponding frequency.We define a subset of wallet balances of drivers, associated with  , as  = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}, where   ∈ N 0 is associated with a driver with identity   ∈ .For   ∈  , there is exactly one tuple with the first component   in  .
Trace: Intuitively, a trace demonstrates the history of the toll stations (  ) visited by a driver and their corresponding frequency (  ) in a billing period.We present a trace as the set of tuples denoted as the set  = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}, where   is the frequency associated with the toll station   .Given , one or more of the   ∈  may equal zero, meaning the driver has not visited the corresponding toll station   .Having defined the trace, we define the correct trace and the plausible trace.
Correct trace: The correct trace is truly made by a driver in an ETC system.It corresponds to the toll stations visited by a driver and the frequency of the visited toll stations during the considered billing period.
Plausible trace: The plausible trace of a driver with wallet  is the  = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}, where the toll prices associated with the toll stations inside the tuples and corresponding frequencies satisfy Equation 2. A plausible trace is not necessarily the correct trace but a candidate for being the correct trace.The adversary uses an attack to obtain the set of plausible traces denoted by _ = { 1 ,  2 , . . .,   }.
Success rate: We define the success rate as the probability that an attacker selects the correct trace of a driver uniform at random from all plausible traces.Note that the uniform selection of the correct trace from the set can be considered a baseline strategy, as the adversary makes no distinction among the traces (they all have an equal probability of being a correct trace).However, the attack could exploit various strategies in which the adversary distinguishes among the traces in the set (each having different probabilities of being a correct trace).In this case, different strategies might result in distinct corresponding success rates comparable to the success rate when the attack employs the baseline strategy.Through this comparison, we can measure the extent to which a strategy performs better or worse than the baseline strategy.
The attack's effectiveness: Two factors mainly impact the attack's effectiveness and, accordingly, affect a driver's privacy: (1) The number of plausible traces: In the case of many plausible traces, privacy is preserved as the success rate becomes very small.
(2) The computational complexity: If it is computationally infeasible to find plausible traces, privacy is preserved.The tables of notations and acronyms are shown in Appendix A.

THREAT MODEL
In this work, we consider a passive adversary with access to a subset  of driver IDs, subsets  and  of home addresses and wallet balances, respectively, associated with those drivers, as well as the set  of all toll prices and the graph  of the ETC system.This knowledge of the adversary is denoted by the set  = {, , , ,  } in the following.Note that this is a small amount of required knowledge, which, in the real world, would be, e.g., already available to payment service providers (e.g., Google Pay [17]) or banks that TSP customers use to make toll payments to the TSP: A payment service provider can certainly identify toll payments to a TSP in its records.These records also contain the corresponding wallet balances.Furthermore, it knows the IDs and home addresses of its customers associated with these payment records, as this information is usually needed to set up an account and issue valid account statements.Finally, the graph, i.e., the map of the toll station infrastructure as well as the pricing, can be considered public information available from the internet (e.g., the website of the TSP).
Certainly,  is a subset of the information available to a TSP.In fact, also the designers of PPETC schemes such as P4TC [15] consider a stronger adversary being the TSP colluding with all RSUs.Such an adversary would additionally receive all anonymous transaction records (consisting of locations, timestamps, and fares, but no IDs) from the RSUs.By means of our attacks, we show that even our weaker adversary not obtaining those transaction records can have a significant success rate.Exploiting more information may only lead to stronger attacks or higher success rates. 3n the following, we consider adversaries aiming to achieve the following two goals: Toll station disclosure (TSD) goal.This goal is a variant of the presence disclosure goal, defined as "to find out if a given user or a set of users are present at some place(s)" [36].The TSD goal has two subgoals: firstly, to learn the toll stations visited by a driver and, secondly, to determine the frequency with which the driver visited each toll station within a given billing period.The frequency of visited toll stations could reveal a driver's point of interest/s (POI) [16,21,38], e.g., supermarkets, restaurants, tourist spots, and hotels [30,44].Note that this goal is equivalent to finding the correct trace defined in Section 2 and denoted as  = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}.
Cycle disclosure (CD) goal.The intuition behind this goal is to discover regular activities of drivers, represented by cycles [3,4].A cycle starts from a driver's home, passes through one or more toll stations, and returns to the home.Besides learning cycles, another subgoal is to determine the frequency of each individual cycle made in the billing period.Regarding this goal, a driver's trace is defined as  = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}, where the   are different cycles the driver made in a billing period.Note CD is a stronger goal than TSD, as it aims to disclose a driver's full trajectories instead of only the visited toll stations.

THE TSD ATTACK
This section presents the TSD attack to achieve the TSD goal.The pseudo-code and details of the attack are discussed in Appendix B. We, finally, theoretically compute the attack's success rate.We present the attack as follows.

Procedure of the attack
The adversary (A) uses an attack based on solving the SSP to obtain the plausible traces of a driver.The attack uses as its input the set  = { , , , }.To find the correct trace of a driver with the wallet , the adversary needs to obtain the driver's set of plausible traces where the correct trace is located.A plausible trace is denoted as  = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}; hence, to create a plausible trace, it needs the set of toll stations, i.e.,  = { 1 ,  2 , . . .,   } and the set of corresponding frequencies, i.e.,  = { 1 ,  2 , . . .,   }.Note that the set  is concluded from graph  = ( , ), where  =  .To obtain the set  = { 1 ,  2 , . . .,   }, it needs to solve the SSP, for which it creates the following linear diophantine equation and solves it via DOcplex (i.e., IBM Decision Optimization CPLEX Modeling [12]), which uses the depth-first search as the default algorithm [22].We remind that the SSP can be interpreted as a linear diophantine equation.
The equation holds as explained in the wallet's definition in Section 2. Each   , in Equation 3, represents the frequency   that the driver visited the toll station   .A solution of Equation 3 results in the set  .Note that the   ≠ 0 in the set  means that the corresponding toll station   is visited at least once.Then, A verifies if the visited toll stations in the set  are connected.For instance, if two toll stations are visited without visiting the intermediate toll station (assuming that this is the only connection between the two toll stations), the solution will be discarded.If connectivity holds, A creates the corresponding plausible trace  using the sets  and  .The algorithm for checking the connectivity in a graph is fully discussed in Appendix C. It should be highlighted that one solution of Equation 3 leads to a plausible trace; hence, since the equation may have more than one solution, it results in a set of plausible traces as _ = { 1 ,  2 , . . .,   }.The correct trace is among the plausible traces.The details of the attack and the corresponding pseudo-code are provided in Appendix B.

Heuristic-based approaches
We explained in Section 2 (see success rate) that the adversary uses a baseline strategy to guess uniformly at random the correct trace from the plausible traces.Here, we offer several heuristics that the TSD attack can utilize to guess non-uniformly the correct trace.The presented heuristics are based on drivers' behavior.
The first heuristic.The presented heuristic is based on drivers' behavior, i.e., they tend to visit a restricted number of toll stations within a given billing period.For example, the toll roads in Brisbane are mainly used for activities, such as taking a holiday/getaway, going to the airport, and social activities [41].This behavior implies that drivers exhibit a constrained toll station usage pattern tied to their regular activities.Concerning this heuristic, A considers the maximum number (ℎℎ) of toll stations a driver visits in a billing period.The threshold can be determined through statistical data [41].Considering a set of plausible traces associated with a driver, the heuristic assigns the probability of zero to plausible traces in which the number of toll stations exceeds the threshold and assigns the same probabilities to the rest of the plausible traces.Finally, A ignores the traces with a probability of zero (this is why we consider this strategy as non-uniformly) and selects a trace uniformly from the remaining ones.
The second heuristic.This heuristic, similar to the first one, is based on drivers' behavior, i.e., visiting a limited number of toll stations, with the difference that the heuristic assigns a certain probability to each plausible trace in a set of plausible traces.The probability can be computed using statistical data, which describes the distribution of the number of visited toll stations by drivers in an ETC system (in a billing period).For instance, A may assign a relatively low probability to plausible traces that involve a large number of toll stations.For example, in the case study of Brisbane, it would be less likely for a driver to visit all nine toll points in the city.On the other hand, A can assign a relatively high probability to plausible traces that involve a relatively small number of toll points.For example, employees who regularly commute from Yatala to Ipswich would likely pass through only a limited number of toll points on their route (see the map in Figure 1).Finally, A selects a trace with the highest assigned probability.The details and pseudocode are discussed in Appendix F.
The third heuristic.This heuristic exploits the regularity in drivers' behavior by utilizing the yearly historical information of wallet balances.The heuristic considers a driver and all their corresponding yearly wallet balances, denoted as   , 1 ≤  ≤ 12, where  represents each month of the year.Using the sets of plausible traces corresponding to the   , the heuristic creates a set of clusters, each of which includes the potential traces a driver could have made within a year.Each cluster is assigned a probability based on the similarity of the traces within it, indicating the likelihood that those traces are associated with the driver.To measure the similarity among the traces, A can use a similarity metric such as Euclidean distance.Clusters with higher probabilities contain traces that demonstrate more similarity and are thus more likely to be associated with a driver displaying regularity in their behavior.Finally, A can select a cluster with the highest probability (see Appendix F for the details and pseudo-code).

EVALUATION OF THE TSD ATTACK
In Section 5.1, we introduce the parameter settings used in the evaluation of the attack.In Sections 5.2 and 5.3, we evaluate the TSD attack and the first heuristic, respectively.

Parameter settings
In this section, we discuss the parameter settings and evaluate the attack based on the metric  (see Equation 4).To evaluate the attack, we use real parameters and statistics provided by Transurban Queensland, including toll prices, the number of toll stations, the billing period length, and statistical information on the distribution of wallet balances [5,40,41].Transurban is the operator of all toll roads in Brisbane.The toll stations and toll roads are shown on the map in Figure 1, where the yellow circles represent the toll stations in Brisbane.In the following, we discuss the attack input, i.e., the set  = { , , , }, and the other parameters needed to obtain the plausible traces.The parameter settings are shown in Table 1.Our evaluation is performed on Windows Server 2019 Standard (64-bit), with 96.0 GB RAM, with x64-based CPU 3.70 GHz, Intel(R) Xeon(R) E-2288G.
• Drivers' identities (): Drivers' identities are not publicly available; however, we can compute the success rate whatever drivers' identities are as a driver's identity has no role in the computation of the correct trace nor of the success rate.The identity is only used for assigning the correct  Using the ranges, we can generate drivers' plausible wallet balances.
Plausible wallet balance: A driver's plausible wallet balance is a wallet in the range [  ,   ] that could potentially be associated with the driver.To generate a driver's plausible wallet balances, we create the below inequality whose solutions are all plausible wallet balances: Note that the idea of the inequality comes from Equation 3.
To create the inequality, we use the wallet ranges shown in Figure 2.For example, given the range [$10, $20], the corresponding inequality is as follows: 10 range.The numbers are shown in Table 1.We further use the generated plausible wallets to compute the average success rate.
• Upper bound (): We set the upper bound to ⌈/()⌉, meaning that   , 1 ≤  ≤  (see Equation 3) cannot exceed this upper bound.• Billing period ( ): We consider a billing period of a month as the wallets reported in [40] are based on one month.

• Number of variables (|𝑆 |):
The number of variables in Equation 3, and Inequality 5 equals 9, i.e., the number of toll stations.

Our evaluation
Having discussed the parameter settings, we proceed to evaluate the TSD attack using the notion of .

Computation of average success rate.
Having discussed the parameter settings, we aim to compute the attack's success rate (see Formula 4); however, as said earlier, the adversary does not access drivers' real wallets ( ) to run the attack.To tackle the problem, we employ the computed plausible wallet balances discussed earlier and use the notion of average success rate ().
The  metric demonstrates the attack's average success in finding a driver's correct trace concerning all his corresponding potentially plausible wallets.To compute , we take the following steps.
• Step 1: Considering drivers associated with the wallet range [  ,   ], we compute a driver's plausible wallet balances using Inequality 5.Then, for each plausible wallet balance, we run the TSD attack and accordingly compute its corresponding success rate, i.e., .
• Step 2: We compute the average of all computed s corresponding to the plausible wallet balances from Step 1.The resulting value is the attack's  for finding the correct trace of a driver associated with the range [  ,   ].Note that for all drivers within the range [  ,   ], the attack's  has the same value since the range results in the same plausible wallet balances and, accordingly, the same .
Note that the two steps will be repeated for each of the three ranges shown in Figure 2. The figure shows that approximately 86% (summation of all proportions) pay less than $40 a month; hence, we are motivated to evaluate the attack's success rate concerning this large proportion of drivers.We do not consider the rest of the drivers (14% of total drivers) with wallet balances above $40 since the corresponding ASR is negligible.

5.2.2
The evaluation results.Based on the evaluation, we discuss the distribution of success rates and the corresponding s which are shown in Table 1.

Distribution of success rates.
To analyze the effectiveness of the TSD attack, we illustrate the distribution of success rates obtained in Step 1.Each box plot in Figure 3  • The  for 61.38% of drivers, a large proportion, is roughly 94% which is considerably high.As said, this proportion of drivers is associated with the wallet range of [$0, $10].• The  for 14.31% of drivers is approximately 54%, yet considerably high.This proportion of drivers is associated with the wallet range of [$10, $20].• The  for the small proportion (11.12%) of drivers is nearly 5%, which is relatively low.• The attack achieves these high s, i.e., 94% and 54% without utilizing drivers' home addresses and transactions.Furthermore, these s are achieved using the baseline strategy without incorporating any heuristics.

5.2.3
Discussion on the attack's effectiveness.As said in Section 2, the attack's effectiveness depends on the number of plausible traces and the complexity of the attack, which hinges on the difficulty of solving Equation 3(which is generally NP-complete).Our evaluation demonstrates that the TSD attack performs effectively as solving the equation is feasible.Figure 4 shows that the distribution of wallet balances significantly impacts the number of plausible traces and the runtime.Figure 4a shows the number of traces for plausible wallet balances below $40, associated with 86% of total drivers.The graph shows that for wallets below $10 (93 red points), the number of plausible traces is one or two, meaning that a large proportion of drivers (61.38%) have very low privacy.The number of traces for wallet balances between $10 and $20 (721 blue points) is between one and sixteen.Drivers' privacy with this range of wallets is also at risk of violation.For balances between $20 and $40 (2000 green points), the number of traces is considerably increasing, i.e., between one and 320.Drivers with balances close to $20 have relatively lower privacy than drivers close to $40.Overall, the graph shows that the balance of wallets noticeably impacts the number of traces and a driver's privacy accordingly.
Figure 4b shows the runtime of the TSD attack, which depends on the difficulty of solving Equation 3. The runtime for the wallets below $10 is between 20  and 60 ; for wallets between $10 and $20 is between 20  and 80 .The runtime for the wallets between $20 and $40 is considerably increasing, from 30  to 260 .Although the graph shows an increasing trend, solving Equation 3 is feasible in a short time.This shows that although solving a linear diophantine equation is NP-complete in general, the equation can be solved quickly, given the real settings of Brisbane's ETC system.

Evaluation of the first heuristic
Due to the page limitation, we evaluate the first heuristic (see Section 4.2) with the parameter settings of Brisbane's ETC system (see Table 6 in Appendix F).We highlight that presenting a more sophisticated and enhanced strategy (to guess the correct trace) would not necessarily strengthen our argument, as our main goal is to provide a counterexample, which we have already achieved.See Appendix F for the details of the heuristic.
For the evaluation, we analyze how the attack's  and the average percentage decrease () change with different potential thresholds that the adversary can consider in the case study of Brisbane.The percentage decrease helps us compute the reduction in the number of plausible traces (due to ignoring traces with zero probabilities) as a percentage of the original size of the set of plausible traces.The notion of  is similar to  (see Section 5.2.1), which is the average of all percentage decreases corresponding to a driver's all plausible wallet balances within the range [  ,   ].Table 2 shows that the metrics show an upward trend as the threshold decreases.This trend is particularly noticeable in larger wallet ranges ([$20, $40], [$40, $60]), indicating a contribution to the improvement in ASR compared to the lower ASRs before applying the heuristic (see "without heuristic" column).However, the upward trend is less noticeable for smaller wallet ranges ([$0, $10], [$10, $20]); nevertheless, the corresponding ASRs are already high even without applying the heuristic, i.e., 94% and 51.16% (see "without heuristic" column).

THE CD ATTACK
As defined in Section 3, the CD goal is to find the cycles and corresponding frequencies a driver made in a billing period.The adversary can attempt to find the cycles where the summation of the price of individual cycles leads to a driver's wallet balance by solving an SSP: The price of a cycle is the summation of the toll prices corresponding to the toll stations included in the cycle.However, this fails; for details, see Appendix I.The adversary would need to solve a linear equation with numerous variables, which is computationally infeasible for a sufficiently large number of toll stations.We introduce the CD attack, which exempts the adversary from solving the SSP.We theoretically elaborate on the algorithm's success rate and then evaluate it with the real parameter settings of Brisbane's ETC system.

Procedure of the attack
In this section, we present the CD attack.See Appendix J for pseudocode and details and Appendix K for an example.
Basic idea.The adversary exploits the set  = {, , , ,  } to achieve the CD goal.Compared to the TSD goal, the adversary, in this attack, exploits additional information, i.e., drivers' home addresses.The attack's core idea is that A exploits the visited toll stations and their corresponding frequencies, which are already obtained from the TSD attack and denoted as _ = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}.The visited toll stations are the building blocks of the cycles made by a driver.Obviously, the toll stations that have not been visited do not contribute to the formation of cycles.In the following, we will first explain the algorithm, namely "find_cycle_algo" for finding cycles utilized by the CD attack.We will then outline the CD attack in three steps.
Algorithm find_cycle_algo.We present an algorithm for finding cycles.The algorithm takes a set of different toll stations, a home location, a graph, and a strategy as the inputs and returns the cycle passing through all the toll points and the home location (if   Each row corresponds to a wallet range and thresholds , where 3 ≤  ≤ 8.As the threshold decreases, the metrics show an upward trend, especially for wallet ranges that include higher wallet balances. any).The algorithm can use various strategies to obtain the cycles, such as the shortest distance and shortest time [8].A driver may use the shortest-distance strategy, taking the shortest possible route to their destination.The algorithm and its pseudo-code are shown in Appendix H.
Step 1: Create partitions.A uniformly selects a plausible trace as a candidate for the correct trace, namely _, via the TSD attack.Given the _ = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}, A first creates a multiset, where the visited toll station   is repeated   times.It then computes all possible partitions by dividing all visited toll points in the multiset into segments, each containing different visited toll stations.The point is that the toll stations in each segment, along with the driver's home location, could potentially form a cycle that a driver has made.A segment must satisfy two conditions: (1) no toll station should be repeated within a segment, and (2) the order of toll stations in a segment does not matter.All the segments in a partition could potentially correspond to the cycles a driver has made.The driver's correct trace (i.e., cycles and their corresponding frequencies) is associated with only one of the partitions.To compute the partitions, A uses a partition function explained in the following example.
Example: Given a driver's correct trace {( 1 , 1), ( 2 , 2), ( 3 , 1)}, the partition function computes the multiset _ = { 1 ,  2 ,  2 ,  3 }.Then, the function divides the visited toll points in the multiset into six different partitions regarding the two mentioned conditions.The portions are 1: The toll stations in each segment, along with a driver's home location (ℎ), lead to a cycle (if any) that the driver could have made.The cycle is found by the algorithm  __.
Step 2: Transform each partition to a plausible trace.In this step, A transforms all the segments in each partition to their corresponding cycle (if any), using the aforementioned algorithm  __.Hence, each partition will be transformed into a plausible trace, including the cycles and their corresponding frequencies {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}.The plausible trace, then, will be stored in a set of plausible traces, namely __.Note that there may be a partition where the toll points in a segment cannot lead to a cycle due to their connectivity, and hence, the partition cannot lead to a plausible trace.We emphasize that each plausible trace within the set of plausible traces is linked to one of the partitions obtained in Step 1.
Step 3: Selection of the correct trace.Finally, A uniformly guesses a plausible trace as the correct trace from the set of plausible traces, i.e., __.The success rate of A to guess correctly is computed as follows: 6.1.1Computation of the success rate.To obtain the success rate, we need to compute the probability of selecting the correct trace in Step 3.This requires A to make two consecutive correct guesses, as outlined below: (1) In Step 1, A has to guess the correct trace, namely   ′ among the plausible traces _ = { 1 , . . .,   } obtained by the TSD attack, where  ′ denotes an arbitrary but fixed plausible trace.The probability of the correct guess is denoted and computed as  [1st] = 1  .(2) Given   ′ , using the CD attack, A obtains the corresponding set of plausible traces, including the cycles and corresponding frequencies denoted by __ = { 1 , . . .,   } (see Step 2).Then, A uniformly selects   ′ as a correct trace from this set with the condition that its first guess is correct, where  ′ in   ′ denotes an arbitrary but fixed plausible trace.The probability of   ′ being a correct trace is denoted by .Now, we use Bayes' theorem [23] to compute the probability of the second guess, i.e.,  [2nd], being a correct trace.
We previously computed the terms  [1st] and  [2nd | 1st], which are equal to 1   and 1  respectively.The term  ([1st | 2nd]) in Formula 6 represents the probability of the first guess being correct, given that the second guess is correct, which is equal to one.This is because the correctness of the second guess depends on the correctness of the first guess as a prerequisite.Substituting the values of these terms into Formula 6 yields the success rate, i.e., the probability of the second guess being correct.
The index  ′ in   ′ denotes that the success rate is computed with respect to the   ′ as a candidate for the correct trace in Step 1. Since, in this step, each   in the set of plausible traces has a chance of being a correct trace, we compute   with respect to each individual   using Formula 7. Note that the value of  depends on   ′ .The final  is obtained by taking the average over all   s (shown in Formula 7), as all of the   have the same probability of being a correct trace.
Formula 8 shows that the success rate of the CD attack () correlates with the success rate of the TSD attack, i.e.,  [1st] = 1  .Discussion on the attack's effectiveness.As said in Section 2, the attack's effectiveness depends on two main factors.(1) The number of plausible traces: If in Formula 8,  and  are quite large, i.e., the corresponding sets of plausible traces contain a quite large number of plausible traces,  becomes negligible, preserving a driver's privacy.(2) Computational complexity: The complexity of the CD attack mainly depends on the complexity of the TSD attack (discussed in Section 4.1) and depends on the complexity of the functions used for creating the partitions and cycles.The details are discussed in Appendix J.

EVALUATION OF THE CD ATTACK
Here, we discuss the parameter settings used in the CD attack and evaluate it using the success rate discussed in 6.1.1.
Parameter settings.We employ the same real parameter settings of Brisbane's ETC system discussed in Section 5.1.The CD attack requires the set  = {, , , } and also the set  to achieve the CD goal.Concerning the set  , we do not access the home locations of drivers registered with the Brisbane ETC system, as this information is not publicly available.Consequently, the algorithm for finding the cycles cannot find the cycles since it relies on  ; additionally, the algorithm for finding the cycles requires the strategy for finding a cycle as drivers use different strategies for selecting a path.However, we do not access such information; nonetheless, we can evaluate the CD attack as follows.
Our evaluation.We evaluate the CD attack using the notion of  explained in Section 5.2.1 with the difference that, here,  is computed based on  in Formula 8.According to the formula,  depends on  and , where  depends on the size of the set of plausible traces (__), including plausible traces.To obtain the set,  and the strategy is needed to find the cycles (see Step 2 of Section 6.1).As said earlier, we do not access such information; nevertheless, we compute  using an upper bound for the size of the set __, i.e., , via the inequality |__ | ≤ number of all partitions.The inequality holds because, as said in Step 2 of Section 6.1, each partition may/may not lead to a corresponding plausible trace.Hence, we employ the total number of partitions to compute an upper bound for |__ |, i.e., , and, accordingly, to compute .Given , we compute  for each individual wallet range [  ,   ].
The evaluation results.We discuss the distribution of success rates and s shown in Table 1.
Distribution of success rates.To investigate how successfully the CD attack performs, we demonstrate the distribution of success rates across all plausible wallet balances within each range [  ,   ] in Figure 5.
Each box plot in Figure 5 illustrates the distribution of success rates corresponding to all plausible wallet balances within the range  • Regarding the CD attack, the  for approximately 61.38% of drivers is around 51%, which is relatively high.However, it is considerably lower than the  of 94% achieved by the TSD attack.• The  for approximately 14.31% of drivers is relatively low, i.e., 11%, which is lower than the  of 54% achieved by the TSD attack.• The  for the small proportion of drivers 11.12% is approximately 0.28%, which is negligible compared to the  of 5% in the TSD attack.• The comparison above demonstrates that the CD attack has a lower  in comparison to the TSD attack.However, the former attack provides the adversary with more information, such as the driver's cycles, whereas the latter attack only discloses the visited toll stations.

IMPACT OF PARAMETER SETTINGS ON PRIVACY
We discuss parameters impacting the TSD attack's success rate and a driver's privacy accordingly.Then, we evaluate the success rate for various parameter settings.To this end, we consider the Brisbane case study with different parameter settings and complexities.This approach offers useful insights to determine the privacy level of a driver under different settings.For instance, we explore alternative toll price ranges rather than Brisbane's, creating diverse complexities.Appendix G provides the details of our experiments.
Parameters impacting privacy.The attack's success rate depends on the number of plausible traces, which, in turn, is influenced by the parameters in Equation 3.These parameters are as follows: toll prices (), wallet balance (), length of the billing period ( ), and the number of toll stations (| |).
Analysis of different parameter settings on privacy.We investigate the impact of different settings of each parameter on the attack's success rate and, consequently, a driver's privacy.To accomplish this, we consider the Brisbane case study and the corresponding parameter settings outlined in Section 5.1.We create scenarios with varying levels of complexity for each parameter, allowing us to assess the attack's success rate in different settings.To this end, while keeping certain parameters fixed, we vary the settings of a specific parameter, such as toll prices, to examine its impact on privacy.Appendix G and Table 7 in the appendix provide further details on our experiments and the parameter settings.In Figure 6a, the  increases from zero to almost 96%, in Figure 6b from zero to almost 73%, and in Figure 6c, from zero to about 15%.This concludes that a driver's privacy is more at risk, given a larger toll price range.This manifests itself more evidently for wallets with low balances (see Figure 6a).The figures show that when the assigned toll prices are equal (selected from the range [1, 1]), the  approaches zero, thereby preserving a driver's privacy (see Appendix G for the details of our experiment).• Parameter : The wallet's balance highly impacts the number of plausible traces and a driver's privacy accordingly.Overall, small wallet balances lead to higher s.This poses a potential risk to a driver's privacy.• It should be noted that the overall impact of the parameters on privacy also holds regarding the CD attack since the success rates of the attacks are correlated (see Formula 8).

DISCUSSION
We discuss the limitations and future work, and then we give several recommendations concerning PPETC schemes.

Limitations and Future work
• Our TSD attack utilizes a subset of the information available to the TSP in order to achieve the TSD goal.This attack can serve as a baseline for other attacks that aim to achieve the same goal but utilize more information than the set .For instance, these attacks could incorporate drivers' home addresses and transactions, which are not utilized in our attack.
The CD attack uses the sets  and  to achieve the CD goal.
Similarly, this attack can be used as a baseline for attacks that exploit additional information, such as transactions.In future work, we can analyze how leveraging such additional information would impact a driver's privacy compared to our attacks.• It is said in [36], if the adversary has already found some user locations where he visited, completing the user's locations can be done more easily.This is achievable if the adversary accesses the user's mobility profile.The mobility profile represents how probable it is for a specific user to move from one location to another in a specific period [36].
The adversary, e.g., could use drivers' identities to find drivers' mobility profiles.Knowing such information would help the adversary complete the locations the driver passed.
In future work, given that the adversary has obtained a driver's mobility profile, we can investigate to what extent the adversary can complete the driver's trajectories.
• As future work, we raise the following research question: "To what extent is an adversary successful in converting the cycles obtained by the CD attack to a chain of trips?"A trip made by a driver starts from a source (at time   ) and ends at a destination (at time   ), and the driver stays at the destination for a while.Each trip takes place for a main purpose: work, shopping, and education.We are interested in finding the trip constituents of a cycle.The adversary, e.g., could benefit from contextual information to guess the trips made by a driver [11].For example, given that a cycle passes through one toll station close to a shopping center, the adversary can conclude that the driver could have made a trip to the shopping center with a certain probability.By inferring the trips, the adversary learns more information about a driver's behavior.• For evaluating the TSD attack, since we did not access drivers' wallet balances, we generated all plausible wallet balances based on wallet ranges provided by statistics.Then, we computed the attack's  (see Section 5.2.1).Although our evaluation demonstrates a high , accessing drivers' wallet balances leads to a more accurate success rate in finding a driver's correct trace.• This study focuses on static toll pricing, excluding the consideration of PPETC schemes involving dynamic pricing.Dynamic pricing entails toll prices that fluctuate according to time or other parameters.Incorporating dynamic pricing would negatively affect the efficacy of our attacks, as it introduces additional variables into Equation 3 while solving the SSP.This stands in contrast to a scenario where prices remain constant.In fact, the inclusion of each time-based toll price requires the addition of a corresponding variable within the equation, thereby reducing the success rate.• In this work, our presented attacks do not employ transactions, including times (stored in the TSP), to achieve the goals.The times are the moments when vehicles pass through toll stations.The reasons for not using times in our attacks are: firstly, based on our threat model, the given "counterexample" is stronger if considering a weaker attacker.Secondly, although there are tracking algorithms [13,43] for tracking vehicles based on the times and locations of vehicles, these algorithms operate under some conditions, some of which are not held in our case study.The main important condition is that vehicles should report their times and locations periodically and with a short time interval [20,43] (e.g., below one second [43]), which does not hold in our study.In our case study, the times corresponding to each vehicle have long time intervals since the distances between two consecutive toll stations are quite long.Hence, in this study, we take advantage of other information for the purpose of tracking, including wallets, toll prices, the city's ETC graph, and home addresses.Finally, to evaluate attacks exploiting timestamps, we would also need realistic traces, including timestamps, for the Brisbane setting.But we do not access real traces, and for generating realistic synthetic traces, we lack the necessary statistics to feed into a simulator we would have to develop.This is why we refrained from exploiting time stamps.As future work, we are interested in how to exploit timestamps as additional information to achieve stronger attacks/higher success rates.• In Section 1, we categorized PPETC schemes into two different categories: (1) privacy-preserving road-infrastructure based ETC schemes and (2) privacy-preserving autonomousdevice based ETC schemes.The focus of this study is on the first category.In future work, we can investigate the privacy of schemes in the second category, such as [2,31,34], where the OBU device is used to compute the monthly toll fees with the help of times and locations received by the GPS.We are interested in determining if our attacks can be applied to the schemes in the second category.Since our attacks are fairly generic and only require minimal information as input, we expect that they are also applicable to disclose a driver's road segments in an autonomous-device based ETC.

Recommendation
To deploy a PPETC system, toll engineers should consider the following recommendations based on our analysis in Section 8: (1) Various parameters impact a driver's privacy, including the range of toll prices (), wallet balances (), length of the billing period ( ), and the number of toll stations (| |), which the TSP sets except the wallet balance.A wallet balance depends on the toll prices of toll stations visited by the driver and the frequency of visiting.The parameter  also indirectly affects a wallet's balance and, accordingly, a driver's privacy.Hence, when deploying PPETC schemes, toll engineers should remember that varying a parameter could influence the other parameters and, accordingly, a driver's privacy.
(2) Considering a sufficiently large billing period by the TSP would help reduce the risk of violating a driver's privacy (Figure 6e).In fact, a large billing period would lead to larger wallet balances as drivers would likely visit more toll stations or visit toll stations more frequently within a longer period.Therefore, larger wallet balances provide better privacy for drivers (Figure 6d).It should be noted that this consideration should align with a toll service provider's financial policies.(3) A wide range of toll prices in an ETC system increases the risk of violating a driver's privacy.It is advisable to consider a toll price range that results in a negligible success rate while not conflicting with the TSP's financial policies.For example, if all toll prices associated with the corresponding toll stations are equal, the success rate is close to zero (see Figures 6a,  6b, and 6c).However, such a toll price range should align with a TSP's financial policies.

RELATED WORK
In this Section, we discuss the typical information leaked by PPETC schemes.Then, we elaborate on the attacks regarding PPETC systems.
Privacy in ETC schemes: In this section, we discuss the information leaked by PPETC schemes and available to the TSP.The privacy-preserving scheme in [25] stores toll fees and hashes of the locations into the TSP so as to hide them from the server.Apart from this, a small percentage of the locations and toll fees should be revealed to the TSP to detect cheating.In [34], the scheme "VPriv" stores the tagged location-time data at the TSP, which an OBU sends.The driver computes the total toll fee and sends it to the TSP.The notion of privacy in "VPriv" means that the privacy of the current scheme should be the same as the privacy in a scheme in which the TSP stores just location-time data without any identifying information (tags), and the total toll fee should be received from an oracle without executing any protocol.In [26], the TSP stores the OBU's id and a set of signed, encrypted, and committed tuples (location, time, toll fee).This information is transmitted by the OBU at the end of each billing period.Two schemes are presented in [10]: (1) the SPEcTRe spot-record scheme and (2) the SPEcTRe no-record scheme.The first scheme reveals the same data as that in the scheme [34].The second scheme does not store drivers' private information but can detect cheaters.

Attacks on privacy in ETC:
To the best of our knowledge supported by the survey [24], the work [7] is the only relevant study that proposes an attack concerning the post-payment PPETC schemes.In [7], the authors present an attack to find drivers' possible traces in a toy example of an ETC system.A trace is a set of trips a driver has made within a billing period.A trip made by a driver includes anonymous transactions belonging to the driver.The authors assumed a strong adversary that already has access to anonymous trips as its input, which is a strong assumption.Besides the trips, the adversary accesses wallet balances, trip prices, the city map, and contextual information such as the maximal speed of cars.The adversary uses an attack that works based on solving the SSP whose solutions lead to a driver's set of traces where the correct trace is located.To solve the SSP, [7] utilizes Pisinger's algorithm [33] due to its linear computation time.However, it is important to note that the values corresponding to the set of traces must adhere to specific restrictions.In our approach, we employ DOcplex [12] as the solver, which, by default, has no restrictions and is more convenient and straightforward than Pisinger's algorithm.The attack uses various heuristics based on the connectivity of trips to reduce the number of traces.Unlike our study, the work lacks a detailed and precise threat model, e.g., the information available to an adversary, and its goal is not determined clearly.This work, as opposed to our study, aims to perform an attack by exploiting any ETC data at the adversary's disposal, while our objective is to do an attack using a weaker adversary and using only the minimal ETC data, which makes our attack more applicable and impressive.The main point overlooked in this work is the significance of parameter settings in determining the complexity of the attack, which depends on the hardness of SSP.In contrast, our study examines the impact of various parameters and their settings, including toll price ranges, wallet balances, and number of toll stations, on the difficulty of the SSP problem (see Sections 5.2.3,8).By analyzing the effect of different parameter settings on the SSP's complexity, we obtain valuable insights into selecting parameter settings that can enhance privacy.The presented attack in [7] is just applied to a toy scenario using the synthetic data within a billing period of 10 days, and the authors do not discuss how their evaluation results can be applied to a real-world scenario.We emphasize that the attack in [7] relies on a very strong assumption (which is not justifiable according to [20,37] with respect to our setting).Therefore, we cannot regard it as a suitable baseline for the purpose of benchmarking.The comparison between our work and [7] is summarized in the table shown in Appendix L.
In [34], the authors briefly mention an attack based on the SSP and argue that if the billing period is large enough, it is unlikely for an adversary to learn the toll stations a driver visits.However, unlike ours, their work primarily focuses on designing a PPETC scheme and does not provide a detailed attack to evaluate a driver's privacy.Our study demonstrates, as opposed to their perception, that our attack can achieve a significant success rate even for a onemonth billing period, which is large enough.Similarly, the work [2] briefly acknowledges the possibility of an attack by solving the SSP in PPETC schemes.However, since their aim is to design a PPETC scheme, they do not present any attacks, unlike ours.

CONCLUSION
This research focuses on the issue of driver privacy in post-payment privacy-preserving road-infrastructure based ETC schemes.We offer a counterexample using the case study of Brisbane's ETC system, illustrating that these PPETC schemes are sometimes not sufficient to provide privacy.We present two attacks that employ a subset of all information available to the TSP with real parameter settings of Brisbane's ETC system.The first attack aims to achieve the toll station disclosure goal, and the results show that the attack's  is considerably high, i.e., 94% affecting 61% of total drivers' privacy.The second attack aims to achieve the cycle disclosure goal, where an adversary learns more information.The evaluation shows that the attack's  is high, i.e., 51% for 61% of total drivers.These high s are achievable considering a weak adversary employing a baseline strategy without exploiting any heuristics.Our attack evaluation shows that solving the SSP (i.e., generally NP-complete), upon which the schemes' privacy is built, is feasible.We then present various parameters impacting a driver's privacy, such as toll prices and wallets, and then analyze how different settings of individual parameters impact a driver's privacy.Finally, we give recommendations to toll engineers when deploying privacy-preserving ETC systems.

A NOTATIONS
The notations and acronyms are shown in Tables 3 and 4

B THE TSD ATTACK
We formalize the attack in three steps and present the attack's pseudo-code in Algorithm 1.To find the set of plausible traces of a driver with identity  and its associated wallet , A performs the following steps: Step 1.The adversary creates the following vectors and equation.
• Vector ì : The adversary creates the vector ì  using the function _ (line 3).The function takes as input the set  and the vector's size  that equals the size of the set of toll stations (line 2).The vector's elements represent toll prices.The index  in   ∈ ì  denotes the toll station's identity.
• Vector ì  : It creates the vector ì  (line 4) in which each element is an unknown variable representing the frequency corresponding to the toll station   .The index  in   ∈ ì  indicates the toll station's identity._ ← _ (, ) for  ∈ _ do 9: ← _ () _ ← _ (_) 16: (, _) ← _ (, _) 17: return (, _) 18: end function Step 2. In this step, A creates the set of plausible traces.To this end, the attack uses the function _ that takes  and  as its input (line 7) and solves the equation.For solving, the function uses DOcplex, i.e., IBM Decision Optimization CPLEX Modeling [12], then stores all solutions in the set _ (line 7).
Having obtained the solutions, for each  ∈ _ (line 8), it calls the functions _ , taking  as its input (line 9) and outputs  .The function _ takes , , and  as its input and outputs _, which is the set of visited toll stations.The set _ includes   ∈ , where its corresponding   ≠ 0.Then, it uses the algorithm ℎ_ℎ_ to check the connectivity between the visited toll stations (line 11).The algorithm ℎ_ℎ_ and its pseudo-code are discussed in detail in Appendix C.This algorithm takes _, , the set of intermediate toll stations   , and main toll stations   as its inputs; if it returns , it will fetch another  from the set _ (line 8); otherwise, if ℎ_ℎ_ returns  , it will create the trace using the function _ (line 12), taking  and  as the input, and stores it in the set of plausible traces, i.e., _ = { 1 ,  2 , . . .,   }.Each   ∈ _ is a plausible trace corresponding to a solution of Equation 3. It should be mentioned that a set of plausible traces is empty if Equation 3 has no solution.The correct trace is in the set of plausible traces.
The _ℎ takes _ (the set of visited toll stations) and  as the inputs and creates the induced subgraph  ′ where it removes all vertices that do not belong to the set _ from graph .The induced subgraph includes every edge in the original graph  that only uses vertices from _. The algorithm creates the set  ′  , including the intermediate toll stations in the set _. To this end, it uses the function _ _, which computes the intersection of the sets _ and   and outputs the set  ′  (line 3).For each intermediate toll station in _, its corresponding main toll station/s must exist in the set _; otherwise, the intermediate toll station is unreachable.Hence, for each  _ ∈  ′  (line 4), it creates the set of corresponding main toll stations, i.e.,  ′  .To this end, it uses the __ function, which takes  _ and   as the inputs and outputs the set  ′  (line 5).Then, it checks if the set  ′  is the subset of the set _; if not, the set _ is not valid, and the algorithm returns .Otherwise, if for each intermediate toll station its corresponding main toll stations exist, it checks the connectivity of graph  ′ using the function ℎ_ (line 10).The function checks if the directed graph  ′ is weakly nected, meaning that the underlying undirected graph is connected.We say an undirected graph is connected if it has a path connecting any two vertices [9].The function ℎ_ operates based on the Depth First Search () algorithm.If the graph is connected, it returns ; otherwise, the graph is unconnected, and the algorithm returns . ′ ← _ℎ(_, ) for  _ ∈  ′  do 5: return

D COMPUTATION OF THE SUCCESS RATE
This algorithm computes the success rate of the TSD attack, and its pseudo-code is shown in Algorithm 3. It takes as inputs the driver's identity, namely , and _ obtained from the attack.It creates the _ (see Section 2) of driver  via the function __ (line 2).Then, it checks if the condition _ ∈ _ holds (line 3), meaning that the correct trace exists in the set _ (a set of plausible traces).If not, it sets the  to null (line 6); otherwise, the success rate is computed via the function _ (line 4).It should be noted that the success rate is formally computed in Section 4.1.

E DISTRIBUTION OF SUCCESS RATES
As a concrete example, we illustrate the distribution of s in the first box plot concerning the wallet range [$0, $10].We compute all plausible wallet balances that fall into the range [$0, $10] (see Step 1 in Section 5.2.1), shown in the second row of Table 5.The row includes 93 different tuples ( ,  ), demonstrating the th plausible wallet balance.The third row includes 93 tuples ( , ), demonstrating the  (in percentage) corresponding to the th plausible wallet balance (in the second row).The  equals the average of all s in the second row, i.e., 94% (see Step 2 in Section 5.2.1).The results are shown in Table 5.

F HEURISTIC ALGORITHMS
The details and pseudo-code of the heuristics are discussed in the following.

F.1 The first heuristic
We present a heuristic algorithm that the TSD attack (see Algorithm 1) can apply.The heuristic contributes to a higher success rate for certain settings.The heuristic takes the set of plausible traces _ (obtained by Algorithm 1) and ℎℎ as inputs and outputs the updated set _. The pseudo-code of the first heuristic algorithm is shown in Algorithm 4. To apply the heuristic, the TSD attack should call the heuristic algorithm between lines 14 and 15 shown in Algorithm 1.
Parameter settings  = {, , , 0 <  ≤ 10}, | | = 9,  = 6,  = a month Plausible wallets (   for  ∈ _ do return _ 8: end function The ℎℎ is the maximum number of toll points a driver has visited during a billing period.From the set _, the heuristic algorithm keeps plausible traces with a length equal to and less than the threshold and discards plausible traces with a length greater than the ℎℎ.Note that the length of a plausible trace, i.e., | | = |{( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}|, equals the number of visited toll stations in the set .Hence, for each plausible trace  in the set _, the heuristic checks the condition | | > ℎℎ.If it holds, the algorithm removes the plausible trace from the set by the function .Finally, it outputs the updated set _.
Then, the adversary uniformly guesses the correct trace from the updated set _. Note that this heuristic helps to increase the attack's success rate  = (1 / |_ |) since the length of the updated set _ becomes smaller after removing implausible traces, i.e., the traces with a length greater than the ℎℎ.
The heuristic evaluation.To evaluate the heuristic, we employ the parameter settings discussed in Section 5.1 for Brisbane's ETC system.However, some settings differ from those discussed in Section 5.1.For each range [  ,   ], we only consider those plausible wallet balances where the corresponding updated _ (i.e., the output) in the heuristic algorithm does not become empty for thresholds ranging from 3 to 8.This is because the success rate is not defined for the empty set _ with a length of zero.Note that the set _ will become empty if, for every  in the set, the condition | | > ℎℎ holds (see Algorithm 4).Moreover, we do not perform our analysis for the thresholds 1 and 2 since a large number of the updated _ corresponding to plausible wallet balances become empty, which leaves only a few plausible wallet balances for our analysis.This makes the computation of the metrics  and  inaccurate.We highlight that the threshold values are the potential ones that can be deduced from the statistical data by an adversary in the case study of Brisbane.The details of the parameter settings are presented in Table 6. 3 to 8 and analyze how it impacts .The results are shown in Table 2. Regarding the wallet range 0 <  ≤ 10, the values for  are the same, i.e., 0% for all thresholds.The reason is that the size of the corresponding plausible traces is equal and below 3 for all the thresholds, which does not contribute to reducing the number of plausible traces.Nevertheless, the success rate is already considerably high (94%) prior to applying the heuristic.For the wallet range 10 <  ≤ 20, the  is increasing from 0% to 24.1% (from threshold 8 to 3).For the wallet range 20 <  ≤ 40, the  increases from 0% to 81.33% (from threshold 8 to 3).For the wallet range 40 <  ≤ 60, the  is increasing from 0.27% to 97.21% (from threshold 8 to 3).Overall,  demonstrates an upward trend by decreasing the threshold value.Given a fixed threshold , where 3 ≤  ≤ 8, the  values show a significant increase across all wallet ranges.For instance, when the threshold is set to 3, the  values are 0%, 24.1%, 81.33%, and 97.21% (see the second column).This increasing trend can be attributed to the fact that larger wallet balances encompass a greater number of plausible traces, many of which are ultimately discarded by the heuristic.

Impact of the heuristic on
Impact of the heuristic on .We analyze how the heuristic, with threshold , impacts  for each wallet range [  ,   ].We remind that  is the average of all success rates corresponding to all plausible wallet balances within the range [  ,   ].The results are shown in Table 2.For each wallet range [  ,   ], we vary the threshold from 3 to 8 and analyze how it impacts  by discarding implausible traces.Furthermore, we calculate the "percentage increase" to quantify the extent to which the ASR, associated with threshold 3, has increased compared to the ASR without applying any heuristic.For the wallet range 0 <  ≤ 10, the  has the same value of 94%, associated with all thresholds.This is because the size of the corresponding plausible traces is equal and below 3 for all the thresholds, which does not contribute to reducing the number of plausible traces.For the wallet range 10 <  ≤ 20 (for thresholds from 8 to 3), the  is increasing from 51.16% to 65.84%, resulting in a percentage increase of 65.84−51.1651.16 ≈ 29%.For the wallet range 20 <  ≤ 40, the  increases from 4.69% reaching to 22.05%.This results in a percentage increase of 22.05−4.694.69 ≈ 370%.For the wallet range 40 <  ≤ 60, the  increases from 0.15% reaching to 5.2%, leading to the percentage increase of 5.2−0.150.15 ≈ 3367%, which is significantly high.Overall,  shows an upward trend by decreasing the threshold.Besides, the values of percentage increases demonstrate that as the wallet ranges get larger, their corresponding "percentage increase" significantly increases.This is because the plausible wallet balances within a larger wallet range lead to a larger number of plausible traces, many of which are discarded by the heuristic, leading to a relatively large .

F.2 The second heuristic
The heuristic algorithm takes a set of plausible traces and a distribution function as inputs.The algorithm outputs a set of plausible traces, including the traces and their corresponding probabilities.The pseudo-code of the heuristic is shown in Algorithm 5. To apply the heuristic, the TSD attack should call the heuristic algorithm between lines 14 and 15 shown in Algorithm 1.The algorithm performs as follows.For each trace in the set of plausible traces, it computes the probability  using the distribution function  ().The distribution function describes the probability that a given trace is the correct one.The function takes the number of visited toll stations in a trace, i.e., | | and outputs the corresponding .Using the function _, the algorithm assigns the probability  to the , creating the tuple (, ).Then, it stores the tuple in the set .The adversary may use different strategies to select the correct trace in the set .For example, it can select the trace with the highest probability (i.e., non-uniformly).If two or more traces are assigned the same highest probability, the adversary can uniformly select one of them.

F.3 The third heuristic
The third heuristic algorithm takes as inputs , , , and the list of yearly wallet balances (_) associated with a driver.Then, it outputs a set of tuples, including a cluster and its corresponding probability.Each of the clusters includes the potential traces a driver might have made within a year.The pseudo-code of the heuristic is shown in Algorithm 6.

Algorithm 6
The third heuristic algorithm Input: { , , , _} Output: _ _ 1: function third_heuristic_algo({, , , _}) for  ∈ __ do _ ← _ ( ) for  ∈ _ do 8: _ ← _ ( ) _ ← _ 10: end for 11: _ ←  (_) 12: for  ∈ _ do 13: (, ) ← _ (, _) 14: _ _ ← (, ) 15: end for 16: return _ _ 17: end function The algorithm performs as follows.For each wallet  in the list _, it computes the corresponding set of plausible traces via the TSD attack (see Algorithm 1).Then, it stores the set of plausible traces in the list  .Using the function _, the algorithm creates all possible clusters, where each cluster includes a different permutation of traces, each of which belongs to a different set of plausible traces in the set  .For creating the permutation of traces, the algorithm uses the Cartesian product of the sets in  .The made clusters will be stored in the set _.For each  in the set _, the algorithm computes the similarity distance among the traces in the cluster using the _ function.Then, the algorithm stores _ to the set of similarity distances, i.e., _. The Euclidean distance between two plausible traces,  1 = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )} and  2 = {( 1 ,  ′ 1 ), ( 2 ,  ′ 2 ), . . ., (  ,  ′  )}, is calculated using the below formula: Note that each pair of frequencies inside the parentheses corresponds to the same toll station  .The formula indicates that if the frequencies within each pair of parentheses are close to each other, the distance will be closer to zero.
The function  takes the set of similarity distances and normalizes its elements to the corresponding probabilities stored in the list _.Each value in the list is the probability that the driver could have made the traces inside the  .Then, for each cluster in the set _, the algorithm via the function _ assigns the cluster to its corresponding probability  and outputs the tuple (, ).Then, the algorithm stores the tuple (, ) to the set _ _.Finally, the adversary may use different strategies for selecting the correct cluster from the set _ _.By the correct cluster, we mean a cluster including the correct traces of a driver made within a one-year billing period.One example of a strategy is that the adversary selects a cluster with the highest assigned probability (i.e., non-uniformly).
If two or more of the clusters have equal highest probabilities, the adversary will uniformly select one of the clusters.It should be noted that the cluster with the highest probability may not always be the correct one.This is because, inside the cluster, there could be plausible traces where there is a coincidental similarity among them.With this heuristic, the adversary achieves a stronger goal than the TSD goal.By stronger, we mean that the adversary learns all plausible traces associated with a driver within a year.

G IMPACT OF PARAMETERS ON PRIVACY
We analyze the impact of different settings for each parameter on a driver's privacy.To achieve this, we vary the settings of a specific parameter, such as toll prices, while keeping the settings for the remaining parameters fixed.We then observe how the success rate changes with respect to the different settings.Table 7 displays the settings for the parameters we are analyzing.
Parameter .To analyze the impact of toll prices on a driver's privacy, we consider ten different ranges of toll prices as [1, ], 1 ≤  ≤ 10, where  ranges from one to ten, creating a diversity of range spans.We consider different wallet ranges, [  ,   ], shown in Figure 2, and for each wallet range, we perform the following experiment to compute the  explained in Section 5.2.1.For each toll price range [1, ], we uniformly randomly select nine toll prices from the range [1, ] (one price for each toll station in Brisbane) and assign them to the corresponding toll stations.Then, we compute the .We repeat this experiment 50 times, each with different selected toll prices from the range [1, 𝑗].Repeating experiments multiple times reduces the impact of random fluctuations, enabling us to observe a range of results and determine the average outcome.This provides more reliable and accurate s.Then, we record the resulting s in a set that results in a box plot (in green) for the toll price range [1, 𝑗].The s corresponding to the wallet ranges [$1, $10], [$10, $20], and [$20, $40] are demonstrated in Figures 6a, 6b, and 6c respectively.The second row in Table 7 shows all settings for the parameter .
Parameter .To analyze the impact of the wallet balance on a driver's privacy, we consider the three different wallet ranges discussed in Figure 2. Then we compute the success rate (see Section 4.1) for all plausible wallets within each wallet range [  ,   ].The red points in Figure 6d show the success rate for all plausible wallets in the range [$0, $10].The success rate associated with most wallets is 100%, and for some wallets are 50% and 33%.The blue points correspond to all plausible wallets in the range [$10, $20].The density of blue points demonstrates that for the significant number of wallet balances, the success rates are 100%, 50%, and 33%, and for the rest, the success rate is between 6% and 33%.The green points concern all plausible wallets in the range [$20, $40].For the wallets between $20 and $25, the success rate is between 3% and 100%, and for the wallets between $25 and $40, the success rate is between 0% and 20%.The values for the toll prices and the length of the billing period are fixed.The parameter settings are summarized in Table 7.
Parameter  .To analyze the impact of the length of the billing period ( ), we consider different lengths, i.e., from one week to eight weeks.In each billing period, drivers' wallet range can be airport.Or, given that the driver visited a toll station close to an industrial area, he might have exited the road to work in a factory.
The computational complexity of Algorithm 7. The complexity of the algorithm depends on the complexity of function  _, which varies by the strategy the function uses.For example, we compute the complexity considering that the strategy is the shortest distance, i.e., selecting a route that minimizes the overall distance traveled to reach a destination.In this case, the function  _ is a variant of the algorithm used for solving the traveling salesman problem.This variant asks the following question: "Given a set of cities and the roads connecting them, what is the shortest cycle that visits each city at least once and returns to the origin city?" [19].The TSP is NP-hard for which exact and approximate algorithms exist [29].The complexity of an exact algorithm using dynamic programming is , where | ′ | is the graph order, i.e., the number of vertices in graph  ′ .Although the complexity is exponential in the length of | ′ |, the exact algorithms can efficiently compute the cycle in our application as the graph order is very small.The graph order, in our case, equals the number of visited toll stations by a driver in a city, which is typically a small number.The example in Appendix K uses this strategy, i.e., the shortest distance.

I SSP-CD ATTACK
We introduce the SSP-CD attack to achieve the CD goal (the pseudocode is shown in Algorithm 8).We employ a similar idea used to accomplish the TSD goal where the idea was to solve the SSP to obtain the set of plausible traces, where a plausible trace is defined as  = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )} (see Section 2).
Concerning the CD goal, similarly, A first needs to find the set of plausible traces of a driver (with the wallet ) defined as  = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}, where   is the cycle and   is its corresponding frequency.Then, A uniformly guesses the correct trace from the set.Hence, to create a plausible trace, it needs two items: (1) the set of cycles  = { 1 , . . .,   } and (2) frequencies, i.e.,  = { 1 , . . .,   }, for which it performs the following steps: (1) To obtain the set of cycles  considering graph , A computes all different combinations of toll stations that could potentially be constituents of a cycle.To this end, A stores the graph's nodes to the set  using the function _, taking  as its input (line 2).Then, it uses the function _, taking  as the input, and creates the set of all different combinations of toll stations that can be made, which is stored in _ (line 3 (2) To obtain  , it needs to solve the SSP, for which it creates the following linear diophantine equation.The adversary, using the function _, creates a similar equation as Equation 3, where the cycle price  is used instead of the toll price  (line 11).The equation is as follows: In Equation 10, the cycle price   is the summation of  toll prices   ∈  corresponding to  toll stations along the cycle   , which is computed as   =  =1   .The set of cycle prices is denoted as Π = { 1 ,  2 , . . .,   }.The interpretation of Equation 10 is that the summation of the prices of the cycles made by a driver in a billing period results in .Each   , in the equation, represents the frequency   that the driver made the cycle   .Then, A solves the equation via the function _ and stores the solutions in the set _ (line 12)._ ← _ (, ) 13: for  ∈ _ do (, _) ← _ (, ) 19: return (, _) 20: end function The issue with this idea is that the total number of different cycles (which equals the number of variables n Equation 10) in a city's graph grows exponentially in the size of | |.This is because the number of cycles is correlated with the number of combinations of toll stations (see the first step) obtained by the following formula.The exponential number of variables in Equation 10 makes solving the equation computationally infeasible.While this idea may be viable for a small number of toll stations in a sparse graph, it is generally infeasible for a large number of toll stations and a dense graph.Hence, in Section 6.1, we present a new idea that exploits the TSD goal so as to achieve the CD goal without needing to create and solve Equation 10.
I.0.1 The computational complexity of the attack: The complexity mainly depends on two factors: the complexity of the algorithm  __ to obtain the cycles and the complexity of solving Equation 10.We consider  as a complete undirected graph and compute all Hamiltonian cycles in the graph, resulting in the worst-case number of cycles.To obtain the total number of cycles, for each combination  ∈ _ (including  toll points), we compute the number of Hamiltonian cycles, which is (−1)! 2 [18].Based on Formula 11, the total number of combinations is computed as | |   .Therefore, the total number of cycles in  is equal to the number of combinations multiplied by the associated number of cycles, which is (−1)! 2 , as given by the following formula.Formula 12 shows the total number of Hamiltonian cycles in , which is at least exponential.For example, given  including 10 toll stations, the total number of Hamiltonian cycles equals 556014, which is the number of variables in Equation 10.This number of variables makes solving the equation computationally infeasible.In Section 6.1, we present a new idea to handle the infeasibility problem of solving Equation 10 due to many variables.

J THE CD ATTACK
We discuss in detail the CD attack presented to achieve the CD goal.The attack takes as inputs the sets ,  , and outputs the correct trace guessed uniformly by A. The pseudo-code is shown in Algorithm 9.The full example of the attack is illustrated in Appendix K.The main idea of the attack is that A exploits a driver's correct trace, obtained by the TSD attack, to create the corresponding plausible traces, including cycles and their associated frequencies.In six stages, we explain the CD attack.
(1) Create the multiset.The CD attack executes the TSD attack to obtain the correct trace of driver  denoted as _ = {( 1 ,  1 ), . . ., (  ,   )}, including the visited toll stations and their associated frequencies.Given the _, A creates a multiset, namely __ out of the correct trace including all the visited toll stations.Each toll station   included in the correct trace is repeated   times in the multiset.For example, if we consider _ = {( 1 , 2), ( 2 , 3)}, it results in the multiset __ = { 1 ,  1 ,  2 ,  2 ,  2 }.A utilizes the function _ (line 3) to generate the multiset.

Figure 2 :
Figure2: The proportion of drivers across wallet ranges.[40] illustrates the distribution of success rates corresponding to all plausible wallet balances within the range of [  ,   ].The figure shows that almost all of the success rates (s) within the range of [$0, $10] are 100% (shown in the first box plot).For the range of [$10, $20], half of the success rates (s) are between 50% and 100%, and the other half are between 7% and 50% (shown in the second box plot).Finally, for the range of [$20, $40], almost all of the success rates (s) are below 12% (shown in the third box plot).As a concrete example, we illustrate the distribution of s in the first box plot in Appendix E.

Figure 3 :
Figure 3: Each box plot shows the distribution of s across all plausible wallet balances within the range [  ,   ].

Figure 4 :
Figure 4: The number of plausible traces and runtime are two parameters impacting the attack's effectiveness.The value of these parameters increases significantly as the balance of the wallets gets larger.
of [  ,   ].The figure shows that half of the success rates (s) corresponding to the range of [$0, $10] are nearly 50%, while the other half fall between 10% and 50% (shown in the first box plot).For the range of [$10, $20], half of the success rates (s) fall between 5.50% and 35% (shown in the second box plot).Finally, for the range of [$20, $40], a few of the success rates (s) are below 10%, and the rest are close to zero (shown in the third box plot).

Figure 5 :
Figure 5: Each box plot shows the distribution of s across all plausible wallet balances within the range [  ,   ].

Figure 6
shows the impact of settings of different parameters on the success rate.The red, blue, and green points/graphs concern the wallet ranges [$0, $10], [$10, $20], and [$20, $40], respectively.The results are summarized as follows: • Parameter : Each of the toll price ranges [1, ], 1 ≤  ≤ 10 on the x-axis of Figures 6a, 6b, and 6c indicate the span from which toll prices are chosen and assigned to the respective toll stations in Brisbane.Figures 6a, 6b, and 6c demonstrate that the  increases as the toll price range gets larger.Now, for comparing the box plots, we consider their median.
Figure 6d shows that drivers (above %75 of total drivers) with wallet balances below $35 are more at risk of a privacy violation than those with wallets between $35 and $40.• Parameter  : Figure 6e shows that a short billing period leads to a high .Even for a relatively long billing period of eight months (two months), the attack has  of about 10%, which is considerable.• Parameter | |: Figure 6f illustrates that the attack's  remains high for a low number of toll stations.While the  decreases as the number of toll stations increases, it is still significantly high, approximately 60%, for 20 toll stations and the wallet range [$0, $10] (as shown by the red graph).

Figure 6 :
Figure 6: The figures demonstrate how different settings of a parameter, including toll prices, wallet balances, billing period length, and the number of toll stations, impact the attack's  or success rate and, accordingly, a driver's privacy.

( 1 )
In Step 1, it creates vectors ì , ì  , and accordingly, the linear equation .(2) In Step 2, it solves Equation  to create all plausible traces associated with the driver.(3) In Step 3, A selects a plausible trace (as the correct trace) uniformly from the set of plausible traces.

( 3 )Algorithm 8 2 : 3 : 4 : 5 :
For each solution  in the set _, A creates the set of frequencies  (line 14).Having obtained the sets  (from the previous steps) and  , the adversary computes the plausible trace and stores it in the set __ (line 15).Note that one solution of Equation 3 leads to a plausible trace, and since the equation may have more than one solution, this leads to a set of plausible traces as __.Finally, A guesses the correct trace uniformly from the set __ (line 17).The SSP-CD attack Input:  = {, , , },  Output: (, _) 1: function SSP_CD_attack(,  )  ← _ () _ ← _ () for  ∈ _ do  ←  __ (, , ℎ  , )

Table 2 :
The metrics  and  indicate the impact of the first heuristic on the reduction of plausible traces and the average success rate, respectively, during a one-month billing period.