Ruffle: Rapid 3-party shuffle protocols

Secure shuffle is an important primitive that finds use in several applications such as secure electronic voting, oblivious RAMs, secure sorting, to name a few. For time-sensitive shuffle-based applications that demand a fast response time, it is essential to design a fast and efficient shuffle protocol. In this work, we design secure and fast shuffle protocols relying on the techniques of secure multiparty computation. We make several design choices that aid in achieving highly efficient protocols. Specifically, we consider malicious 3-party computation setting with an honest majority and design robust ring-based protocols. Our shuffle protocols provide a fast on-line (i.e., input-dependent) phase compared to the state-of-the-art for the considered setting. To showcase the efficiency improvements brought in by our shuf-fle protocols, we consider two distinct applications of anonymous broadcast and secure graph computation via the GraphSC paradigm. In both cases, multiple shuffle invocations are required. Hence, going beyond standalone shuffle invocation, we identify two distinct scenarios of multiple invocations and provide customised protocols for the same. Further, we showcase that our customized protocols not only provide a fast response time, but also provide improved overall run time for multiple shuffle invocations. With respect to the applications, we not only improve in terms of efficiency, but also work towards providing improved security guarantees, thereby out-performing the respective state-of-the-art works. We benchmark our shuffle protocols and the considered applications to analyze the efficiency improvements with respect to various parameters


INTRODUCTION
Shuffle is a technique of rearranging the elements of an ordered set. Performing a shuffle in a privacy-preserving manner entails randomly permuting the elements of the ordered set while ensuring * This work was done during the author's affiliation at Indian Institute of Science. that the permutation, as well as the elements in the ordered set, are not known on clear. Secure shuffle finds wide-spread use as a primitive in various applications such as electronic voting [24,44], secure sorting [25,26], oblivious RAM [7,12], GraphSC paradigm [6,43], anonymous broadcast [20], to name a few. Several works in the literature [6,13,20,34,35] provide a secure shuffle protocol using the cryptographic technique of secure multiparty computation (MPC). This technique enables a set of parties to jointly compute a function on their private inputs while guaranteeing that no subset of at most < parties, controlled by an adversary, learns anything other than the function output.
An essential factor to be considered when designing a secure shuffle protocol is its response time (which accounts for the time taken from submission of the input, it's processing, to delivery of the output). To minimize the response time, we focus on designing secure shuffle protocols in the preprocessing paradigm, which allow offloading heavy input-independent computations to a preprocessing phase, thereby obtaining a very fast input-dependent online phase. Although secure shuffle is used in various applications, we use the representative example of anonymous broadcast to motivate the need for a fast online phase (quick response time). An anonymous broadcast system allows a set of clients to broadcast their messages such that none learns about the association between a message and the identity of its sender. Such a system finds application in use cases like live anonymous polling/feedback. Further, note that the output is required in real-time due to the live nature of the event. For such time-sensitive applications, it is important to have a system which provides a fast response time. Since secure shuffle forms an integral part of anonymous broadcast, a fast protocol for shuffle is essential to facilitate a fast anonymous broadcast system. A similar argument applies to other real-time applications as well where secure shuffle is used, such as the GraphSC paradigm [6,43] to securely evaluate breadth first search (BFS) for contact tracing, PageRank for fraud detection, etc. To further enhance the overall efficiency, we consider working in the small-party setting that is known to provide efficient, customized solutions [15,19,29,30,39,46,47], following the footsteps of prior works on shuffle such as [6,20]. We make several other design choices to enhance the efficiency of the designed secure shuffle protocols. Our contributions are detailed next which elaborate on these choices.

Our contributions
Keeping efficiency in mind, we design ring-based maliciously secure shuffle protocols in the threshold-optimal setting of 3-party computation (3PC), assuming an honest majority (i.e., < /2). We design our protocols in the preprocessing model and focus on attaining a fast online phase. Our protocols are robust and provide the security of guaranteed output delivery (GOD, also known as robustness) where honest parties are guaranteed to receive the correct output regardless of any adversarial behaviour. We note that this security guarantee is achieved at no additional (amortized) cost in comparison to weaker security notions such as security with fairness 1 or abort security 2 .
We showcase the efficiency improvements brought in by our shuffle protocols for the specific applications of anonymous broadcast and GraphSC paradigm. In the process, we identify the need to handle multiple invocations of shuffle and hence design optimizations that are tailor-made to cater to this. We also work towards improving the security guarantees offered in the considered applications.
Secure shuffle. The shuffle protocol takes as input the secret shares of the elements of the ordered set that require to be shuffled. The output comprises secret shares of the shuffled elements. The random secret permutation used for shuffling is defined during the run of the protocol. The works of [6,20] provide a secure shuffle protocol that offers the weaker notion of abort security 3 . We design a new shuffle protocol, Ruffle, whose highlight is the improved online efficiency in comparison both [20] and [6]. At a high level, Ruffle leverages the secret sharing semantics to offload the inputindependent computations to a preprocessing phase. This allows achieving a super fast online phase by restricting computations to be performed only on the input-dependent shares. In fact, Ruffle also has a better overall run time than the shuffle protocol of [20] owing to our better round as well as communication complexity. Finally, we note that Ruffle is designed to offer the improved security guarantee of robustness in comparison to prior works.
Anonymous broadcast. Recall that the application allows clients to securely shuffle their input messages. When realizing this via MPC, the clients rely on a set of servers to perform the secure shuffle on their behalf. Note that an anonymous broadcast system can run perpetually, i.e., client messages are received continuously, and the system is responsible for shuffling every consecutive set of well-formed messages. Thus, an anonymous broadcast system in fact requires multiple sequential invocations of shuffle which can be captured by the following generic scenario.
Let T 1 , T 2 , . . . , T be ordered sets that are required to be shuffled under random secret permutations, say 1 , 2 , . . . , , respectively. Consider the scenario where these shuffles are performed sequentially such that +1 (T +1 ) is invoked after (T ), and T +1 is independent of (T ) 4 . We refer to this as Independent-Shuffles scenario where multiple independent shuffles are required with the constraint that they are invoked sequentially.
While Ruffle is designed to handle the case of a single shuffle invocation, we extend it and design Ruffle-1 to handle the scenario of Independent-Shuffles. Ruffle-1 is designed to leverage the independence of the shuffles in Independent-Shuffles, to facilitate performing the necessary preprocessing steps in parallel. Our use of Ruffle-1, allows us to design a more efficient shuffle-based anonymous broadcast system in comparison to the state-of-the-art system of Clarion [20]. Further, the drawback of Clarion is that it does not offer the property of censorship resistance. This property guarantees that a malicious server should not be allowed to discard an honest client's message by claiming it to be malformed. Hence, apart from improving efficiency, our system also guarantees censorship resistance. Our system also aims to minimize communication/computation overhead at client.
GraphSC paradigm. This paradigm [6,43] provides a highly efficient and scalable solution for securely evaluating graph algorithms. Unlike the case of Independent-Shuffles, this paradigm requires performing a composition of shuffles which can be captured by the following generic scenario.
Unlike the previous scenario of independent shuffles, in this case we are interested in determining the composition of shuffles such that T = ( −1 (. . . 1 (T))), where T is the input to be shuffled. Such a composition of shuffles generates a sequence of intermediate shuffled sets, where the th ordered set is denoted as T = (. . . 1 (T)). In this way, the composition of permutations induces a sequential nature to the shuffle invocations with +1 (T ) being invoked after (T −1 ) since T = (T −1 ). We refer to this as the Composed-Shuffles scenario where the permutations are required to be composed such that the output of one shuffle invocation is fed as the input to the next 5 .
Recall that Ruffle-1 is designed to leverage the independence of the shuffles to facilitate parallel preprocessing. This is in contrast to Composed-Shuffles where the shuffles are no longer independent. Thus, Ruffle-1 is not apt for Composed-Shuffles, and hence we design Ruffle-2 to specifically cater to this scenario. Ruffle-2 strategically breaks the sequential dependence on shuffles in the preprocessing. This enables performing the preprocessing phase in parallel for the shuffles. Although Ruffle-2 can be used in the scenario of Independent-Shuffles, we note that the design of Ruffle-2 for breaking the dependency in the preprocessing comes at the cost of slightly increased preprocessing communication compared to the preprocessing of Ruffle-1. Hence, the use of Ruffle-1 is apt for Independent-Shuffles and Ruffle-2 for Composed-Shuffles. We showcase the improvements that can be attained in the GraphSC paradigm of [6] via the Ruffle-2. Note that our shuffle protocols enable reusing and inverting the underlying secret permutation, which is an essential property needed in applications such as GraphSC paradigm.
. In summary, the current work provides efficient solutions for secure shuffle while accounting for different scenarios. A comparison of our shuffle protocols with that of [6,20] is given in Table 1. Since all protocols have a common structure for the online 5 The scenario can indeed be generalized such that +1 can be invoked on some function of (T ), rather than on (T ) itself.
: number of elements to be shuffled, where each element is an ℓ-bit string; (= 48): statistical security parameter; : order of field. [20] uses a 128-bit field ‡: Although [20] does not have an explicit preprocessing phase, we observe that the shuffle correlation and other randomness can be preprocessed. Hence, we explicitly distinguish between preprocessing and online to provide a fair comparison. *: The preprocessing for [6] only involves the generation of randomness, non-interactively. ‡ ‡: See §B for a discussion on security guarantees of [6]. †: The communication for verification comprises broadcasting 2 hashes and 2 bits, the cost of which gets amortized over multiple shuffle instances. * * : Ruffle-2 for Independent-Shuffles additionally requires communicating 3 ℓ bits. * * * : Ruffle-1 for Composed-Shuffles instead requires (5 + log 2 ) rounds. phase that comprises steps for semi-honest shuffle followed by its verification, the cost of these is reported in Table 1. The improved online phase of both, Ruffle-1 and Ruffle-2, supplemented by their parallel preprocessing in their respective scenarios, results in both of them improving in terms of overall run time in comparison to shuffle protocols in [6,20] for multiple shuffles (i.e., ≥ 2). Further, as shown in the table, Ruffle-1 becomes prohibitively expensive for the Composed-Shuffles case, because it incurs an factor inflation in the preprocessing round complexity (see the highlighted entry). Similarly, Ruffle-2 is inapt for Independent-Shuffles, due to the inflation of 3 ℓ bits in its preprocessing communication complexity (see the highlighted entry). The complexity of Ruffle is captured by the complexity of Ruffle-1 when = 1.
Benchmarks. We benchmark the performance of our shuffle protocols, Ruffle, Ruffle-1 and Ruffle-2. We establish how the protocols, Ruffle-1, Ruffle-2 are apt for their respective scenarios of Independent-Shuffles, Composed-Shuffles. Further, we showcase the improvements brought in by our shuffle protocols in the applications of anonymous broadcast and securely evaluating BFS in the GraphSC paradigm. The summary of the improvements is: • Solitary shuffle. When considering a single invocation, Ruffle improves over [20] and [6] in the online run time. Further, with respect to [20], Ruffle is also better in terms of overall communication as well as overall run time.
• Multiple-sequential shuffles. By considering multiple sequential shuffles, we establish the improvements of our shuffle protocols with respect to overall run time in comparison to [6] 6 . Beginning with as low as two sequential shuffle invocations, both Ruffle-1 (for Independent-Shuffles) and Ruffle-2 (for Composed-Shuffles) outperform [6].
• Anonymous broadcast. The server-side complexity of our anonymous broadcast system outperforms [20] in every aspect. The clientside computation also sees improvements. The improvements we observe is not only attributed to our shuffle protocol but also to the improvements we bring in to the other components of the system. • BFS via GraphSC. Our implementation of secure BFS evaluation in the GraphSC paradigm outperforms that of [6]. Further, we also 6 We stick to comparing with [6] since the shuffle in [6] outperforms that in [20].
showcase how the performance of our BFS varies with the number of processors in the multiprocessor setting, described in [43]. Unlike in anonymous broadcast, here the reported gain is only due to the improved shuffle protocol.
Organization. The rest of the paper is organized as follows. §2 describes the related work and §3 provides the preliminaries. §4 describes our shuffle protocol, Ruffle. This is followed by the applications of anonymous broadcast and GraphSC paradigm in §5, which also describe the protocols Ruffle-1 and Ruffle-2. The benchmark results appear in §6, followed by conclusion in §7. Additional preliminaries appear in §A. Details of the shuffle protocol of [6] are recollected in §B. This is followed by additional details of anonymous broadcast and BFS in GraphSC paradigm in §C, §D, respectively. Additional benchmark details and security proofs are provided in §E and §F, respectively.

RELATED WORK
One of the first techniques proposed for shuffling is that of mix networks (or mix-nets) [17,22,49,52]. It comprises a sequence of mixes, where each mix receives a set of messages, shuffles them, and forwards them to the next mix. Unlinkability of a message to its sender is guaranteed if at least one mix is honest. To guarantee security when a mix is malicious, it must be ensured that a verifiable shuffle is performed by each mix [1,3,8,27,44]. This further adds to the expense of a mix-net. Not only are mix-netbased solutions computationally expensive, but they are also vulnerable to traffic analysis attacks [48,50]. Hence, several works in the literature explore MPC-based techniques for secure shuffling [23,28,35,38,41,42]. Some of these solutions rely on securely performing sort [35,42], while some others consider securely evaluating a permutation network [23,28,38,41]. These techniques require at least (log ) rounds for shuffling elements, which proves to be expensive for time-sensitive applications. The works of [6,20] which appeared concurrent to each other consider performing a 3PC shuffle protocol in the honest majority setting. In the semi-honest 3PC honest-majority setting, [6] presents a shuffle protocol which is an adaption of the shuffle protocol of [35] to the 3-party setting. This semi-honest protocol requires three rounds of interaction. Note that, [6] contributes to making this semi-honest protocol secure in the presence of a malicious adversary by augmenting with a verification phase to ensure the correctness of the semi-honest shuffle, which additionally requires 2 + log 2 rounds. Further, [6] also provides a 2 round semi-honest protocol but leaves open the question of attaining malicious security for the same. Clarion [20] also gives a 2-round 3PC honest-majority shuffle protocol which builds on the semi-honest 2-party protocol of [13]. To guarantee malicious security, they add integrity checks by having MACs appended to the elements to be shuffled. The resulting maliciously secure protocol requires 6 rounds overall. Clarion also extends its shuffle protocol to the -party dishonest majority setting that guarantees malicious security, which additionally requires maliciously secure OTs (oblivious transfer) in the preprocessing phase. It improves over the protocol in [37] in terms of efficiency, however, it lacks in terms of security guarantees where the latter provides fairness in the preprocessing phase and GOD only in the online phase, for the setting of < /3.

PRELIMINARIES
We work in the 3-party setting. Let P = { 0 , 1 , 2 } denote the set of three parties that are connected via a pairwise private and authentic channel. Let A be a static, probabilistic, polynomialtime adversary which corrupts at most one party maliciously. Our protocols are proven secure against a computationally bounded A in the standalone simulation-based security model of MPC, using the real-world/ideal-world simulation paradigm [36]. Parties use a one-time key setup [9,14,29,39,47] to establish common random keys for a pseudo-random function (PRF) between them. This is modelled as a functionality F setup (Fig. 7). This enables each subset of parties to non-interactively sample a common random ℓ-bit string v ∈ Z 2 ℓ . Parties also have access to a collision-resistant hash function, H(·), and a non-interactive commitment scheme, Com(·). Formal details appear in §A.
Secret sharing semantics. Inspired from [29,51], we use the following secret sharing semantics.
Non-interactively generating [·]-shares of a common v ∈ Z 2 ℓ held by , . To generate [v], parties need to define three shares is held by parties , ∈ P. Observe that this can be consists of all possible (bijective functions) rearrangements of elements in N and hence comprises ! permutations. Note that permutations can be composed similar to composition of functions, and thus forms a group with respect to composition (•) operation.
satisfies group properties of closure, associativity, and presence of identity. However, permutations are not commutative under composition but are invertible.
Sampling a random permutation denotes choosing a random ∈ . We next describe how parties , can do this noninteractively using the shared key established via F setup . , noninteractively generate common random values say v 1 , v 2 , . . . , v ∈ Z 2 ℓ where ℓ >> log 2 . The parties tag each of the values v with its index to obtain a list = {(v , x )} =1 , where x = . Each party then locally sorts this list of tuples based on the first entry v of each tuple to obtain a sorted list Joint message passing (jmp) primitive [29]. This primitive allows two parties to deliver a common message to a third party where one sender sends the message while the other sends its hash to the receiver. In the process, either the recipient receives the correct message or, if there is an inconsistency in the received messages, parties instead proceed to identify a trusted third party (TTP) 7 . The TTP is then responsible for performing the required computation on clear and guarantee delivery of output. Several works [19,29,30] rely on this primitive or its variation to ensure GOD. We let " , jmp v to " denote invocation of jmp with , as senders, as receiver, and v being the message to be sent. Formal protocol for jmp appears in Fig. 8.
Output reconstruction [29]. To enable reconstruction of a ·shared value v ∈ Z 2 ℓ , parties proceed as follows. During the preprocessing phase, in addition to generating [ v ], parties also generate commitments on each of the [·]-shares of v . Looking ahead, these commitments aid in guaranteeing the correct reconstruction of v in the online phase. To generate the commitments, each pair , ∈ P computes Com([ v ] ) on the value [ v ] using the common randomness. , jmp Com([ v ] ) to . After the jmp invocations, it is guaranteed that each party in P either possess the correct commitment on each [·]-share of v , or a TTP is identified and subsequent computation proceeds via the TTP 8 . Next, in the online phase to reconstruct v, observe that each party misses the share [ v ] which is held by the other two parties , ∈ P \ . Hence, , send the opening of Com( [ v ] ) to . Since at most one party among , can be malicious, even if the malicious party sends an incorrect opening, is guaranteed to receive the correct opening from the honest party (the correct opening can be identified owing to the property of the commitment scheme which outputs a ⊥ for incorrect ones). Party uses the correct opening to obtain the missing share . Thus, reconstruction will not fail if a malicious party tries to disrupt it by sending an incorrect message, resulting in robust reconstruction.
On the security of our protocols. While our protocol provides the strongest security of robustness, we note that depending on the application scenario, one may choose the desired level of security. Specifically, robustness is attained by relying on a TTP to carry out the computation on the honest party's inputs (in the clear) if misbehaviour is detected. Hence, if the application under consideration cannot tolerate revealing the inputs to a TTP even though the TTP is known to be honest, the application can settle for the weaker security notion of fairness (which is stronger than abort security achieved in prior systems). The fair version of our protocols can be derived from the robust version by making the following changes-(i) use of the fair version of jmp instead of the robust version, (ii) terminating the protocol when a party aborts instead of proceeding with TTP identification and, (iii) relying on a fair reconstruction protocol. We remark that even for this weaker security notion of fairness, our protocols are on par with the robust protocols in terms of efficiency, and hence, are more efficient than prior works. Finally, we would like to note that an alternative to the TTP-based approach of achieving robustness, is the recent notion of security with friends and foes proposed in [5]. This notion allows attaining robustness without relying on the TTP. We refer an interested reader to [5] for further details pertaining to the same.

3PC SHUFFLE
We begin with defining the ideal functionality for shuffle in Fig. 1. Let a table T denote a set of ordered rows where each row consists of an ℓ-bit string. Let denote the size of T or the number of rows in T. Secure shuffle operation takes as input · -shares of table T, i.e., · -shares of each of the ℓ-bit string that constitutes a row in T. The output is random · -shares of a table T , which consists of rows of T in a randomly permuted order.
Without loss of generality, let ∈ P denote the party corrupted by adversary S. F Shuffle interacts with parties in P and S. It receives as input · -shares of the input table T from all parties. Let T denote the randomly shuffled input table. F Shuffle also receives from S its · -shares of T , i.e. it receives To , To , To where , , denote parties in P. F Shuffle proceeds as follows.
• Reconstruct input T using · -shares of the honest parties.
• Sample a random permutation from the space of all permutations, and generate T = (T).
Functionality F Shuffle

Ruffle
Given that the input table T is · -shared, there exists a T , T ∈ Z 2 ℓ such that T = T ⊕ T is held by all parties in P, and T is [·]-shared, 12 where , ∈ P hold [ T ] ∈ Z 2 ℓ . Let be the random permutation used to shuffle the rows of T. Observe that, T = (T) = ( T ⊕ T ) = ( T ) ⊕ ( T ). To respect the · -sharing semantics for T , we require Observe, however, that this approach leaks the secret permutation to all the parties, since they all hold T and will now also hold ( T ) on clear, from which one can recover . To keep private, we observe that it suffices to mask ( T ) with some randomness R ∈ Z 2 ℓ , and hence, define this masked value as T o , i.e., T o = ( T )⊕R. Further, to ensure that the relation Looking ahead R gets defined during the generation of T o . Thus, in what follows, we first describe steps to generate [ ( T )], followed by steps to generate T o and then [R].
where is a random secret permutation (independent of T), can be generated during preprocessing. For this, we employ the protocol of [6]. The protocol takes as input [·]-shares of a table, and outputs [·]-shares of the table shuffled using a random secret permutation . It also outputs a flag that indicates correctness of [·]-shares of shuffled table 9 . At a high level, the protocol of [6] relies on the semi-honest 3PC shuffle protocol from [35] which guarantees privacy against a malicious adversary. [6] then augments this with a robust Set-Equality protocol to verify the correctness of the semi-honest shuffle. The semi-honest shuffle comprises three invocations of Shuffle-Pair protocol. In each instance of Shuffle-Pair, a random permutation is applied to the input (of the Shuffle-Pair), where the permutation is known to a distinct pair of parties and is hidden from the third. The output of the current Shuffle-Pair is fed as input to the next Shuffle-Pair. The composition of all three permutations, thus, makes up the random secret permutation used to shuffle the input table. Since each party is aware of only two permutations, the final permutation remains private. Formal details of Shuffle-Pair protocol are recollected in §B. Each invocation of Shuffle-Pair is followed by a Set-Equality protocol which outputs a flag ∈ {0, 1} indicating whether the table output by the Shuffle-Pair is indeed a random permutation of the input to this Shuffle-Pair. In this way, the output of the shuffle protocol is guaranteed to be correct if all instances of Shuffle-Pair are verified to be correct. Generation of T o = ( T ) ⊕ R. As part of the Shuffle-Pair instances performed during the preprocessing, parties generate 12 , 01 , 02 . The goal now is to generate Observe that, unlike during preprocessing, the table to be shuffled is now held by all three parties on clear, while the permutation is still private. Further, each party misses exactly one permutation that is held by the other two parties. We Ruffle Proceedings on Privacy Enhancing Technologies 2023 (3) leverage these observations in designing our shuffle protocol to attain a highly efficient online phase. We explain case-by-case how each ∈ P obtains T o . Generating T o towards 1 . Recall that 1 misses 02 . If 1 is given 02 ( T ), it can locally compute ( T ) = 12 • 01 ( 02 ( T )) using its knowledge of 12 • 01 . However, as mentioned earlier, since 1 holds T on clear, knowledge of 02 ( T ) leaks the permutation 02 to it. Hence, we instead provide it with 02 ( T ) ⊕ R ′ , where the randomness R ′ masks 02 ( T ) and prevents leakage of 02 . For this, observe that 02 is held by both 0 , 2 . We let 0 , 2 sample a random R 02 ∈ Z 2 ℓ , and compute and send 02 ( T ⊕ R 02 ) to 1 . Here, 02 (R 02 ) serves as the random mask R ′ . Further, note that since at most one among 0 , 2 can be malicious, making both send the value to 1 enables the latter to check the consistency of the received messages and detect misbehaviour, if any. Since the message from the second sender only aids in verifying the consistency of the received messages, to save on communicating an entire table, it suffices for one sender to send the value and the other to send the hash of it 10 . On receiving a consistent 02 ( T ⊕ R 02 ) (which also guarantees its correctness, as otherwise the received messages would have been inconsistent), 1 can compute T o using the received value and the knowledge of permutations 12 , 01 . Note that 02 (R 02 ) serves as a mask to hide from 1 . Looking ahead, similar masks are required in T o to keep hidden from 2 and 0 . This results in additionally introducing the random masks 12 ( 01 (R 01 )) and 12 (R 12 ), respectively. To ensure that all parties have the same T o and use the same randomness for masking ( T ), T o is defined as where R 12 , R 01 ∈ Z 2 ℓ , and R is jointly sampled by , ∈ P.
At the end of first round, since 1 holds R 12 , R 01 , 12 , 01 , and 02 ( T ⊕ R 02 ), it can compute T o using Eq. (1). Generating T o towards 2 . Observe that 2 lacks 01 that prevents it from computing 12 ( 01 ( 02 ( T ⊕ R 02 ))). On the other hand, if provided with the value 01 ( 02 ( T ⊕ R 02 )), then 2 can obtain 12 ( 01 ( 02 ( T ⊕ R 02 ))) by applying 12 on it. However, similar to the case described earlier, this leaks the permutation 01 to 2 . To fix this leakage, we first mask 02 ( T ⊕ R 02 ) with the random value R 01 and then apply 01 on this masked value and communicate it to the hash). This completes the generation of T o towards all the parties. Observe the need for using R 12 as a mask while computing T o . Analogous to the cases for 1 , 2 , absence of R 12 leaks the permutation 12 to 0 . Further, note that although 2 can compute the correct T o only after the second round, it receives 12 required for computing T o in the first round itself. Hence, communication of T o from 1 , 2 towards 0 can happen in the second round. A pictorial view of the messages exchanged is given in Fig. 2.
. Subsequent to the above discussion and as evident from Eq. (1), R is defined as R = (R 02 ) ⊕ 12 ( 01 (R 01 )) ⊕ 12 (R 12 ), whose [·]-shares are required to be generated in the preprocessing phase. Observe that 1 , 2 hold 12 (R 12 ) on clear. Hence, as described in §3, [ 12 (R 12 )] can be generated non-interactively. A naive approach of generating [·]-shares of remainder terms requires three invocations of Shuffle-Pair for generating [ 12 ( 01 ( 02 ( T )))], three for generating [ 12 ( 01 ( 02 (R 02 )))], and two for [ 12 ( 01 (R 01 ))]. However, all these remainder terms in Eq. (2), need an application of 01 followed by an application of 12 . Hence, instead of separately computing these terms via multiple Shuffle-Pair instances, we club these terms together in such a way that we require only three Shuffle-Pair instances to compute T o . Elaborately, given that R 02 is held by 0 , 2 on clear, parties can non-interactively generate its [·]-shares. Further, given parties invoke Shuffle-Pair with 02 as the secret permutation to generate [ 02 ( T ⊕ R 02 )]. Since [R 01 ] can also be generated noninteractively, the remainder terms in (2) can be alternatively expressed as, 12 ( 01 ( 02 ( T ))) ⊕ 12 ( 01 ( 02 (R 02 ))) ⊕ 12 ( 01 (R 01 )) 1. Guaranteeing output delivery. Note that in the solution described above, an adversary can misbehave, resulting in an abort (i.e., failure of shuffle). However, to attain GOD and obtain as output the randomly shuffled input table irrespective of the adversarial behaviour, one can proceed as follows. Inspired by the techniques of [9,11,29,30], we rely on a trusted third party (TTP) based approach. Elaborately, if shuffle fails, we work towards identifying an honest party in P that is designated as a TTP. Parties robustly reconstruct the input table to TTP, who performs the shuffle operation on the clear table, and sends the output (shares of the randomly shuffled input table) to all. We next describe how a TTP can be identified in preprocessing and online phase whenever shuffle fails. Identifying a TTP if shuffle fails during preprocessing phase. The preprocessing phase involves three sequential invocations of the semi-honest Shuffle-Pair protocol where in each invocation, only two parties communicate a message (see §B for details). Each invocation of Shuffle-Pair is followed by a robust Set-Equality protocol to verify the correctness of the Shuffle-Pair, which outputs a flag indicating that shuffle failed if some misbehaviour was detected in this Shuffle-Pair instance. We make the following observation that aids in identifying a TTP: If any invocation of Set-Equality outputs a flag indicating that Shuffle-Pair fails, it must be due to a misbehaviour by one of the two (communicating) parties in the corresponding Shuffle-Pair instance. This is because Set-Equality protocol is robust against any misbehaviour (owing to the use of a robust 3PC for the same), and hence, shuffle can fail only due to a misbehaviour in Shuffle-Pair. Further, since at most one among the three parties is malicious, this guarantees that the (non-communicating) residual party is honest and can be designated as the TTP. Identifying a TTP if shuffle fails during online phase. Each of the three messages that are exchanged in the online phase have the following communication pattern. There exist two senders who possess the message to be sent to the receiver, where one sender sends the message while the other sends the hash of it. Since this resembles the communication pattern of [19,29], we use the techniques therein to identify a TTP, if any party receives an inconsistent (message, hash) pair. At a high level, if the received message and hash do not match at the receiver, it broadcasts a complaint accusing the senders. It also broadcasts the received messages. This is followed by the senders broadcasting a complaint against the receiver if the latter's broadcast message was inconsistent with the senders sent message. Depending on the publicly available complaints, parties can unanimously determine a pair of parties that are in conflict with each other, one of which is guaranteed to be corrupt. Due to at most one malicious corruption among the three parties, the third party that is not a part of this conflict is guaranteed to be honest and can be designated as the TTP. The formal steps are provided in Fig. 4, and correctness follows from [19] 11 . Preprocessing: -Each pair of parties , ∈ P non-interactively sample R ∈ Z 2 ℓ and random permutations .
-Parties in P follow the steps in Fig. 3 to generate To .
-Identifying TTP when shuffle fails: If flag indicates a failure, all parties set TTP to be the non-communicating party in the corresponding Shuffle-Pair protocol. When multiple flag indicates failure, break tie deterministically and use one flag . Online: • Shuffle (Round 1): -0 computes and sends 01 = 01 ( 02 ( T ⊕ R 02 ) ⊕ R 01 ) to 2 .
• Verification (Round 3) a : For each receiver ∈ P, let , denote the senders. Let send the message and send its hash. checks if the received values are consistent. If not, it broadcasts ("accuse", , , , ), where = H( ), such that and are the values sent by and , respectively.
-Else if is different from the hash of the value sent by to , then broadcasts ("accuse", ). Set TTP = . The above steps follow analogously for .
-Else if ≠ and neither nor accuses , set TTP = .
• One-time computation through TTP: If TTP is set, all parties robustly reconstruct the input table towards the TTP, who randomly shuffles the input and sends the shuffled table to all parties.
a Note that these can be performed as soon as the messages required for detecting inconsistency are available.
Protocol Π Ruffle T

Figure 4: Secure shuffle protocol
Proof of security. The security of our shuffle protocol follows the fact that the secret permutation remains hidden since each party knows only two of the three permutations. Moreover, throughout the protocol, parties only receive random messages. Hence, the security follows easily. A detailed proof of security appears in §F.

APPLICATIONS
Here, we focus on two applications of shuffle-(i) anonymous broadcast and (ii) GraphSC paradigm. Recall that anonymous broadcast being a perpetually running system, requires a shuffle protocol that is designed to handle Independent-Shuffles. On the other hand, the GraphSC paradigm requires a shuffle protocol that can cater to Composed-Shuffles. While Ruffle was described with respect to a solitary shuffle invocation in §4.1, it can easily be extended to handle the case of Independent-Shuffles. The resulting protocol is termed as Ruffle-1. However, due to the sequential dependence present in the preprocessing phase of Ruffle-1, it does not render itself efficient for the case of Composed-Shuffles. Hence, we enhance Ruffle and design an alternative protocol Ruffle-2 that is tailor-made to handle Composed-Shuffles. We describe the applications and the respective shuffle protocols they use, next.

Anonymous broadcast
Anonymous broadcast, as the name suggests, enables a set of clients to anonymously broadcast their messages while guaranteeing that none learns about the association between a message and the identity of its sender. Instead of requiring the clients to send their messages to a centralized server, which can output the randomly shuffled messages back to the clients, we rely on a distributed solution to guarantee client privacy. At a high level, to achieve anonymous broadcast, the clients secret-share their messages to a set of three servers (the three parties in P, henceforth interchangeably called as servers), who invoke a secure shuffle protocol on the same and reconstruct the shuffled output. The solution must guarantee the following desirable properties when at most one server and any number of clients are corrupt. 1. Confidentiality: A coalition of malicious clients and server should not learn the permutation used to shuffle the messages. 2. Integrity: Client's message should remain intact. 3. Security against malicious client: System should discard malformed messages sent by malicious clients. 4. Robustness against malicious server: Malicious server should not abort the computation, and halt the system. 5. Censorship resistance: A malicious server should not be enabled to discard an honest client's message from the system. Many works in the literature consider mix-net [17] based approach to achieve anonymous broadcast [4,20,[31][32][33]37], while others [2,18,21,45] are based on DC-networks proposed in [16]. The recent work of Clarion [20] improves in terms of efficiency over these and provides an alternative shuffle-based anonymous broadcast system. Hence, [20] provides the most efficient shufflebased anonymous broadcast system in the 3-server setting. Our protocol offers the following improvements over Clarion: (i) censorship resistance which was missing in [20], (ii) usage of a more efficient shuffle, (iii) more efficient steps for verifying consistency of client's input message, and (iv) improved security guarantee of robustness, whereas that of [20] only provides security with abort. We now describe our anonymous broadcast in the 3-server setting. 5.1.1 Our anonymous broadcast system. The protocol can be described in the following steps. 1. Input sharing and consistency check: Each client wanting to broadcast a message receives randomness, using which it generates ·shares of its message. On receiving shares of a client's message, servers verify if these are malformed. If so, they discard the message. 2. Shuffle: Assuming messages pass the verification, servers securely shuffle the -sized table using Ruffle-1 described in §5.1.3. 3. Output reconstruction: On receiving the output shares after executing Ruffle-1, servers reconstruct the shuffled table using the steps described in §3. The shuffled table is then broadcast to clients.
Since steps for output reconstruction (step 3) were already described, we next elaborate on input sharing and consistency check (step 1), and the shuffle protocol (step 2).

Input sharing and consistency check.
This comprises a preprocessing phase and an online phase, as elaborated below.
Preprocessing phase: Let the ℓ-bit client message be denoted as m. Consider the sharing semantics of our shuffle protocol. A message m which is · -shared comprises m = m ⊕ m held by all servers in P, To enable the client to generate m towards the servers while minimizing the computation as well as communication at the client, we proceed on similar lines as in [29]. We let the servers generate [·]-shares for a random m , non-interactively.  12 to the servers. To ensure that each server receives the same m and guarantee that a client has not misbehaved, each server broadcasts the m received from the client. If there is a majority among the broadcast values, the client's message is accepted, and each server sets its m to be the majority value. Else, the client's message is deemed as malformed and discarded from the instance of anonymous broadcast protocol.

5.1.3
Ruffle-1. Ruffle protocol described in §4.1, was for a single invocation of shuffle. For the scenario of Independent-Shuffles, where independent shuffles are required to be performed sequentially, we design Ruffle-1 as follows. In its preprocessing phase, Ruffle-1 performs instances of the preprocessing of Ruffle in parallel, whereas for its online phase, it sequentially executes the online phase of Ruffle times. As seen in Table 1, Ruffle-1 has a better complexity than that of [6,20].
A comparison of the concrete cost of our anonymous broadcast with Clarion, together with a justification of how our system attains the desirable properties, appears in §C.

Secure graph computation
The GraphSC paradigm [43] expresses any graph algorithm as a message-passing algorithm. The graph is stored as list of nodes and edges, where each entry in the list is associated with data or state. One round of the message passing algorithm involves updating the state of the nodes in the graph via the primary operations of Scatter and Gather, which realise sending and receiving messages across the edges, respectively. To ensure data obliviousness, before each Scatter/Gather operation can be invoked, the list representing the graph must be securely sorted. [6] takes a step towards improving the performance by replacing every secure sort operation with a secure shuffle. Thus, the entire graph algorithm reduces to performing shuffles and invocations of the Scatter and Gather operations across multiple rounds (O (|V| + |E|)). Note that shuffling in each round takes as input the result obtained in the previous iteration to update the state of the nodes. Hence, these sequential shuffles performed in different rounds indicate the scenario of Composed-Shuffles. As will be explained next, since Ruffle-1 results in having a prohibitively high complexity when employed for the case of Composed-Shuffles, we design Ruffle-2 that is tailormade for this scenario. A detailed explanation of the GraphSC paradigm along with the representative use case of BFS algorithm is given in §D. We next describle Ruffle-2.

Ruffle-2.
For scenarios that demand the composition of, say , shuffles (i.e., Composed-Shuffles), observe that the preprocessing phase of Ruffle-1, which comprises instances of the preprocessing of Ruffle, can no longer execute in parallel, but will have to be performed sequentially. This is because in Composed-Shuffles, the output of one shuffle operation, say T 1 , constitutes the input to a subsequent shuffle operation, which say outputs T 2 = (T 1 ). Hence, once T 1 is generated as output from the first (preprocessing phase) instance of Ruffle, only then can T 2 = T 1 ⊕ R be generated (see §4.1 for definition of T ). This sequential dependency present in the preprocessing phase of Ruffle-1 when deployed in the case of Composed-Shuffles, makes its run time proportional to the number of sequential shuffles. However, it is desirable to facilitate generation of necessary preprocessing data in parallel and hence, decouple the dependency between generation of preprocessing data and pattern of shuffle invocations. This can aid in significantly reducing preprocessing phase's cost. Hence, in the following, we design an alternative protocol Ruffle-2, that breaks this dependence and is tailor-made to handle Composed-Shuffles.
Let T be the input table which has to be shuffled to obtain T = (T). In Ruffle (Fig. 4), This is because recall that T = ( T ) ⊕ R ⊕ ( T ) ⊕ R (where R serves as a random mask for ( T )). We next describe how to generate this ′ In the preprocessing phase, observe that [·]-shares of ( T ) ⊕ R can be generated as described in §4.1. Thus, parties can

BENCHMARKS
We empirically evaluate performance of our shuffle protocols under various parameters and application scenarios, and compare them against their state-of-the-art counterparts. Benchmark environment. Benchmarking is performed over LAN using n1-standard instances of Google Cloud with 2.3 GHz Intel Xeon E5 v3 (Haswell) processors, and 240 GB of RAM. The machines have a bandwidth of 16Gbps. For a fair comparison, we implement all the protocols in python, including that of [20] and [6]. Thus, costs reported for prior works are higher than that reported in the original works (see §E). Hence, the concrete improvements over [20] and [6] reported next capture the relative improvements with respect to the underlying protocols. That is, we do not account for the system level optimizations that may have been included as a part of the implementations in the original works of [20] and [6]. However, we note that the reported communication costs are invariant of the implementation. Our code accounts for multi-threading with 64 threads. We instantiate the communication layer between the parties using PyTorch library. We use Crypto library for AES and hashlib for generating SHA256 hash. We note that our code 12 is developed for benchmarking, is not optimized for industry-grade use, and a C++ based implementation can give better performance.

Ruffle
Proceedings on Privacy Enhancing Technologies 2023 (3) Benchmark parameters. We follow the standard practice and benchmark honest executions (with verification) as done in [10,29,39,47]. We consider run time and communication of protocols as the parameters for comparison. We account for online as well as total (preprocessing + online) cost when doing so. To capture the combined effect of both these parameters, we additionally report online throughput (TP).

Shuffle
We begin by comparing Ruffle to the shuffle protocol of [20] and [6] for the case of a single invocation of shuffle. Table 2 reports the online phase comparisons to capture the fast response time and the communication involved. Observe that Ruffle clearly outperforms both [6,20]. Concretely, we observe improvements up to 15× in run time and 2.5× in communication over [20]. When compared to [6], Ruffle has an improvement of up to 11.2× in run time and 2.5× in communication. The improvements in the run time and communication are reflected in a high throughput, which captures the number of such single invocations that can be performed in parallel. The improvements in throughput range up to 5.5× and 2.2× with respect to [20] and [6], respectively. When considering the overall cost, we note that Ruffle fares better than [20] but is slightly higher, yet comparable, to that of [6]. We report this in Table 3 for completeness. We remark that Ruffle-1 has same complexity as Ruffle for the case of a single shuffle, while Ruffle-2 is not apt for single shuffle invocation due to its higher preprocessing cost.  To capture the improvements of Ruffle-1 and Ruffle-2, we benchmark their performance for multiple sequential shuffle invocations, i.e., scenarios of Independent-Shuffles and Composed-Shuffles. Recall that Ruffle-1 is apt for Independent-Shuffles while Ruffle-2 for Composed-Shuffles. Since [6] outperforms [20] (as evident from Table 2 and Table 3), we restrict to comparing Ruffle-1 and Ruffle-2 in their respective settings against [6]. Further, to capture improvements Ruffle-2 protocol brings over Ruffle-1, we also report the cost for performing Ruffle-1 in the scenario of Composed-Shuffles. The   comparison for varying number of shuffle invocations is reported in Fig. 5 (and Table 4). We make the following observations:

|T|
• The cost of [6] remains the same for Independent-Shuffles and Composed-Shuffles since it is indifferent to both.
• We infer the following with respect to the online complexity. Irrespective of the scenario and the number of shuffle invocations, recall from Table 1 that Ruffle-1 and Ruffle-2 are comparable since their online phase is same, except for the extra computation required in Ruffle-2. Hence, as expected, Ruffle-1 (and thereby Ruffle-2) outperforms [6] by up to 10×.
• We infer the following with respect to the overall run time. For a single shuffle invocation, both Ruffle-1 (i.e. Ruffle for = 1) and Ruffle-2 have a slightly higher run time than [6]. However, starting from as low as two invocations, Ruffle-1 begins to outperform [6] for Independent-Shuffles. This is justified as follows-since Ruffle-1's online phase is faster than that of [6], performing the preprocessing for shuffles in parallel results in improving the overall complexity. This improvement is not seen in [6] since the shuffles that can be performed in parallel during our preprocessing are required to be performed sequentially in case of [6] which adds to the overhead. We see improvements of up to 6.4× in this case. On the other hand, for Composed-Shuffles, [6] continues to outperform Ruffle-1 for the following reason. The composition of shuffles induces a sequential nature in the preprocessing phase of Ruffle-1 (which indeed is the complete protocol of [6]). The computations performed additionally in the online phase of Ruffle-1 renders its overall complexity slightly higher than that of [6]. To break this chain of sequential shuffles in the preprocessing phase, Ruffle-2 was designed to outperform Ruffle-1 (and thereby [6]), where we see improvements of up to 4.7× with respect to [6].
• To capture the effect of both total run time and total communication, we additionally report the monetary cost in Table 4, which is the price paid for performing the secure shuffle computation. This    -sized table, and reconstruction of the shuffled result. Observe from Table 5 that our anonymous broadcast system outperforms [20] in every aspect. This can be attributed not only to the use of our efficient shuffle protocol but also due to the simplicity of the input sharing and consistency check, and output reconstruction. On the other hand, [20] relies on several MAC verifications and encryption operations which render the system of [20] less efficient. The improvements we observe with respect to online and total time is up to 29× and 13× respectively, whereas that of online and total communication is up to 2×.
The effect of varying the client message size on run time and communication with respect to the server is reported in Table 6. Our system outperforms [20] in terms of both. Concretely, with respect to online and total time, we see improvements up to 39× and 9×, respectively. With respect to online and total communication, we see improvements up to 1.2× and 1.3×. Table 5 and Table 6 do not account for the time/communication required to share a client's input. Hence, to showcase the overhead of input sharing, on both the client and the server, we report the costs in Table 7. Since this overhead is dependent on the client message size, Table 7 also account for the same. Recall that our system additionally requires the client to wait to receive the preprocessing data from the server. Despite this, the time for which a client has to remain online in our system is 18× lesser in comparison to [20]. The higher cost of [20] can be attributed to the need for PRG (pseudorandom generator) invocations, encryption of message followed by MAC tag computation at the client. This is unlike our system, which relies on simple operations such as XOR. On the other hand, since [20] requires the clients to communicate to only two servers instead of the three servers as required in our case, they have lesser communication. The reduced time a client has to remain online comes at the cost of server-to-client communication, which is absent in [20]. This we note is a small price paid. Thus, we note that our realization of the anonymous broadcast system not only provides improved efficiency but also offers censorship resistance and allows attaining the improved security of GOD.

Secure graph computation
We benchmark the application of BFS as described in §D via the GraphSC paradigm of [6]. We rely on the robust MPC framework   of SWIFT [29] wherever needed. To overcome the linear dependence on the size of the input graph, we cast our secure evaluation of BFS in the multiprocessor setting as described in [43]. This allows us to obtain a solution which has a round complexity of O |V |+|E | + log(|V| + |E|) , where is the number of processors, using the parallel variants of Scatter and Gather primitives as described in [43]. The formal details of the Scatter and Gather primitives required for the specific case of BFS is described in §D. Since GraphSC paradigm requires the composition of shuffles (Composed-Shuffles), we implement BFS using Ruffle-2. The improvements that Ruffle-2 brings over the shuffle of [6] has already been established in Table 4. We now compare our implementation of BFS that uses Ruffle-2 with the BFS implementation of [6] and we report it in Table 8. We see improvements up to 11.5× in the online run time in comparison to [6], while also outperforming in terms of the total run time. Note that these improvements were observed in the 32-processor setting. Since the number of processors affects the run time, we estimate the cost for varying number of processors, and the same appears in Fig. 6.

CONCLUSION
We design secure shuffle protocols which not only provide a fast online phase but also improve on the overall run time when considering two or more sequential shuffle invocations. We showcase the significant improvements that arise when using our secure shuffle protocols in the application of anonymous broadcast and secure BFS computation via GraphSC paradigm, and thereby provide a solution that improves over the respective state-of-the-art works.  for a graph of size 10 4 (= | L |) in 32 processors setting. With secure shuffle being an integral part of various other applications, it would be interesting to see how our shuffle protocols can be used to bring about improvements therein. Since applications may demand a varying number of parties, going ahead, we believe it is an important question to design secure shuffle protocols for the arbitrary -party setting. A naive extension of our shuffle protocols to the -party setting would result exponential blow-up in the number of permutations held at every party. This would in turn affect the communication and round complexity adversely. Hence, the challenge would be to circumvent the above issues and design efficient solutions for -party setting.

A PRELIMINARIES
Shared key setup. Let : {0, 1} × {0, 1} → be a pseudorandom function (PRF), with = Z 2 ℓ . To enable parties to sample common random values non-interactively, the following keys (for the PRF ) are established between the parties: each pair of parties , ∈ P know a common , and all parties in P know P . , can now sample a common value ∈ Z 2 ℓ , non-interactively, by computing ( ). Here, denotes a counter maintained by , , which is updated after every PRF invocation. The ideal functionality for robustly establishing these common keys among parties, as described in [29], appears in Fig. 7.
F setup interacts with the parties in P and the adversary S. F setup picks random keys for , ∈ {0, 1, 2}, < , and P . Let y denote the keys corresponding to party . Then y = ( 01 , 02 and P ) when = 0 .
Output: Send (Output, y ) to every ∈ P.
Functionality F setup Commitment scheme. Let Com( ) denote the commitment of a value . The commitment scheme Com( ) possesses two properties; hiding and binding. The former ensures privacy of the value given its commitment Com( ), while the latter prevents a corrupt party from opening the commitment to a different value ′ ≠ . Note that providing an incorrect opening for a commitment results in outputting a ⊥. We instantiate a commitment scheme (as described in [40]) is via a hash function H(·) given below, whose security can be proved in the random-oracle model-the commitment is defined as Com( ; ) = H( || ), and its opening is defined as ( || ).
Joint message passing. The robust protocol for Π jmp appears in Fig. 8. We refer the reader to [19] for further details.
We note that at any point during the run of the protocol if invocation of Π jmp results in identifying a TTP, parties do not execute the rest of the protocol steps, and the remainder computation is carried out by the TTP who guarantees the delivery of the correct function output to all parties. Elaborately, on identifying a TTP, all parties send their inputs on clear to the TTP, who evaluates the necessary function on these clear inputs. It then sends the computed output to all the parties. In this way, reliance on Π jmp to send protocol messages ensures that if any malicious party misbehaves to prevent the parties from obtaining the output, a TTP is identified and the output is now obtained via the TTP. We remark that most protocols in the 3-party setting that guarantee output delivery follow the TTP-based approach [9,11,29].
Send Phase: sends msg = v to . Verify Phase: sends msg = H(v) to who checks if the hash is consistent with the value sent by . If the values are not consistent, parties proceed as follows to identify a TTP. • If none of the parties or accuse then parties set TTP = .
Protocol Π jmp v, , , The shuffle protocol of [6] requires three invocations of Shuffle-Pair, each of which is followed by a Set-Equality protocol to verify the correctness of Shuffle-Pair. In the following, we first provide the Shuffle-Pair protocol followed by the Set-Equality protocol.
Shuffle-Pair. Let table T be [·]-shared. Shuffle-Pair enables parties to generate [T ′ ] where T ′ = (T) and is a random permutation held by , ∈ P. We describe the protocol with respect to 0 , 1 who hold the permutation 01 in Fig. 9. The protocol for the other pair of parties can be worked out analogously.
Set-Equality. The input for the Set-Equality protocol is a pair of tables (T, T ′ ), each comprising of rows and an -bit string in each row. Thus, the table comprises rows and columns. Here, T ′ is obtained by randomly shuffling T. The protocol returns as output a 1, if the two tables are different and a 0 otherwise. The protocol chooses random subsets of rows and columns, and verifies that the bits in the intersection of the chosen rows and columns are the same for both the underlying tables. This verification is performed times to ensure a low probability of cheating. In order to choose the subset of rows randomly, each row of T is extended by random and secret bits before shuffle. Consequently, after the shuffle, every row in T ′ has the same −sized suffix which it had in T. The rows for the ℓth test is picked based on the bit in the ℓth column of the -sized extension, with the row being chosen if the corresponding bit is a 1. Let ℓ be the bit vector that denotes this selection of rows (i.e., + ℓth column of the table). Similarly, let ℓ be the ′ = + bit public vector that denotes the choice of columns picked for the ℓth test. The verification test compares the XOR of the values in the intersection of the chosen rows and columns in T and T ′ to check for the correctness of the shuffle. This is captured by the operations performed as described in Eq. (3), where ,ℓ denotes the th component of the vector ℓ . The output of the check is a bit ℓ such that if the tables are different, some ℓ , for ℓ ∈ {1, . . . , }, is non-zero with high probability. To detect if any ℓ is non-zero, a flag is computed which is the OR of all these ℓ 's, followed by reconstructing flag. A non-zero flag indicates a misbehaviour in the Shuffle-Pair instance for which Set-Equality is performed.
We note that performing the steps of Set-Equality by relying on a robust MPC yields a robust Set-Equality protocol, as done in the work of [6]. Preprocessing: 1. 0 , 2 randomly sample [T ′ ] 02 ∈ Z 2 ℓ , and 1 , 2 randomly sample [T ′ ] 12 ∈ Z 2 ℓ , non-interactively using the common keys established via F setup .  Figure 9: Shuffle-Pair [6,35] Regarding the security of the shuffle protocol of [6]. As per the discussion above, observe that their protocol correctly determines the output shares of the shuffled table for each party, in case no misbehaviour occurs during any of the three Shuffle-Pairs. However, since Set-Equality is performed via robust MPC, it guarantees that any malicious act in the corresponding Shuffle-Pair is detected since some ℓ , for ℓ ∈ {1, . . . , }, will be non-zero with high probability. Such a misbehaviour in the Shuffle-Pair is indicated by a flag bit being set to a 1. In such an event, observe that parties will not possess the correct output shares and the protocol cannot proceed further. Thus, the protocol only provides security with abort (considering the robust ideal functionality of shuffle, defined in Fig.  1). Moreover, observe that a malicious party can also misbehave such that it learns the output shares but deprives the honest parties of the correct shares. This is possible if a malicious party aborts in the last Shuffle-Pair invocation. Elaborately, consider the last Shuffle-Pair among the three invocations used for getting a random shuffle. Without loss of generality, the protocol proceeds exactly as given in Fig. 9 with 0 and 1 being the communicating parties. In case, 1 is corrupt, it may receive 0 from 0 , obtain the output shares of the shuffled table, and then abort. Since the honest party 0 does not receive 1 from 1 , it does not learn its output shares. Hence, we believe [6]'s protocol gives only security with abort.
Complexity of shuffle protocol of [6]. Observe that the protocol of [6] requires three invocations of Shuffle-Pair, each of which is followed by a Set-Equality protocol that verifies the correctness of the semi-honest Shuffle-Pair. While Shuffle-Pair requires only one round of communication, the Set-Equality protocol requires 2 + log 2 rounds 14 , where is the security parameter. Thus, the overall round complexity is 3 + 2 + log 2 . With respect to the communication complexity, it requires communicating 6 ℓ bits for the three Shuffle-Pair instances, and additional communication for evaluating the Set-Equality protocol that involves computing 2 Boolean multiplications, computing OR of bits and a robust reconstruction.

C ANONYMOUS BROADCAST
Regarding the threat model. An anonymous broadcast system with a powerful adversary corrupting arbitrary number of clients, faces inherent issues. In any set of messages submitted to the clients, a significant subset will belong to the adversary. As a result, the honest clients' messages can always be linked to their sender with some probability. This probability depends on the size of the anonymity set for a message, which is the number of honest users who submit their messages in the same round. In the worst case instance, if an adversary is powerful enough to corrupt ( − 1) clients in a system, then we can no longer protect the honest client from deanonymization. For our system, we assume that the anonymity set is large enough in each instance of the protocol and that complete deanonymization attacks cannot happen.
Attaining the desirable properties. Confidentiality: Observe that this property is attained since the messages to be shuffled as well as the permutation remain hidden from the servers and clients owing to the secure shuffle protocol. Censorship resistance: The only way a malicious server can cheat to discard an honest client's message m is by broadcasting an incorrect m during the input consistency check. Since at most one server among the three is malicious, there will always be agreement among the honest servers with respect to the correct m . Thus, our protocol does not allow discarding an honest client's messages, thereby attaining censorship resistance. Integrity and robustness against malicious adversary: Observe that due to the robust input sharing, shuffle, and output reconstruction protocols, our system is robust against any malicious behaviour, and guarantees the integrity of honest clients' messages.
Comparison of our anonymous broadcast system with Clarion [20]. We look at each of the steps in an anonymous broadcast system and draw out a comparison in terms of the communication cost, round complexity and the level of security provided. System guarantees: As described earlier, Clarion provides security with abort while our protocol allows attaining the strongest security notion of GOD. Clarion provides no way of distinguishing between a malicious act by a server and client, and the system rejects the request from a client whenever the verification for input consistency fails. Specifically, since Clarion only provides security with abort, it allows a malicious server to make the input consistency check (with respect to an honest client's input) fail by aborting the computation. Thus, an honest client may be dropped due to misbehaviour by a malicious server, and hence Clarion does not guarantee censorship resistance. On the other hand, as described earlier, our scheme guarantees the same.
Input sharing: Our scheme requires 3 commitments be communicated among the servers in the preprocessing. In the online phase, our scheme requires 9 commitments be communicated from servers to the client. Additionally, each server sends the opening for two commitments to the client. Our next step requires the client to communicate an ℓ-bit string m to each server. Finally, to verify that client has sent consistent m , each server engages in a broadcast of a message of length ℓ bits 15 . Clarion's preprocessing involves PRG (pseudo-random generator) invocations, encryption of message, and tag computation at the client side. At the servers, one of the servers is responsible for transferring (ℓ + 1) Beaver's triples to the other two shuffling parties. Each message of length ℓ is divided into blocks where = ℓ | | . For Clarion, | | = 128 bits. To send a message in the online, a client communicates ( + 3) blocks to each of the shuffling servers. Moreover, since a client can launch an in-protocol denial-of-service attack by sending incorrect MACs (message authentication code), the servers proceed to verify correctness of client's message by performing (ℓ + 1) multiplications using Beaver's.
Shuffle: The preprocessing phase of our shuffle protocol requires one invocation of the 3PC shuffle protocol of [6] and requires 6 ℓ bits of communication for the Shuffle-Pairs. Additionally, it requires computation of 2 Boolean multiplications, OR of bits ( denotes the security parameter), and one robust reconstruction. The online phase of our shuffle requires communication 3 ℓ bits and at the end of 2 rounds guarantees that misbehaviour, if any, will be detected by a party. If a misbehaviour is detected, it requires 2 additional rounds to broadcast complaints that enable determining a TTP, via which GOD is attained. The 3-server honest-majority shuffle protocol of Clarion is also cast in the preprocessing paradigm. Preprocessing requires communication of (2 + 3) blocks along with Beaver's triples to each of the 2 shuffling servers. The online phase occurs in 2 rounds, and 2 (2 + 3) blocks of communication. Clarion additionally needs a verification phase to ensure that the servers haven't misbehaved during shuffling. This includes verifying the MACs associated with the client messages. Hence, 4 additional rounds are performed as no proof of correctness of the shuffle is generated. These 4 rounds involve revealing the ciphertext shares multiplications, revealing hashes for the shares of difference 15 These two phases of preprocessing and online translate to a real-world scenario as follows. Consider an anonymous blogging site. The user of the system, on creating a pseudonym, receives a set of pre-computed randomness, which makes up the preprocessing phase, and facilitate generation of a certain number of articles in the online phase which will be posted anonymously.
between computed and actual tag and then giving the actual shares to check that the difference turns out to be 0.
Output reconstruction: In our protocol, output reconstruction requires a single round where openings with respect to the commitments generated on the missing [·]-share of are communicated. On the other hand, Clarion takes 2 rounds of communication and guarantees only security with abort.

D GRAPHSC PARADIGM AND BREADTH FIRST SEARCH
We describe the GraphSC paradigm proposed in [43] and improved in [6], which provides a highly efficient and scalable solution for securely evaluating graph algorithms. The paradigm relies on expressing graph algorithms as message-passing algorithms where each node updates its state by repeatedly sending/receiving messages on its edges over multiple rounds. The sending and receiving of messages are realized via the primitive operations of Scatter and Gather. The paradigm provides secure realizations of these primitives such that a message-passing algorithm expressed as a composition of these primitives can be evaluated securely. The GraphSC paradigm takes as input a directed graph G(V, E) such that each node u ∈ V is represented as a tuple (u, u, d) and an edge (u, v) ∈ E as (u, v, d). Note that for a tuple corresponding to an edge, the first entry denotes the source node, and the second entry denotes the destination node. Both these entries are the same for a tuple representing a node. Further, d is a binary string that encodes the data/state associated with a tuple (node/edge). Thus, we let L denote the input G comprising the tuples described above. The Gather primitive requires L to be sorted based on the destination node v such that all tuples corresponding to incoming edges at v appear before the tuple representing node v. Thus a linear scan of L allows a node to gather information from its incoming edges. Analogously, Scatter requires L to be sorted based on source node u, such that tuples corresponding to outgoing edges from u appear after the tuple for node u. This facilitates each node to scatter information on outgoing edges via a linear scan. Thus, the paradigm requires securely sorting L appropriately before each invocation of a Scatter and Gather primitive. In this way, two invocations of a secure sort are required for each round of message passing. [6] improves the performance of [43] by relying on a secure shuffle primitive instead of repeatedly performing a secure sort. Specifically, [6] introduces the following sequence of operations to securely evaluate a message-passing graph algorithm. As a one-time initialization, the input L is securely shuffled using a random secret permutation to obtain L = (L). Another invocation of secure shuffle with permutation translates L to L = (L ). Note that the translation must be such that the parties are able to generate new random shares of L from L , and vice versa, while ensuring both , remain hidden. Having fixed on the random permutations and , each time a Scatter is required, the parties consider the shares of L and sort its entries based on the source node, as mentioned earlier. Rather than relying on a secure sort for the same, the parties perform a partially-insecure sort wherein the entries in L remain secret shared, but the output of the comparisons made during the sort are revealed to the parties. In this way, the mapping of the entries in L to the source-sorted list is made public to the parties. In an alternative view, performing a sort to obtain a source-sorted list is equivalent to picking a specific permutation that defines the sorted order. Hence, performing a partially insecure sort translates to performing a secure sort followed by revealing in public. Note that this does not leak any information since the randomly shuffled input does not permit one to determine any associations between entries in the original graph G and the sorted output. Additionally, the above sort needs to be performed only once to determine the mapping ( ). Subsequent generation of source-sorted list from L can be performed locally by relying on the publicly known . Similarly, for Gather, the parties use the shares of L to obtain the mapping of its entries to the destination-sorted list using a partially-insecure sort, once. Recall that the translation from source-sorted list to destination-sorted list, and vice-versa, requires that the permutations , used for generating L , L , respectively, remain the same. Hence, the secure shuffle protocol should facilitate the reusing of the permutations as well as their inverse. We remark that this property is guaranteed by our shuffle protocol and hence is an apt fit for GraphSC based applications. A pictorial representation of the various translations described above with an example graph is provided later in §D.1 for clarity. To give a complete picture of securely evaluating message passing graph algorithms, we describe the representative case of breadth first search (BFS) algorithm. We showcase how this captures the need for Composed-Shuffles, and thereby use of Ruffle-2.
Breadth-first search. We consider a secure evaluation of the BFS algorithm to determine all nodes that are within a given distance from a source node while ensuring the private graph remains hidden. Additionally, we wish to determine the distance of such nodes from the source node u. The input L is assumed to be such that the data item d u corresponding to the source node u is set as 0, while those of every other entry in L is set to a very high value denoted ∞. The data item d v corresponding to node v thus encodes the distance of this node from the source node u. The parties use L as defined above and proceed as follows.
One-time set up phase: Given that parties have · -shares of L, they invoke Ruffle-2 to generate L where L = (L). They also generate L from L via Ruffle-2 where L = (L ). Further, the parties generate the mapping from L to · -shares in source-sorted list and L to · -shares in destination-sorted list, as described earlier. This completes the necessary set-up.
Message-passing phase: To determine the nodes at a distance from a given source node, the parties evaluate the BFS protocol in rounds, with each round comprising the following operations.
1. Scatter: Assuming L as the input, obtain the required sourcesorted list using the publicly known mapping. Perform a linear scan of the source-sorted list wherein each node v sends out the data d v + 1 on all of its outgoing edges. Using the publicly known mapping from the source-sorted list to (L), obtain L (which now has updated data entries owing to Scatter).
2. Obtaining L : Transform shares of updated L into shares of L using the Ruffle-2 protocol with permutation .
3. Gather: Assuming L as the input, obtain the destinationsorted list using the publicly known mapping. Perform a linear scan of the destination-sorted list wherein each node v gathers the data entries among all of its incoming edges and updates its data d v to the value minimum among them. Using the publicly known mapping from the destination-sorted list to (L), obtain L (which now has updated data entries owing to Gather).

4.
Obtaining L : Transform shares of updated L into shares of L using the Ruffle-2 protocol with permutation −1 . In this way, each round of message passing propagates distance information through edges and allows nodes to update their distance from the source node as the minimum among the propagated values. Thus, determining the target nodes and their corresponding distances occurs in rounds. Since each round requires two invocations of secure shuffle, the overall BFS computation requires 2 sequential invocations of Ruffle-2. The formal protocol for Scatter and Gather are included for completeness in §D.2.

D.1 Pictorial example
The transformations involved in a GraphSC application are depicted with the help of an example graph in Fig. 10. For this, we consider a graph with 3 nodes (v 1 , v 2 , v 3 ) and 2 edges (e 12 , e 32 ). For the ease of understanding, the entries in the input L are also named accordingly. However, note that the parties in P are only aware of the size of L and not its entries on clear. Same is the case with other lists L , L where each of its entries is only known in secret shares among parties in P. The arrows in red denote the invocation of our secure shuffle protocol. On the other hand, the arrows in blue depict the position of each element in the L to its position in the source-sorted list. In this way, the blue arrows collectively represent the publicly known permutation that can be obtained by revealing the result corresponding to each comparison made in the secure sort protocol applied on the secret shared entries of L . The same applies to the blue arrows between L and destination-sorted list.

D.2 Scatter-Gather primitives
We now describe the Scatter and Gather primitives for breadth first search in the GraphSC paradigm. As described earlier, the input to the algorithm is a directed graph where every node and edge in the graph is encoded as a tuple. Let this graph be denoted by G and the ℎ tuple in the graph be denoted as G[ ]. Every vertex tuple is encoded as (u, u, isVertex, buff, dist) and every edge tuple is encoded as (u, v, isVertex, buff, dist) with u, v being the source and destination vertices of an edge. Thus, the entries of isVertex, buff, dist are the components of the data part of each tuple. The isVertex value in a tuple indicates whether the tuple belongs to a vertex and hence is set as 1 for a vertex tuple and 0 for an edge tuple. The dist value holds the minimum distance of a vertex tuple from a given source node. The buff value is used as a buffer for communicating between vertices and edges. During Scatter, the value (dist + 1) corresponding to each vertex is transferred to the buff value of all it's outgoing edges by performing a linear scan of all the entries in G. During Gather, the minimum of all the buff values among all the incoming edges to a vertex is used to assign the minimum distance of that vertex. The steps required to perform such a Scatter and Gather are provided formally in Fig. 11. This provides the clear text algorithm for the primitives and the secure variants can be easily obtained by using secure operations for additions, comparison as well as secure realizations of if − else conditions as done in [6,43]. Further, we note that  for simplicity, the input to both the primitives is denoted as G. However, the assumption is that the graph G is source-sorted for Scatter and is destination-sorted for Gather. Further, the steps in Fig.  11 are provided for a single processor setting. They can be easily translated to the multi-processor setting using the work of [43] thereby providing a solution with number of rounds in the order of |V |+ |E | + log(|V| + |E|), where P is the number of processors used for running the parallel algorithm. This is unlike the linear algorithm which has a round complexity in the order of |V| + |E| as described in Fig. 11.

E BENCHMARK DETAILS
With respect to the application of anonymous broadcast, we observe that our Python based implementation of [20] has an overhead compared to the public implementation of [20] (GO-based). We believe this is due to the use of Python based implementation. For example, time taken to generate a random 16000 byte value is 9.881 × 10 −5 seconds in Clarion's GO-based implementation whereas the same task takes 6.915 × 10 −3 seconds in our Pythonbased implementation, under the same system configuration. We believe that similar trends apply to the GraphSC framework of [6] as well. We could not get the exact values since [6] does not provide public implementation of their work. Further, we note that the higher cost reported here is also due to the fact that the variant of BFS we consider here is different from the one considered in [6]. The latter considered the simplistic case of determining all nodes reachable within -rounds which only requires Boolean operations. Making their protocol complaint with arithmetic operations, as required for our BFS implementation, introduces additional overheads.

F SECURITY OF OUR PROTOCOLS
Lemma F.1. Whenever a TTP is identified, it is always honest.
Proof. Consider the instances in our shuffle protocol (Fig. 4) where we proceed to identify a TTP. We will show that in each such case, a malicious party is never identified as the TTP.
Preprocessing phase. During this phase, a TTP is identified whenever a Shuffle-Pair fails. Without loss of generality, let the first failed Shuffle-Pair instance be with respect to , where TTP = .
Since and are involved in communication, failure occurs when either of them sends an incorrect message. Hence, the malicious party lies in { , } set. Thus, , which is identified as the TTP, is honest as only one out of the three parties could be malicious.
Online phase. During the online phase, consider the first instance where a receiver broadcasts ("accuse", , , , ), where , are the senders who send , , respectively, to , and = H( ). We consider all corruption scenarios and prove that the TTP identified in each such case is an honest party.
is corrupt: In this case, when broadcasts ("accuse", , , , ), it is due to an incorrect message sent by to . Party will not broadcast ("accuse", ), since , are both honest and there will not be a mismatch in their messages. Thus, the malicious will never be identified as a TTP. If broadcasts ("accuse", ), then TTP is set as which is an honest party. Else, if does not broadcast ("accuse", ), then TTP is set as which is also an honest party. -is corrupt: A similar argument follows for this case.
is corrupt: In this case, since , are honest, if broadcasts ("accuse", , , , ), then has wrongfully accused and . In the broadcast message, = , then other parties know that is the malicious party and set TTP = . Else, if ≠ , at least one of or will broadcast ("accuse", ) if the value broadcast by is different from what they sent. Hence, TTP ∈ { , } will be set as an honest party. □ Lemma F.2. The shuffle protocol, Π Ruffle (Fig. 4) securely realizes the functionality F Shuffle (Fig. 1) against a malicious adversary that corrupts at most one party in P, in the F setup -hybrid model. Preprocessing: (1) Using the keys commonly held with A (generated as part of F setup ), sample the common randomness. Online: (1) Let To 01 , To 02 denote the partial shares of T generated towards A during preprocessing. Let To ∈ Z 2 ℓ be sampled randomly. Invoke the ideal functionality F Shuffle with A's · -shares of the table-To 01 , To 02 , To . Proof. Let A denote the real-world adversary and S Π Ruffle denote the corresponding ideal-world adversary. At a high level, S Π Ruffle begins by first emulating F setup during which common keys are established with A that are used to sample the common randomness required throughout the protocol. Thus, S Π Ruffle is aware of all the randomness used by A using which it can extract the input of A (specifically, S Π Ruffle knows [ T ] , T held by A) as well as verify the correctness of messages sent by A. Following this, it simulates the steps of the shuffle protocol. The simulation steps for a corrupt 0 are provided in Fig. 12, where the corresponding simulator is denoted as S 0 Π Ruffle . Analogously the corruption of 1 , 2 can also be simulated.
Observe that in the real-world, during the preprocessing phase, A receives messages that are computed as part of Shuffle-Pair and Set-Equality, where the messages are masked using some randomness to hide the missing permutation as well as information regarding the missing share at A. During the online phase, A receives 12 , H( 12 ) where 12 is a randomized using the random mask R 12 . In this way, observe that the messages received by A in the real-world are random and uniform. In the ideal world too, observe that A receives messages that are sampled randomly from the uniform distribution. Moreover, S 0 Π Ruffle can verify the correctness of the messages sent by A, as described earlier. This allows S 0 Π Ruffle to also simulate all the accuse messages as done in the real world. In this way, real-world and ideal-world executions are indistinguishable. □