43
A Security Framework for Privacy-preserving Data Aggregation in Wireless Sensor Networks ALDAR C-F. CHAN National University of Singapore CLAUDE CASTELLUCCIA INRIA A formal treatment to the security of concealed data aggregation (CDA) and the more general private data aggregation (PDA) is given. While there exist a handful of constructions, rigorous security models and analyses for CDA or PDA are still lacking. Standard security notions for public key encryption, including semantic security and indistinguishability against chosen ciphertext attacks, are refined to cover the multi-sender nature and aggregation functionality of CDA and PDA in the security model. The proposed security model is sufficiently general to cover most application scenarios and constructions of privacy-preserving data aggregation. An impossibility result on achieving security against adaptive chosen ciphertext attacks in CDA/PDA is shown. A generic CDA construction based on public key homomorphic encryption is given, along with a proof of its security in the proposed model. The security of a number of existing schemes is analyzed in the proposed model. Categories and Subject Descriptors: C.2.0 [Computer-Communication Networks]: General—Security and Protection; C.2.1 [Computer-Communication Networks]: Network Architecture and Design—Wireless Sensor Networks General Terms: Security, cryptography Additional Key Words and Phrases: Privacy-preserving data aggregation, concealed data aggre- gation, provable security, semantic security, adaptive chosen ciphertext attacks. 1. INTRODUCTION Supporting efficient in-network data aggregation while preserving data privacy has emerged as an im- portant requirement in numerous wireless sensor network applications [He et al. 2007; Acharya et al. 2005; Castelluccia et al. 2005; Girao et al. 2005; Westhoff et al. 2006; Armknecht et al. 2008]. As a key An earlier version of this paper appeared in ESORICS 2007 as [Chan and Castelluccia 2007]. This is an extended version with a substantially generalized and extended security model. New work includes the impossibility result on security against adaptive chosen ciphertext attacks, revised security analyses in the generalized model, and new analyses on one CDA scheme [Armknecht et al. 2008] and two general private data aggregation schemes [He et al. 2007]. Aldar C-F. Chan’s work was performed, in part, at INRIA. Authors’ address: Aldar C-F. Chan, Department of Computer Science, School of Computing, National University of Singapore, Singapore. Email: [email protected] Claude Castelluccia, INRIA, Zirst - 655 avenue de l’Europe, 38334 Saint Ismier Cedex, France. Email: [email protected] Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. Tocopy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. c 20YY ACM 0730-0301/20YY/0100-0001 $5.00 ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY, Pages 1–0??.

A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

A Security Framework for Privacy-preserving DataAggregation in Wireless Sensor Networks

ALDAR C-F. CHANNational University of SingaporeCLAUDE CASTELLUCCIAINRIA

A formal treatment to the security of concealed data aggregation (CDA) and the more generalprivate data aggregation (PDA) is given. While there exist a handful of constructions, rigoroussecurity models and analyses for CDA or PDA are still lacking. Standard security notions for publickey encryption, including semantic security and indistinguishability against chosen ciphertextattacks, are refined to cover the multi-sender nature and aggregation functionality of CDA andPDA in the security model. The proposed security model is sufficiently general to cover mostapplication scenarios and constructions of privacy-preserving data aggregation. An impossibilityresult on achieving security against adaptive chosen ciphertext attacks in CDA/PDA is shown.A generic CDA construction based on public key homomorphic encryption is given, along witha proof of its security in the proposed model. The security of a number of existing schemes isanalyzed in the proposed model.

Categories and Subject Descriptors: C.2.0 [Computer-Communication Networks]: General—Security andProtection; C.2.1 [Computer-Communication Networks]: Network Architecture and Design—WirelessSensor Networks

General Terms: Security, cryptography

Additional Key Words and Phrases: Privacy-preserving data aggregation, concealed data aggre-gation, provable security, semantic security, adaptive chosen ciphertext attacks.

1. INTRODUCTION

Supporting efficient in-network data aggregation while preserving data privacy has emerged as an im-portant requirement in numerous wireless sensor network applications [He et al. 2007; Acharya et al.2005; Castelluccia et al. 2005; Girao et al. 2005; Westhoff et al. 2006; Armknecht et al. 2008]. As a key

An earlier version of this paper appeared in ESORICS 2007 as [Chan and Castelluccia 2007]. This is an extended version with asubstantially generalized and extended security model. New work includes the impossibility result on security against adaptivechosen ciphertext attacks, revised security analyses in the generalized model, and new analyses on one CDA scheme [Armknechtet al. 2008] and two general private data aggregation schemes [He et al. 2007].Aldar C-F. Chan’s work was performed, in part, at INRIA.Authors’ address:Aldar C-F. Chan, Department of Computer Science, School of Computing, National University of Singapore, Singapore. Email:[email protected] Castelluccia, INRIA, Zirst - 655 avenue de l’Europe, 38334 Saint Ismier Cedex, France. Email:[email protected]

Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that thecopies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication,and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to poston servers, or to redistribute to lists requires prior specific permission and/or a fee.c© 20YY ACM 0730-0301/20YY/0100-0001 $5.00

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY, Pages 1–0??.

Page 2: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

2 ·approach to fulfilling this requirement of private data aggregation, concealed data aggregation (CDA)in which multiple source nodes send encrypted data to a sink along a convergecast tree with ciphertextaggregation performed en route is an active research problem [Acharya et al. 2005; Castelluccia et al.2005; Girao et al. 2005; Westhoff et al. 2006; Peter et al. 2008; Armknecht et al. 2008]. The problemof CDA was first investigated by Westhoff et. al. [Girao et al. 2005; Westhoff et al. 2006] which alsoposed a number of design requirements unique to CDA, on top of the privacy-preservation requirementin general private data aggregation (PDA). In brief, CDA can be considered as a subset of efficient PDAschemes with additional design criteria.

The main difference between CDA and general PDA lies in the aggregation topology. In CDA, theaggregation topology has to be a tree, whereas aggregation can be performed over arbitrary topologies inprivate data aggregation.1 Since a tree (which is a graph without loops) is the most efficient aggregationtopology (wherein each node only needs to transmit a single intermediate aggregate or a single datamessage), the capability to work on an aggregation tree is actually a strength of CDA, compared to PDA.In some PDA schemes such as [He et al. 2007], a non-tree topology has to be adopted in order to protectdata privacy of individual nodes. Another difference is that, in CDA, the aggregation function is a publicalgorithm and the (encryption and aggregation) algorithms do not depend on the aggregation topologywhile, in some PDA schemes, an aggregating node needs some secret to perform aggregation and theused algorithms may rely on knowledge of the aggregation topology. For example, [He et al. 2007]proposes two PDA schemes based on non-tree topologies, one using a tree with complete subgraphsattached to all its leaves and the other adopting a graph-sum of a number of trees. Besides, secret keysare needed as input to the aggregation algorithms of both schemes. Be it CDA or general PDA, the maingoal is to ensure privacy, both end-to-end aggregate privacy and individual node privacy included.

Although assuring end-to-end aggregate authenticity (or integrity) is also an essential security re-quirement for in-network aggregation to work correctly in the presence of active attacks, the primarydesign goal of CDA/PDA is still on privacy. This paper mainly focuses on the security model for privacyin CDA and PDA; work on authenticity in data aggregation [Hu and Evans 2003; Przydatek et al. 2003;Chan et al. 2006; Manulis and Schwenk 2007; 2009] is thus out of the scope but the discussion here iscomplementary to these attempts in general. Nevertheless, it should be noted that, while the notions ofprivacy and authenticity are orthogonal and usually treated independently in one-to-one communication(with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authenticationcodes [Bellare et al. 1996] for authenticity), it is not necessarily true that the two notions can always betreated separately in the context of privacy-preserving data aggregation.

The privacy goal is two-fold. First, the privacy of data has to be guaranteed end-to-end, that is, onlythe sink could learn about the final aggregation result and only a negligible amount of information aboutthe final aggregate should be leaked out to any eavesdropper or node along the path. Each node shouldonly have knowledge of its data, but no information about the data of other nodes. Second, to reducethe communication overhead, the data from different source nodes have to be efficiently combined byintermediate nodes (i.e. aggregation) along the path. Nevertheless, these intermediate nodes should notlearn any information about the final aggregate or individual nodes’ data in an ideal scheme. It appearsthat these two goals are in conflict. As a result, deliberate study on the security definitions and rigorousanalyses on CDA and PDA schemes are necessary. While there exist a handful of constructions ofCDA [Acharya et al. 2005; Castelluccia et al. 2005; Girao et al. 2005; Westhoff et al. 2006; Armknechtet al. 2008] and PDA [He et al. 2007] achieving various levels of privacy-efficiency tradeoff, a rigoroustreatment to the security definitions, notions and analyses of CDA or PDA is still lacking despite its

1Using non-tree topologies implicitly implies that, in general private data aggregation, the contribution from a node may be sentin portions via multiple distinct paths to the sink, with each portion aggregated into the final aggregate through a different path.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 3: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 3

importance to verifying the correctness and evaluating the security strength of the proposed schemes.This work aims to fill the gap based on the paradigm of provable security [Goldwasser and Micali 1984;Bellare 1997].

While there has been a solid foundation in cryptography for both private-key [Shannon 1949; Luby1996; Katz and Yung 2006] and public-key [Goldwasser and Micali 1984; Micali et al. 1988; Dolev et al.2000] encryption, a refinement to such standard security models is needed to cover the salient featuresof CDA/PDA. First, a CDA or PDA scheme can be based on private key or public key cryptography; thatis, the encryption function could be public or private. Second, CDA and PDA are many-to-one (multi-sender single-receiver) while cryptosystems in the literature are either one-to-one [Katz and Yung 2006;Goldwasser and Micali 1984] or one-to-many [Shoup and Gennaro 2002; Fiat and Naor 1993]. Third,CDA and most PDA schemes include the aggregation functionality on encrypted data whose adversarialmodel needs a new definition. This paper extends the standard security notions of semantic security[Goldwasser and Micali 1984] and indistinguishability against chosen-ciphertext attacks [Micali et al.1988] to the CDA/PDA setting and analyzes existing schemes [Castelluccia et al. 2005; Westhoff et al.2006; Armknecht et al. 2008; He et al. 2007]. The notion of forward security in the context of CDA/PDAis also considered.

1.1 Related Work

Westhoff et. al [Westhoff et al. 2006; Girao et al. 2005] gave the first CDA construction based onthe Domingo-Ferrer private key homomorphic encryption [Domingo-Ferrer 2002] and coined the termCDA. The scheme allows additive aggregation. Castelluccia et. al constructed a Vernam cipher likeCDA scheme for additive aggregation in [Castelluccia et al. 2005] and subsequently improved the con-struction in [Castelluccia et al. 2009]. A key distribution architecture is proposed in [Armknecht et al.2008] to address the communication overhead problem of CMT-based schemes (requiring node identi-ties to be sent to the sink). In [Acharya et al. 2005], Westhoff et. al. gave a private aggregation schemefor comparing encrypted data; however, the security of the proposed scheme is not clearly defined. [Pe-ter et al. 2008] provides a comprehensive survey and comparative study on CDA schemes. All theseschemes are based on a tree as the aggregation topology. In [He et al. 2007], two general private dataaggregation schemes based on non-tree topologies for additive aggregation are given; however, using ageneral aggregation topology does not seem to improve the privacy of the proposed schemes over CDAschemes like [Castelluccia et al. 2009]. Based on the provable security paradigm, Manulis et. al. [Man-ulis and Schwenk 2007; 2009] proposed a security model for integrity protection in data aggregation,which is orthogonal to the security model in this paper; [Manulis and Schwenk 2007; 2009] focuseson the correctness and soundness of the aggregation process while this paper focuses on aggregate andnode privacy. It is fair to say despite the existence of these CDA and PDA constructions, a rigoroussecurity model and analysis for privacy-preserving data aggregation are still missing in the literature.

1.2 Our Contributions

The contribution of this paper is four-fold. First, the main contribution is the formalization of CDAand PDA security. To the best of our knowledge, this is the first paper in the literature to provide aformal security model and cryptographic analysis for privacy-preserving data aggregation. We extendthe standard security notions of encryption schemes to cover the CDA/PDA scenario. Our securitymodel is fairly general to cover most CDA/PDA designs and application scenarios. More specifically,it covers both private-key and public-key based CDA/PDA constructions and takes into account thepossibility of insider attacks due to compromised nodes. Both public and private aggregation algorithmsare included. The proposed security model and framework make no assumption on the aggregationtopology, thus applicable to schemes based on arbitrary aggregation topologies. The model also includes

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 4: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

4 ·the case in which global randomness for encryption is prescribed beforehand or chosen by the sink andbroadcast to the source nodes [Castelluccia et al. 2005]. Compared to an earlier version [Chan andCastelluccia 2007], the security model is generalized in a number of ways: first, the model in [Chanand Castelluccia 2007] is only applicable to CDA while the model here also covers PDA; second, [Chanand Castelluccia 2007] only considers protecting the privacy of the final aggregate, whereas, this paperconsiders privacy protection of the final aggregate and the inputs of all the non-compromised nodes;third, the forward security notion is added.

Second, we show that achieving security (indistinguishability or node privacy) against adaptive cho-sen ciphertext attacks seems impossible for CDA. The notion defined is not a contrived one but rather anatural extension of the counterpart notion widely adopted in public key encryption [Micali et al. 1988].

Third, we give a generic CDA construction based on any public key homomorphic encryption scheme.The construction is provably secure and based on a minimal set of assumptions. More specifically, pro-vided that the underlying homomorphic encryption is semantically secure, the generic CDA constructionachieves semantic security against any coalition with up to (n− 1) compromised nodes and is forwardsecure for the case of n compromised nodes where n is the total number of nodes in the system.

Fourth, based on the CDA/PDA security model proposed in this paper, we analyze five existingschemes — three CDA schemes (WGA [Westhoff et al. 2006], CMT [Castelluccia et al. 2005] andAWGH [Armknecht et al. 2008]) and two PDA schemes (CPDA [He et al. 2007] and SMART [He et al.2007]). We also propose a modification to CPDA to improve its computation efficiency while preservingthe security guarantee of the original scheme. We believe this work could provide useful guidelines forpractitioners to choose a proper scheme to fit the security requirements of their applications.

The rest of the paper is organized as follows. In Section 2, we give a brief introduction of the provablesecurity paradigm and the notations used in this paper. The definition of CDA/PDA and related securitynotions are presented in Sections 3 and 4 respectively. In Section 5, a generic CDA construction isgiven. The security of existing schemes is analyzed in Section 6, followed by a comparative summaryof the schemes discussed and a conclusion in Sections 7 and 8 respectively.

2. PRELIMINARIES

2.1 Provable Security

The idea of provable security was introduced in the seminal work of Goldwasser and Micali [Goldwasserand Micali 1984] and subsequently generalized by Bellare and Rogaway [Bellare and Rogaway 1993;1995; Bellare et al. 1994; Bellare et al. 1995] to include a number of practical considerations such asthe formal model for block ciphers and the random oracle model. The paradigm of provable securityis adopted in this paper to specify the security goals, to define the security model, and to analyze thesecurity strength of CDA and PDA constructions. A good reference on provable security can be foundin [Kolbitz and Menezes 2007; Bellare 1997]. The proofs in this paper are done through a sequence ofgames with slight differences in the transition between consecutive games[Shoup 2004].

2.2 Notations

We follow the notations for algorithms and probabilistic experiments originating in [Goldwasser et al.1988]. A detailed exposition can be found there. We denote by z ← A(x, y, . . .) the experiment ofrunning a probabilistic algorithm A on inputs x, y . . ., generating output z. The probability distributioninduced by the output of A is denoted by {A(x, y, . . .)}. The notation x← D means randomly pickinga sample x from the probability distributionD; if no probability function is specified forD, we assumex is uniformly picked from the sample space. As usual, PPT denotes probabilistic polynomial time. ⊥denotes an output failure of an algorithm which may occur when the input to the algorithm is invalid.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 5: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 5

We denote by N the set of non-negative integers, and R the set of real numbers. An empty set isdenoted by φ and an empty element by ε. We usually denote the security parameter λ over unitaryby 1λ; this corresponds to a λ-bit key being used. The following definition of negligible functions[Goldreich 2001] is adopted in this paper.

DEFINITION 2.1. A positive-valued function ε : N → R is negligible in λ if for every positivepolynomial p(·) there exists a value of N such that for all λ > N , ε(λ) < 1

p(λ) .

When we say a certain quantity is negligible in λ, we mean the quantity is a negligible function in λ.

3. DEFINITIONS

A typical CDA/PDA scheme includes a sink R and a set U of n source nodes (which are usually sensornodes) where U = {si : 1 ≤ i ≤ n}. Denote the set of node identities by ID ; in the simplest case,ID = [1, n].

In the following discussion, hdr ⊆ ID is a header indicating the nodes contributing to a final aggre-gate (the sink receives) or an intermediate aggregate (usually in encrypted form). For an intermediateaggregate, some auxiliary information a i ∈ {0, 1}∗ along with hdri provides aggregation instructionsfor an aggregating node; this information is rarely needed, only in some general PDA schemes 2 suchas [He et al. 2007], and not used in CDA. For instance, a i indicates whether the aggregation process isin the first or second step in [He et al. 2007]. Besides, aux ∈ {0, 1}∗ represents some auxiliary globalinformation about the aggregation topology which some schemes (such as the hop-by-hop aggregationscheme described in [Castelluccia et al. 2005]) 3 rely on for encryption or aggregation; it is not neededin CDA. All these abstract constructs or data structures (includinghdr, a i, aux) are needed in the defini-tions so as to make the security model in this paper sufficiently general to cover most privacy-preservingdata aggregation schemes. Nevertheless, generating them should not be treated as a requirement for ac-tual implementations. In schemes wherein such data structures are meaningless, they can be treatedas the empty element ε; we usually exclude these data structures from the notation when a scheme inquestion does not need them.

Given a security parameter λ, a CDA/PDA scheme is a 4-tuple of polynomial time algorithms withdetails as follows.

Key Generation (KG). Let KG(1λ, n)→ (dk, ek1, ek2, . . . , ekn) be a probabilistic algorithm. Then,eki (with 1 ≤ i ≤ n) is the encryption key assigned to source node s i and dk is the correspondingdecryption key given to the sink R.

Encryption (E). Eeki(mi; aux) → (hdri, ci)/ ⊥ is a probabilistic encryption algorithm taking aplaintext mi (from a certain message spaceM) and an encryption key ek i as input to generate a cipher-text ci and a header hdri ⊂ ID according to the auxiliary information input aux (about the aggregationtopology and instructions on how to divide m i). hdri indicates the identity of the node performingencryption, that is, hdri = {i} for identity i. ⊥ represents an encryption failure which may occur whenaux specifies an invalid aggregation topology.We sometimes denote the encryption function by Eeki (mi, r; aux) to explicitly show by a string r therandom coins used in the encryption process.

2In such schemes, an aggregating node differentiates between incoming intermediate aggregates, which are processed with dif-ferent procedures.3In hop-by-hop aggregation, each pair of neighboring nodes shares a secret key. An aggregation tree is formed. Each nodeencrypts its data (using its parent’s key) and send the encrypted data to its parent. When an aggregating node receives a number ofencrypted aggregates, it decrypts each one of them and aggregates the resulting plaintexts and re-encrypt the combined aggregate(using its parent’s key) and passes the encrypted aggregate to its parent.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 6: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

6 ·Decryption (D). Given an (encrypted) aggregate c and its header hdr ⊆ ID (which indicates the

source nodes included in the aggregation), Ddk(hdr, c) → m/ ⊥ is a deterministic algorithm whichtakes the decryption key dk, hdr and c as inputs and returns the plaintext aggregate m or possibly ⊥ ifc is an invalid ciphertext.We sometimes denote the decryption function by Ddk(hdr, c, r) to explicitly indicate the global randomcoins r that may be used in the decryption process of some of CDA/PDA schemes, in particular, thosebased on private key cryptography such as [Castelluccia et al. 2005].

Aggregation (Agg). With a specified associative aggregation function f , the corresponding aggre-gation algorithm Aggf (ekl, (hdri, ci; ai), (hdrj , cj ; aj); aux) → (hdrl, cl; al)/ ⊥ aggregates two en-crypted intermediate aggregates ci and cj with headers hdri and hdrj respectively (where ai and aj arethe auxiliary information providing aggregation instructions) to create a combined encrypted aggregatecl, a new header hdrl = hdri ∪ hdrj and new auxiliary information a l, where ekl is the encryptionkey of the aggregating node. One of the input 3-tuples can be the empty element ε for modeling thescenario that a source node (in general private data aggregation) extracts a specific part of its ciphertextto be sent to one of its parents.Note that the aggregation algorithm does not need the decryption key dk as input; depending on whetherit needs the secret encryption key ekl of the aggregating node (in symmetric-key based schemes only) asinput, it could be a public or private algorithm. The aggregation algorithm in CDA is public while itscounterpart in hop-by-hop aggregation, using the aggregating node’s secret key as input, is private.

Depending on applications, the aggregation function f could be any associative function, for instance,f can be the sum, product, max, etc.. Leveraging on the associativity property, we abuse the notation inthis paper: we denote the composition of multiple copies of f simply by f(m 1,m2, . . . ,mi) irrespectiveof the order of aggregation and call it the f -aggregate on m 1,m2, . . . ,mi; to be precise, it should bewritten as f(f(f(m1,m2), . . .),mi) with a certain aggregation order.

It is intentional to include the description of the header hdr in the above definition so as to makethe security model as general as possible (to cover CDA schemes requiring headers in their opera-tions [Castelluccia et al. 2005; Castelluccia et al. 2009]). Nonetheless, generating headers or includingheaders as input to algorithms should not be treated as a requirement in the actual construction or imple-mentation of CDA or PDA algorithms. For constructions which do not need headers (such as the genericconstruction given in Section 5), all hdr’s can simply be treated as the empty set φ in the security modeland the discussions in this paper still apply. By the same token, aux and a i are merely notational con-structs to make the security model sufficiently general to cover schemes based on non-tree aggregationtopologies and schemes which differentiate aggregation inputs respectively. They can be treated as theempty element when analyzing schemes not needing them. For such cases (such as in CDA), we simplyexclude aux and ai in the notations in this paper.

In the aggregation algorithm, we do not require hdr i and hdrj to be non-overlapping. The reason isthat, in private data aggregation with a non-tree aggregation topology, the contribution of a node mayreach the sink in parts via multiple paths. In other words, both c i and cj may have incorporated portionsof the same node’s contribution. On the contrary, for CDA schemes, we require that hdr i ∩ hdrj = φ(the empty set). Suppose ci and cj are the ciphertexts for plaintext aggregates m i and mj respectivelyin a CDA scheme. The output cl of the aggregation algorithm Aggf is the ciphertext for the aggregatef(mi,mj), namely, Ddk(hdrl, cl) → f(mi,mj). Such an input-output relationship may not hold forthe aggregation done by an arbitrary node in general PDA [He et al. 2007].

Normally, the auxiliary information a i is not needed for most schemes, the aggregation algorithmof which is homogeneous (i.e. an aggregating node does not differentiate aggregates and all receivedintermediate aggregates are processed in the same way). For example, all CDA schemes need no such

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 7: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 7

information. However, in a few PDA schemes such as [He et al. 2007] or hop-by-hop aggregation[Castelluccia et al. 2005], different modes or procedures could be applied to received aggregates. Forinstance, in a PDA scheme called SMART [He et al. 2007], a i indicates whether the aggregation processis in the first or second aggregation step wherein different processing procedures are applied. a i modelsthe additional side information that an aggregating node can obtain by extracting different fields (suchas sender’s identity) in the header of a received packet in the actual implementation.

Similarly, aux models global auxiliary information such as the aggregation topology adopted in thecurrent session. In detail, aux can be represented by a labeled, directed graph, with the direction of anedge in the graph representing the direction of information flow in the aggregation session and the labelsstoring possible additional aggregation instructions. This information is normally not needed in mostschemes such as CDA. But in some PDA designs which is not topology-decoupled, the encryption oraggregation algorithms are topology dependent. For instance, the aggregation algorithm of hop-by-hopaggregation needs information (like where the aggregates are received from and where the result is sentto) to choose a correct secret key for proper processing. The main advantage of modeling this kind ofauxiliary information as input parameter aux is that the aggregation process (the focus of the securitymodel) can be isolated from the algorithmic details of topology formation in specific schemes. Thesecurity model in this paper assumes that an adversary can freely manipulate and be in full control ofthe underlying topology formation process. The goal is that, even so, the privacy of the aggregationprocess is still assured.

3.1 Correctness

To define the correctness of a CDA/PDA scheme, we denote the encrypted result of an aggregationsession by AGf (aux, ek1, ..., eki, ..., ekn, (hdr1, c1), ..., (hdri, ci), ..., (hdrn, cn)), which is an algo-rithmic procedure modeling the aggregation process. 4 In essence, AGf () is a composite procedurecomposed of a sequence of invocations of the aggregation function Agg f , with the sequence determinedaccording to the information in aux. A full description of AG f () is as follows:

Algorithm AGf (aux, ek1, ..., eki, ..., ekn, (hdr1, c1), ..., (hdri, ci), ..., (hdrn, cn))

(1) Initialize a1, a2, ..., an based on aux.

(2) With the sink (of the aggregation session) as the root, do a breath-first search on theunderlying graph of aux to obtain an order of the nodes s i1 , ..., sij , ..., sin wheresi1 is the sink. Those nodes not in aux are put at the back of the order arbitrarily.

(3) While EndSession(aux, a1, a2, ..., an) = FALSE:For j = n, ..., 1:

For each sl ∈ Child(sij ),set (hdrij , cij ; aij )← Aggf (ekij , (hdrij , cij ; aij ), (hdrl, cl; al); aux).

(4) Output (hdri1 , ci1).

Note that, in the AGf () algorithm depicted above, EndSession(aux, a1, a2, ..., an) is a subroutinedetermining whether the current aggregation session is complete. While each node invokes the aggre-

4Although all nodes’ inputs are included as inputs to AGf (), the algorithm AGf () only takes the inputs of those nodes includedin aux. That is, AGf () has a filtering function based on aux; for an aggregation session involving a subset S of nodes, themembership of S for that session can be extracted from aux; those not included in S will be excluded by AGf (). In fact, aux ispart of the input to the encryption algorithm E; if a node is not in S, E returns ⊥; the resulting ciphertext is neglected by AGf ().

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 8: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

8 ·gation function Aggf once in CDA, in some PDA schemes such as [He et al. 2007], a node may invokeAggf multiple times before an aggregation session is complete. The subroutine Child(s i) returns theset of all child nodes of the input node s i. The correctness of CDA/PDA is defined as follows.

DEFINITION 3.1. A CDA/PDA scheme (KG,E,D,Agg) is correct for the aggregation function f if,for every n, every λ, every key tuple (dk, ek1, ek2, ..., ekn) ← KG(1λ, n), every possible aggregationtopology aux ∈ {0, 1}∗, every possible message tuple (m1,m2, ...,mn) ∈Mn, the following holds:

Pr

⎡⎢⎢⎣(hdr1, c1)← Eek1(m1; aux); . . . ; (hdrn, cn)← Eekn(mn; aux);(hdr, c)← AGf (aux, ek1, ek2, ..., ekn, (hdr1, c1), (hdr2, c2), ..., (hdrn, cn));m← Ddk(hdr, c) :m = f(m1,m2, ...,mn)

⎤⎥⎥⎦ = 1

Informally, the correctness definition can be interpreted as follows: when the encryption and de-cryption functions, E and D respectively, are performed with correct headers and matched keys (thatis, generated by the same instance of the key generation algorithm KG), and all the invocations of theaggregation algorithm Aggf are run properly (with correct headers and auxiliary information) and allthe intermediate results are passed on correctly according to aux, the decryption of the result of the ag-gregation process should give back the f -aggregate of all the data input to the encryption. We require,in the definition, the probability of getting back the f -aggregate of all the inputs be 1. Note that mostof the algorithms defined in a CDA/PDA scheme (KG,E,D,Agg) are basically probabilistic and anaggregation session can hence be treated as a probabilistic experiment. It is thus appropriate to definethe correctness property of CDA/PDA by a probability statement.

3.2 Operation Examples

The syntax of the algorithms is so defined to provide a model sufficiently general to cover the operationof a wide range of PDA schemes. In a CDA scheme, many of the defined parameters are irrelevantbecause, in CDA, the aggregation topology has to be a tree, the algorithms are not topology dependent,and the aggregation algorithm is public. The family of CDA schemes is a special instance of the abovedefinitions with its algorithmic syntax defined in the following reduced forms:

Key Generation. KG(1λ, n)→ (dk, ek1, ek2, . . . , ekn)

Encryption. Eeki(mi)→ (hdri, ci)

Decryption. Ddk(hdr, c)→ m/ ⊥Aggregation. Aggf ((hdri, ci), (hdrj , cj))→ (hdrl, cl)

To illustrate how this operational model can cover a variety of CDA/PDA schemes, we discuss theoperation of two schemes with respect to the above definitions.

Typical CDA Operation. The operation of CDA runs as follows. In initialization, the sink R runs KGto generate a set of encryption keys {eki : 1 ≤ i ≤ n} and the corresponding decryption key dk anddistributes each one of the encryption keys to the corresponding source, say, ek i to si. Depending onconstructions, the encryption keys eki could be private or public, but the decryption key dk has to beprivate in all cases.

At a certain instant, the sink selects a subset S ⊆ U of the n source nodes to report their data. Eachsi ∈ S uses its encryption key eki to encrypt its data represented by the plaintext m i, giving a ciphertextci. We do not pose restrictions on whether global or local random coins should be used for encryption.If each source node generates its random coins individually, the random coins are said to be local; if therandom coins are chosen by the sink and broadcast to all source nodes, they are global. Global random

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 9: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 9

coins are usually public. When global random coins are used, we do not pose any restriction on thereuse of randomness despite that, in practice, each global random coin is treated as a nonce, that is,used once only. The generic construction given in Section 5 uses local random coins whereas the CMTscheme [Castelluccia et al. 2005] uses a global nonce.

Usually, the source nodes form a convergecast tree over which the encrypted data are sent. To savecommunication cost, aggregation is done en route to the sink whenever possible. When a node s i in thetree receives x ciphertexts, say (hdri1 , ci1), . . . , (hdrix , cix), from its children nodes5 (with identitiesi1, . . . , ix ∈ S), it aggregates these ciphertexts along with its own ciphertext (hdr i, ci) by runningAggf successively. The convergecast tree structure ensures that any pair of these headers have anempty intersection. Suppose ci1 , . . . , cix are the ciphertexts for the plaintext aggregates m i1 , . . . ,mix .The resulting ciphertext is: (hdrl, cl) where hdrl = hdri1 ∪ . . .∪ hdrix ∪ hdri and cl is the encryptionof the aggregate f(mi1 , . . . ,mix ,mi). Note that Aggf is public.

Eventually, a number of encrypted aggregates would arrive at the sink which combines them throughrunning Aggf to obtain a single encrypted aggregate csink and then applies the decryption algorithm tocsink to get back the plaintext aggregate f(. . . ,mi, . . .) with si ∈ S. We require the CDA be correct inthe sense that when the encryption and decryption are performed with matched keys and correct headersand all the aggregations are run properly, the decryption should give back an f -aggregate of all the dataapplied to the encryption (Definition 3.1).

Operation of a General PDA Scheme — SMART [He et al. 2007]. The operation of SMART is asfollows. In initialization, the sink R runs KG to generate and distribute key rings for all nodes so thateach node si shares a common secret key with each other node. That is, we assume all the bootstrappingprocesses (key predistribution, key discovery and path key establishment as named in [Eschenauer andGilgor 2002]) are completed in the initialization stage. The key ring of node s i is represented by eki anddk is the union set of all key rings. In a reporting epoch, a multi-level tree is formed covering all nodesand each node si forms a 1-level tree with (J − 1) other nodes (where J is a design parameter). Denote

this set of logical neighbors of si by Si = {s(i)1 , . . . , s(i)i′−1, s

(i)i′+1, . . . , s

(i)J }. Note that s(i)i′ is si itself

and 1 ≤ i′ ≤ J . While the aggregation over the multi-level tree is from-leaf-to-root, the aggregationover the 1-level tree is from-root-to-leaf. 6 All this topology information is represented by aux. Theaggregation is done in two steps.

In the first aggregation step, aggregation is done over the 1-level tree. Each node s i runs E whichperforms the following on si’s input data mi: (1) split mi into J pieces mi1,mi2, . . . ,miJ such that∑J

j=1 mij = mi; encrypt each mij (where j �= i′ and 1 ≤ j ≤ J) with the secret key shared with node

s(i)j ∈ Si (aux provides the necessary information to select the key), resulting in (J − 1) ciphertextscij (j �= i′, 1 ≤ j ≤ J); si keeps mii′ and we could take mii′ as cii′ . The overall ciphertext is:ci = ci1||ci2|| . . . ||ciJ . For each j ∈ [1, J ], si runs Aggf (({i}, ci; aij), ε; aux) which extracts cij from

ci and outputs auxiliary information a ′ij for cij . aij specifies the intended next aggregating node s

(i)j

for Aggf . si then sends the 3-tuple (hdri, cij ; a′ij) to s(i)j where hdri = {i}. Suppose s(i)j is sl. When

s(i)j runs Aggf on (hdri, cij ; a

′ij) and other received 3-tuples, a ′

ij indicates the first aggregation stepprocedures should be applied and instructs Aggf to first decrypt cij using the key indicated by hdriand aux and then aggregate the results into the output aggregate c l. Note that cl is not encrypted. Theaggregation output of Aggf in this step is of the form: (hdrl, cl; al) where hdrl is the union of all input

5It is possible that some of these ciphertexts are already the encryption of aggregated data rather than the encryption of a singleplaintext.6When si’s first hop parent is viewed as the root of a different 1-level tree and si as a leaf, the aggregation is from-leaf-to-root.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 10: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

10 ·hdri’s and al tells the aggregating node in the second aggregation step to use the prescribed proceduresfor the second aggregation step to aggregate c l.

In the second aggregation step, aggregation is done over the multi-level tree from-leaf-to-root. Eachnode sends its result (hdrl, cl; al) from the first step to its parent in the tree. Suppose an aggregatingnode sl′ receives two 3-tuples (hdri, ci; ai) and (hdrj , cj ; aj). sl′ runs the same aggregation algorithmAggf (as in the first step) on the two received 3-tuples. This time, the auxiliary information a i andaj would instruct Aggf to apply the second aggregation step procedure to perform aggregation — nodecryption would be performed on c i and cj as they are plaintexts; Aggf simply aggregates ci andcj in plaintext and output the resulting aggregate c l′ , also in plaintext. The output is (hdrl′ , cl′ ; al′)where hdrl′ = hdri ∪ hdrj and al′ would tell the next aggregating node to apply the same aggregationprocedure without doing decryption. The same aggregation process (in the plaintext domain) continuesuntil the final aggregate reaches the sink. The decryption function D is a dummy function (which passesthe input to the output) since the final aggregate is in plaintext. We require all PDA schemes to becorrect; that is, if all encryption and decryption are done with matched keys and the aggregation is donecorrectly, the resulting final aggregate should be a correct f -aggregate of all the inputs.

4. SECURITY NOTIONS

Three types of oracle queries (adversary interaction with the system) are allowed in the security model,namely, the encryption oracle OE , the decryption oracle OD and the aggregation oracle OA. Theirdetails are as follows:

Encryption Oracle OE(i,m, aux). For fixed encryption and decryption keys, on input an encryp-tion query 〈i,m, aux〉, the encryption oracle retrieves s i’s encryption key eki and runs the encryp-tion algorithm on m according to the input auxiliary information aux and replies with the ciphertextEeki (m, r; aux) and its header hdr. In case global random coins are used, the random coins r are partof the query input to OE .

Decryption Oracle OD(hdr, c). For fixed encryption and decryption keys, on input a decryptionquery 〈hdr, c〉 (where hdr ⊆ ID ), the decryption oracle retrieves the decryption key dk and runs thedecryption algorithm D and sends the result Ddk(hdr, c) as the reply.

Aggregation Oracle. OA(i, (hdr(1), c(1); a(1)), (hdr(2), c(2); a(2)), aux). For fixed encryption and

decryption keys, on input an aggregation query 〈i, (hdr (1), c(1); a(1)), (hdr(2), c(2); a(2)), aux〉, the ag-gregation oracle retrieves si’s encryption key eki and runs the aggregation algorithm Aggf on the two

input 3-tuples — (hdr(1), c(1); a(1)) and (hdr(2), c(2); a(2)) — according to the provided auxiliary in-formation aux and replies with the output 3-tuple (hdr (3), c(3); a(3)) of Aggf .

The encryption oracle is needed in the security model since the encryption algorithm in some CDAor PDA schemes could use private keys, for example, [Castelluccia et al. 2005; Westhoff et al. 2006;Armknecht et al. 2008; He et al. 2007]. In case the encryption algorithm does not use any secretinformation, an adversary can freely generate the ciphertext on any message of his choice withoutrelying on the encryption oracle. Similarly, the aggregation oracle is needed because the aggregationalgorithm is private in some PDA schemes, for example, the hop-by-hop aggregation scheme and [Heet al. 2007]. In schemes whose aggregation function is public, such as all CDA schemes, the adversarycan freely perform aggregation without relying on the aggregation oracle.

A number of privacy notions are formulated in this section. Each notion will be denoted in the format:GOAL-ATTACK. The prefix GOAL defines the security goal to be fulfilled and the suffix ATTACKspecifies the type of attacks to be withheld.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 11: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 11

4.1 Security against Chosen Ciphertext Attacks (IND-CCA2)

To define security (more precisely, indistinguishability) against adaptive chosen ciphertext attacks (de-noted by IND-CCA2), we use the following game played between a challenger and an adversary, as-suming there is a set U of n source nodes. If no PPT adversary, even in collusion with at most tcompromised node (with t < n), can win the game with non-negligible advantage (as defined below),we say the CDA/PDA scheme is t-IND-CCA2-secure.

DEFINITION 4.1. A CDA or PDA scheme is t-secure (indistinguishable) against adaptive chosenciphertext attacks (t-IND-CCA2-secure) if the advantage of winning the following game is negligiblein the security parameter λ for all PPT adversaries.

Setup. The challenger runs KG to generate a decryption key dk and n encryption keys {ek i : 1 ≤i ≤ n}, one for each node si ∈ U .

Collusion Choice. The adversary chooses to corrupt at most t source nodes. When a node s i iscorrupted, all its secret key eki will be exposed to the adversary. The adversary can capture these tnodes adaptively in Query 1 phase below (but no capture is allowed in Query 2 phase).

Query 1. The adversary can issue to the challenger three types of queries:—Encryption Query 〈ij ,mj , auxj〉. The challenger responds with Eekij

(mj ; auxj).—Decryption Query 〈hdrj , cj〉. The challenger responds with Ddk(hdrj , cj).

—Aggregation Query⟨ij , (hdr

(1)j , c

(1)j ; a

(1)j ), (hdr

(2)j , c

(2)j ; a

(2)j ), auxj

⟩. The challenger responds

with Aggf

(ekij , (hdr

(1)j , c

(1)j ; a

(1)j ), (hdr

(2)j , c

(2)j ; a

(2)j ); auxj

).

In case global random coins are used, the adversary is allowed to choose and submit his choices ofrandom coins for both encryption and decryption queries. Depending on whether the encryption keyshave to be kept secret, encryption queries may or may not be needed. Similarly, aggregation queries arenot necessary if the aggregation algorithm is public in a CDA/PDA scheme.The adversary can choose t nodes to corrupt. The node captures can be interleaved with oracle queries.Denote the set of corrupted nodes and the set of their identities by S ′ and I ′ respectively. After thisquery phase, the adversary possesses the subset of t encryption keys {ek j : sj ∈ S′}.

Challenge. Once the adversary decides that the first query phase is over, it chooses an aggregationtopology aux and selects a subset S of d source nodes (whose identities are in the set I) such that|S\S′| > 0, and outputs aux and two different sets of plaintexts M0 = {m0k : k ∈ I} and M1 ={m1k : k ∈ I} to be challenged. The constraint is that m0k = m1k, ∀k ∈ I ′. This constraint impliesthat the plaintext inputs of compromised nodes have to be the same in both M 0 and M1.The challenger flips a coin b ∈ {0, 1} to select between M0 and M1. The challenger then encryptseach mbk ∈Mb with ekk and aggregates the resulting ciphertexts in the set {Eekk

(mbk; aux) : k ∈ I}to form the ciphertext C of the aggregate xb, that is, Ddk(I, C) = xb, and gives (I, C) (along withthe public communication transcript of the current aggregation session) as a challenge to the adversary.In case global random coins are used for encryption, the challenger chooses and passes them to theadversary. If a nonce is used, the global random coins should be chosen different from those used in theQuery 1 phase and no encryption query on the same nonce should be allowed in the Query 2 phase.

Query 2. The adversary is allowed to make more queries (all 3 types of queries) as previously done inQuery 1 phase but no decryption or private aggregation query can be made on the challenged ciphertext(I, C). Nevertheless, the adversary can still make a decryption query on the header I correspondingto the set S except that the ciphertext has to be chosen different from the challenged ciphertext C.Similarly, no encryption query on the same nonce, used in the challenge, is allowed.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 12: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

12 ·Guess. Finally, the adversary outputs a guess b ′ ∈ {0, 1} for b.

Result. The adversary wins the game if b ′ = b. The advantage of the adversary is defined as:

AdvA =

∣∣∣∣Pr[b′ = b]− 1

2

∣∣∣∣ .

The definition game specifies the desired properties possessed by a CDA/PDA scheme which is rea-sonably secure in practice. In the definition game, the adversary is challenged with a task to distinguishtwo aggregation sessions (for two possibly different sets of node inputs). The two sets M 0,M1 of inputsare chosen by the adversary. That is, the adversary chooses two sets of inputs; when given the ciphertextof the final aggregate and the public communication transcript (consisting of the ciphertexts of all theinputs and intermediate aggregates) of the aggregation session for one of the two input sets, the adver-sary has to tell which case has occurred. For a particular scheme, when an adversary cannot tell thetwo cases apart with probability of success non-negligibly greater than 1/2, this means, in essence, hecan learn no information about the final aggregate or the input of any non-compromised node from theciphertexts or public transmissions in an aggregation session of the scheme.

Note that the above definition assures both end-to-end aggregate privacy and privacy on all individualnodes’ inputs. If an adversary is able to tell which of the two possible final aggregates in the challengeis part of the public communication transcript — including the ciphertexts for the final aggregate, theinputs from all individual nodes and all intermediate aggregates — given in the challenge phase, he isable to tell apart the two aggregation cases, thus breaking the indistinguishability goal. Similarly, if anadversary is able to tell which of the two possible inputs of a non-compromised node is included in thecommunication transcript, he is able to tell apart the two aggregation cases and break the indistinguisha-bility goal.

The adversary is assumed to be capable to control the aggregation topology. The adversary is alsoallowed to adaptively compromise nodes and capture their secret keys. When a node is compromised,the adversary is assumed to have obtained all the secret information possessed by the captured node,including all other information or capabilities which can be derived from this information. In most casessuch as schemes based on private key cryptography [Westhoff et al. 2006; Castelluccia et al. 2005],the adversary is capable to recover information of the plaintext from a ciphertext encrypted under theencryption key of a compromised node.

Note that in CDA or PDA, what an adversary is interested in is the information about the final ag-gregate and the input data of non-compromised nodes. In other words, the definition game should bedefined in such a way that winning the game would imply the adversary learns information about oneof them. That is, the security goal of an acceptable CDA/PDA scheme is to prevent an adversary fromlearning information about the final aggregate or the input of any non-compromised node through thecapabilities derived from the compromised nodes. Recall that the ability to learn information about theplaintext from a given ciphertext is usually formulated as the capability to tell apart the ciphertexts oftwo given plaintexts since [Goldwasser and Micali 1984].

In most application scenarios of CDA or PDA, assuming the adversary is capable to know the input ofa compromised node from its transmission or ciphertext is a reasonable assumption although there existsschemes (such as the generic CDA construction given in Section 5) which perform better than this. 7 Asa result, it is sensible to impose a constraint on the adversary’s input choices (M 0,M1) such that theinput of a compromised node has to be the same in both M 0 and M1 (that is, m0k = m1k for all k ∈ I ′)

7In the generic CDA construction, no efficient adversary can distinguish the ciphertexts of two given plaintexts encrypted undera compromised node’s encryption key if the homomorphic encryption used is semantically secure.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 13: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 13

since learning information about the input of a compromised node by the adversary seems irrelevant tothe security goal. The above mentioned constraint would ensure that an adversary wins the game onlybecause he has learnt some information about the inputs of the non-compromised nodes or the finalaggregate. Without such a constraint imposed on the adversary’s choice of (M 0,M1) in the definitiongame, it may be trivial for an adversary to win the game without learning any information about theinputs of the non-compromised nodes or the final aggregate, in particular, for schemes based on privatekey cryptography. To illustrate this point, suppose the game is defined without such a constraint; inCDA/PDA schemes based on private key cryptography, an adversary can recover the plaintext of aciphertext encrypted under a compromised node’s encryption key ek i; if the adversary submits a choicefor M0,M1 such that m0i �= m1i, by merely working on the part of the communication transcriptcorresponding to the ciphertexts encrypted under ek i, the adversary could learn no information aboutthe inputs of the non-compromised nodes or the final aggregate but still win the game; winning this gameonly implies that the adversary learns information of the inputs of the compromised nodes. It is clear thata definition without excluding such cases which are trivial for an adversary to win (without gaining newinformation other than what he originally possesses through the compromised nodes) is meaningless.Nevertheless, there exist schemes, such as the generic construction in Section 5, which remain securein the definition game defined without the constraint (m0k = m1k for all k ∈ I ′) imposed on theadversary’s choice; that is, no adversary can win the definition game without the mentioned constraint.

For similar reasons, no node capture is allowed after the challenge is released. Otherwise, it maybe trivial for the adversary to win as it is assumed that an adversary has knowledge of the plaintext ofa given ciphertext encrypted under the encryption key of a compromised node. Of course, there areschemes which could remain secure against adversaries allowed to capture nodes after the challengephase. The generic construction given in Section 5 is an example. Such schemes are said to be forwardsecure (Section 4.5) because when a node is compromised, all its previous transmissions remain secret.

The IND-CCA2 notion can be relaxed in two directions, resulting in two weaker notions — semanticsecurity (IND-CPA) and node privacy (NP). The notion of node privacy considers a less ambitious goalwhich only protects privacy of the inputs of individual nodes while information of the final aggregateand intermediate aggregates may possibly be leaked out. Whereas, the notion of semantic securityassumes a weaker adversary which can only obtain the encryptions of chosen plaintexts but not thedecryptions of chosen ciphertexts (as allowed in the IND-CCA2 notion) while the goal is still to achieveindistinguishability (IND). Since these two relaxations are orthogonal, the weaker adversary assumptionin the semantic security notion can also be applied to the node privacy notion, leading to two types ofnode privacy notions — node privacy against adaptive chosen ciphertext attacks (NP-CCA2) and nodeprivacy against chosen plaintext attacks (NP-CPA).

4.2 Semantic Security (IND-CPA)

Semantic security, which is equivalent to indistinguishability against chosen plaintext attacks (IND-CPA)[Goldwasser and Micali 1984], is defined by the same game as in the definition of IND-CCA2 securityin Section 4.1 except that no query to the decryption oracle OD or aggregation oracle OA (if the ag-gregation algorithm is private) is allowed in any query phases. If the aggregation algorithm is public,no query to the aggregation oracle OA is necessary (for the adversary) to obtain the desired result; ifthe aggregation algorithm relies on private key ek i, no query to OA is allowed. Similar to the definitionin Section 4.1, a CDA/PDA scheme is said to be t-IND-CPA-secure when it can still achieve semanticsecurity against any PPT adversary corrupting at most t compromised nodes.

For a CDA scheme to be useful, it should at least achieve semantic security. In the notion of semanticsecurity, the main resource available to an adversary is the encryption oracle OE . In some schemes

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 14: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

14 ·like [Westhoff et al. 2006; Castelluccia et al. 2005], the adversary may not know the encryption keys,meaning he might not have access to the encryption oracle in the real environment. Nevertheless, insensor networks, he can still obtain the encryption of any plaintext of his choice by manipulating thesensing environment and recording the sensed value using his own sensors. Hence, chosen plaintextattacks are still a real threat to CDA/PDA.

4.3 Individual Node Privacy (NP-CCA2 and NP-CPA)

Recall that the indistinguishability goal in the IND-CCA2 and IND-CPA notions is formulated based onthe adversary’s inability to distinguish between two possible aggregation sessions (on two sets of inputschosen by the adversary) when given the public communication transcript. The indistinguishability goalthus ensures complete privacy of the final aggregate, the inputs of all non-compromised nodes, and allthe intermediate aggregates of an aggregation session. More specifically, it assures: (1) no informationabout the final aggregate or any intermediate aggregate is leaked out to any eavesdropper or intermediateaggregating node; (2) each node only has information of its data and learns nothing about the data ofany other node.

When an adversary is able to distinguish between two possible values for the final aggregate, any oneof the intermediate aggregates, or any one of the inputs of the non-compromised nodes, correspondingto two possible aggregation sessions, he already gains sufficient information to tell apart the two aggre-gation sessions and the indistinguishability goal is not achievable. That is, the indistinguishability goalensures the highest level of privacy. In many real cases, privacy of the final aggregate could commonlybe breached, namely, there could be considerable leakage of information about the final aggregate; 8

however, it may still be possible to preserve privacy of the inputs of part of the non-compromised nodes,which is the node privacy goal. In other words, a CDA/PDA scheme achieving the indistinguishabilitygoal (as described in Sections 4.1 and 4.2) would also achieve the node privacy goal but the reverse isnot necessarily true. In general, the notions of indistinguishability and node privacy are not equivalent.For instance, the PDA schemes in [He et al. 2007] achieve node privacy but not indistinguishability.The relation between the two notions is given in Section 4.6.

Intuitively, it appears that a scheme achieving the node privacy goal also fulfills the indistinguishabil-ity goal, provided that the encryption (used to encrypt input data) is secure. In the context of CDA/PDA,this intuition is not true in general. It is true wholly based on the assumption that the protocol itselfdoes not disclose any intermediate aggregate to aggregating nodes, which may possibly be compro-mised. However, inherent in some designs such as the PDA schemes in [He et al. 2007], an aggregatingnode may legitimately have access to the intermediate aggregate it processes; by compromising specificnodes, an adversary could access particular intermediate aggregates which could possibly help him tellwhich one of two given aggregation scenarios actually happens for a given public communication tran-script, thus breaking the indistinguishability goals described in Sections 4.1 and 4.2. It is not necessarilytrue that node privacy implies indistinguishability in CDA/PDA.

Node privacy is a less ambitious goal guaranteeing that the data input of a particular node remainssecret/private while the final aggregate may be totally exposed to the public. In fact, node privacy isa commonly-adopted design goal for privacy-preserving data aggregation in the literature [Abdelzaheret al. 2007; Ganti et al. 2008; LeMay et al. 2007]. The crux of these problems is that it is necessary topreserve input privacy of individual nodes of an aggregation session while the final aggregate has to beshared with the public as a subscription service. We formalize the notion of node privacy in this section.

8For instance, when an adversary is able to learn the value of an intermediate aggregate, say, through a compromised node, hemay not know the precise value of the final aggregate but he is able to estimate a bound for it, such as a lower bound of the finalaggregate value if the aggregation function is sum, average or maximum, assuming inputs are non-negative.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 15: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 15

There are two notions of node privacy, namely, node privacy against adaptive chosen ciphertext at-tacks (NP-CCA2) and node privacy against chosen plaintext attacks (NP-CPA). The same simulationgame (except the Challenge phase) for the IND-CCA2 notion in Section 4.1 defines the NP-CCA2notion. Similarly, the NP-CPA notion is defined by the simulation game for the IND-CPA notion inSection 4.2 except the Challenge phase.

The modification of the Challenge phase is as follows: the adversary selects a target node s i �∈ S′ andoutputs two different plaintexts m0,m1 (where m0 �= m1) and a set S of other nodes with identities inI (such that |S\(S ′ ∪ {si})| > 0) to request a challenge; the challenger flips a coin b ∈ {0, 1} to selectbetween m0 and m1; the challenger encrypts mb with eki to obtain the ciphertext cb = Eeki(mb; aux);the challenger randomly selects a set of arbitrary data inputs for nodes in S\{s i} and runs an aggregationsession on this selected set and mb based on the aggregation topology aux chosen by the adversary;the resulting final aggregate xb = f(...,mb, ...) and cb (the encryption of mb), along with the publiccommunication transcript of the aggregation session for nodes in {s i}∪S on mb and the set of arbitrarymessages, are given to the adversary as a challenge. The adversary’s task is to guess b. A scheme issaid to be t-NP-CCA2-secure or t-NP-CPA-secure if no PPT adversary (in collusion with at most tcompromised nodes) can win the corresponding modified games with non-negligible advantage.

4.4 One-wayness

One-wayness is the weakest security notion for encryption. A CDA or PDA scheme is t-secure inone-wayness if no PPT attacker, corrupting at most t nodes, is able (with non-negligible probability ofsuccess) to recover the plaintext aggregate matching a given ciphertext. To define one-wayness formally,we use the same game in Section 4.1 except that no query is allowed and the adversary can make nochoice in the challenge phase but is given the header and ciphertext of a certain aggregate/message x(encrypted using at least one encryption key not held by the adversary) and asked to recover x.

4.5 Forward Security

Intuitively, a CDA or PDA scheme is forward secure if the security guarantee of a notion preservesfor all previous aggregation sessions (prior to a complete capture) even if all the nodes are captured.For instance, in a forward secure scheme achieving semantic security, even if an adversary capturesall the nodes in the system, he would still be unable to learn any information about the input (of apreviously non-compromised node) or the final aggregate of a past aggregation session (prior to thecomplete capture) whose transmissions were recorded by the adversary. The concept of forward securityis applicable to all the notions discussed above. That is, a forward secure scheme can achieve semanticsecurity, node privacy or onewayness. The same set of games is used. A scheme is said to achieve aparticular security notion (semantic security, node privacy or onewayness) with forward security whenno PPT adversary can win the corresponding definition game with the following modification: theadversary is allowed to corrupt all the remaining (n− t) nodes in Query 2 phase.

4.6 Relations between Notions

The relations between different security notions for CDA and PDA are summarized in Figure 1. Anarrow represents an implication between notions; more specifically, an arrow from notion A to notion Bmeans that a construction achieving notion A would also achieve notion B. As a result, we only need toprove the strongest notion achievable by a scheme.

Although IND-CCA2 security is the most desired goal for CDA and PDA schemes, it may not beachievable, as summarized by Theorem 4.2 below. Intuitively, this impossibility result is a direct con-sequence of the aggregation functionality which gives an adversary the ability to modify ciphertexts. Inother words, CDA/PDA schemes are inherently malleable. The implications between the notions of non-

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 16: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

16 ·

IND-CCA2

NP-CCA2 NP-CPA

onewayness

IND-CPA

Semantic Security

Fig. 1. Relations between security notions for CDA and PDA.

malleability [Dolev et al. 2000] and indistinguishability [Naor and Yung 1990] against adaptive chosenciphertext attacks — namely, indistinguishability is not achievable in any malleable cryptosystem inthe presence of adaptive chosen ciphertext attacks — are well-known for both public-key [Bellare et al.1998] and private-key [Katz and Yung 2006] encryption. Although the impossibility result of Theorem4.2 may possibly be viewed as a special case of this well-known result for encryption schemes, for thesake of prudence and completeness, we believe it is appropriate to include the impossibility result forCDA/PDA in this paper. The reason is as follows: the semantic of CDA/PDA differs considerably fromthat of encryption, be it public-key or private-key; such kind of differences could sometimes leak todifferent conclusions in cryptography; for instance, despite the high similarity between public-key andprivate-key encryption, the implication between the notions of non-malleability and indistinguishabilityare slightly different for the two cases — more specifically, in public-key encryption, the two notions arealways equivalent [Bellare et al. 1998]; whereas, in private-key encryption, the two notions are equiva-lent only if the adversary has access to an encryption oracle, and otherwise, indistinguishability impliesnon-malleability (but not the reverse) [Katz and Yung 2006]. The impossibility result for CDA/PDA issummarized as follows.

THEOREM 4.2. Indistinguishability against adaptive chosen ciphertext attacks (IND-CCA2 secu-rity) defined in Definition 4.1 and node privacy against adaptive chosen ciphertext attacks (NP-CCA2security) cannot be achieved by any CDA or PDA constructions with a public aggregation algorithm(i.e. Aggf uses no secret input) for t ≥ 0.

PROOF. We show below a successful attack to break the node privacy notion. Since node privacy isimplied from indistinguishability, by a standard contrapositive argument, if the former is broken, so isthe latter. The capability to access the decryption oracleOD is so powerful that a successful attack doesnot even need information from the part of the transcript corresponding to nodes in S (defined in thenode privacy game). That is, with OD, an adversary can break the node privacy goal by concentratingon the ciphertext of the victim node and neglect the rest of the transcript given in the challenge. For thesake of clarity, we exclude this part of the transcript for the aggregation session involving nodes in S inthe following discussion; without loss of generality, |S| could be taken as 0.

We assume the adversary chooses a particular node, say, s i, to attack in the Challenge phase of thenode privacy game. The adversary chooses m0,m1 and obtains a challenge ({i}, C) where C is theciphertext of either m0 or m1. Let the message chosen by the challenger be m ∈ {m0,m1}. Theadversary wins if he can tell whether C is a ciphertext of m0 or m1. After receiving the challenge,the adversary chooses m′ such that f(m0,m

′) �= f(m1,m′). The adversary can easily obtain from

another node sj (where j �= i) the ciphertext C ′ for m′ for the same aggregation session. This can bedone either through a captured node or via the encryption oracle O E (in Query 2 phase if allowed). Ifthe aggregation algorithm Aggf is public, the adversary can freely aggregate any pair of ciphertexts,including the pair ({i}, C) and ({j}, C ′). Suppose C ′′ is the resulting output when running Aggf on({i}, C) and ({j}, C ′). Note that {i, j} �= I in the challenge and C ′′ �= C; that is, C ′′ is a valid query

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 17: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 17

to the decryption oracle. Besides, C ′′ is a valid ciphertext for the aggregate f(m,m ′) (where m is eitherm0 or m1). Through the decryption oracle OD in Query 2 phase, the adversary can obtain the plaintextM whose ciphertext under the header hdr = {i, j} is C ′′. It can be seen M = f(m,m′). The adversarycan compute f(m0,m

′) and f(m1,m′); by comparing M with the computation results, the adversary

can tell whether m is equal to m0 or m1, thus winning the challenge in the node privacy game. Thatmeans NP-CCA2 security is non-achievable with a public aggregation algorithm. This in turn impliesIND-CCA2 security cannot be achieved by any scheme with a public aggregation algorithm.

Note that Theorem 4.2 does not rule out the possibility of achieving IND-CCA2 security throughstateful decryption with nonces; that is, the decryption algorithm keeps track of the legitimate ciphertextof the final aggregate for each nonce used and denies the decryption of all other ciphertexts for the samenonce. However, there exist obvious difficulties to implement such an idea.

5. A GENERIC CDA CONSTRUCTION

A generic construction of semantically secure CDA (using local random coins) is given based on anysemantically secure public-key homomorphic encryption scheme. The result is not surprising but couldbe useful, in particular, the resulting construction can provide forward privacy for data in previousaggregation sessions when all the nodes in the system are compromised. Asymmetric key homomorphicencryption is used in this construction, compared to the symmetric key homomorphic encryption in theWGA construction [Westhoff et al. 2006]. Asymmetric key encryption is necessary in order to withholdpossible insider attacks from compromised nodes.

5.1 Public Key Homomorphic Encryption

A public key homomorphic encryption scheme is a 4-tuple (KG,E,D,A). The key generation algo-rithm KG receives the security parameter 1λ as input and outputs a pair of public and private keys(pk, sk). E and D are the encryption and decryption algorithms. Given a plaintext x and randomcoins r, the ciphertext is Epk(x, r) and Dsk(Epk(x, r)) = x. The homomorphic property allows one tooperate on the ciphertexts using the poly-time algorithm A without first decrypting them; more specifi-cally, for any x, y, rx, ry , A can generate from Epk(x, rx) and Epk(y, ry) a new ciphertext of the formEpk(x ⊗ y, s) for some random coins s. The operator⊗ could be addition, multiplication or others de-pending on specific schemes; for instance, it is multiplication for RSA [Rivest et al. 1978] or El Gamal[El Gamal 1985] and addition for Paillier [Paillier 1999].

As observed in previous work in the literature, due to the homomorphic property, achieving IND-CCA2security may be impossible for homomorphic encryption. The notion of security against CCA1 attacksis not often considered in practice. Hence, semantic security or the equivalent notion of IND-CPA se-curity appears to be the de facto security notion for homomorphic encryption schemes. In brief, theIND-CPA notion for homomorphic encryption can be described by the following game:

Setup. The challenger runs KG(1λ) to generate a pair of public and private keys, gives the publickey to the adversary but keeps the private key.

Query 1. The adversary can freely encrypt any message of his choice using the public key. Using thepublic aggregation algorithm, the adversary can freely aggregate any pair of ciphertexts.

Challenge. The adversary chooses two different messages m0,m1 and gives them to the challengerwhich flips a coin b ∈ {0, 1} and gives Epk(mb; r) to the adversary as the challenge. The adversary’stask to guess b.

Query 2. The adversary can perform more encryption and aggregation as he wishes.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 18: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

18 ·Guess. Eventually, the adversary has to output a guess b ′ for b and his advantage of winning the game

is defined as∣∣Pr[b′ = b]− 1

2

∣∣.DEFINITION 5.1. A public key homomorphic encryption scheme is said to be semantically secure

or IND-CPA secure if the advantage of winning the above game is negligible in the security parameterλ for all PPT adversaries.

5.2 CDA based on Public Key Homomorphic Encryption

Assume there are n source nodes in total. Suppose there exists a semantically secure public-key ho-momorphic encryption scheme (KGHE , EHE , DHE , AHE) with homomorphism on operator ⊗. Wecan construct a semantically secure CDA scheme, tolerating up to (n− 1) compromised nodes, with anaggregation function of the form: f(m i,mj) = mi ⊗mj . The construction is as follows: (The headersare included in the description for completeness; they are not needed in the construction. In fact, allthese hdri’s are the empty set φ.)

Key Generation (KG). Run KGHE(1λ) to generate (pk, sk). Set the CDA decryption key dk = skand each one of the CDA encryption keys to be pk, that is, ek i = pk, ∀i ∈ [1, n].

Encryption (E). Given a plaintext data mi, toss the random coins ri needed for EHE and outputci = EHE

pk (mi, ri). Set the header hdri = φ. Output (hdri, ci).

Decryption (D). Given an encrypted aggregate c and its header hdr, run DHE using the private keysk to decrypt c and output x = DHE

sk (c) as the plaintext aggregate.Aggregation (Agg). Given two CDA ciphertexts (hdri, ci) and (hdrj , cj), the aggregation can be

done using the homomorphic property of the encryption. Generate c l = AHE(ci, cj) and hdrl =hdri ∪ hdrj . Output (hdrl, cl).

Correctness. Without loss of generality, we consider the case with only two plaintexts m i and mj andignore the header part as it is always equal to φ. The corresponding ciphertexts for m i and mj areci = EHE

pk (mi, ri) and cj = EHEpk (mj , rj) for some random coins ri, rj . If the aggregation is done

using Agg as described above, the aggregation result c l should be equal to EHEpk (mi ⊗mj , s) for some

s. In essence, this value is EHEpk (f(mi,mj), s) . With the correctness property of the homomorphic

encryption scheme, DHEsk (cl) should give back mi ⊗mj which is the aggregate f(mi,mj).

Security. The security of this construction is summarized as follows, with the proof in Appendix A.

THEOREM 5.2. For a total of n nodes in the system, the generic CDA construction is semanti-cally secure against any collusion of at most (n − 1) compromised nodes and provides forward secu-rity/privacy for data inputs and aggregates in all previous aggregation sessions when all the n nodes arecompromised, assuming that the underlying public key homomorphic encryption scheme is semanticallysecure (according to Definition 5.1).

When all n nodes in the system are captured, the generic CDA construction still provides a strongsecurity guarantee that, without knowing the randomness used in generating a ciphertext, the adversarywould not be able to tell apart the ciphertexts of two given messages (that is, the adversary could gainno information about the plaintext from a given ciphertext) even with access to all the encryption keysstored in the n nodes. Note that what an adversary gains from a captured node in the generic CDAconstruction is the public key of the homomorphic encryption, which should provide no additionalinformation (not known to the adversary before).

For the case of n captured nodes, what the generic CDA construction withholds is the following: evenif an adversary compromises all the n nodes in the system, he still cannot gain any information about

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 19: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 19

the aggregate of a past aggregation session from previously recorded transmissions between nodes.In other words, given the ciphertext c of an aggregate m which was sent in an aggregation sessionbefore the adversary has captured all the n nodes (i.e. there is at least one non-compromised node), theadversary (after capturing all the n nodes) would still have no additional advantage to gain informationabout m from c with the help of all the n compromised nodes unless all these nodes keep a historyof previously sent plaintext data or used random coins. Of course, the adversary would have accessto all the aggregates afterwards by reading from the captured nodes’ memory directly. In conclusion,the generic CDA construction provides forward security (privacy protection of all past data before acomplete compromise of participants) which is not achievable in other CDA or PDA schemes. Thisstronger privacy protection usually comes with a cost of higher battery consumption (for ciphertexttransmissions) and a larger computational overhead since most public key homomorphic encryptionschemes have a significantly larger ciphertext compared to typical symmetric key encryption schemes.

Example. To demonstrate the use of the generic CDA construction, we show an example below.

EXAMPLE 5.3. To obtain a CDA scheme supporting additive aggregation, an additive homomorphicencryption scheme, say, the Paillier cryptosystem [Paillier 1999], is chosen. For 280 security, we need a1024-bit number n (a product of two large primes) for the Paillier cryptosystem. The ciphertext length is2048 bits. The security of the Paillier cryptosystem is based on the Composite Residuosity assumption,so is the resulting CDA scheme.

6. SECURITY ANALYSIS OF EXISTING SCHEMES

We analyze five schemes in the literature in the proposed security model. They are WGA [Westhoffet al. 2006], CMT and its hashed variant [Castelluccia et al. 2005; Castelluccia et al. 2009], AWGH[Armknecht et al. 2008], CPDA [He et al. 2007] and SMART [He et al. 2007]. All these five schemesare designed to support additive aggregation. WGA, CMT and AWGH are CDA schemes using a treeas the aggregation topology whereas CPDA and SMART are general PDA schemes based on non-treeaggregation topologies. Whenever possible, we state the strongest security notion achieved by theseschemes. The security claims in this section depict the guarantee provided by the relevant schemes inthe worst case scenario (corresponding to the easiest task for the adversary).

6.1 WGA

WGA [Westhoff et al. 2006] uses Domingo-Ferrer’s symmetric-key homomorphic encryption [Domingo-Ferrer 2002] as a building block. Each node uses the same encryption key ek and the sink’s decryptionkey dk = ek. The security of WGA is summarized by the following theorem, with the proof given inAppendix B.

THEOREM 6.1. When there is no compromised node (that is, t = 0), if the underlying symmetric-key homomorphic encryption is semantically secure, then WGA achieves semantic security or is 0-IND-CPA-secure. WGA is insecure when t > 0.

6.2 CMT and its Hashed Variant

CMT [Castelluccia et al. 2005] can be considered as a practical modification of the Vernam cipher orone-time pad [Vernam 1926] to allow plaintext addition to be done in the ciphertext domain. Basically,there are two modifications made in CMT. First, the exclusive-OR operation is replaced by an additionoperation. By choosing a proper modulus, multiplicative aggregation is also possible. 9 Second, instead

9CMT can support either additive or multiplicative aggregation but not both at the same time. For multiplicative aggregation,there is no gain in the communication overhead since the ciphertext for an aggregate of n pieces of data has to be at least n times

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 20: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

20 ·of uniformly picking a key at random from the key space, the key is generated by a deterministicalgorithm — a pseudorandom function [Goldreich et al. 1986] . As a result, the information-theoreticsecurity (requiring the key to be at least as long as the plaintext) in the Vernam cipher is replaced witha security guarantee in the computational-complexity theoretic setting in CMT.

In the following discussion, let F = {Fλ}λ∈N be a pseudorandom function (PRF) family whereFλ = {fs : {0, 1}λ → {0, 1}λ}s∈{0,1}λ is a collection of functions indexed by a key s ∈ {0, 1}λ. Fordetails on PRFs, [Goldreich 2001] has a comprehensive description. Loosely speaking, given a functionfs from a PRF ensemble with an unknown key s, any PPT distinguisher procedure allowed to get thevalues of fs(·) at (polynomially many) arguments of its choice should not be able to tell (with non-negligible advantage in λ) whether the answer of a new query (with the argument not queried before) issupplied by fs or randomly picked from {0, 1}λ.

Suppose there are n nodes in the system. The basic CMT scheme operates as follows: 10 Let p be asufficiently large integer used as the modulus. Assume the key length is λ bits. Then p could be 2 λ.Besides, global random coins are used in CMT, that is, the sink chooses and broadcasts a public nonceto all nodes. While the random coins can only be used once (a new set of random coins for each newsession), there is no randomness requirement on the random coins (nonce).

Key Generation (KG). Randomly pick K ∈ {0, 1}λ and set it as the decryption key dk. For eachi ∈ [1, n], eki = fK(i) is the encryption key for node s i with identity i.

Encryption (E). Given an encryption key ek i, a plaintext data mi and a broadcast nonce r from thesink, output ci = (mi + feki(r)) mod p. Set the header hdri = {i}. Output (hdri, ci). Note: each rhas to be used for one aggregation session only.

Decryption (D). Given the ciphertext (hdr, c) of an aggregate and the nonce r used in encryption,generate eki = fK(i), ∀i ∈ hdr. Output the plaintext aggregate x = (c−

∑i∈hdr feki(r)) mod p.

Aggregation (Agg). Given two CDA ciphertexts (hdri, ci) and (hdrj , cj), compute cl = (ci +cj) mod p and hdrl = hdri ∪ hdrj and output (hdrl, cl).

The basic CMT scheme is semantically secure. As can be seen, a ciphertext of the basic CMT schemehas to be λ bits long where λ is the security parameter (the key length) of the underlying PRF. 11 In orderto maintain a small ciphertext size, it is shown in [Castelluccia et al. 2009] that the output of the PRFcan be hashed down by some hash function h : {0, 1}λ → {0, 1}l where l is the size of the maximumplaintext aggregate. Note that the parameter l should still be chosen large enough to ensure reasonablylow probability of success for a random guess. For a given plaintext m i, a nonce r and an encryptionkey eki, the ciphertext of the hashed CMT is: ci = (mi + h(feki(r))) mod p′ where |p′| = l. Thedecryption algorithm is modified accordingly to hash the output of the PRF and then subtract the hashvalues from the ciphertext. [Castelluccia et al. 2009] shows that semantic security of the basic CMTscheme is preserved if the hash function h satisfies the following property: {t ← {0, 1}λ : h(t)} is auniform distribution over {0, 1} l.

The security of the basic CMT scheme and its hashed variant is summarized by the following twotheorems, the proofs of which can be found in Appendices C and D respectively. It should be empha-sized that, to achieve such security strength in CMT-based schemes, it is essential that the global randomcoins or nonce should never be re-used.

the data length. It is more efficient to pass each piece of (encrypted) data to the sink without aggregation done en route.10The description is slightly different from the original scheme [Castelluccia et al. 2005] as the procedure to generate encryptionkeys from a PRF is filled in.11This restriction is a result of using a strict definition for PRFs in this paper.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 21: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 21

THEOREM 6.2. The basic CMT scheme is semantically secure against any collusion with at most(n − 1) compromised nodes (that is, (n − 1)-IND-CPA-secure), assuming Fλ = {fs : {0, 1}λ →{0, 1}λ}s∈{0,1}λ is a pseudorandom function and a new nonce is used for each session.

THEOREM 6.3. The hashed CMT scheme is (n − 1)-IND-CPA-secure, assuming Fλ = {fs :{0, 1}λ → {0, 1}λ}s∈{0,1}λ is a pseudorandom function and a new nonce is used for each sessionand the hash function h : {0, 1}λ → {0, 1}l satisfies the following criteria: {t ← {0, 1}λ : h(t)} is auniform distribution over {0, 1}l.

6.3 AWGH

For correct decryption in the presence of possible silent nodes, it is a requirement in CMT-based schemes[Castelluccia et al. 2005; Castelluccia et al. 2009] that the sink knows the exact list of participating nodesin an aggregation session, leading to an O(n) communication overhead for transmitting node identitiesto the sink where n is the number of nodes in the system. To relieve this burden, [Armknecht et al.2008] focuses on scenarios wherein there is little incentive for an attacker to physically compromisesensor nodes, and proposes an architecture — composing of a newly defined cryptographic primitivecalled bihomomorphic encryption and a tailored key management scheme — which can avoid sendingidentities to the sink, significantly reducing the communication overhead to O(log n).

AWGH works on a fixed aggregation topology. 12 The basic idea of AWGH is that each node si storesbeforehand a number of ciphertexts, one set for each of its child nodes, and uses them to replenishmissing contributions (due to silent nodes) in an aggregation session when necessary. Each of these pre-stored ciphertexts is supposed to be the encryption of an intermediate aggregate (whose value r is fixedand known to the sink only) to be used by s i in an aggregation session as if it is the encryption of anaggregate incorporating all the contributions from nodes in the subtree rooted at the corresponding childnode of si. When a child node sj goes silent in an aggregation session, si would use the correspondingpre-stored ciphertext to substitute the missing contributions from s j and all the nodes in the subtreerooted at sj , and take r as the aggregate of these missing contributions. In essence, the aggregation isperformed as if all nodes participate in all aggregation sessions and, in each session, the sink receivesthe ciphertext of a final aggregate which is the aggregate of the actual contributions (from reportingnodes) and a multiple of the value r (replenishing the contributions from silent nodes). As a result, itis not necessary for the sink to know the identities of the actual reporting nodes or silent nodes, andonly a count — keeping track of the number of times r is incorporated into the final aggregate — needsto be sent to the sink, which can then decrypt the encrypted aggregate correctly by assuming all nodesparticipate in each aggregation session.

The security guarantee of AWGH is wholly determined by the security strength of the bihomomorphicencryption used, more specifically, in terms of the amount of correlated information between ciphertextsencrypted under the same key and how easy this information can be extracted. While in principle anybihomomorphic encryption can be used, [Armknecht et al. 2008] only gives one instantiation by replac-ing the exclusive-OR operation of the Vernam one-time pad [Vernam 1926] with an addition operation,and it is the only instantiation available in the literature. As a consequence, AWGH inherits the securityconstraint of one-time pads, considerably undermining its security when compromised nodes exist. Theauthors of [Armknecht et al. 2008] also do not recommend the use of AWGH in scenarios where phys-ically compromising sensor nodes is feasible ([Armknecht et al. 2008], Section 10.2.4). As compared

12More precisely, AWGH works on different subtrees of a fixed topology graph wherein edges between nodes would not changeover the lifetime of a sensor network, except possible deletions due to depleted nodes. Without loss of generality, we assume allaggregation sessions are carried out over the same topology in this section.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 22: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

22 ·to CMT-based schemes, AWGH has a significantly lower communication overhead but its resilienceagainst compromised nodes is also much lower.

In the following, we base our discussions and analysis on the AWGH instantiation given in [Armknechtet al. 2008]. The instantiation gives two protocols for key updating. When key updating is performedproperly, keys used in other sessions should reveal minimum or practically no information about thekeys used in a particular session. We therefore skip details of the key update protocols and treat keysused in any session independent from those used in other sessions. AWGH operates as follows.

Suppose there are n nodes in the system. Let p be a sufficiently large integer used as the modulus asin CMT. Assume p is λ bits long. Note that p should be chosen such that p > n. Let T be the set of allpossible aggregation sessions and ST (sj) denote the subset of nodes in the subtree rooted at s j .

Key Generation (KG). Randomly pick r ∈ Z∗p. For each session t ∈ T , each i ∈ [1, n], randomly

pick an encryption key ek(t)i to be used in session t by node si with identity i. Give the sink a set of

decryption keys {dk(t) : t ∈ T } where dk(t) =∑n

i=1 ek(t)i mod p. Give each si the set Ki = {ek(t)i :

t ∈ T } and a set of ciphertext sets Vi = {Vj : sj is a child node of si} where Vj = {v(t)j : t ∈ T } and

each v(t)j = (

∑sq∈ST (sj)

ek(t)q + r) mod p.13 Note that the encryption key of si includes Ki and Vi.

Encryption (E). For a session t, given an encryption key ek(t)i and plaintext data mi, compute vi =

(mi+ ek(t)i ) mod p and set cnti = 0. The ciphertext ci = (vi, cnti). Note that cnti is used to count the

number of times r is incorporated in a final aggregate, and hdr = φ in AWGH.

Decryption (D). Given the ciphertext c = (v, cnt) of an aggregate, retrieve the necessary session keydk(t). Output the plaintext aggregate x = (v − dk (t) − cnt · r) mod p.

Aggregation (Agg). Given aux and two CDA ciphertexts (hdri, ci; ai) and (hdrj , cj; aj) wherehdri = hdrj = φ and ci = (vi, cnti) and cj = (vj , cntj), compute vl = (vi + vj) mod p andcntl = (cnti + cntj) mod p. Set cl = (vl, cntl) and hdrl = φ. Combine and update ai, aj based onaux to give al. Output (hdrl, cl; al). Note that the two ciphertexts — (hdri, ci; ai) and (hdrj , cj ; aj)— could be either received from a reporting child node of the aggregating node s l or retrieved from thepre-stored ciphertext set Vl stored at sl to substitute missing contributions from silent nodes.

Based on the topology information in aux, each aggregating node s l can determine which of its childnodes are silent in an aggregation session after receiving the ciphertexts from all its child nodes whichreport in that session. sl then runs Agg successively to aggregate all the contributions from the reportingnodes and the pre-stored ciphertexts selected to replenish missing contributions of the silent nodes. Notethat ai and aj contain information about the identities of s l’s child nodes whose contributions have beenincorporated in the two ciphertexts ci and cj ; at the end of running Agg, al includes all the child nodes’identities originally in ai and aj .

The security of the AWGH instantiation [Armknecht et al. 2008] is summarized by the followingtheorem, with its proof and the details of potential attacks given in Appendix E.

THEOREM 6.4. Suppose encryption keys used in different aggregation sessions are independent. Ifthe modified Vernam cipher is used to instantiate the bihomomorphic encryption and the same valueof r is used system-wide for all pre-stored ciphertexts, then the resulting AWGH instantiation achievessemantic security or is 0-IND-CPA-secure when there is no compromised node (i.e. t = 0), assuming

13Since distributing Vj’s requires knowledge of the topology of a deployment, [Armknecht et al. 2008] performs distributed key

generation after the network has been deployed, which starts with choosing dk(t) and then splits it to form each share ek(t)i .

Nonetheless, the two key generation methods arrive at identically the same joint probability distribution for the assigned keys.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 23: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 23

the adversary has no knowledge of the encryption keys used. When there exist at least one compromisednode (i.e. t > 0), the AWGH instantiation becomes insecure with respect to both indistinguishabilityand node privacy. By compromising all the child nodes and one of the grandchild nodes of the sink, anadversary is able to recover the final aggregate for any aggregation session.

Recall that the security of AWGH hinges on the security strength of the bihomomorphic encryptionused. AWGH might potentially achieve similar level of security as CMT-based schemes do — semanticsecurity against (n − 1) compromised nodes — if a sufficiently secure instantiation of bihomomorphicencryption can be constructed. However, replacing the exclusive OR operation of the Vernam one-timepad by an addition operation, as done in [Armknecht et al. 2008], is the only available instantiationin the literature and finding other lightweight instantiations for bihomomorphic encryption remains anopen problem. It is thus fair to conclude that, despite the resulting efficiency, AWGH is not suitable foruse in scenarios wherein node compromises are expected.

It is widely known that the Vernam one-time pad achieves perfect secrecy in the information theoreticsense, but the constraint is that a key cannot be used to encrypt two different messages. This securityguarantee and limitation is inherited in the bihomomorphic encryption used in [Armknecht et al. 2008].However, embedded in the AWGH architecture is that each key could potentially be used twice toencrypt different messages, a newly sensed value and the system-wide value r (in pre-stored ciphertextsfor replenishing missing contributions). Consequently, when an adversary is able to compromise anode, say si, and obtain all the ciphertexts for r (encrypted under different unknown keys) stored at s i,he would have learnt some information. In fact, this information is sufficient for him to tell apart twoaggregation scenarios. More precisely, this information allows the adversary to determine, from theirciphertexts, the difference of two plaintext intermediate aggregates obtained at any two child nodes of s i

in any aggregation session even though the two child nodes are not compromised. As a result, it is easyfor the adversary to tell apart two given aggregation scenarios. When the adversary also compromisesthe parent node of si, he would be able to recover all the aggregate keys of the child nodes of s i.

6.4 SMART and CPDA

SMART (Slice-Mix-AggRegaTe) and CPDA (Cluster-based Private Data Aggregation) [He et al. 2007]are general private data aggregation schemes using non-tree aggregation topologies. SMART uses thegraph-sum of a number of trees, whereas, CPDA uses a tree with a complete subgraph (for each cluster)attached to each of its leaves. For both schemes, the aggregation is done in two steps with differentaggregation procedures applied. CPDA and SMART differs in the first step. In the second step, normalplaintext aggregation [Madden et al. 2002] is performed over an aggregation tree for data which are theresults of the first step. As can be seen, both schemes do not protect the secrecy of the final aggregate andhence do not fulfill the semantic security notion in Section 4.2. The operations of the first aggregationstep of the two schemes are as follows.

[SMART] Suppose each data input is represented as a number in some finite group Z p (with integerp) and the arithmetic is done in mod p. In the first aggregation step of SMART, each node s i splits itsprivate data input mi into J random pieces mij′ (1 ≤ j′ ≤ J) such that

∑Jj′=1 mij′ = mi. Node

si keeps one piece, say, mii′ (where 1 ≤ i′ ≤ J), and sends the other (J − 1) pieces mij′ (where1 ≤ j′ ≤ J and j ′ �= i′) through a secure channel via symmetric key encryption to (J − 1) other nodes(denoted by sj’s) which are picked beforehand by s i as its parents; si shares a distinct secret key witheach sj . Each node sj receives a number of data pieces, decrypts them and sums them together to obtaina partial sum Mj (Note that the nodes may not receive the same number of pieces). Each node s j formsa node of the aggregation tree in the second aggregation step with input M j .

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 24: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

24 ·[CPDA] Similar to SMART, the arithmetic is done in Zp. The first aggregation step of CPDA involvesa number of non-overlapping clusters, each having at least w nodes. Without loss of generality, assumeeach cluster has exactly w nodes. These clusters partition the whole set of n nodes and each nodeshares a distinct secret key with each of the other (w − 1) nodes in the same cluster. Nodes in thesame cluster jointly perform a private computation of sum on their inputs, based on (w,w) thresholdsecret sharing [Shamir 1979]; the construction is similar to Pedersen’s threshold cryptosystem [Pedersen1991] involving w parallel runs of a secret sharing scheme. In an (l, k) threshold secret sharing scheme,a secret is split into l shares and any k shares can re-construct the secret but any (k − 1) (or less) ofthem leak no information of the secret; that is, in a (w,w) threshold secret sharing scheme, a secret issplit into w shares, all of which are needed to re-construct the secret.

We focus on the operations of a single cluster. Without loss of generality, assume the w nodes inthe cluster are from the subset {si : 1 ≤ i ≤ w} ⊂ U . Note that this choice of cluster membershipaims to facilitate a clearer discussion and is not a restriction for the actual implementation of CPDA; thediscussion below readily applies to other clusters (with different membership) by simply re-assigningnode identities or assigning each node a logical identity solely used in its cluster. In a CPDA cluster,each node si splits its data input mi into w shares mij (1 ≤ j ≤ w) using (w,w) threshold secretsharing; si keeps one share, say, mii, and gives a distinct share to each of the other (w − 1) nodesin the same cluster, say, mij to sj (where 1 ≤ j ≤ w and j �= i), through a secure channel viasymmetric key encryption. Each node s j has mjj (a share of its own input) and receive (w − 1) shares{mij : 1 ≤ i ≤ w; i �= j} from nodes in the same cluster; due to the homomorphic property ofthreshold secret sharing, the partial sum Mj =

∑wi=1 mij of these w shares forms a share of the sum

M =∑w

i=1 mi of data inputs from the w nodes in the same cluster. Each node sends its partial sumMj to a cluster head. When all these w partial sums are available, the cluster head can reconstruct thesum M (for the cluster) using the Langrange interpolation. The cluster head then forms a node of theaggregation tree in the second aggregation step with input M .

6.4.1 A Modification for CPDA. As mentioned by the authors [He et al. 2007], the secret sharingand reconstruction steps in CPDA cause a considerable computation overhead. However, we notice thatthe use of secret sharing is not necessary in their design. A typical threshold secret sharing scheme hastwo design goals: secret protection and resilience to missing/lost shares in secret reconstruction; hence,in general usage, an (l, k) threshold scheme with l > k is chosen. The (w,w) threshold scheme usedin CPDA aims solely to protect the secret input and is a very inefficient means to achieve such a goal.In fact, the secret sharing scheme can be replaced by the splitting technique as used in SMART, thatis, each node (with data input mi) chooses (w − 1) random numbers, sends these numbers to the other(w−1) nodes in the same cluster, and uses mi minus the sum of these random numbers as its own share.The security of the original scheme is preserved with this method, and CPDA and SMART would havea similar computational overhead.

With this modification, CPDA and SMART are essentially the same mechanism with different aggre-gation topologies. Nodes in CPDA are restricted to send their shares to the (w − 1) nodes in the samecluster while nodes in SMART can choose any (J − 1) nodes out of the entire set of nodes to send theirshares; this fact implies that SMART generally provides better node privacy than CPDA for a similarcommunication overhead assuming the adversary captures nodes randomly. Besides, the first aggrega-tion step of SMART is complete after each node sends its shares and the second aggregation step usesa tree covering the entire set of nodes, whereas, nodes in CPDA have to send their partial sums to acluster head to complete the first aggregation step and the second aggregation step uses a tree coveringcluster heads only. If sending partial sums to a cluster head in CPDA is viewed as part of the secondaggregation step, then the first aggregation step is essentially the same for both schemes (regardless of

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 25: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 25

different restrictions on choosing parent nodes which lead to different aggregation topologies in the firststep) and the second aggregation step for both schemes is merely plaintext aggregation over the wholeset of nodes using different aggregation topologies.

6.4.2 Security of SMART and CPDA. Intuitively, SMART (with a less structured aggregation topol-ogy) seems more secure than CPDA as an adversary needs to capture more nodes in SMART for a par-ticular node’s secret to be revealed. This is true only when the adversary has limited means to controlthe aggregation topology around the victim node. When the adversary is in full control of the topologyformation process, he can limit or even reject nodes choosing the victim node as their parent. In ourmodel, the adversary is assumed to be capable to manipulate the aggregation topology formation; hence,both schemes achieve the same level of security. The security of CPDA and SMART is summarized bythe following theorems, whose proofs are in Appendix F.

THEOREM 6.5. Suppose the adversary is capable to manipulate the formation of the underlyingaggregation topology. A CPDA scheme using a cluster size of w achieves node privacy against anycollusion with at most (w − 2) compromised nodes using chosen plaintext attacks (that is, (w − 2)-NP-CPA-secure) but is insecure when t ≥ (w − 1).

THEOREM 6.6. Suppose the adversary is capable to manipulate the formation of the underlyingaggregation topology. A SMART scheme with parameter J achieves node privacy against any collusionwith at most (J−2) compromised nodes using chosen plaintext attacks (that is, (J−2)-NP-CPA-secure)but is insecure when t ≥ (J − 1).

7. COMPARISONS

The properties of the schemes discussed in this paper are summarized and compared in Table I. Thesecurity notion shown is the strongest security notion achievable by a scheme. The communicationoverhead is measured in the number of bits transmitted in an aggregation session by a node.

Schemeachievable

notion

max. tolerable

collusion size, t

forward

security

communication overhead

(transmission load per node)*

generic CDA IND-CPA n− 1√

lasy (∼ 103)

WGA IND-CPA 0 × Domingo-Ferrer ciphertext size (∼ 103)

basic CMT IND-CPA n− 1 × PRF key length + |hdr| · lID (∼ 102)

hashed CMT IND-CPA n− 1 × lagg + |hdr| · lID (∼ 102)

AWGH IND-CPA 0 × lagg + lID (∼ 101)

CPDA NP-CPA w − 2 × lagg + (w − 1) · lsym (∼ 102 − 103)

SMART NP-CPA J − 2 × lagg + (J − 1) · lsym (∼ 102 − 103)

Note: * Typical average transmission load (in bits) per node per aggregation session is given in the bracket.

lasy: the ciphertext length of typical asymmetric-key encryption

lsym: the ciphertext length of typical symmetric-key encryption

lID : number of bits needed to represent a node identity

lagg : maximum length of a data aggregate

Table I. A comparison between schemes discussed in this paper.

Out of the schemes discussed, the generic CDA construction given in Section 5 provides the best pri-vacy but usually has a large overhead for each node to transmit as all existing public key homomorphic

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 26: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

26 ·encryption schemes have a large ciphertext. Note that the Domingo-Ferrer cipher used in WGA [West-hoff et al. 2006] is based on a composite modulus which is as large as those used in factoring-basedasymmetric key cryptosystems. Hence, despite offering a much weaker security guarantee, WGA doesnot have much gain in the communication overhead compared to the generic CDA construction. TheCMT type of schemes [Castelluccia et al. 2005; Castelluccia et al. 2009] could be the most efficientscheme achieving semantic security. For a balanced w-ary aggregation tree over n reporting nodes,the header size |hdr| (sent by a node) in an aggregation session of CMT has an average of roughly(logw n+1) identities, leading to an average communication overhead of ((logw n+ 1) · lID + λ) bitswhere lID is the number of bits needed to represent a node identity and λ is the key length of the PRFneeded for a certain level of security. Typical value for λ (in bits) is of the order 10 2. For simplicity, wecould take lID = log2 n. Then for a sensor network with 1,000 to 100,000 nodes, the overhead (in bits)due to the transmission of node identities is also of the order 102. The overall overhead (in bits) pernode in CMT is thus of the order 102. For the same scenario, the hashed CMT [Castelluccia et al. 2009]would have approximately 30− 50% reduction in the communication overhead (depending on n). Notethat the communication overhead for CMT given above is an average value over all nodes, and a nodein CMT may actually transmit more bits than others. While the total communication overhead of anaggregation session is evenly shared among all nodes in other schemes, in CMT or hashed CMT, a nodecloser to the root of the aggregation tree (that is, the sink) could have a much higher transmission loadthan those closer to a leaf. It should be noted that the communication overhead for each node in AWGH[Armknecht et al. 2008] is constant, irrespective of the total number of nodes in the system or where thenode is located in the aggregation tree. This overhead is also considerably lower than all other schemes.However, under the current state of art on bihomomorphic encryption constructions, the efficiency ofAWGH is achieved at the expense of security guarantee against compromised nodes.

8. CONCLUSIONS

In this paper, we give a rigorous treatment to privacy-preserving data aggregation, covering both CDAand general private data aggregation. More specifically, we extend standard privacy notions in cryptog-raphy to cover the CDA/PDA scenario which is a multi-sender, single-receiver system supporting aggre-gation. We show that achieving indistinguishability or node privacy against adaptive chosen ciphertextattacks is impossible if the aggregation algorithm is public. We also give a generic CDA constructionbased on any semantically secure public key homomorphic encryption scheme and prove that it achievessemantic security and forward security. Finally, we analyze the security of five existing constructions(WGA, CMT, AWGH, CPDA and SMART) in the proposed model and suggest a modification to CPDAto make it more computationally efficient while preserving the original security property. We believethe proposed security model and framework provide essential security guidelines for future designs ofCDA and PDA, while acting as a comprehensive tool for analyzing CDA/PDA designs.

APPENDIX

A. PROOF OF THEOREM 5.2: SECURITY OF GENERIC CDA CONSTRUCTION FROMPUBLIC KEY HOMOMORPHIC ENCRYPTION

The proof is done through a sequence of games with slight differences in the transition between con-secutive games (as in the approach suggested by Shoup [Shoup 2004]). In the following, we explicitlymention the difference between games without repeating parts which are identical.

Instead of working on the game defined for CDA semantic security (in Section 4.2), we show that theadvantage of winning a slightly modified game Game is negligible for all PPT algorithms. Recall that in

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 27: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 27

the challenge phase of the semantic security game, the adversary chooses a set S of nodes with identityin I and outputs two sets of plaintexts, namely, M0 = {m0k : k ∈ I} and M1 = {m1k : k ∈ I}, alongwith an aggregation topology aux. The constraint is that |S\S ′| > 0 and m0k = m1k for all k ∈ I ′

where S ′ is the set of all compromised nodes and I ′ is the set of their identities. The adversary is thenchallenged with a ciphertext (I, C) of the aggregate xb = f(...,mbk, ...) where b ∈ {0, 1}, along withthe communication transcript including the ciphertexts for all messages in M b and their intermediateaggregates. Normally, the adversary is allowed to compromise nodes only before the challenge phase;when forward security is considered, the adversary is allowed to compromise all the remaining nodesafter receiving the challenge.

[Game.] The difference between Game and the semantic security game is as follows. First, in thesemantic security game, the adversary is allowed to compromise at most t < n nodes in Query 1 phase,that is, |S′| ≤ t, whereas, in Game, the adversary can compromise all n nodes of the system at thebeginning of the game. Second, Game does not have the constraint (m 0k = m1k for all k ∈ I ′) on theadversary’s choice. We prove the following two claims.

CLAIM A.1. If a public key homomorphic encryption is used, no PPT adversary can win Gamewith non-negligible advantage implies that the generic CDA construction achieves semantic securityagainst any collusion with at most (n− 1) nodes and forward security when n nodes are compromised.

CLAIM A.2. If the underlying public key homomorphic encryption is semantically secure, no PPTadversary can win Game with non-negligible advantage.

Combining the two claims, it is clear to see that if the homomorphic encryption used is semanticallysecure, the generic CDA construction is semantically secure against any collusion with at most (n− 1)nodes and achieves forward security when all the n nodes are compromised. The proofs are as follows.

PROOF OF CLAIM A.1. We prove by contradiction. Assume a public key homomorphic encryptionscheme is used so that possessing the public key would not allow an adversary to tell apart the ciphertextsfor two different messages. Suppose there exists an efficient adversary A which, in collusion with atmost t compromised nodes, can break the semantic security and forward security of the generic CDAconstruction. We show in the following how A can be used to construct another adversary A ′ to winGame with non-negligible advantage for all 0 ≤ t < n.

Algorithm A′

Setup. Receive the public key pk from the challenger and pass it to A. Note that all the n nodes arecompromised by A′.

Query 1. Pass all the oracle queries fromA to the challenger of Game and relay the responses backto A. For any node compromise requested by A, pass the information of the requested node to A. LetS′ and I ′ denote the sets of nodes compromised by A and their identities. Then |S ′| = |I ′| ≤ t.

Challenge. In the challenge phase, receive from A an aggregation topology aux, a set S of targetnodes with identities in I and two sets of plaintext messages, namely, M0 = {m0k : k ∈ I} andM1 = {m1k : k ∈ I} such that m0k = m1k, ∀k ∈ I ′. Note that 1 ≤ |I| ≤ n. Pass M0,M1 as achallenge request for Game. When a challenge (which is the transcript consisting of the ciphertexts forall the messages in Mb and their intermediate aggregates where b ∈ {0, 1}), pass it to A.

Query 2. Same as Query 1 phase. Allow A to corrupt the remaining (n − t) nodes. Recall that allthe n nodes have been compromised by A ′ at the beginning.

Guess. When A outputs b′, output b′ as a guess for b.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 28: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

28 ·It is clear that if A is PPT, so is A′. M0 and M1 form a valid challenge request for Game because

there is no constraint imposed on the adversary’s choice in Game. It is obvious that, for all 0 ≤ t < n,the advantage for A′ to win Game is at least equal to the advantage for A to win the semantic securityand forward security game after colluding with at most t compromised nodes. If the former is negligible,so is the latter, implying security of the generic CDA construction as claimed in Theorem 5.2.

PROOF OF CLAIM A.2. Suppose the underlying homomorphic encryption is semantically secure ac-cording to Definition 5.1. In other words, the advantage Adv HE-IND-CPA

A (λ) for any PPT adversaryA tosuccessfully break semantic security of the homomorphic encryption is bounded by an advantage boundnegligible in λ. We denote this advantage bound by εHE.

For the sake of simple notations, we use EHEpk (m) (instead of EHE

pk (m, r)) to denote the encryptionof a message m under the public key pk without explicitly denoting the randomness r used in theencryption process; we assume no randomness reuse in encryption, that is, given the ciphertexts of anytwo messages, the random coins used to encrypt them are independently generated.

Note that since a public key homomorphic encryption scheme is adopted in the generic CDA construc-tion, simulating the encryption or aggregation oracles in the following games is actually not necessary.In fact, simulating an encryption oracle or an aggregation oracle is fairly straightforward; to answer anencryption or aggregation query, one can run the publicly known encryption or aggregation algorithm ofthe homomorphic encryption scheme to generate a valid reply; this reply should look indistinguishablefrom one in the real attack environment.

We will focus on showing that no efficient adversary can win Game with non-negligible advantage.This will be done through a sequence of games defined as follows.

[Game’.] The difference between Game and Game’ is that, in the challenge phase of the latter, the ad-versary is given only the set of the ciphertexts for all messages in M b, namely, the set E = {EHE

pk (mbk) :k ∈ I} (where b can be 0 or 1) while the adversary of the former is also given the communication tran-script (consisting of the ciphertexts of all the intermediate aggregates for the challenged aggregationsession) in addition to E . Since the aggregation algorithm AHE of the homomorphic encryption ispublic, any adversary A′′ in Game’ can perfectly simulate a valid communication transcript for an ad-versary A′ in Game by simply invoking AHE on the ciphertexts in E . This simulated communicationtranscript would look indistinguishable from that in a real attack and allow A ′′ to make use of any ef-ficient adversary A′ (successfully winning Game) to win Game’. It is straightforward to see that theadvantages of the two adversaries in winning Game and Game’ respectively are equal.

[Game”.] The only difference between Game’ and Game” is that, in the former, there is no constraintimposed on the size of S, whereas, in the latter, it is required that |S| = n for a valid challenge request.That is, in Game”, |M0| = |M1| = n. It is obvious that, if no PPT adversary can win Game”, then noPPT adversary can win Game’ either.

Up to this point, it is clear that, if the advantage of winning Game” is negligible for all PPT adver-saries, than the generic CDA construction achieves semantic security against any collusion with at most(n− 1) compromised nodes and forward security when all n nodes are compromised. In the following,we define a sequence of very similar games Gamei (0 ≤ i ≤ n) and show the reduction between eachconsecutive pair. The main idea follows a standard hybrid argument [Goldreich 2001].

[Game0 - Gamen−1.] Gamei is defined by the same game as Game” except that a constraint isimposed on the acceptable choices of M0 and M1. In Game”, the adversary can choose any two sets ofplaintext data (which could be different at all element positions) as a challenge request; in other words,it is allowed that m0k �= m1k, ∀k ∈ I in Game”. Note that |I| = n. Nevertheless, m0k and m1k could

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 29: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 29

still be equal for some values of k in Game”. On the contrary, in Game i, we require that the first ielements of M0 and M1 have to be the same, that is, m0k = m1k for 1 ≤ k ≤ i. For (i + 1) ≤ k ≤ n,m0k and m1k could be equal or different.

[Gamen.] Gamen differs from Gamen−1 in that the two plaintext sets M0,M1 in Gamen are identi-cal, that is, the two resulting aggregates are equal.

Let Wi denote the event that the adversary wins Gamei and AdvGamei

Didenote the advantage of

winning Gamei by an adversary Di; when Di is missing in the notation, that means the advantagebound for all PPT adversaries. There are three claims about this sequence of games.

(1) Game0 is exactly Game”.

(2) For Gamen, M0 = M1. Hence, Pr[Wn] = 1/2 and the advantage AdvGamen = 0 for all PPTadversaries.

(3) For any two consecutive games, say Gamei−1 and Gamei, the following holds where εHE is theadvantage bound of breaking the semantic security of the underlying homomorphic encryption.

AdvGamei−1 ≤ AdvGamei + εHE, ∀1 ≤ i ≤ n.

It is straightforward to see why the first claim is true. For the second claim, in Gamen, the adversaryis challenged with the task to distinguish between the ciphertexts of two identical set of messages. Inother words, the two sets of ciphertexts (in the challenge) have identical distribution and no algorithmshould be able to tell them apart. As a result, the probability of success should not be better thana random guess, that is, Pr[Wn] = 1/2. The last claim means the following: if there exists someefficient adversary Di−1 which can win Gamei−1 with non-negligible advantage, then there existseither an efficient adversary Di which can win Gamei with non-negligible advantage or an efficientadversary DHE

i−1 which can break the semantic security of the underlying homomorphic encryption withnon-negligible advantage. Under the assumption that the homomorphic encryption scheme used issemantically secure, such an adversary DHE

i−1 does not exist.We prove claim (3) by contradiction. Assume the underlying homomorphic encryption is semanti-

cally secure, that is, AdvHE-ID-CPAD′ < εHE (negligible) for all PPT adversary D ′. Suppose there exists a

PPT adversaryDi−1 which can successfully win Gamei−1 with non-negligible advantageAdvGamei−1

Di−1.

We show below how to construct another adversary D i from Di−1 to win Gamei, assuming Di−1 al-ways outputs a guess.

Algorithm Di

Setup. Receive the public key pk from the challenger and pass it to D i−1.

Query 1. Since no private key is needed for encryption or aggregation of the homomorphic encryp-tion, the public algorithms EHE and AHE can be used to answer any encryption or aggregation queries.

Challenge. In the challenge phase, receive from D i−1 two sets of plaintext messages, namely,

M(i−1)0 = {m1,m2, . . . ,mi−1,m0i,m0(i+1), . . . ,m0n}, and

M(i−1)1 = {m1,m2, . . . ,mi−1,m1i,m1(i+1), . . . ,m1n}.

Note that the first (i − 1) elements of M (i−1)0 and M

(i−1)1 are the same. To create a valid challenge

request for Gamei, Di needs to generate two sets M(i)0 and M

(i)1 of plaintexts such that the first i

elements of the two sets are equal. This can be done by replacing m 1i ∈ M(i−1)1 with m0i ∈ M

(i−1)0 ,

that is, the challenge request submitted for Gamei is

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 30: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

30 ·

M(i)0 = {m1,m2, . . . ,mi−1,m0i,m0(i+1), . . . ,m0n}, and

M(i)1 = {m1,m2, . . . ,mi−1,m0i,m1(i+1), . . . ,m1n}.

Where b could be 0 or 1, Di would receive from its challenger of Gamei a set Eb of ciphertexts (en-crypted with unknown random coins) as depicted below:Eb = {EHE

pk (m1), EHEpk (m2), . . . , E

HEpk (mi−1), E

HEpk (m0i), E

HEpk (mb(i+1)), . . . , E

HEpk (mbn)}.

The task of Di is to guess whether b = 0 or b = 1. Di passes Eb as challenge for Di−1.

Query 2. Same as Query 1 phase.

Guess. When Di−1 outputs b′, Di outputs b′ as its guess for b.

Assuming Di−1 always outputs a guess, if Di−1 is PPT, so is Di. We compute the probability of successfor Di, denoted by PrDi [Wi], as follows. LetM0,M1, C0 and C1 denote the following subsets:

M0 = {m1,m2, . . . ,mi−1,m0(i+1), . . . ,m0n}, andM1 = {m1,m2, . . . ,mi−1,m1(i+1), . . . ,m1n};C0 = {EHE

pk (m1), EHEpk (m2), . . . , E

HEpk (mi−1), E

HEpk (m0(i+1)), . . . , E

HEpk (m0n)}, and

C1 = {EHEpk (m1), E

HEpk (m2), . . . , E

HEpk (mi−1), E

HEpk (m1(i+1)), . . . , E

HEpk (m1n)}.

Note that M (i−1)0 =M0 ∪ {m0i} and M

(i−1)1 =M1 ∪ {m1i}, M (i)

0 = M(i−1)0 and M

(i)1 =M1 ∪

{m0i}. Note also that Eb = Cb∪{EHEpk (m0i)} whereas a valid challenge for Di−1 is Cb∪{EHE

pk (mbi)}for b ∈ {0, 1}. The probability of success for D i is as follows:

PrDi [Wi] = Pr[Di(Eb) = b] = Pr[Di−1(Eb) = b] = 12 {Pr[Di−1(E0) = 0] + Pr[Di−1(E1) = 1]}

= 12

{Pr[Di−1(C0 ∪ {EHE

pk (m0i)}) = 0] + Pr[Di−1(C1 ∪ {EHEpk (m0i)}) = 1]

}

= 12

{Pr[Di−1(C0 ∪ {EHE

pk (m0i)}) = 0] + Pr[Di−1(C1 ∪ {EHEpk (m0i)}) = 1]

+Pr[Di−1(C1 ∪ {EHEpk (m1i)}) = 1]− Pr[Di−1(C1 ∪ {EHE

pk (m1i)}) = 1]

}

= PrDi−1 [Wi−1]

+ 12

{Pr[Di−1(C1 ∪ {EHE

pk (m0i)}) = 1]− Pr[Di−1(C1 ∪ {EHEpk (m1i)}) = 1]

}.

The last step makes use of the following fact:

PrDi−1 [Wi−1] =1

2

{Pr[Di−1(C0 ∪ {EHE

pk (m0i)}) = 0] + Pr[Di−1(C1 ∪ {EHEpk (m1i)}) = 1]

}.

Rearranging terms, we have

PrDi−1 [Wi−1] = PrDi [Wi]

+ 12

{Pr[Di−1(C1 ∪ {EHE

pk (m1i)}) = 1]− Pr[Di−1(C1 ∪ {EHEpk (m0i)}) = 1]

}.

Subtracting 1/2 from both sides and taking absolute value, we have∣∣PrDi−1 [Wi−1]− 12

∣∣ ≤ ∣∣PrDi [Wi]− 12

∣∣+ 1

2

∣∣∣Pr[Di−1(C1 ∪ {EHEpk (m1i)}) = 1]− Pr[Di−1(C1 ∪ {EHE

pk (m0i)}) = 1]∣∣∣

AdvGamei−1

Di−1≤ AdvGamei

Di+ εi−1

where εi−1 = 12

∣∣∣Pr[Di−1(C1 ∪ {EHEpk (m1i)}) = 1]− Pr[Di−1(C1 ∪ {EHE

pk (m0i)}) = 1]∣∣∣.

It is clear that, if m0i = m1i, then εi−1 = 0 since{C1 ∪ {EHE

pk (m0i)}

and{C1 ∪ {EHE

pk (m1i)}

are

identical probability distributions and no algorithm should be able to tell them apart. For m 0i �= m1i,we argue that εi−1 ≤ εHE (that is, overall, εi−1 ≤ εHE for all choices of m0i,m1i). The reason is if

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 31: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 31

this quantity is not negligible, we can use Di−1 to construct an algorithm DHEi−1 to break the semantic

security of the underlying homomorphic encryption with non-negligible advantage. The constructionof DHE

i−1 is as follows: after receiving M(i−1)0 and M

(i−1)1 from Di−1, DHE

i−1 passes m0i, m1i as achallenge request to the challenger in the semantic security game; D HE

i−1 generates the ciphertexts in C1using the public encryption algorithm EHE

pk ; when a challenge cb = EHEpk (mbi) (where b is 0 or 1) is

received, DHEi−1 passes C1 ∪ {cb} as challenge for Di−1; when Di−1 outputs a guess b′, DHE

i−1 outputsb′ as its own guess. It can be shown that AdvHE-ID-CPA

DHEi−1

= εi−1 where AdvHE-ID-CPADHE

i−1

is the advantage

that DHEi−1 breaks semantic security of the homomorphic encryption successfully. 14 The advantage for

all PPT algorithms (including DHEi−1) to break the semantic security game is bounded by εHE (which

is negligible under the assumption that the homomorphic encryption scheme is semantically secure).Consequently, εi−1 = AdvHE-ID-CPA

DHEi−1

≤ εHE is negiligible. Overall, we have

AdvGamei−1

Di−1≤ AdvGamei

Di+ εHE, 1 ≤ i ≤ n.

Starting from Game0 and keeping iterating over i, we arrive at the following:

AdvGame0

D0≤ AdvGamen

Dn+ n · εHE. (1)

That is, if there exists an efficient (PPT) adversary D0 which can win Game0 with non-negligibleadvantage AdvGame0

D0, from D0, we can always obtain another efficient (PPT) adversary Dn which can

win Gamen with non-negligible advantage AdvGamen

Dnsatisfying Equation (1). Following claim (2),

AdvGamen = 0 for all PPT adversaries including Dn, we have the following:

AdvGame0

D0≤ n · εHE.

If AdvGame0

D0is non-negligible, then εHE must be non-negligible (a contradiction). As a result, if the

underlying homomorphic encryption is semantically secure (that is, ε HE is negligible), then AdvGame0

D0

must be negligible for all D0; that is, no PPT adversary can win Game” with non-negligible advantage.This in turn implies that there is negligible advantage to win Game for all PPT adversaries. Thisconcludes the proof.

B. PROOF OF THEOREM 6.1: SECURITY OF WGA

Semantic security of a symmetric key homomorphic encryption is defined by the same game for publickey homomorphic encryption in Section 5.1 except the following: first, a private key is generated inthe Setup phase (that is, no public key is given to the adversary); second, the adversary cannot freelyencrypt messages, this has to be done via an encryption oracle which returns the encryption of a messagesubmitted by the adversary. We call this game GameSKHE.

14The derivation is as follows.

AdvHE-ID-CPADHE

i−1

=

∣∣∣∣PrDHEi−1

[Success]− 12

∣∣∣∣=

∣∣∣ 12{Pr[DHE

i−1(EHEpk (m0i)) = 0] + Pr[DHE

i−1(EHEpk (m1i)) = 1]

}− 1

2

∣∣∣= 1

2

∣∣∣{1− Pr[DHE

i−1(EHEpk (m0i)) = 1] + Pr[DHE

i−1(EHEpk (m1i)) = 1]

}− 1

∣∣∣= 1

2

∣∣∣Pr[DHEi−1(E

HEpk (m1i)) = 1]− Pr[DHE

i−1(EHEpk (m0i)) = 1]

∣∣∣= 1

2

∣∣∣Pr[Di−1(C1 ∪ {EHEpk (m1i)}) = 1]− Pr[Di−1(C1 ∪ {EHE

pk (m0i)}) = 1]∣∣∣ = εi−1

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 32: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

32 ·PROOF. Suppose there are n nodes in total. We first derive a slightly different game Game’ from

GameSKHE. The only difference between Game’ and GameSKHE is in the Challenge phase; the adver-sary in Game’ submits two sets M0,M1 of messages (such that |M0| = |M1| = n) while the adversaryin GameSKHE submits a single pair of messages. Using a standard hybrid argument [Goldreich 2001],it can be shown that AdvGame’ < n · εSKHE , where AdvGame’ is the maximum advantage of win-ning Game’ achievable by PPT adversaries and εSKHE is the advantage bound of breaking semanticsecurity of the symmetric key homomorphic encryption. Hence, if the symmetric key homomorphicencryption is semantically secure (that is, εSKHE is negligible), then no PPT adversary can win Game’with non-negligible advantage.

We show in the following that, if no PPT adversary can win Game’ with non-negligible advan-tage, then WGA is semantically secure when no compromised node exists. We prove by contradiction.Assume no PPT adversary can win Game’ with non-negligible advantage. Suppose there exists an ad-versary A which can break the semantic security of WGA with advantage AdvWGA-IND-HE

A . Then it istrivial that A can be used as a subroutine of another algorithm A ′ to win Game’ with non-negligibleadvantage AdvGame’

A′ . The construction of A′ is as follows: First, relay all encryption queries from A;any encryption oracle query from A can be answered easily by A ′ using the query result from the en-cryption oracle of Game’; in other words, the view to A in this simulation is indistinguishable fromthat in the real attack. Second, whenA submits a topology aux and two sets M 0,M1 of messages for achallenge, if |M0| = |M1| < n, randomly pick messages to fill up M0,M1 to form M ′

0,M′1 such that

|M ′0| = |M ′

1| = n, and extend aux to form aux′; this is possible because the aggregation algorithmis public in WGA. Pass M ′

0,M′1 and aux′ as a challenge request for Game’. When a challenge is re-

ceived, extract the part of the transcript corresponding to aux and M 0,M1 and pass it as a challenge forA. When A output a guess b′, output b′ as a guess for b.

It is obvious that if A is PPT, so is A′ and the two advantages are equal. In other words, ifAdvWGA-IND-HE

A is non-negligible, so is AdvGame’A′ , implying A′ can win Game’ with non-negligible

advantage (a contradiction). This concludes the proof. Overall, we can conclude that, if the symmet-ric key homomorphic encryption is semantically secure, then no PPT adversary can win Game’ withnon-negligible advantage which, in turn, implies that WGA is semantically secure when there is nocompromised node. However, as few as one node is compromised, the adversary knows the decryptionkey and can gain knowledge of all future aggregates by passive eavesdropping — not even one-waynesscan be achieved if compromised nodes exist in WGA.

C. PROOF OF THEOREM 6.2: SECURITY OF BASIC CMT

To prove the security of basic CMT, we make use of the indistinguishability property of a pseudorandomfunction (PRF) stated as follows.

Indistinguishability of a Pseudorandom Function (PRF). LetFλ = {fs : {0, 1}λ→ {0, 1}λ}s∈{0,1}λ

be a family of keyed functions. Let Γλ,λ denote the set of all functions from {0, 1}λ to {0, 1}λ. Infor-mally, Fλ is said to be pseudorandom if it is hard to distinguish a random function drawn from F λ anda random function drawn from Γλ,λ. More formally, for all PPT algorithm A (with oracle access to f s

and a function in Γλ,λ), the following is negligible in λ:

∣∣∣Pr[s← {0, 1}λ : Afs() = 1]− Pr[f ′ ← Γλ,λ : Af ′() = 1]

∣∣∣ .We denote the maximum value of this quantity (over all PPT algorithms) by ε PRF which is negligible.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 33: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 33

In fact, this quantity is equal to two times the PRF advantage.15 Overall, we have:

AdvPRFA (λ) ≤ 1

2· εPRF , for all PPT algorithmA. (2)

The indistinguishability property also implies the following: Assume fK : {0, 1}λ → {0, 1}λ istaken from a PRF family with an unknown, randomly picked key K . Then for a fixed input argument x,the output fK(x) is computationally indistinguishable from a randomly picked number from {0, 1} λ toany PPT distinguisher who has knowledge of x and a set of polynomially many 2-tuples (x i, fK(xi))where xi �= x. That is, the following is negligible in λ for all PPT algorithmA:∣∣Pr[K ← {0, 1}λ : AfK (fK(x)) = 1]− Pr[r ← {0, 1}λ : AfK (r) = 1]

∣∣where x is an input not queried before. This quantity is also bounded by ε PRF .

Instead of proving the CDA semantic security game defined in Section 4.2, a slightly different gameGame” is proven. The transition from the original semantic security game to Game” involves twomodifications. Denote the game after the first modification by Game’. First, in the original game,the adversary is given a complete transcript of an aggregation session while, in Game’, the adversaryis only given a set of ciphertexts for the input messages. Second, in the original game or Game’,the adversary can choose two different sets M0,M1 of messages as a challenge request without anyconstraint imposed on the size of M0,M1, whereas, in Game”, |M0| = |M1| = n.

With the first modification, for all 0 ≤ t < n, Game’ is equivalent to the original game for the caseof basic CMT since the aggregation algorithm of CMT is public. When given a set of ciphertexts forthe input messages, it is easy to generate (through the public aggregation algorithm) a communicationtranscript for an aggregation session with an arbitrary topology. Then, for the second modification, it isstraightforward to see that no PPT adversary can win Game” with non-negligible advantage implies thatno PPT adversary can win Game’ with non-negligible advantage either. The reason is that it is alwayspossible to derive a valid challenge request for Game” from a valid challenge request for Game’ (or theoriginal semantic security game) and to extract part of the returned challenge from Game” to generatea valid challenge for Game’.

Note that Game” with different values of t represents a number of versions of the game. A set ofsimilar games, denoted by Gamet for 0 ≤ t < n, can be derived from these versions of Game”. Thedifference between Gamet and the corresponding version of Game” is that, in Game t, the adversarycompromises exactly t nodes, whereas, in the corresponding version of Game”, the number of nodesthe adversary can compromise is any value from 0 to t. If no PPT adversary can win Game t for all0 ≤ t < n with non-negligible advantage, it can be concluded that no PPT adversary can win (with non-negligible advantage) the version of Game’ with t = (n− 1), which in turn implies that the basic CMTscheme is semantically secure against any PPT adversary in collusion with at most (n−1) compromisednodes. We prove the following two claims about Game0,Game1, . . . ,Gamen−1.

CLAIM C.1. No PPT adversary can win Gamen−1 with non-negligible advantage if f is a pseudo-random function and a new nonce is used for each aggregation session.

CLAIM C.2. No PPT adversary can win Gamet−1 with non-negligible advantage for 1 ≤ t < n iff is a pseudorandom function and no PPT adversary can win Game t with non-negligible advantage.

15The derivation is as follows.∣∣∣Pr[s← {0, 1}λ : Afs() = 1]− Pr[f ′ ← Γλ,λ : Af ′() = 1]

∣∣∣=

∣∣∣Pr[s← {0, 1}λ : Afs () = 1]− 1 + Pr[f ′ ← Γλ,λ : Af ′() = 0]

∣∣∣=

∣∣2 · PrPRFA [Success]− 1

∣∣ = 2 · ∣∣PrPRFA [Success]− 1

2

∣∣ = 2 · AdvPRFA (λ).

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 34: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

34 ·It is clear that combining the two claims would lead to the security claim for basic CMT as stated inTheorem 6.2, that is, basic CMT is semantically secure against any collusion with at most (n − 1)compromised nodes if f is a PRF and a new nonce is used for each aggregation session. Since theaggregation algorithm of basic CMT is public, no aggregation oracle is needed in the following.

PROOF OF CLAIM C.1. Without loss of generality, we prove the security of a modified version inwhich each encryption key is uniformly picked from {0, 1} λ, compared to keys generated by a pseudo-random function in basic CMT. We then provide a justification why the inference applies to the actualCMT implementation. Assume the PRF output fits in Zp.

We prove by contradiction. Suppose there exists a PPT adversary D which, in coalition with exactly(n− 1) compromised nodes, can win Gamen−1 with non-negligible advantage AdvGamen−1

D . We showin the following how D can be used to construct an algorithm D ′ which can tell apart the output of ffrom a random number with non-negligible advantage. Assume the PRF key K is unknown to D ′.

Algorithm D′

Setup. Allow the adversary D to choose any (n − 1) nodes to corrupt. Randomly pick (n − 1)encryption keys eki ∈ {0, 1}λ and pass them to D. Assume node n is uncorrupted. The encryption keyfor node n is taken to be K (the PRF key D ′ is being challenged with). That is, K is unknown to D ′.

Query. Upon receiving an encryption query 〈i j ,mj〉with nonce rj , return cj = (fekij(rj)+mj) mod p

if ij �= n. Otherwise, pass rj to query the PRF challenger to obtain fK(rj) and reply with cj =(fK(rj) +mj) mod p.

Challenge. Receive from the adversary D two sets of messages M0 = {m01,m02, . . . ,m0n} andM1 = {m11,m12, . . . ,m1n} in the challenge phase. Note that m0k = m1k for 1 ≤ k ≤ n − 1.Randomly pick a number w and output it to the PRF challenger to ask for a challenge. Note that w isthe nonce used for CDA encryption in the challenge for D. The PRF challenger flips a coin b ∈ {0, 1}and returns tb, which is fK(w) when b = 0 and randomly picked from {0, 1}λ when b = 1.Flip a coin d ∈ {0, 1}, and return to D the set Cd of challenge ciphertexts for Md, that is,

Cd = {fek1(w) +md1, fek2(w) +md2, . . . , fekn−1(w) +md(n−1), tb +mdn}.Guess. D returns its guess b′. Return b′′ which is 0 when b′ = d and 1 otherwise.

Obviously, if D is PPT, D′ is also PPT assuming D always outputs a guess. Since m0k = m1k, ∀k ∈[1, n − 1], the only difference between C0 and C1 is the n-th element, which is tb + m0n in C0 andtb+m1n in C1. Denote the first (n−1) elements of Cd by C1,n−1 and mdn by Xd, the challenge passedto D can be expressed as (cd, C1,n−1) where cd = Xd+ tb. Note that C1,n−1 contains exactly the sameelements for both d = 0 and d = 1. For the sake simpler notations, in the following discussion, weexclude C1,n−1 in the notations and denote the output of D on input (c d, C1,n−1) simply by D(cd). Theprobability of success for D ′ to distinguish between fK(w) and a random number is then:

PrPRFD′ [Success] = Pr[b′′ = b] = 1

2 {Pr[b′′ = 0|b = 0] + Pr[b′′ = 1|b = 1]}

= 14

{Pr[b′′ = 0|b = 0, d = 0] + Pr[b′′ = 0|b = 0, d = 1]+ Pr[b′′ = 1|b = 1, d = 0] + Pr[b′′ = 1|b = 1, d = 1]

}

= 14

{Pr[D(t0 +X0) = 0] + Pr[D(t0 +X1) = 1]+ Pr[D(t1 +X0) = 1] + Pr[D(t1 +X1) = 0]

}

= 14

{Pr[D(t0 +X0) = 0] + Pr[D(t0 +X1) = 1] + 1+ Pr[D(t1 +X0) = 1]− Pr[D(t1 +X1) = 1]

}

= 14

{2Pr

Gamen−1

D [Success] + 1 + (Pr[D(t1 +X0) = 1]− Pr[D(t1 +X1) = 1])}.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 35: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 35

Note that t0 + X0 and t0 + X1 are valid CMT ciphertexts for the two challenged plaintexts m0n andm1n respectively. In the last step, we make use of the fact that the probability of success for D to winGamen−1 is: Pr

Gamen−1

D [Success] ={

12Pr[D(t0 +X0) = 0] + 1

2Pr[D(t0 +X1) = 1]}.

Rearranging terms, we have{4PrPRF

D′ [Success]+Pr[D(t1 +X1) = 1]− Pr[D(t1 +X0) = 1]

}= 2Pr

Gamen−1

D [Success] + 1{4(PrPRF

D′ [Success]− 12 )

+Pr[D(t1 +X1) = 1]− Pr[D(t1 +X0) = 1]

}= 2(Pr

Gamen−1

D [Success]− 12 ).

Taking absolute value on both sides and substituting with: AdvPRFD′ = |PrPRF

D′ [Success] − 12 | and

AdvGamen−1

D = |PrGamen−1

D [Success]− 12 |, we have

2AdvPRFD′ +

1

2

∣∣ Pr[D(t1 +X1) = 1]− Pr[D(t1 +X0) = 1]∣∣ ≥ Adv

Gamen−1

D .

Since t1 is a randomly picked number, the two sets of distribution {t 1 + X0} and {t1 + X1} areidentically distributed. That is, no algorithm should be able to distinguish the two distributions and, forany PPT algorithm D,

Pr[D(t1 +X0) = 1] = Pr[D(t1 +X1) = 1].

Hence,

2 · AdvPRFD′ (λ) ≥ Adv

Gamen−1

D (λ).

From Equation (2), 2 · AdvPRFD′ ≤ εPRF . That is, AdvGamen−1

D (λ) ≤ εPRF . If AdvGamen−1

D is non-negligible in λ, so is εPRF . As a result, if D can win Gamen−1 with non-negligible advantage,D ′ coulddistinguish between the output of pseudorandom function f and a random number (a contradiction tothe indistinguishability property of a PRF).

The above security argument applies to the actual implementation of basic CMT since the view ofthe adversary D in the above simulation is in essence the same as that in the actual scheme. For eachone of the n − 1 corrupted node, the encryption key is fK′(i) ( 1 ≤ i ≤ n − 1) for some randomlypicked master key K ′. By the property of a PRF, fK′(i) is indistinguishable from a randomly pickedkey (as used in the above simulation game) for all PPT distinguisher algorithms. For the uncorruptednode, its output for encryption is now ffK′(n)(x) instead of fK(x) (with randomly picked K) as usedin the above simulation game. It can be shown by a contrapositive argument that, for fixed n and givenx, the two distributions are computationally indistinguishable, that is,

{K ′ ← {0, 1}λ : (x, ffK′ (n)(x))

} c≡{K ← {0, 1}λ : (x, fK(x))

}.

The argument is as follows: Assume f is a PRF, that is, A = {K ′ ← {0, 1}λ : fK′(n)} is indis-tinguishable from B = {K ← {0, 1}λ : K} for all PPT distinguishers knowing n. If there existsa PPT distinguisher D which can distinguish between X = {K ′ ← {0, 1}λ : (x, ffK′ (n)(x))} andY = {K ← {0, 1}λ : (x, fK(x))}, we can use D to distinguish between A and B. The idea is whenwe receive a challenge s which could be from A or B, we send x and f s(x) as a challenge for D. If sbelongs to A, (x, fs(x)) belongs to X , and if s belongs to B, (x, fs(x)) belongs to Y . We could thusdistinguish X from Y (a contradiction).

PROOF OF CLAIM C.2. We prove by contradiction. Assume f is a PRF and no PPT adversary canwin Gamet with non-negligible advantage. Suppose there exists a PPT adversary D t−1 which can win

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 36: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

36 ·

Gamet−1 with non-negligible advantageAdvGamet−1

Dt−1. Dt−1 can be used to construct another adversary

Dt to win Gamet with non-negligible advantage AdvGamet

Dt. The construction of Dt is as follows.

Algorithm Dt

Setup. Allow Dt−1 to choose any (t − 1) nodes to corrupt. D t then requests from its challengerto corrupt these (t − 1) nodes plus an additional node. Without loss of generality, assume that D t−1

corrupts nodes with identities 1, 2, ..., (t − 1) and Dt corrupts nodes with identities 1, 2, ..., (t − 1), t.Pass the (t− 1) encryption keys eki (for 1 ≤ i ≤ t− 1) to Di−1 but keep ekt.

Query. Upon receiving an encryption query 〈i j ,mj〉with nonce rj , return cj = (fekij(rj)+mj) mod p

if 1 ≤ ij ≤ t. Otherwise, pass 〈ij , rj ,mj〉 to make an encryption query through the encryption oracleof Dt’s game to get back an encryption cj for rj ,mj encrypted under ekij , and pass cj to Dt−1.

Challenge. Receive from Dt−1 two sets of messages, namely, M0 = {m01,m02, . . . ,m0n} andM1 = {m11,m12, . . . ,m1n}. Note that a valid challenge request from Dt−1 satisfies that m0k = m1k

for all k ∈ [1, t − 1]. To derive a valid challenge request M ′0,M

′1 for Gamet, replace m1t in M1 by

m0t in M0. That is, M ′0 = M0 and M ′

1 = M1 ∪ {m0t}\{m1t}. When a challenge Cb (which is a set ofciphertexts for messages in M ′

b where b ∈ {0, 1}) is received along with the used nonce w, pass w,Cb

as a challenge for Dt−1.

Guess. When Dt−1 returns its guess b′, return b′.

In the following discussion, denote by Ew(M) the set of ciphertexts for a set M of messages en-crypted with nonce w, assuming each element m ∈M is indexed by the identity of the node whose en-cryption key is used to encrypt m in Ew(M). For example, Ew(M0) = Ew({m01,m02, . . . ,m0n}) ={fek1(w) +m01, fek2(w) +m02, . . . , fekn(w) +m0n}. That is, Cb = Ew(M

′b).

It is clear that if Dt−1 is PPT, so is Dt, assuming Dt−1 always outputs a guess. Let Wt and Wt−1

denote the respective events that the adversary wins Game t and Gamet−1. The probability of successof Dt is as follows. The derivation is very similar to that in the proof of Claim C.1.

PrDt [Wt] = 12 {Pr[Dt(Ew(M

′0)) = 0] + Pr[Dt(Ew(M

′1)) = 1]}

= 12 {Pr[Dt−1(Ew(M0)) = 0] + Pr[Dt−1(Ew(M

′1)) = 1]}

= 12

{Pr[Dt−1(Ew(M0)) = 0] + Pr[Dt−1(Ew(M1)) = 1]+Pr[Dt−1(Ew(M

′1)) = 1]− Pr[Dt−1(Ew(M1)) = 1]

}

= PrDt−1 [Wt−1] +12 {Pr[Dt−1(Ew(M

′1)) = 1]− Pr[Dt−1(Ew(M1)) = 1]}

Note that M ′0 = M0 and Ew(M0) and Ew(M1) form a valid challenge request for Gamet−1. As a

result, the following fact is used to derive the last step:PrDt−1 [Wt−1] =

12 {Pr[Dt−1(Ew(M0)) = 0] + Pr[Dt−1(Ew(M1)) = 1]}.

Switching terms and taking absolute value on both sides, we have

AdvGamet−1

Dt−1≤ AdvGamet

Dt+ εt−1

where εt−1 = 12 |Pr[Dt−1(Ew(M1)) = 1]− Pr[Dt−1(Ew(M

′1)) = 1]|.

Recall that M ′1 = M1∪{m0t}\{m1t}, that is, M ′

1 and M1 differ in a single position. In other words,Ew(M1) and Ew(M

′1) have only one different element. In essence, ε t−1 is the advantage of Dt−1 in

distinguishing the two distributions {m0t + fekt(w)} and {m1t + fekt(w)} when given fixed m0t,m1t

since all other positions of Ew(M1) and Ew(M′1) have identical elements for both cases. Note that

Dt−1 does not possess ekt. Using the same technique in the proof of Claim C.1, we argue that if f is aPRF, εt−1 is negligible and, more specifically, εt−1 ≤ εPRF . The details are as follows:

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 37: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 37

For fixed m0t,m1t and a randomly picked r ∈ {0, 1}λ, the two distributions {m0t+r} and {m1t+r}are identically distributed, that is, no algorithm can tell them apart, implying that

Pr[r ← {0, 1}λ : Dt−1(m0t + r) = 1] = Pr[r ← {0, 1}λ : Dt−1(m1t + r) = 1].

Substituting this into the expression for εt−1, we have

εt−1 = 12 |Pr[Dt−1(m0t + fekt(w)) = 1]− Pr[Dt−1(m1t + fekt(w)) = 1]|

= 12

∣∣∣∣ Pr[Dt−1(m0t + fekt(w)) = 1]− Pr[Dt−1(m1t + fekt(w)) = 1]−Pr[r ← {0, 1}λ : Dt−1(m0t + r) = 1] + Pr[r ← {0, 1}λ : Dt−1(m1t + r) = 1]

∣∣∣∣≤ 1

2

{ ∣∣Pr[Dt−1(m0t + fekt(w)) = 1]− Pr[r ← {0, 1}λ : Dt−1(m0t + r) = 1]∣∣

+∣∣Pr[Dt−1(m1t + fekt(w)) = 1]− Pr[r ← {0, 1}λ : Dt−1(m1t + r) = 1]

∣∣}

Note that fekt(w) is the output of a PRF for input w under an unknown key ek t and r is a randomnumber. It is trivial that

∣∣Pr[Dt−1(m0t + fekt(w)) = 1]− Pr[r ← {0, 1}λ : Dt−1(m0t + r) = 1]∣∣ ≤

εPRF and∣∣Pr[Dt−1(m1t + fekt(w)) = 1]− Pr[r ← {0, 1}λ : Dt−1(m1t + r) = 1]

∣∣ ≤ εPRF , other-wise, it is straightforward to construct an algorithm from D t−1 to tell apart the output of a PRF and arandom number. Consequently, we have ε t−1 ≤ εPRF and hence

AdvGamet−1

Dt−1≤ AdvGamet

Dt+ εPRF .

If there exists an adversary Dt−1 which can win Gamet−1 with non-negligible advantage, thenAdv

Gamet−1

Dt−1is non-negligible. This in turn implies that AdvGamet

Dtin non-negligible unless εPRF is

non-negligible (which contradicts with the assumption that f is a PRF). Hence, if no PPT adversary canwin Gamet with non-negligible advantage and f is a PRF, AdvGamet−1

Dt−1must be negligible, implying

that no PPT adversary can win Gamet−1 with non-negligible advantage.

Combining Claims C.1 and C.2, it can be concluded that the basic CMT scheme is semanticallysecure against all collusion with at most (n − 1) compromised nodes assuming f is a PRF and a newnonce is used for each aggregation session.

D. PROOF OF THEOREM 6.3: SECURITY OF HASHED CMT

PROOF. Only a few modifications to the security proof of the basic CMT scheme in Appendix C areneeded in order to prove the security of the hash variant.

First, all ciphertexts are now generated using the hashed values of the pseudorandom function outputsor replies from the challenger. With such changes, we now denote the ciphertext c for any message munder encryption key eki with nonce w by c = m+ h(feki(w)).

Second, in the proof of Claim C.1, the challenge passed to D would be: c d = Xd + h(tb). Then thederivation for the advantage expressions is essentially the same as that for basic CMT.

Third, the proofs of both Claims C.1 and C.2 in basic CMT rely on the fact that, for fixed X 0, X1, thetwo distributions {t1 ← {0, 1}λ : t1 +X0} and {t1 ← {0, 1}λ : t1 +X1} are identical. To prove thecorresponding claims for hashed CMT, we need the following distributions to be identical:

{t1 ← {0, 1}λ : h(t1) +X0}, {t1 ← {0, 1}λ : h(t1) +X1}.If h fulfills the mentioned requirement, then the distribution {t 1 ← {0, 1}λ : h(t1)} is uniform over{0, 1}l. Consequently, the above two distributions are identical.

Fourth, the proof of Claim C.2 for basic CMT requires that, for fixed message m and a randomr ∈ {0, 1}λ, the following holds if f is a PRF:∣∣Pr[Dt−1(m+ fekt(w)) = 1]− Pr[r ← {0, 1}λ : Dt−1(m+ r) = 1]

∣∣ ≤ εPRF .

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 38: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

38 ·To prove the corresponding claim for hashed CMT, it is required that∣∣Pr[Dt−1(m+ h(fekt(w))) = 1]− Pr[r ← {0, 1}λ : Dt−1(m+ h(r)) = 1]

∣∣ ≤ εPRF .

This requirement is fulfilled in hashed CMT since h is a public function, otherwise, it is straightforwardto construct an algorithm from Dt−1 to distinguish the output of a PRF from a random number withnon-negligible advantage. This concludes the proof that hashed CMT is semantically secure.

E. PROOF OF THEOREM 6.4: SECURITY OF AWGH

No computational assumption is needed when proving semantic security of AWGH. The only assump-tion needed is that the adversary has no knowledge about the secret keys used in an aggregation session.We assume the adversary is able to choose nodes to capture after the aggregation topology is known andprove the following two claims regarding AWGH security.

CLAIM E.1. The AWGH instantiation using the modified Vernam one-time pad and the same valuefor r system-wide is semantically secure when there is no compromised node.

CLAIM E.2. The same AWGH instantiation (as described in Claim E.1) is insecure when there existcompromised nodes. At minimum, a PPT adversary just needs to compromise one node to break both theindistinguishability and node privacy goals under chosen plaintext attacks when he is able to choosewhich node to compromise. If all the child nodes of the sink are compromised, given one additionalcompromised node of his choice (more precisely, a grandchild node of the sink), the adversary canrecover the final aggregate from the public communication transcript of any aggregation session.

PROOF OF CLAIM E.1. When given the public communication transcript of an aggregation session,anyone can extract the ciphertext for any input from a node participating in the session. If the key usedby each participating node is random and the adversary has no knowledge about any of the keys used,all the ciphertexts would look like a random number in Z p.

Since the keys used in any aggregation session are assumed independent of those used in other ses-sions, given the encryption of plaintexts chosen by the adversary (which belongs to previous sessions)would not help the adversary to learn information about the session being challenged. We can thereforefocus on a single aggregation session without considering chosen plaintext attacks.

We prove Claim E.1 by contradiction. Assume the adversary has no knowledge about any of the keysused in the challenge session. If the adversary is now able to tell, with non-negligible advantage, whichone of two given aggregation scenarios (involving the same set S of nodes but on two sets M 0,M1 ofinput messages) a given communication transcript corresponds to, with non-negligible probability, hecan also correctly recover all the secret encryption keys involved in the aggregation, that is, all the keysbelonging to nodes in S. The main reason is that the messages inM 0 and M1 are known to the adversary,and the set of input messages Mb = {mbi : si ∈ S} is either M0 or M1. The communication transcriptallows the adversary to retrieve the ciphertext c i of each node si ∈ S and ci = mbi + eki mod p whereeki is si’s secret key. Note that eki = ci − mbi mod p. If the adversary can guess the set of inputmessages, Mb, correctly, he can substitute mbi and ci to recover eki. This concludes the proof.

PROOF OF CLAIM E.2. In the AWGH instantiation given in [Armknecht et al. 2008], beside its ownsecret key eki, an internal node si in the aggregation tree also stores a number of ciphertexts, eachof which is the encryption of a system-wide value r encrypted under the aggregate key Kj of a childnode of si, say sj , where the aggregate key Kj =

∑sq∈ST (sj)

ekq mod p with ST (sj) being the setof nodes in the subtree rooted at sj . si stores one ciphertext for each of its child nodes. Consequently,compromising a node would also gain information about the aggregate keys of all of its child nodes. In

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 39: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 39

fact, knowing the information stored in one compromised node is sufficient to allow an adversary to tellapart two given aggregation scenarios. With two compromised nodes such that the second one is thefirst one’s child node, the adversary can uniquely determine all the aggregate keys of the child nodes ofthe second compromised node. Details of these attacks are as follows.

Case 1: One compromised node. Suppose the adversary has captured a node s i. Then he learns eki(the secret key of si) and {v(r)j : sj is a child node of si} where v

(r)j is the pre-stored ciphertext for r

encrypted using the aggregate key of s j (which is a child node of si). More specifically, for each of si’s

child nodes, say sj , v(r)j = r + Kj mod p where Kj is the aggregate key of sj . Since the same valueof r needs to be used for all pre-stored ciphertexts in the current instantiation [Armknecht et al. 2008],from any two pre-stored ciphertexts, say v

(r)j and v

(r)j′ for nodes sj and sj′ respectively, the adversary

can obtain (Kj − Kj′) mod p, which is equal to (v(r)j − v

(r)j′ ) mod p.

In the challenge aggregation session, if both s j and sj′ are not silent, then the ciphertexts vj , vj′ sentfrom them (and obtainable from the public transcript) are respectively v j = Kj+

∑sq∈ST (sj)

mq mod p

and vj′ = Kj′+∑

sq∈ST (sj′ )mq mod p wheremq denotes the message input from node sq. Computing

CD = (vj − vj′ ) − (v(r)j − v

(r)j′ ) mod p would give the sum-difference on the input messages, that is,

SD =∑

sq∈ST (sj)mq −

∑sq∈ST (sj′ )

mq mod p. By computing SD for the two given aggregationscenarios in the challenge and comparing the results with CD, the adversary is now able to tell which ofthe two given aggregation scenarios the communication transcript (given in the challenge) correspondsto, thus breaking the indistinguishability goal. Note that the adversary is allowed to choose the two setsof inputs in the challenge; hence, the SD values for the two scenarios could be made arbitrarily differentat his will.

The above attack can be readily extended to break the node privacy goal as well since the adversaryis allowed to choose his victim and compromised nodes, as well as, nodes participating in the challengesession. Without loss of generality, suppose the victim node is a leaf node and the adversary has alreadycompromised its parent node and chooses the victim node and one of its siblings to participate in thechallenge aggregation session. Note that the adversary does not need to compromise the sibling node.Using the parent node’s information and the public transcript, he can compute the CD value whichgives (m − m′) mod p where m and m′ are the unknown inputs of the victim node and its sibling.From the final aggregate x revealed as part of the challenge in the node privacy game, the adversary canactually determine (m +m′) mod p. Together with CD, the adversary can therefore uniquely find m,the unknown input of the victim node, thus breaking the node privacy goal.

Case 2: Two or more compromised nodes. Given the information stored in two compromised nodessi, sj such that sj is a child node of si, the adversary can recover the aggregate keys of all the childnodes of sj , that is, Kw1 , Kw2 , . . . where sw1 , sw2 , . . . are child nodes of sj . The reason is: from

node si, the adversary learns eki and v(r)j = (r + Kw1 + Kw2 + . . . + ekj) mod p; from node sj , the

adversary learns ekj and v(r)w1 , v

(r)w2 , . . . where v

(r)w1 = (r + Kw1) mod p, v

(r)w2 = (r + Kw2) mod p, . . .;

this set of independent linear equations is of sufficient rank to allow the adversary to uniquely determineKw1 , Kw2 , . . . and r.

From now on, for each additional captured node s j′ , with the determined value of r, the adversarycan learn all the aggregate keys of the child nodes of s j′ . Note that sj′ can be any node and does notneed to be a child node of si or sj . As a result, by compromising all the child nodes of the sink andone of the child nodes of these nodes, an adversary can recover the final aggregate for any aggregationsession.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 40: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

40 ·F. PROOF OF THEOREMS 6.5 AND 6.6: SECURITY OF CPDA AND SMART

Since the final aggregate is transmitted in plaintext in CPDA and SMART, both schemes are not se-mantically secure even though no compromised node exists. We prove that the two schemes achievenode privacy if the number of captured nodes t < (w − 1) in CPDA and t < (J − 1) in SMART. Thesecurity proofs for CPDA and SMART are almost the same as the adversary is supposed to be capableto manipulate the aggregation topology formation. We therefore only show the proof for CPDA.

We assume the adversary is able to choose nodes to capture after deciding a victim node and controlthe aggregation topology. In other words, all the captured nodes could be in the same cluster as thevictim is in CPDA or could be the parent nodes of the victim in SMART. Assume the symmetric keyencryption (with encryption algorithm Enc, used for communication between nodes in a cluster) issemantically secure, that is, without knowing the private key, it is difficult for a PPT adversary todistinguish between the ciphertexts of any pair of known messages even if he can obtain the ciphertextsof polynomially many plaintexts he chooses.

PROOF. We prove by contradiction. Assume the underlying symmetric key encryption scheme issemantically secure. Suppose there exists an adversary A, compromising (w − 2) nodes, can breaknode privacy of CPDA for a particular node with non-negligible advantage. We show how to constructanother algorithmA′ fromA to recover the plaintext of a ciphertext encrypted with an unknown key K .

When there are more than (w − 2) compromised nodes, the adversary could break the node privacygoal since he is assumed able to control the topology formation. We focus on a single cluster. Supposeall the (w−2) compromised nodes are in the same cluster as the victim node. Without loss of generality,let s1 be the victim node and s2 be the remaining non-compromised node in the cluster. Denote the setof compromised nodes by S ′, that is, S′ = {si : 3 ≤ i ≤ w}. Let kij denote the encryption key sharedbetween nodes si and sj , then k12 = K (the unknown key). Note that kij = kji. The construction ofA′ is as follows.

Algorithm A′

Setup. Randomly pick all the secret keys {kij : 1 ≤ i, j ≤ w, i �= j, (i, j) �= (1, 2)} used in thecluster except the key K shared between s1 and s2. A′ is challenged with the key K . Distribute the keyrings of the (w − 2) compromised nodes in S ′ to A.

Query. All the encryption queries fromA can be answered by running the encryption algorithm if thekey in question belongs to a node in S ′ or by passing the query to the challenger of A ′ if K is queried.

Challenge. The challenge phase has a number of steps as follows.(1) Receive fromA two messages x0, x1. A’s challenge is to decide whether s1’s input is x0 or x1. A′

is supposed to giveA the communication transcript for an aggregation session as a challenge. Askfor a challenge ciphertext c of an unknown plaintext m encrypted under K , that is, c = EncK(m).The task ofA′ is to find m.

(2) Randomly pick m3,m4, . . . ,mw to simulate the inputs of the (w − 2) compromised nodes (whichare s3, s4, . . . , sw) and run the first CPDA aggregation step on these (w− 2) inputs to simulate thepart of the transcript involving the transmissions from these (w − 2) nodes.

(3) Let the inputs of s1, s2 be unknown. In fact, the input m1 of node s1 is set to be either x0 or x1.To simulate the rest of the transcript, for 3 ≤ j ≤ w, randomly pick m 1j and m2j as the respectiveshares of m1 and m2 to be encrypted and transmitted to node s j ; the encryption can be done sincethe keys {k1j , k2j : 3 ≤ j ≤ w} are known to A′.

(4) Set the share m12 (from s1 to s2) to be the unknown message m. Randomly pick m ′ as the sharem21, and request the encryption oracle for the unknown key K to return a ciphertext c ′ for m′. c, c′

completes the transcript of the first aggregation session step.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 41: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 41

(5) To simulate the partial sums Mj for 3 ≤ j ≤ w, sum up all mij where 1 ≤ i ≤ w, 3 ≤ j ≤ w;these are numbers picked in steps (2) and (3). Randomly pick two numbers to simulate M 1,M2.

(6) The transcript for other clusters and the second aggregation step can be generated with ease. Passthe complete simulated transcript to A.

Guess. When A returns its guess b. Set m1 = xb. Note that{M1 =

∑wi=3 mi1 +m′ + (m1 −

∑wj=3 m1j −m)

M2 =∑w

i=3 mi2 +m+ (m2 −∑w

j=3 m2j −m′)

There are two linear equations with two unknowns, namely, m and m 2.16 The unknown message m canhence be found by solving the two equations.

IfA is PPT, so isA′. Assuming a node’s input is uniformly distributed overZp, the simulated view byA′ and the view of a real attack are indistinguishable. If A has non-negligible advantage to distinguishwhether s1’s input is x0 or x1 after seeing the transcript of an aggregation session (i.e. breaking thenode privacy notion), then A ′ has non-negligible probability to successfully find m from EncK(m)under an unknown key K (a contradiction to the assumption that Enc is semantically secure).

It is trivial to extend the above proof for SMART. The modifications are as follows: first, replace the(w− 1) neighboring nodes of the victim node s1 in CPDA by the (J − 1) parent nodes of s1 in SMARTand assume s2 is the un-compromised parent; second, s2 sends no share to s1 if s1 is not its parent;s1 still uses the challenge ciphertext c as the encryption of its share to s2; third, the partial sum outputof s2 would be a randomly picked number plus all the inputs from s 2’s children. The key K sharedbetween s1 and s2 is again assumed to be secret and the challenge forA ′ is to recover the message in c.A′ has the knowledge of all shared keys in the network except K . In other words, A ′ could decrypt allciphertexts from the compromised nodes and the children of both s 1 and s2.

ACKNOWLEDGMENT

Aldar C-F. Chan would like to acknowledge the Lee Kuan Yew Postdoctoral Fellowship and the AcRFresearch grant R-252-000-331-112 provided by the Singapore Ministry of Education to financially sup-port the work presented in this paper.

The work presented in this paper was supported in part by the European Commission within theSTREP UbiSec&Sens and WSAN4CIP projects. The views and conclusions contained herein are thoseof the authors and should not be interpreted as representing the official policies or endorsement of theUbiSec&Sens or WSAN4CIP project or the European Commission.”

REFERENCES

ABDELZAHER, T., ANOKWA, Y., BODA, P., BURKE, J., ESTRIN, E., GUIBAS, L., KANSAL, A., MADDEN, S., AND REICH,J. 2007. Mobiscopes for human space. IEEE Pervasive Computing 6, 2.

ACHARYA, M., GIRAO, J., AND WESTHOFF, D. 2005. Secure comparison of encrypted data in wireless sensor networks. InWiOpt 2005.

ARMKNECHT, F., WESTHOFF, D., GIRAO, J., AND HESSLER, A. 2008. A lifetime-optimized end-to-end encryption schemefor sensor networks allowing in-network processing. Computer Communications 31, 4 (Mar.), 734–749.

BELLARE, M. 1997. Practice-oriented provable-security. In International Workshop on Information Security (ISW 97), Springer-Verlag LNCS vol. 1396.

16Since all other numbers are randomly picked, with overwhelming probability, the two equations are independent.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 42: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

42 ·BELLARE, M., CANETTI, R., AND KRAWCZYK, H. 1996. Keying hash functions for message authentication. In Advances in

Cryptology — CRYPTO 1996, Springer-Verlag LNCS vol. 1109. 1–15.

BELLARE, M., DESAI, A., POINTCHEVAL, D., AND ROGAWAY, P. 1998. Relations among notions of security for public-keyencryption schemes. In Advances in Cryptology — CRYPTO 1998, Springer-Verlag LNCS vol. 1462. 26–45.

BELLARE, M., GUERIN, R., AND ROGAWAY, P. 1995. XOR MACs: New methods for message authentication using finitepseudorandom functions. In Advances in Cryptology — CRYPTO 1995, Springer-Verlag LNCS vol. 963. 15–28.

BELLARE, M., KILIAN, J., AND ROGAWAY, P. 1994. The security of cipher block chaining. In Advances in Cryptology —CRYPTO 1994, Springer-Verlag LNCS vol. 839. 341–358.

BELLARE, M. AND ROGAWAY, P. 1993. Random oracles are practical: A paradigm for designing efficient protocols. In ACMConference on Computer and Communication Security (CCS 93). 62–73.

BELLARE, M. AND ROGAWAY, P. 1995. Entity authentication and key distribution. In Advances in Cryptology — CRYPTO1993, Springer-Verlag LNCS vol. 950. 92–111.

CASTELLUCCIA, C., CHAN, A. C.-F., MYKLETUN, E., AND TSUDIK, G. 2009. Efficient and provably secure aggregation ofencrypted data in wireless sensor networks. ACM Transactions on Sensor Networks 5, 3 (May).

CASTELLUCCIA, C., MYKLETUN, E., AND TSUDIK, G. 2005. Efficient aggregation of encrypted data in wireless sensornetworks. In MobiQuitous’05. 1–9.

CHAN, A. C.-F. AND CASTELLUCCIA, C. 2007. On the privacy of concealed data aggregation. In European Symposium onResearch in Computer Security (ESORICS 2007), Springer-Verlag LNCS vol. 4734. 390–405.

CHAN, H., PERRIG, A., AND SONG, D. 2006. Secure hierarchical in-network aggregation in sensor networks. In ACM Confer-ence on Computer and Communication Security (CCS 06).

DOLEV, D., DWORK, C., AND NAOR, M. 2000. Nonmalleable cryptography. SIAM Journal on Computing 30, 2, 391–437.Preliminay version in 23rd ACM STOC, 1991.

DOMINGO-FERRER, J. 2002. A provably secure additive and multiplicative privacy homomorphism. In ISC’02, Springer-VerlagLNCS vol. 2433. 471–483.

EL GAMAL, T. 1985. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Transactions onInformation Theory IT-30, 4 (July), 469–472.

ESCHENAUER, L. AND GILGOR, V. D. 2002. A key management scheme for distributed sensor networks. In ACM Conferenceon Computer and Communication Security (CCS 2002). 41–47.

FIAT, A. AND NAOR, M. 1993. Broadcast encryption. In Advances in Cryptology — CRYPTO 1993, Springer-Verlag LNCS vol.773. 480–491.

GANTI, R., PHAM, N., TSAI, T.-E., AND ABDELZAHER, T. 2008. PoolView: Stream privacy for grassroots participatorysensing. In the 6th ACM Conference on Embedded Networked Sensor Systems (Sensys).

GIRAO, J., WESTHOFF, D., AND SCHNEIDER, M. 2005. CDA: Concealed data aggregation in wireless sensor networks. InIEEE ICC’05.

GOLDREICH, O. 2001. Foundations of Cryptography: Part 1. Cambridge University Press.

GOLDREICH, O., GOLDWASSER, S., AND MICALI, S. 1986. How to construct random functions. Journal of ACM (JACM) 33, 4,792–807.

GOLDWASSER, S. AND MICALI, S. 1984. Probabilistic encryption. Journal of Computer and System Sciences 28, 2, 270–299.

GOLDWASSER, S., MICALI, S., AND RIVEST, R. 1988. A secure signature scheme secure against adaptive chosen-messageattacks. SIAM Journal on Computing 17, 2, 281–308.

HE, W., LIU, X., NGUYEN, H., NAHRSTEDT, K., AND ABDELZAHER, T. 2007. PDA: Privacy-preserving data aggregation inwireless sensor networks. In IEEE INFOCOM 2007.

HU, L. AND EVANS, D. 2003. Secure aggregation for wireless networks. In Workshop on Security and Assurance in Ad hocNetworks.

KATZ, J. AND YUNG, M. 2006. Characterization of security notions for probabilistic private-key encryption. Journal of Cryp-tology 19, 1, 67–95.

KOLBITZ, N. AND MENEZES, A. 2007. Another look at “provable security”. Journal of Cryptology 20, 1, 3–37.

LEMAY, M., GROSS, G., GUNTER, C. A., AND GARG, S. 2007. Unified architecture for large-scale attested metering. InHICSS-40.

LUBY, M. 1996. Pseudorandomness and Cryptographic Applications. Princeton University Press, Princeton, NJ, USA.

MADDEN, S. R., FRANKLIN, M. J., HELLERSTEIN, J. M., AND HONG, W. 2002. TAG: a Tiny AGgregation service for ad-hocsensor networks. In OSDI’02. 131–146.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.

Page 43: A Security Framework for Privacy-preserving Data ... · (with symmetric-key encryption [Bellare et al. 1994] for privacy protection and message authentication codes [Bellare et al

· 43

MANULIS, M. AND SCHWENK, J. 2007. Provably secure framework for information aggregation in sensor networks. InInternational Conference on Computational Science and Its Applications (ICCSA) 2007, Springer-Verlag LNCS vol. 4705,Part I,. 603–621.

MANULIS, M. AND SCHWENK, J. 2009. Security model and framework for information aggregation in sensor networks. ACMTransactions on Sensor Networks 5, 2 (Mar.).

MICALI, S., RACKOFF, C., AND SLOAN, B. 1988. The notions of security of probabilistic cryptosystems. SIAM Journal onComputing 17, 2, 412–426.

NAOR, M. AND YUNG, M. 1990. Public-key cryptosystems provably secure against chosen-ciphertext attacks. In ACM Sympo-sium on Theory of Computing (STOC 1990). 427–437.

PAILLIER, P. 1999. Public-key cryptosystems based on composite degree residuosity classes. In Advances in Cryptology —EUROCRYPT 1999, Springer-Verlag LNCS vol. 1592. 223–238.

PEDERSEN, T. P. 1991. A threshold cryptosystem without a trusted party. In Advances in Cryptology — EUROCRYPT 1991,Springer-Verlag LNCS vol. 547. 522–526.

PETER, S., WESTHOFF, D., AND CASTELLUCCIA, C. 2008. A survey on the encryption of convergecast traffic with in-networkprocessing. IEEE Transactions on Dependable and Secure Computing 5, 4 (October-December).

PRZYDATEK, B., SONG, D., AND PERRIG, A. 2003. SIA: Secure information aggregation in sensor networks. In ACM SenSys2003.

RIVEST, R., SHAMIR, A., AND ADLEMAN, L. 1978. A method for obtaining digital signatures and public-key cryptosystems.Communications of ACM 21, 2 (Feb.), 120–126.

SHAMIR, A. 1979. How to share a secret. Communication of the ACM 22, 11, 612–613.SHANNON, C. E. 1949. Communication theory of secrecy systems. Bell Systems Technical Journal 28, 656–715.SHOUP, V. 2004. Sequences of games: a tool for taming complexity in security proofs. Cryptology ePrint Archive, Report

2004/332. http://eprint.iacr.org/.SHOUP, V. AND GENNARO, R. 2002. Securing threshold cryptosystems against chosen ciphertext attack. Journal of Cryptol-

ogy 15, 2, 75–96.VERNAM, G. S. 1926. Cipher printing telegraph systems for secret wire and radio telegraphic communications. Journal of the

American Institute of Electrical Engineers 45, 105–115. See also US patent #1,310,719.WESTHOFF, D., GIRAO, J., AND ACHARYA, M. 2006. Concealed data aggregation for reverse multicast traffic in sensor

networks: Encryption, key distribution, and routing adaption. IEEE Transactions on Mobile Computing 5, 10, 1417–1431.

ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.