Privacy-Preserving Public Auditing for Data Storage ... · Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing ... cloud data auditing system, which meets

Privacy-Preserving Public Auditing for Data StorageSecurity in Cloud Computing

Cong Wang, Qian Wang, and Kui RenDepartment of ECE

Illinois Institute of TechnologyEmail: {cong, qian, kren}@ece.iit.edu

Wenjing LouDepartment of ECE

Worcester Polytechnic InstituteEmail: [email protected]

Abstract—Cloud Computing is the long dreamed vision ofcomputing as a utility, where users can remotely store their datainto the cloud so as to enjoy the on-demand high quality applica-tions and services from a shared pool of configurable computingresources. By data outsourcing, users can be relieved from theburden of local data storage and maintenance. However, the factthat users no longer have physical possession of the possibly largesize of outsourced data makes the data integrity protection inCloud Computing a very challenging and potentially formidabletask, especially for users with constrained computing resourcesand capabilities. Thus, enabling public auditability for cloud datastorage security is of critical importance so that users can resortto an external audit party to check the integrity of outsourceddata when needed. To securely introduce an effective third partyauditor (TPA), the following two fundamental requirements haveto be met: 1) TPA should be able to efficiently audit the cloud datastorage without demanding the local copy of data, and introduceno additional on-line burden to the cloud user; 2) The thirdparty auditing process should bring in no new vulnerabilitiestowards user data privacy. In this paper, we utilize and uniquelycombine the public key based homomorphic authenticator withrandom masking to achieve the privacy-preserving public clouddata auditing system, which meets all above requirements. Tosupport efficient handling of multiple auditing tasks, we furtherexplore the technique of bilinear aggregate signature to extendour main result into a multi-user setting, where TPA can performmultiple auditing tasks simultaneously. Extensive security andperformance analysis shows the proposed schemes are provablysecure and highly efficient.

I. INTRODUCTION

Cloud Computing has been envisioned as the next-generation architecture of IT enterprise, due to its long listof unprecedented advantages in the IT history: on-demandself-service, ubiquitous network access, location independentresource pooling, rapid resource elasticity, usage-based pricingand transference of risk [1]. As a disruptive technology withprofound implications, Cloud Computing is transforming thevery nature of how businesses use information technology. Onefundamental aspect of this paradigm shifting is that data isbeing centralized or outsourced into the Cloud. From users’perspective, including both individuals and IT enterprises,storing data remotely into the cloud in a flexible on-demandmanner brings appealing benefits: relief of the burden forstorage management, universal data access with independentgeographical locations, and avoidance of capital expenditureon hardware, software, and personnel maintenances, etc [2].

While Cloud Computing makes these advantages moreappealing than ever, it also brings new and challenging securitythreats towards users’ outsourced data. Since cloud serviceproviders (CSP) are separate administrative entities, data out-sourcing is actually relinquishing user’s ultimate control overthe fate of their data. As a result, the correctness of the datain the cloud is being put at risk due to the following reasons.First of all, although the infrastructures under the cloud aremuch more powerful and reliable than personal computingdevices, they are still facing the broad range of both internaland external threats for data integrity. Examples of outages andsecurity breaches of noteworthy cloud services appear fromtime to time [3]–[5]. Secondly, for the benefits of their own,there do exist various motivations for cloud service providersto behave unfaithfully towards the cloud users regarding thestatus of their outsourced data. Examples include cloud serviceproviders, for monetary reasons, reclaiming storage by discard-ing data that has not been or is rarely accessed, or even hidingdata loss incidents so as to maintain a reputation [6]–[8]. Inshort, although outsourcing data into the cloud is economicallyattractive for the cost and complexity of long-term large-scaledata storage, it does not offer any guarantee on data integrityand availability. This problem, if not properly addressed, mayimpede the successful deployment of the cloud architecture.

As users no longer physically possess the storage of theirdata, traditional cryptographic primitives for the purpose ofdata security protection can not be directly adopted. Thus, howto efficiently verify the correctness of outsourced cloud datawithout the local copy of data files becomes a big challengefor data storage security in Cloud Computing. Note that simplydownloading the data for its integrity verification is not apractical solution due to the expensiveness in I/O cost andtransmitting the file across the network. Besides, it is ofteninsufficient to detect the data corruption when accessing thedata, as it might be too late for recover the data loss ordamage. Considering the large size of the outsourced dataand the user’s constrained resource capability, the ability toaudit the correctness of the data in a cloud environment canbe formidable and expensive for the cloud users [8], [9].Therefore, to fully ensure the data security and save the cloudusers’ computation resources, it is of critical importance toenable public auditability for cloud data storage so that theusers may resort to a third party auditor (TPA), who has

978-1-4244-5837-0/10/$26.00 ©2010 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE INFOCOM 2010 proceedingsThis paper was presented as part of the main Technical Program at IEEE INFOCOM 2010.

expertise and capabilities that the users do not, to audit theoutsourced data when needed. Based on the audit result, TPAcould release an audit report, which would not only help usersto evaluate the risk of their subscribed cloud data services, butalso be beneficial for the cloud service provider to improvetheir cloud based service platform [7]. In a word, enablingpublic risk auditing protocols will play an important role forthis nascent cloud economy to become fully established, whereusers will need ways to assess risk and gain trust in Cloud.

Recently, the notion of public auditability has been proposedin the context of ensuring remotely stored data integrity underdifferent systems and security models [6], [8], [10], [11].Public auditability allows an external party, in addition to theuser himself, to verify the correctness of remotely stored data.However, most of these schemes [6], [8], [10] do not supportthe privacy protection of users’ data against external auditors,i.e., they may potentially reveal user data information to theauditors, as will be discussed in Section III-C. This severedrawback greatly affects the security of these protocols inCloud Computing. From the perspective of protecting data pri-vacy, the users, who own the data and rely on TPA just for thestorage security of their data, do not want this auditing processintroducing new vulnerabilities of unauthorized informationleakage towards their data security [12]. Moreover, there arelegal regulations, such as the US Health Insurance Portabilityand Accountability Act (HIPAA) [13], further demandingthe outsourced data not to be leaked to external parties [7].Exploiting data encryption before outsourcing [11] is one wayto mitigate this privacy concern, but it is only complemen-tary to the privacy-preserving public auditing scheme to beproposed in this paper. Without a properly designed auditingprotocol, encryption itself can not prevent data from “flowingaway” towards external parties during the auditing process.Thus, it does not completely solve the problem of protectingdata privacy but just reduces it to the one of managing theencryption keys. Unauthorized data leakage still remains aproblem due to the potential exposure of encryption keys.

Therefore, how to enable a privacy-preserving third-partyauditing protocol, independent to data encryption, is the prob-lem we are going to tackle in this paper. Our work is among thefirst few ones to support privacy-preserving public auditing inCloud Computing, with a focus on data storage. Besides, withthe prevalence of Cloud Computing, a foreseeable increaseof auditing tasks from different users may be delegated toTPA. As the individual auditing of these growing tasks canbe tedious and cumbersome, a natural demand is then howto enable TPA to efficiently perform the multiple auditingtasks in a batch manner, i.e., simultaneously. To address theseproblems, our work utilizes the technique of public key basedhomomorphic authenticator [6], [8], [10], which enables TPAto perform the auditing without demanding the local copyof data and thus drastically reduces the communication andcomputation overhead as compared to the straightforwarddata auditing approaches. By integrating the homomorphicauthenticator with random masking, our protocol guaranteesthat TPA could not learn any knowledge about the data content

stored in the cloud server during the efficient auditing process.The aggregation and algebraic properties of the authenticatorfurther benefit our design for the batch auditing. Specifically,our contribution in this work can be summarized as thefollowing three aspects:

1) We motivate the public auditing system of data storagesecurity in Cloud Computing and provide a privacy-preservingauditing protocol, i.e., our scheme supports an external auditorto audit user’s outsourced data in the cloud without learningknowledge on the data content.

2) To the best of our knowledge, our scheme is the first tosupport scalable and efficient public auditing in the CloudComputing. In particular, our scheme achieves batch auditingwhere multiple delegated auditing tasks from different userscan be performed simultaneously by the TPA.

3) We prove the security and justify the performance of ourproposed schemes through concrete experiments and compar-isons with the state-of-the-art.

The rest of the paper is organized as follows. Section IIintroduces the system and threat model, our design goals,notations and preliminaries. Then we provide the detaileddescription of our scheme in Section III. Section IV givesthe security analysis and performance evaluation, followed bySection V which overviews the related work. Finally, SectionVI gives the concluding remark of the whole paper.

II. PROBLEM STATEMENT

A. The System and Threat Model

We consider a cloud data storage service involving threedifferent entities, as illustrated in Fig. 1: the cloud user(U), who has large amount of data files to be stored in thecloud; the cloud server (CS), which is managed by cloudservice provider (CSP) to provide data storage service andhas significant storage space and computation resources (wewill not differentiate CS and CSP hereafter.); the third partyauditor (TPA), who has expertise and capabilities that cloudusers do not have and is trusted to assess the cloud storageservice security on behalf of the user upon request.

Users rely on the CS for cloud data storage and mainte-nance. They may also dynamically interact with the CS toaccess and update their stored data for various applicationpurposes. The users may resort to TPA for ensuring thestorage security of their outsourced data, while hoping to keeptheir data private from TPA. We consider the existence ofa semi-trusted CS as [14] does. Namely, in most of time itbehaves properly and does not deviate from the prescribedprotocol execution. However, during providing the cloud datastorage based services, for their own benefits the CS mightneglect to keep or deliberately delete rarely accessed data fileswhich belong to ordinary cloud users. Moreover, the CS maydecide to hide the data corruptions caused by server hacks orByzantine failures to maintain reputation. We assume the TPA,who is in the business of auditing, is reliable and independent,and thus has no incentive to collude with either the CS orthe users during the auditing process. TPA should be able to


Data Flow

Cloud Service Provider

Users

CloudServers

Security

MessageFlow

Security Message Flow

SecurityMessage Flow

Third Party Auditor

Fig. 1: The architecture of cloud data storage service

efficiently audit the cloud data storage without local copy ofdata and without bringing in additional on-line burden to cloudusers. However, any possible leakage of user’s outsourceddata towards TPA through the auditing protocol should beprohibited.

Note that to achieve the audit delegation and authorize CSto respond to TPA’s audits, the user can sign a certificategranting audit rights to the TPA’s public key, and all auditsfrom the TPA are authenticated against such a certificate.These authentication handshakes are omitted in the followingpresentation.

B. Design Goals

To enable privacy-preserving public auditing for cloud datastorage under the aforementioned model, our protocol de-sign should achieve the following security and performanceguarantee: 1) Public auditability: to allow TPA to verify thecorrectness of the cloud data on demand without retrievinga copy of the whole data or introducing additional on-lineburden to the cloud users; 2) Storage correctness: to ensurethat there exists no cheating cloud server that can pass theaudit from TPA without indeed storing users’ data intact;3) Privacy-preserving: to ensure that there exists no wayfor TPA to derive users’ data content from the informationcollected during the auditing process; 4) Batch auditing: toenable TPA with secure and efficient auditing capability tocope with multiple auditing delegations from possibly largenumber of different users simultaneously; 5) Lightweight: toallow TPA to perform auditing with minimum communicationand computation overhead.

C. Notation and Preliminaries

• F – the data file to be outsourced, denoted as a sequenceof n blocks m1, . . . ,mn ∈ Zp for some large prime p.

• fkey(·) – pseudorandom function (PRF), defined as:{0, 1}∗ × key → Zp.

• πkey(·) – pseudorandom permutation (PRP), defined as:{0, 1}log2(n) × key → {0, 1}log2(n).

• MACkey(·) – message authentication code (MAC) func-tion, defined as: {0, 1}∗ × key → {0, 1}l.

• H(·), h(·) – map-to-point hash functions, defined as:{0, 1}∗ → G, where G is some group.

We now introduce some necessary cryptographic back-ground for our proposed scheme.

Bilinear Map Let G1, G2 and GT be multiplicative cyclicgroups of prime order p. Let g1 and g2 be generators of G1 andG2, respectively. A bilinear map is a map e : G1×G2 → GT

with the following properties [15]: 1) Computable: thereexists an efficiently computable algorithm for computing e;2) Bilinear: for all u ∈ G1, v ∈ G2 and a, b ∈ Zp,e(ua, vb) = e(u, v)ab; 3) Non-degenerate: e(g1, g2) �= 1; 4)for any u1, u2 ∈ G1, v ∈ G2, e(u1u2, v) = e(u1, v) · e(u2, v).

III. THE PROPOSED SCHEMES

In the introduction we motivated the public auditabilitywith achieving economies of scale for cloud computing. Thissection presents our public auditing scheme for cloud datastorage security. We start from the overview of our publicauditing system and discuss two straightforward schemes andtheir demerits. Then we present our main result for privacy-preserving public auditing to achieve the aforementioned de-sign goals. Finally, we show how to extent our main scheme tosupport batch auditing for TPA upon delegations from multi-users.

A. Definitions and Framework of Public Auditing System

We follow the similar definition of previously proposedschemes in the context of remote data integrity checking [6],[10], [11] and adapt the framework for our privacy-preservingpublic auditing system.

A public auditing scheme consists of four algorithms(KeyGen, SigGen, GenProof, VerifyProof). KeyGenis a key generation algorithm that is run by the user to setup thescheme. SigGen is used by the user to generate verificationmetadata, which may consist of MAC, signatures, or otherrelated information that will be used for auditing. GenProofis run by the cloud server to generate a proof of data storagecorrectness, while VerifyProof is run by the TPA to auditthe proof from the cloud server.

Our public auditing system can be constructed from theabove auditing scheme in two phases, Setup and Audit:

• Setup: The user initializes the public and secret pa-rameters of the system by executing KeyGen, and pre-processes the data file F by using SigGen to generatethe verification metadata. The user then stores the data fileF at the cloud server, deletes its local copy, and publishesthe verification metadata to TPA for later audit. As partof pre-processing, the user may alter the data file F byexpanding it or including additional metadata to be storedat server.

• Audit: The TPA issues an audit message or challengeto the cloud server to make sure that the cloud serverhas retained the data file F properly at the time of theaudit. The cloud server will derive a response messagefrom a function of the stored data file F by executingGenProof. Using the verification metadata, the TPAverifies the response via VerifyProof.

Note that in our design, we do not assume any additionalproperty on the data file, and thus regard error-correcting codesas orthogonal to our system. If the user wants to have more


error-resiliency, he/she can first redundantly encode the datafile and then provide us with the data file that has error-correcting codes integrated.

B. The Basic Schemes

Before giving our main result, we first start with twowarmup schemes. The first one does not ensure privacy-preserving guarantee and is not as lightweight as we wouldlike. The second one outperforms the first one, but suffersfrom other undesirable systematic demerits for public auditing:bounded usage and auditor statefulness, which may poseadditional on-line burden to users as will be elaborated shortly.The analysis of these basic schemes will lead to our mainresult, which overcomes all these drawbacks.Basic Scheme I The cloud user pre-computes MACs σi =MACsk(i||mi) of each block mi (i ∈ {1, . . . , n}), sends boththe data file F and the MACs {σi}1≤i≤n onto the cloud server,and releases the secret key sk to TPA. During the Auditphase, the TPA requests from the cloud server a number ofrandomly selected blocks and their corresponding MACs toverify the correctness of the data file. The insight behindthis approach is that auditing most of the file is much easierthan the whole of it. However, this simple solution suffersfrom the following severe drawbacks: 1) The audit from TPAdemands retrieval of users’ data, which should be prohibitivebecause it violates the privacy-preserving guarantee; 2) Itscommunication and computation complexity are both linearwith respect to the sampled data size, which may result in largecommunication overhead and time delay, especially when thebandwidth available between the TPA and the cloud server islimited.Basic Scheme II To avoid retrieving data from the cloudserver, one may improve the above solution as follows: Beforedata outsourcing, the cloud user chooses s random messageauthentication code keys {skτ}1≤τ≤s, pre-computes s MACs,{MACskτ

(F )}1≤τ≤s for the whole data file F , and publishesthese verification metadata to TPA. The TPA can each timereveal a secret key skτ to the cloud server and ask for a freshkeyed MAC for comparison, thus achieving privacy-preservingauditing. However, in this method: 1) the number of times aparticular data file can be audited is limited by the numberof secret keys that must be a fixed priori. Once all possiblesecret keys are exhausted, cloud user then has to retrieve datafrom the server in order to re-compute and re-publish newMACs to TPA. 2) The TPA has to maintain and update statebetween audits, i.e., keep a track on the possessed MAC keys.Considering the potentially large number of audit delegationsfrom multiple users, maintaining such states for TPA can bedifficult and error prone.

Note that another common drawback of the above basicschemes is that they can only support the case of static data,and none of them can deal with data dynamics, which is alsoof paramount importance for cloud storage systems. For thereason of brevity and clarity, we will focus on the static data,too, though our auditing protocol can be immediately adaptedto support data dynamics, based on our previous work [8].

C. The Privacy-Preserving Public Auditing Scheme

To effectively support public auditability without having toretrieve the data blocks themselves, we resort to the homo-morphic authenticator technique [6], [8], [10]. Homomorphicauthenticators are unforgeable verification metadata generatedfrom individual data blocks, which can be securely aggregatedin such a way to assure an auditor that a linear combinationof data blocks is correctly computed by verifying only theaggregated authenticator. However, the direct adoption ofthese techniques is not suitable for our purposes, since thelinear combination of blocks may potentially reveal user datainformation, thus violating the privacy-preserving guarantee.Specifically, if enough number of the linear combinations ofthe same blocks are collected, the TPA can simply derive theuser’s data content by solving a system of linear equations.Overview To achieve privacy-preserving public auditing, wepropose to uniquely integrate the homomorphic authenticatorwith random masking technique. In our protocol, the linearcombination of sampled blocks in the server’s response ismasked with randomness generated by a pseudo randomfunction (PRF). With random masking, the TPA no longerhas all the necessary information to build up a correct groupof linear equations and therefore cannot derive the user’sdata content, no matter how many linear combinations of thesame set of file blocks can be collected. Meanwhile, due tothe algebraic property of the homomorphic authenticator, thecorrectness validation of the block-authenticator pairs will notbe affected by the randomness generated from a PRF, whichwill be shown shortly. Note that in our design, we use publickey based homomorphic authenticator, specifically, the BLSbased signature [10], to equip the auditing protocol with publicauditability. Its flexibility in signature aggregation will furtherbenefit us for the multi-task auditing.Scheme Details Let G1, G2 and GT be multiplicative cyclicgroups of prime order p, and e : G1 × G2 → GT be abilinear map as introduced in preliminaries. Let g be thegenerator of G2. H(·) is a secure map-to-point hash function:{0, 1}∗ → G1, which maps strings uniformly to G1. Anotherhash function h(·) : G1 → Zp maps group element of G1

uniformly to Zp. The proposed scheme is as follows:Setup Phase:

1) The cloud user runs KeyGen to generate the system’spublic and secret parameters. He chooses a random x← Zp, arandom element u← G1, and computes v ← gx and w ← ux.The secret parameter is sk = (x) and the public parametersare pk = (v, w, g, u). Given data file F = (m1, . . . ,mn), theuser runs SigGen to compute signature σi for each block mi:σi ← (H(i) · umi)x ∈ G1 (i = 1, . . . , n). Denote the set ofsignatures by Φ = {σi}1≤i≤n. The user then sends {F,Φ} tothe server and deletes them from its local storage.Audit Phase:

2) During the auditing process, to generate the auditmessage “chal”, the TPA picks a random c-element subsetI = {s1, . . . , sc} of set [1, n], where sq = πkprp

(q) for1 ≤ q ≤ c and kprp is the randomly chosen permutation keyby TPA for each auditing. We assume that s1 ≤ · · · ≤ sc.


For each element i ∈ I , the TPA also chooses a randomvalue νi (of a relative small bit length compared to |p|). Themessage “chal” specifies the positions of the blocks that arerequired to be checked in this Audit phase. The TPA sendsthe chal = {(i, νi)}i∈I to the server.

3) Upon receiving challenge chal = {(i, νi)}i∈I , the serverruns GenProof to generate a response proof of data storagecorrectness. Specifically, the server chooses a random elementr ← Zp via r = fkprf

(chal), where kprf is the randomlychosen PRF key by server for each auditing, and calculatesR = (w)r = (ux)r ∈ G1. Let μ′ denote the linear combinationof sampled blocks specified in chal: μ′ =

∑i∈I νimi. To

blind μ′ with r, the server computes: μ = μ′ + rh(R) ∈ Zp.Meanwhile, the server also calculates an aggregated signatureσ =

∏i∈I σνi

i ∈ G1. It then sends {μ, σ,R} as the responseproof of storage correctness to the TPA. With the responsefrom the server, the TPA runs VerifyProof to validate theresponse by checking the verification equation

e(σ · (Rh(R)), g) ?= e(sc∏

i=s1

H(i)νi · uμ, v) (1)

The correctness of the above verification equation can beelaborated as follows:

e(σ ·Rh(R), g) = e(sc∏

i=s1

σνii · (ux)r·h(R), g)

= e(sc∏

i=s1

(H(i) · umi)x·νi · (ur·h(R))x, g)

= e(sc∏

i=s1

(H(i)νi · umiνi) · (ur·h(R)), g)x

= e(sc∏

i=s1

(H(i)νi) · uμ′+r·h(R), gx)

= e(sc∏

i=s1

(H(i)νi) · uμ, v)

It is clear that the random mask R has no effect on thevalidity of the checking result. The security of this protocolwill be proved in Section IV.Discussion As analyzed at the beginning of this section, thisapproach ensures the privacy of user data content during theauditing process. Meanwhile, the homomorphic authentica-tor helps achieve the constant communication overhead forserver’s response during the audit: the size of {σ, μ,R} is fixedand has nothing to do with the number of sampled blocks c.Note that there is no secret keying material or states for TPAto keep or maintain between audits, and the auditing protocoldoes not pose any potential on-line burden toward users. Sincethe TPA could “re-generate” the random c-element subsetI = {s1, . . . , sc} of set [1, n], where sq = πkprp

(q), for1 ≤ q ≤ c, unbounded usage is also achieved.

Previous work [6], [8] showed that if the server is missinga fraction of the data, then the number of blocks that needsto be checked in order to detect server misbehavior with high

probability is in the order of O(1). For example, if the serveris missing 1% of the data F , the TPA only needs to audit forc = 460 or 300 randomly chosen blocks of F so as to detectthis misbehavior with probability larger than 99% or 95%,respectively. Given the huge volume of data outsourced in thecloud, checking a portion of the data file is more affordableand practical for both TPA and cloud server than checkingall the data, as long as the sampling strategies provides highprobability assurance. In Section IV, we will present theexperiment result based on these sampling strategies.

D. Support for Batch Auditing

With the establishment of privacy-preserving public auditingin Cloud Computing, TPA may concurrently handle multipleauditing delegations upon different users’ requests. The in-dividual auditing of these tasks for TPA can be tedious andvery inefficient. Given K auditing delegations on K distinctdata files from K different users, it is more advantageousfor TPA to batch these multiple tasks together and audit atone time. Keeping this natural demand in mind, we proposeto explore the technique of bilinear aggregate signature [15],which supports the aggregation of multiple signatures bydistinct signers on distinct messages into a single signatureand thus provides efficient verification for the authenticityof all messages. Using this signature aggregation techniqueand bilinear property, we can now aggregate K verificationequations (for K auditing tasks) into a single one, as shownin equation 2, so that the simultaneous auditing of multipletasks can be achieved.

The details of extending our main result to this multi-usersetting is described as follows: Assume there are K users in thesystem, and each user k has a data file Fk = (mk,1, . . . ,mk,n)to be outsourced to the cloud server, where k ∈ {1, . . . , K}.For a particular user k, denote his secret key as xk ← Zp,and the corresponding public parameter as (vk, wk, gk, uk) =(gxk , uxk

k , g, uk). In the Setup phase, each user k runsSigGen and computes signature σk,i ← [H(k||i) ·umk,i

k ]xk ∈G1 for block mk,i (i ∈ {1, . . . , n}). In the Audit phase,the TPA sends the audit challenge chal = {(i, νi)}i∈I to theserver for auditing data files of all K users. Upon receivingchal, for each user k (k ∈ {1, . . . , K}), the server randomlypicks rk ∈ Zp and computes

μk =sc∑

i=s1

νimk,i + rkh(Rk) ∈ Zp and σ =K∏

k=1

(sc∏

i=s1

σνi

k,i),

where Rk = (wk)rk = (uxk

k )rk . The server then responsesthe TPA with {σ, {μk}1≤k≤K , {Rk}1≤k≤K}. Similar as thesingle user case, using the properties of the bilinear map, theTPA can check if the following equation holds:

e(σ ·K∏

k=1

Rh(Rk)k , g) ?=

K∏

k=1

e(sc∏

i=s1

[H(k||i)]νi · (uk)μk , vk) (2)


The left-hand side (LHS) of the equation expands as:

LHS = e(K∏

k=1

sc∏

i=s1

σνi

k,i(uxk

k )rkh(Rk), g)

= e(K∏

k=1

sc∏

i=s1

[H(k||i) · umk,i

k ]xkνi(urkh(Rk)k )xk , g)

=K∏

k=1

e(sc∏

i=s1

[H(k||i) · umk,i

k ]νi(urkh(Rk)k ), g)xk

=K∏

k=1

e(sc∏

i=s1

[H(k||i)]νi · (uk)νimk,i+rkh(Rk), gxk)

=K∏

k=1

e(sc∏

i=s1

[H(k||i)]νi · (uk)μk , vk),

which is the right hand side, as required.Discussion As shown in equation 2, batch auditing not onlyallows TPA to perform the multiple auditing tasks simultane-ously, but also greatly reduces both the communication cost onthe server side and the computation cost on the TPA side. Forsaving communication cost, the bilinear aggregate signatureensures that the server only needs to send one group elementσ in the response to TPA, instead of a set of {σk}1≤k≤K asrequired by individual auditing, where σk =

∏i∈I σνi

k,i denotesthe aggregation signature supposed to be returned for each userk. Meanwhile, aggregating K verification equations into onehelps reduce the number of expensive pairing operations from2K, as required in the individual auditing, to K + 1. Thus, aconsiderable amount of auditing time is expected to be saved.

Note that the verification equation 2 only holds when allthe responses are valid, and fails with high probability whenthere is even one single invalid response in the batch auditing.In many situations, a response collection may contain invalidresponses, especially {μk}1≤k≤K , caused by accidental datacorruption, or possibly malicious activity by a cloud server.The ratio of invalid responses to the valid could be quitesmall, and yet a standard batch auditor will reject the entirecollection. To further sort out these invalid responses in thebatch auditing, we can utilize a recursive divide-and-conquerapproach (binary search). Specifically, if the batch auditingfails, we can simply divide the collection of responses intotwo halves, and recurse the auditing on halves via equation 2.Note that TPA may now require the server to send back allthe {σk}1≤k≤K , as the same in individual auditing. In SectionIV-B2, we show through carefully designed experiment thatusing this recursive binary search approach, even if up to 20%of responses are invalid, batch auditing still performs fasterthan individual verification.

IV. SECURITY ANALYSIS AND PERFORMANCE

EVALUATION

A. Security Proofs

We evaluate the security of the proposed scheme by an-alyzing its fulfillment of the security guarantee described

in Section II, namely, the storage correctness and privacy-preserving. We start from the single user case, where our mainresult is originated. Then we show how to extend the securityguarantee to a multi-user setting, where batch auditing for TPAis enabled. Due to space limitation, we only list the proofsketches to gives a rough idea for achieving those guarantees.The detailed formalized proofs of Theorem 1, 2, and 3 areprovided in the full version [16]. Note that all proofs arederived on the probabilistic base, i.e., with high probabilityassurance, which we omit writing explicitly.

1) Storage Correctness Guarantee: We need to prove thatthe cloud server can not generate valid response toward TPAwithout faithfully storing the data, as captured by Theorem 1.

Theorem 1: If the cloud server passes the Audit phase,then it must indeed possess the specified data intact as it is.

Proof Sketch: The proof consists of three steps. First,we show that there exists no malicious server that can forgea valid response {σ, μ,R} to pass the verification equation 1.The correctness of this statement follows from the Theorem4.2 proposed in [10]. Note that the value R in our protocol,which enables the privacy-preserving guarantee, will not affectthe validity of the equation, due to the hardness of discrete-logand the commutativity of modular exponentiation in pairing.

Next, we show that if the response {σ, μ,R} is valid, whereμ = μ′+rh(R) and R = (v2)r, then the underlying μ′ must bevalid too. This can be derived immediately from the collision-free property of hash function h(·) and determinism of discreteexponentiation.

Finally, we show that the validity of μ′ implies the cor-rectness of {mi}i∈I where μ′ =

∑i∈I νimi. Here we utilize

the small exponent (SE) test technique of batch verificationin [17]. Because {νi} are picked up randomly by the TPAin each Audit phase, {νi} can be viewed similarly as therandom chosen exponents in the SE test [17]. Therefore, thecorrectness of individual sampled blocks is ensured. All abovesums up to the storage correctness guarantee.

2) Privacy Preserving Guarantee: We want to make surethat TPA can not derive users’ data content from the infor-mation collected during auditing process. This is equivalentto prove the Theorem 2. Note that if μ′ can be derived byTPA, then {mi}i∈I can be easily obtained by solving a groupof linear equations when enough combinations of the sameblocks are collected.

Theorem 2: From the server’s response {σ, μ,R}, no infor-mation of μ′ will be leaked to TPA.

Proof Sketch: Again, we prove the Theorem 2 in threesteps. First, we show that no information on μ′ can be learnedfrom μ. This is because μ is blinded by r as μ = μ′ + rh(R)and R = (v2)r, where r is chosen randomly by cloud serverand is unknown to TPA. Note that even with R, due to thehardness of discrete-log assumption, the value r is still hiddenagainst TPA. Thus, privacy of μ′ is guaranteed from μ.

Second, we show that no information on μ′ can be learnedfrom σ, where


TABLE I: Notation summary of cryptographic operations.Hasht

Ghash t values into the group G.

AddtG

t additions in group G.

MulttG

t multiplications in group G.

ExptG(�) t exponentiations gai , where g ∈ G, |ai| = �.

m-MultExptG(�) t m-term exponentiations

∏m

i=1gai .

PairtG,H t pairings e(gi, hi), where gi ∈ G, hi ∈ H.

m-MultPairtG,H t m-term pairings

∏m

i=1e(gi, hi).

σ =∏

i∈I

σνii =

∏

i∈I

(H(i) · umi)x·νi

= [∏

i∈I

H(i)νi · uμ′]x = [

∏

i∈I

H(i)νi ]x · [(uμ′)x].

This can be analyzed as follows: (uμ′)x is blinded by

[∏

i∈I H(i)νi ]x. However, to compute [∏

i∈I H(i)νi ]x from[∏

i∈I H(i)νi ] and gx, which is the only information TPAcan utilize, is a computational Diffie-Hellman problem, whichcan be stated as: given g, ga, gb, compute gab. This problemis hard for unknown a, b ∈ Zp. Therefore, on the basis ofcomputational Diffie-Hellman assumption, TPA can not derivethe value of (uμ′

)x, let alone μ′.Finally, all that remains is to prove from {σ, μ,R}, still no

information on μ′ can be obtained by TPA. Recall that r is arandom private value chosen by the server and μ = μ′+rh(R).Following the same technique of Schnorr signature [18],our auditing protocol between TPA and cloud server canbe regarded as a provably secure honest zero knowledgeidentification scheme [19], by viewing μ′ as a secret key andh(R) as a challenge value, which implies no information onμ′ can be leaked. This completes the proof of Theorem 2.

3) Security Guarantee for Batch Auditing: Now we showthat extending our main result to a multi-user setting willnot affect the aforementioned security insurance, as shown inTheorem 3:

Theorem 3: Our batch auditing protocol achieves the samestorage correctness and privacy preserving guarantee as in thesingle-user case.

Proof Sketch: Due to the space limitation, we only provethe storage correctness guarantee. The privacy-preservingguarantee in the multi-user setting is similar to that of The-orem 2, and thus omitted here. The proposed batch auditingprotocol is built upon the aggregate signature scheme proposedin [15]. According to the security strength of aggregate signa-ture [15], in our multi-user setting, there exists no maliciouscloud servers that can forge valid μ1, . . . , μk in the responsesto pass the verification equation 2. Actually, the equation 2functions as a kind of screening test as proposed in [17].While the screening test may not guarantee the validity ofeach individual σk, it does ensure the authenticity of μk in thebatch auditing protocol, which is adequate for the rationale inour case. Once the validity of μ1, . . . , μk is guaranteed, from

TABLE II: Performance comparison under different numberof sampled blocks c for high assurance auditing.

Our Scheme [10]

Sampled blocks c 460 300 460 300

Sever compt. time (ms) 405.57 273.34 403.69 270.46

TPA compt. time (ms) 525.89 493.25 524.02 491.38

Comm. cost (Byte) 60 40 60 40

the proof of Theorem 1, the storage correctness guarantee inthe multi-user setting is achieved.

B. Performance Analysis

We now assess the performance of the proposed privacy-preserving public auditing scheme. We will focus on theextra cost introduced by the privacy-preserving guarantee andthe efficiency of the proposed batch auditing technique. Theexperiment is conducted using C on a Linux system with anIntel Core 2 processor running at 1.86 GHz, 2048 MB ofRAM, and a 7200 RPM Western Digital 250 GB Serial ATAdrive with an 8 MB buffer. Algorithms use the Pairing-BasedCryptography (PBC) library version 0.4.18. The elliptic curveutilized in the experiment is a MNT curve, with base field sizeof 159 bits and the embedding degree 6. The security level ischosen to be 80 bit, which means |νi| = 80 and |p| = 160. Allexperimental results represent the mean of 20 trials.

1) Cost of Privacy-preserving Guarantee: We begin byestimating the cost in terms of basic cryptographic operations,as notated in Table I. Suppose there are c random blocksspecified in the chal during the Audit phase. Under thissetting, we quantify the extra cost introduced by the supportof privacy-preserving into server computation, auditor com-putation as well as communication overhead. On the serverside, the generated response includes an aggregated signatureσ =

∏i∈I σνi

i ∈ G1, a random metadata R = (w)r = (ux)r ∈G1, and a blinded linear combination of sampled blocksμ =

∑i∈I νimi + rh(R) ∈ Zp. The corresponding compu-

tation cost is c-MultExp1G(|νi|), Exp1

G(|p|), and AddcZp

+Multc+1

Zp+ Hash1

Zp, respectively. Compared to the existing

homomorphic authenticator based solution for ensuring remotedata integrity [10]1, the extra cost for protecting the userprivacy, resulted from the random mask R, is only a constant:Exp1

G(|p|) + Mult1Zp

+ Hash1Zp

+ Add1Zp

, which has nothingto do with the number of sampled blocks c. When c is set tobe 460 or 300 for high assurance of auditing, as discussed inSection III-C, the extra cost for privacy-preserving guaranteeon the server side would be negligible against the total servercomputation for response generation.

Similarly, on the auditor side, upon receiving the response{σ,R, μ}, the corresponding computation cost for responsevalidation is Hash1

Zp+ c-MultExp1

G(|νi|) + HashcG +

Mult2G+Exp2G(|p|)+Pair2

G,G, among which only Hash1Zp

+Exp1

G(|p|) + Mult1Zp

account for the additional constant

1We refer readers to [10] for detailed description of homomorphic authen-ticator based solutions.


0 50 100 150 200430

440

450

460

470

480

490

500

510

520

530

Number of auditing tasks

Aud

iting

tim

e pe

r ta

sk (

ms)

individual auditingbatch auditing (c = 460)batch auditing (c = 300)

Fig. 2: Comparison on auditing time between batch auditingand individual auditing. Per task auditing time denotes the totalauditing time divided by the number of tasks.

computation cost. For c = 460 or 300, and considering therelatively expensive pairing operations, this extra cost imposeslittle overhead on the overall cost of response validation, andthus can be ignored. For the sake of completeness, TableII gives the experiment result on performance comparisonbetween our scheme and the state-of-the-art [10]. It can beshown that the performance of our scheme is almost the sameas that of [10], even if our scheme supports privacy-preservingguarantee while [10] does not. Note that in our scheme, theserver’s response {σ,R, μ} contains an additional randomelement R, which is a group element and has the same sizeof 159 bits as σ does. This explains the extra communicationcost of our scheme opposing to [10].

2) Batch Auditing Efficiency: Discussion in Section III-Dgives an asymptotic efficiency analysis on the batch auditing,by considering only total number of expensive pairing op-erations. However, on the practical side, there are additionaloperations required for batching, such as modular exponenti-ations and multiplications. Meanwhile, the different samplingstrategies, i.e., different number of sampled blocks c, is alsoa variable factor that affects the batching efficiency. Thus,whether the benefits of removing pairings significantly out-weighs these additional operations is remained to be verified.To get a complete view of batching efficiency, we conduct atimed batch auditing test, where the number of auditing tasksis increased from 1 to approximately 200 with intervals of 8.The performance of the corresponding non-batched (individ-ual) auditing is provided as a baseline for the measurement.Following the same experimental setting as c = 460 and 300,the average per task auditing time, which is computed bydividing total auditing time by the number of tasks, is givenin Fig. 2 for both batch auditing and the individual auditing.It can be shown that compared to individual auditing, batchauditing indeed helps reduce the TPA’s computation cost, asmore than 11% and 17% of per-task auditing time is saved,when c is set to be 460 and 300, respectively.

0 5 10 15 20430

440

450

460

470

480

490

500

510

520

530

Fraction of invalid responses α

Aud

iting

tim

e pe

r ta

sk (

ms)

individual auditingbatch auditing (c = 460)batch auditing (c = 300)

Fig. 3: Comparison on auditing time between batch auditingand individual auditing, when α-fraction of 256 responses areinvalid. Per task auditing time denotes the total auditing timedivided by the number of tasks.

3) Sorting out Invalid Responses: Now we use experimentto justify the efficiency of our recursive binary search approachfor TPA to sort out the invalid responses when batch auditingfails, as discussed in Section III-D. This experiment is tightlypertained to work in [20], which evaluates the batch verifica-tion efficiency of various short signature schemes.

To evaluate the feasibility of the recursive approach, we firstgenerate a collection of 256 valid responses, which implies theTPA may concurrently handle 256 different auditing delega-tions. We then conduct the tests repeatedly while randomlycorrupting an α-fraction, ranging from 0 to 20%, by replacingthem with random values. The average auditing time per taskagainst the individual auditing approach is presented in Fig.3. The result shows that even the number of invalid responsesexceeds 15% of the total batch size, the performance of batchauditing can still be safely concluded as more preferable thanthe straightforward individual auditing. Note that the randomdistribution of invalid responses within the collection is nearlythe worst-case for batch auditing. If invalid responses aregrouped together, it is possible to achieve even better results.

V. RELATED WORK

Ateniese et al. [6] are the first to consider public auditabilityin their defined “provable data possession” (PDP) model forensuring possession of data files on untrusted storages. Theirscheme utilizes the RSA-based homomorphic authenticatorsfor auditing outsourced data and suggests randomly sampling afew blocks of the file. However, the public auditability in theirscheme demands the linear combination of sampled blocksexposed to external auditor. When used directly, their protocolis not provably privacy preserving, and thus may leak user datainformation to the auditor. Juels et al. [11] describe a “proofof retrievability” (PoR) model, where spot-checking and error-correcting codes are used to ensure both “possession” and“retrievability” of data files on remote archive service systems.However, the number of audit challenges a user can perform


is a fixed priori, and public auditability is not supported intheir main scheme. Although they describe a straightforwardMerkle-tree construction for public PoRs, this approach onlyworks with encrypted data. Shacham et al. [10] design animproved PoR scheme built from BLS signatures with fullproofs of security in the security model defined in [11].Similar to the construction in [6], they use publicly verifiablehomomorphic authenticators that are built from provably se-cure BLS signatures. Based on the elegant BLS construction,public retrievability is achieved. Again, their approach doesnot support privacy-preserving auditing for the same reasonas [6]. Shah et al. [7], [12] propose allowing a TPA to keeponline storage honest by first encrypting the data then sendinga number of pre-computed symmetric-keyed hashes over theencrypted data to the auditor. The auditor verifies both theintegrity of the data file and the server’s possession of apreviously committed decryption key. This scheme only worksfor encrypted files, and it suffers from the auditor statefulnessand bounded usage, which may potentially bring in on-lineburden to users when the keyed hashes are used up. In otherrelated work, Ateniese et al. [21] propose a partially dynamicversion of the prior PDP scheme, using only symmetric keycryptography but with a bounded number of audits. In [22],Wang et al. consider a similar support for partial dynamicdata storage in distributed scenario with additional feature ofdata error localization. In a subsequent work, Wang et al. [8]propose to combine BLS based homomorphic authenticatorwith MHT to support both public auditability and fully datadynamics. Almost simultaneously, Erway et al. [23] developeda skip lists based scheme to enable provable data possessionwith fully dynamics support. However, these two protocolsrequire the linear combination of sampled blocks just as [6],[10], and thus does not support privacy-preserving auditingon user’s outsourced data. While all above schemes providemethods for efficient auditing and provable assurance on thecorrectness of remotely stored data, none of them meet all therequirements for privacy-preserving public auditing in CloudComputing, as supported in our result. More importantly, noneof these schemes consider batch auditing, which will greatlyreduce the computation cost on the TPA when coping withlarge number of audit delegations.

VI. CONCLUSION

In this paper, we propose a privacy-preserving public audit-ing system for data storage security in Cloud Computing. Weutilize the homomorphic authenticator and random masking toguarantee that TPA would not learn any knowledge about thedata content stored on the cloud server during the efficientauditing process, which not only eliminates the burden ofcloud user from the tedious and possibly expensive auditingtask, but also alleviates the users’ fear of their outsourced dataleakage. Considering TPA may concurrently handle multipleaudit sessions from different users for their outsourced datafiles, we further extend our privacy-preserving public auditingprotocol into a multi-user setting, where TPA can perform themultiple auditing tasks in a batch manner, i.e., simultaneously.

Extensive analysis shows that the proposed schemes are prov-ably secure and highly efficient.

ACKNOWLEDGEMENT

This work was supported in part by the US National ScienceFoundation under grant CNS-0831963, CNS-0626601, CNS-0716306, and CNS-0831628.

REFERENCES

[1] P. Mell and T. Grance, “Draft nist working definition of cloud com-puting,” Referenced on June. 3rd, 2009 Online at http://csrc.nist.gov/groups/SNS/cloud-computing/index.html, 2009.

[2] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Kon-winski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia,“Above the clouds: A berkeley view of cloud computing,” University ofCalifornia, Berkeley, Tech. Rep. UCB-EECS-2009-28, Feb 2009.

[3] Amazon.com, “Amazon s3 availability event: July 20, 2008,” Online athttp://status.aws.amazon.com/s3-20080720.html, July 2008.

[4] S. Wilson, “Appengine outage,” Online at http://www.cio-weblog.com/50226711/appengine outage.php, June 2008.

[5] B. Krebs, “Payment Processor Breach May Be Largest Ever,” On-line at http://voices.washingtonpost.com/securityfix/2009/01/paymentprocessor breach may b.html, Jan. 2009.

[6] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson,and D. Song, “Provable data possession at untrusted stores,” CryptologyePrint Archive, Report 2007/202, 2007, http://eprint.iacr.org/.

[7] M. A. Shah, R. Swaminathan, and M. Baker, “Privacy-preserving auditand extraction of digital contents,” Cryptology ePrint Archive, Report2008/186, 2008, http://eprint.iacr.org/.

[8] Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, “Enabling publicverifiability and data dynamics for storage security in cloud computing,”in Proc. of ESORICS’09, Saint Malo, France, Sep. 2009.

[9] Cloud Security Alliance, “Security guidance for critical areas of focusin cloud computing,” 2009, http://www.cloudsecurityalliance.org.

[10] H. Shacham and B. Waters, “Compact proofs of retrievability,” in Proc.of Asiacrypt 2008, vol. 5350, Dec 2008, pp. 90–107.

[11] A. Juels and J. Burton S. Kaliski, “Pors: Proofs of retrievability for largefiles,” in Proc. of CCS’07, Alexandria, VA, October 2007, pp. 584–597.

[12] M. A. Shah, M. Baker, J. C. Mogul, and R. Swaminathan, “Auditing tokeep online storage services honest,” in Proc. of HotOS’07. Berkeley,CA, USA: USENIX Association, 2007, pp. 1–6.

[13] 104th United States Congress, “Health Insurance Portability and Ac-countability Act of 1996 (HIPPA),” Online at http://aspe.hhs.gov/admnsimp/pl104191.htm, 1996, last access: July 16, 2009.

[14] S. Yu, C. Wang, K. Ren, and W. Lou, “Achieving secure, scalable,and fine-grained access control in cloud computing,” in Proc. of IEEEINFOCOM’10, San Diego, CA, USA, March 2010.

[15] D. Boneh and C. Gentry, “Aggregate and verifiably encrypted signaturesfrom bilinear maps,” in Proceedings of Eurocrypt 2003, volume 2656 ofLNCS. Springer-Verlag, 2003, pp. 416–432.

[16] C. Wang, Q. Wang, K. Ren, and W. Lou, “Privacy-preserving publicauditing for data storage security in cloud computing,” Cryptology ePrintArchive, Report 2009/579, 2009, http://eprint.iacr.org/.

[17] M. Bellare, J. Garay, and T. Rabin, “Fast batch verification for modularexponentiation and digital signatures,” in Proceedings of Eurocrypt1998, volume 1403 of LNCS. Springer-Verlag, 1998, pp. 236–250.

[18] C. Schnorr, “Efficient identification and signatures for smart cards,” inProceedings of Eurocrypt 1989, volume 435 of LNCS. Springer-Verlag,1989, pp. 239–252.

[19] D. Pointcheval and J. Stern, “Security proofs for signature schemes,” inProceedings of Eurocrypt 1996, volume 1070 of LNCS. Springer-Verlag,1996, pp. 387–398.

[20] A. L. Ferrara, M. Greeny, S. Hohenberger, and M. Pedersen, “Practicalshort signature batch verification,” in Proceedings of CT-RSA, volume5473 of LNCS. Springer-Verlag, 2009, pp. 309–324.

[21] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, “Scalable andefficient provable data possession,” in Proc. of SecureComm’08, 2008.

[22] C. Wang, Q. Wang, K. Ren, and W. Lou, “Ensuring Data StorageSecurity in Cloud Computing,” in Proc. of IWQoS’09, July 2009.

[23] C. Erway, A. Kupcu, C. Papamanthou, and R. Tamassia, “Dynamicprovable data possession,” in Proc. of CCS’09, 2009.


Documents

Privacy-Preserving Public Auditing for Data Storage ... · Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing ... cloud data auditing system, which meets