12

Click here to load reader

Privacy preserving public auditing for

Embed Size (px)

Citation preview

Page 1: Privacy preserving public auditing for

1

Privacy-Preserving Public Auditing forSecure Cloud Storage

Cong Wang, Student Member, IEEE, Sherman S.-M. Chow, Qian Wang, Student Member, IEEE,Kui Ren, Member, IEEE, and Wenjing Lou, Member, IEEE

Abstract —Using Cloud Storage, users can remotely store their data and enjoy the on-demand high quality applications andservices from a shared pool of configurable computing resources, without the burden of local data storage and maintenance.However, the fact that users no longer have physical possession of the outsourced data makes the data integrity protection inCloud Computing a formidable task, especially for users with constrained computing resources. Moreover, users should be ableto just use the cloud storage as if it is local, without worrying about the need to verify its integrity. Thus, enabling public auditabilityfor cloud storage is of critical importance so that users can resort to a third party auditor (TPA) to check the integrity of outsourceddata and be worry-free. To securely introduce an effective TPA, the auditing process should bring in no new vulnerabilities towardsuser data privacy, and introduce no additional online burden to user. In this paper, we propose a secure cloud storage systemsupporting privacy-preserving public auditing. We further extend our result to enable the TPA to perform audits for multiple userssimultaneously and efficiently. Extensive security and performance analysis show the proposed schemes are provably secureand highly efficient.

Index Terms —Data storage, privacy-preserving, public auditability, cryptographic protocols, cloud computing.

1 INTRODUCTION

C LOUD Computing has been envisioned as thenext-generation information technology (IT) ar-

chitecture for enterprises, due to its long list of un-precedented advantages in the IT history: on-demandself-service, ubiquitous network access, location in-dependent resource pooling, rapid resource elasticity,usage-based pricing and transference of risk [1]. Asa disruptive technology with profound implications,Cloud Computing is transforming the very nature ofhow businesses use information technology. One fun-damental aspect of this paradigm shifting is that datais being centralized or outsourced to the Cloud. Fromusers’ perspective, including both individuals and ITenterprises, storing data remotely to the cloud in aflexible on-demand manner brings appealing benefits:relief of the burden for storage management, universaldata access with independent geographical locations,and avoidance of capital expenditure on hardware,software, and personnel maintenances, etc [2].

While Cloud Computing makes these advantagesmore appealing than ever, it also brings new and chal-lenging security threats towards users’ outsourceddata. Since cloud service providers (CSP) are separate

• Cong Wang, Qian Wang, and Kui Ren are with the Department ofElectrical and Computer Engineering, Illinois Institute of Technology,Chicago, IL 60616. E-mail: {cong,qian,kren}@ece.iit.edu.

• Sherman S.-M. Chow is with the Department of Computer Science,Courant Institute of Mathematical Sciences, New York University,New York, NY 10012. E-mail: [email protected].

• Wenjing Lou is with the Department of Electrical and ComputerEngineering, Worcester Polytechnic Institute, Worcester, MA 01609.E-mail: [email protected].

administrative entities, data outsourcing is actuallyrelinquishing user’s ultimate control over the fate oftheir data. As a result, the correctness of the datain the cloud is being put at risk due to the follow-ing reasons. First of all, although the infrastructuresunder the cloud are much more powerful and reli-able than personal computing devices, they are stillfacing the broad range of both internal and externalthreats for data integrity. Examples of outages andsecurity breaches of noteworthy cloud services appearfrom time to time [3]–[7]. Secondly, there do existvarious motivations for CSP to behave unfaithfullytowards the cloud users regarding the status of theiroutsourced data. For examples, CSP might reclaimstorage for monetary reasons by discarding data thathas not been or is rarely accessed, or even hide dataloss incidents so as to maintain a reputation [8]–[10].In short, although outsourcing data to the cloud iseconomically attractive for long-term large-scale datastorage, it does not immediately offer any guaranteeon data integrity and availability. This problem, ifnot properly addressed, may impede the successfuldeployment of the cloud architecture.

As users no longer physically possess the storage oftheir data, traditional cryptographic primitives for thepurpose of data security protection cannot be directlyadopted [11]. In particular, simply downloading allthe data for its integrity verification is not a practicalsolution due to the expensiveness in I/O and trans-mission cost across the network. Besides, it is ofteninsufficient to detect the data corruption only whenaccessing the data, as it does not give users correctnessassurance for those unaccessed data and might be too

Page 2: Privacy preserving public auditing for

2

late to recover the data loss or damage. Consideringthe large size of the outsourced data and the user’sconstrained resource capability, the tasks of auditingthe data correctness in a cloud environment can beformidable and expensive for the cloud users [10],[12]. Moreover, the overhead of using cloud storageshould be minimized as much as possible, such thatuser does not need to perform too many operationsto use the data (in additional to retrieving the data).For example, it is desirable that users do not need toworry about the need to verify the integrity of the databefore or after the data retrieval. Besides, there may bemore than one user accesses the same cloud storage,say in an enterprise setting. For easier management,it is desirable that the cloud server only entertainsverification request from a single designated party.

To fully ensure the data integrity and save the cloudusers’ computation resources as well as online burden,it is of critical importance to enable public auditingservice for cloud data storage, so that users mayresort to an independent third party auditor (TPA)to audit the outsourced data when needed. The TPA,who has expertise and capabilities that users do not,can periodically check the integrity of all the datastored in the cloud on behalf of the users, whichprovides a much more easier and affordable way forthe users to ensure their storage correctness in thecloud. Moreover, in addition to help users to evaluatethe risk of their subscribed cloud data services, theaudit result from TPA would also be beneficial for thecloud service providers to improve their cloud basedservice platform, and even serve for independentarbitration purposes [9]. In a word, enabling publicauditing services will play an important role for thisnascent cloud economy to become fully established,where users will need ways to assess risk and gaintrust in the cloud.

Recently, the notion of public auditability has beenproposed in the context of ensuring remotely storeddata integrity under different system and securitymodels [8], [10], [11], [13]. Public auditability allowsan external party, in addition to the user himself,to verify the correctness of remotely stored data.However, most of these schemes [8], [10], [13] do notconsider the privacy protection of users’ data againstexternal auditors. Indeed, they may potentially re-veal user data information to the auditors, as willbe discussed in Section 3.4. This severe drawbackgreatly affects the security of these protocols in CloudComputing. From the perspective of protecting dataprivacy, the users, who own the data and rely onTPA just for the storage security of their data, do notwant this auditing process introducing new vulnera-bilities of unauthorized information leakage towardstheir data security [14]. Moreover, there are legalregulations, such as the US Health Insurance Porta-bility and Accountability Act (HIPAA) [15], furtherdemanding the outsourced data not to be leaked to

external parties [9]. Exploiting data encryption beforeoutsourcing [11] is one way to mitigate this privacyconcern, but it is only complementary to the privacy-preserving public auditing scheme to be proposedin this paper. Without a properly designed auditingprotocol, encryption itself cannot prevent data from“flowing away” towards external parties during theauditing process. Thus, it does not completely solvethe problem of protecting data privacy but just re-duces it to the key management. Unauthorized dataleakage still remains a problem due to the potentialexposure of decryption keys.

Therefore, how to enable a privacy-preservingthird-party auditing protocol, independent to dataencryption, is the problem we are going to tacklein this paper. Our work is among the first fewones to support privacy-preserving public auditingin Cloud Computing, with a focus on data storage.Besides, with the prevalence of Cloud Computing, aforeseeable increase of auditing tasks from differentusers may be delegated to TPA. As the individualauditing of these growing tasks can be tedious andcumbersome, a natural demand is then how to enablethe TPA to efficiently perform multiple auditing tasksin a batch manner, i.e., simultaneously.

To address these problems, our work utilizes thetechnique of public key based homomorphic linearauthenticator (or HLA for short) [8], [10], [13], whichenables TPA to perform the auditing without demand-ing the local copy of data and thus drastically re-duces the communication and computation overheadas compared to the straightforward data auditingapproaches. By integrating the HLA with randommasking, our protocol guarantees that the TPA couldnot learn any knowledge about the data contentstored in the cloud server during the efficient auditingprocess. The aggregation and algebraic properties ofthe authenticator further benefit our design for thebatch auditing. Specifically, our contribution can besummarized as the following three aspects:

1) We motivate the public auditing system of datastorage security in Cloud Computing and pro-vide a privacy-preserving auditing protocol, i.e.,our scheme enables an external auditor to audituser’s outsourced data in the cloud withoutlearning the data content.

2) To the best of our knowledge, our scheme is thefirst to support scalable and efficient public au-diting in the Cloud Computing. Specifically, ourscheme achieves batch auditing where multipledelegated auditing tasks from different users canbe performed simultaneously by the TPA.

3) We prove the security and justify the perfor-mance of our proposed schemes through con-crete experiments and comparisons with thestate-of-the-art.

The rest of the paper is organized as follows. Section

Page 3: Privacy preserving public auditing for

3

II introduces the system and threat model, and our de-sign goals. Then we provide the detailed descriptionof our scheme in Section III. Section IV gives the secu-rity analysis and performance evaluation, followed bySection V which overviews the related work. Finally,Section VI gives the concluding remark of the wholepaper.

2 PROBLEM STATEMENT

2.1 The System and Threat Model

We consider a cloud data storage service involvingthree different entities, as illustrated in Fig. 1: the clouduser (U), who has large amount of data files to bestored in the cloud; the cloud server (CS), which ismanaged by the cloud service provider (CSP) to providedata storage service and has significant storage spaceand computation resources (we will not differentiateCS and CSP hereafter); the third party auditor (TPA),who has expertise and capabilities that cloud usersdo not have and is trusted to assess the cloud storageservice reliability on behalf of the user upon request.

Users rely on the CS for cloud data storage andmaintenance. They may also dynamically interactwith the CS to access and update their stored data forvarious application purposes. To save the computa-tion resource as well as the online burden, cloud usersmay resort to TPA for ensuring the storage integrityof their outsourced data, while hoping to keep theirdata private from TPA.

We consider the existence of a semi-trusted CSas [16] does. Namely, in most of time it behavesproperly and does not deviate from the prescribedprotocol execution. However, for their own benefitsthe CS might neglect to keep or deliberately deleterarely accessed data files which belong to ordinarycloud users. Moreover, the CS may decide to hide thedata corruptions caused by server hacks or Byzantinefailures to maintain reputation. We assume the TPA,who is in the business of auditing, is reliable andindependent, and thus has no incentive to colludewith either the CS or the users during the auditingprocess. However, it harms the user if the TPA couldlearn the outsourced data after the audit.

To authorize the CS to respond to the audit dele-gated to TPA’s, the user can sign a certificate grantingaudit rights to the TPA’s public key, and all auditsfrom the TPA are authenticated against such a certifi-cate. These authentication handshakes are omitted inthe following presentation.

2.2 Design Goals

To enable privacy-preserving public auditing forcloud data storage under the aforementioned model,our protocol design should achieve the followingsecurity and performance guarantees.

1) Public auditability: to allow TPA to verify thecorrectness of the cloud data on demand without

D a t a F l o wU s e r sD a t a A u d i t i n gD e l e g a t i o n S e c u r i t y M e s s a g e F l o wP u b l i c D a t aA u d i t i n gT h i r d P a r t y A u d i t o r C l o u d S e r v e r s

Fig. 1: The architecture of cloud data storage service

retrieving a copy of the whole data or introduc-ing additional online burden to the cloud users.

2) Storage correctness: to ensure that there existsno cheating cloud server that can pass the TPA’saudit without indeed storing users’ data intact.

3) Privacy-preserving: to ensure that the TPA can-not derive users’ data content from the informa-tion collected during the auditing process.

4) Batch auditing: to enable TPA with secure andefficient auditing capability to cope with mul-tiple auditing delegations from possibly largenumber of different users simultaneously.

5) Lightweight: to allow TPA to perform auditingwith minimum communication and computa-tion overhead.

3 THE PROPOSED SCHEMES

This section presents our public auditing schemewhich provides a complete outsourcing solution of data– not only the data itself, but also its integrity check-ing. We start from an overview of our public auditingsystem and discuss two straightforward schemes andtheir demerits. Then we present our main schemeand show how to extent our main scheme to supportbatch auditing for the TPA upon delegations frommultiple users. Finally, we discuss how to generalizeour privacy-preserving public auditing scheme and itssupport of data dynamics.

3.1 Definitions and Framework

We follow a similar definition of previously proposedschemes in the context of remote data integrity check-ing [8], [11], [13] and adapt the framework for ourprivacy-preserving public auditing system.

A public auditing scheme consists of fouralgorithms (KeyGen, SigGen, GenProof,VerifyProof). KeyGen is a key generationalgorithm that is run by the user to setup thescheme. SigGen is used by the user to generateverification metadata, which may consist of MAC,signatures, or other related information that will beused for auditing. GenProof is run by the cloudserver to generate a proof of data storage correctness,while VerifyProof is run by the TPA to audit theproof from the cloud server.

Page 4: Privacy preserving public auditing for

4

Running a public auditing system consists of twophases, Setup and Audit:

• Setup: The user initializes the public and secretparameters of the system by executing KeyGen,and pre-processes the data file F by usingSigGen to generate the verification metadata.The user then stores the data file F and theverification metadata at the cloud server, anddeletes its local copy.As part of pre-processing, the user may alterthe data file F by expanding it or includingadditional metadata to be stored at server.

• Audit: The TPA issues an audit message or chal-lenge to the cloud server to make sure that thecloud server has retained the data file F properlyat the time of the audit. The cloud server willderive a response message from a function of thestored data file F and its verification metadata byexecuting GenProof. The TPA then verifies theresponse via VerifyProof.

Our framework assumes the TPA is stateless, whichis a desirable property achieved by our proposedsolution. It is easy to extend the framework above tocapture a stateful auditing system, essentially by split-ing the verification metadata into two parts which arestored by the TPA and the cloud server respectively.

Our design does not assume any additional prop-erty on the data file. If the user wants to have moreerror-resiliency, he/she can always first redundantlyencodes the data file and then uses our system withthe data file that has error-correcting codes integrated.

3.2 Notation and Preliminaries

• F – the data file to be outsourced, denoted as asequence of n blocks m1, . . . , mn ∈ Zp for somelarge prime p.

• MAC(·)(·) – message authentication code (MAC)function, defined as: K × {0, 1}∗ → {0, 1}l whereK denotes the key space.

• H(·), h(·) – cryptographic hash functions.

We now introduce some necessary cryptographicbackground for our proposed scheme.

Bilinear Map. Let G1, G2 and GT be multiplicativecyclic groups of prime order p. Let g1 and g2 begenerators of G1 and G2, respectively. A bilinear mapis a map e : G1 × G2 → GT such that for all u ∈ G1,v ∈ G2 and a, b ∈ Zp, e(ua, vb) = e(u, v)ab. Thisbilinearity implies that for any u1, u2 ∈ G1, v ∈ G2,e(u1 · u2, v) = e(u1, v) · e(u2, v). Of course, there existsan efficiently computable algorithm for computinge and the map should be non-trivial, i.e., e is non-degenerate: e(g1, g2) 6= 1.

3.3 The Basic Schemes

Before giving our main result, we study two classes ofschemes as a warm-up. The first one is a MAC-based

solution which suffers from undesirable systematicdemerits – bounded usage and stateful verification,which may pose additional online burden to users, ina public auditing setting. This somehow also showsthat the auditing problem is still not easy to solveeven we have introduced a TPA. The second one is asystem based on homomorphic linear authenticators(HLA), which covers many recent proof of storagesystems. We will pinpoint the reason why all existingHLA-based systems are not privacy-preserving. Theanalysis of these basic schemes leads to our mainresult, which overcomes all these drawbacks. Ourmain scheme to be presented is based on a specificHLA scheme.

MAC-based Solution. There are two possible ways tomake use of MAC to authenticate the data. A trivialway is just uploading the data blocks with their MACsto the server, and sends the corresponding secret keysk to the TPA. Later, the TPA can randomly retrieveblocks with their MACs and check the correctness viask. Apart from the high (linear in the sampled datasize) communication and computation complexities,the TPA requires the knowledge of the data blocksfor verification.

To circumvent the requirement of the data inTPA verification, one may restrict the verificationto just consist of equality checking. The idea isas follows. Before data outsourcing, the cloud userchooses s random message authentication code keys{skτ}1≤τ≤s, pre-computes s (deterministic) MACs,{MACskτ

(F )}1≤τ≤s for the whole data file F , andpublishes these verification metadata (the keys andthe MACs) to TPA. The TPA can reveal a secret keyskτ to the cloud server and ask for a fresh keyedMAC for comparison in each audit. This is privacy-preserving as long as it is impossible to recover F infull given MACskτ

(F ) and skτ . However, it suffersfrom the following severe drawbacks: 1) the numberof times a particular data file can be audited is limitedby the number of secret keys that must be fixed apriori. Once all possible secret keys are exhausted, theuser then has to retrieve data in full to re-computeand re-publish new MACs to TPA; 2) The TPA alsohas to maintain and update state between audits, i.e.,keep track on the revealed MAC keys. Consideringthe potentially large number of audit delegations frommultiple users, maintaining such states for TPA canbe difficult and error prone; 3) it can only supportstatic data, and cannot efficiently deal with dynamicdata at all. However, supporting data dynamics is alsoof critical importance for cloud storage systems. Forthe reason of brevity and clarity, our main protocolwill be presented based on static data. Section 3.6 willdescribe how to adapt our protocol for dynamic data.

HLA-based Solution. To effectively support publicauditability without having to retrieve the data blocksthemselves, the HLA technique [8], [10], [13] can be

Page 5: Privacy preserving public auditing for

5

TPA Cloud Server

1. Retrieve file tag t, verify its

signature, and quit if fail;

2. Generate a random challenge{(i,νi)}i∈I

−−−−−−−−−−−−−−→challenge request chal

3. Compute µ′ =∑

i∈I νimi, and also

chal = {(i, νi)}i∈I ; σ =∏

i∈I σνi

i ;

4. Randomly pick r ← Zp, and compute

R = e(u, v)r and γ = h(R);{µ,σ,R}

←−−−−−−−−−−−−−−−−storage correctness proof

5. Compute µ = r + γµ′ mod p ;

6. Compute γ = h(R), and then

verify {µ, σ, R} via Equation 1.

Fig. 2: The privacy-preserving public auditing protocol

used. HLAs, like MACs, are also some unforgeableverification metadata that authenticate the integrityof a data block. The difference is that HLAs can beaggregated. It is possible to compute an aggregatedHLA which authenticates a linear combination of theindividual data blocks.

At a high level, an HLA-based proof of storagesystem works as follow. The user still authenticateseach element of F = (m1, · · · , mn) by a set of HLAsΦ. The cloud server stores {F, Φ}. The TPA verifies thecloud storage by sending a random set of challenge{νi}. (More precisely, F , Φ and {νi} are all vectors,so {νi} is an ordered set or {i, νi} should be sent).The cloud server then returns µ =

∑i νi ·mi and an

aggregated authenticator σ (both are computed fromF , Φ and {νi}) that is supposed to authenticate µ.

Though allowing efficient data auditing and con-suming only constant bandwidth, the direct adoptionof these HLA-based techniques is still not suitable forour purposes. This is because the linear combinationof blocks, µ =

∑i νi ·mi, may potentially reveal user

data information to TPA, and violates the privacy-preserving guarantee. Specifically, if an enough num-ber of the linear combinations of the same blocks arecollected, the TPA can simply derive the user’s datacontent by solving a system of linear equations.

3.4 Privacy-Preserving Public Auditing Scheme

Overview. To achieve privacy-preserving public au-diting, we propose to uniquely integrate the homo-morphic linear authenticator with random maskingtechnique. In our protocol, the linear combination ofsampled blocks in the server’s response is maskedwith randomness generated the server. With randommasking, the TPA no longer has all the necessaryinformation to build up a correct group of linearequations and therefore cannot derive the user’s datacontent, no matter how many linear combinations ofthe same set of file blocks can be collected. On theother hand, the correctness validation of the block-authenticator pairs can still be carried out in a new

way which will be shown shortly, even with thepresence of the randomness. Our design makes useof a public key based HLA, to equip the auditingprotocol with public auditability. Specifically, we usethe HLA proposed in [13], which is based on theshort signature scheme proposed by Boneh, Lynn andShacham (hereinafter referred as BLS signature) [17].

Scheme Details. Let G1, G2 and GT be multiplicativecyclic groups of prime order p, and e : G1 × G2 →GT be a bilinear map as introduced in preliminaries.Let g be a generator of G2. H(·) is a secure map-to-point hash function: {0, 1}∗ → G1, which maps stringsuniformly to G1. Another hash function h(·) : GT →Zp maps group element of GT uniformly to Zp. Theproposed scheme is as follows:

Setup Phase: The cloud user runs KeyGen to gener-ate the public and secret parameters. Specifically, theuser chooses a random signing key pair (spk, ssk),a random x ← Zp, a random element u ← G1,and computes v ← gx. The secret parameter issk = (x, ssk) and the public parameters are pk =(spk, v, g, u, e(u, v)).

Given a data file F = (m1, . . . , mn), the user runsSigGen to compute authenticator σi for each blockmi: σi ← (H(Wi) · u

mi)x ∈ G1. Here Wi = name||iand name is chosen by the user uniformly at randomfrom Zp as the identifier of file F . Denote the set ofauthenticators by Φ = {σi}1≤i≤n.

The last part of SigGen is for ensuring the integrityof the unique file identifier name. One simple way todo this is to compute t = name||SSigssk(name) as thefile tag for F , where SSigssk(name) is the signatureon name under the private key ssk. For simplicity,we assume the TPA knows the number of blocks n.The user then sends F along with the verificationmetadata (Φ, t) to the server and deletes them fromlocal storage.

Audit Phase: The TPA first retrieves the file tagt. With respect to the mechanism we describe inthe Setup phase, the TPA verifies the signature

Page 6: Privacy preserving public auditing for

6

SSigssk(name) via spk, and quits by emitting FALSEif the verification fails. Otherwise, the TPA recoversname.

Now it comes to the “core” part of the auditingprocess. To generate the challenge message for theaudit “chal”, the TPA picks a random c-element subsetI = {s1, . . . , sc} of set [1, n]. For each element i ∈ I ,the TPA also chooses a random value νi (of bit lengththat can be shorter than |p|, as explained in [13]).The message “chal” specifies the positions of theblocks that are required to be checked. The TPA sendschal = {(i, νi)}i∈I to the server.

Upon receiving challenge chal = {(i, νi)}i∈I , theserver runs GenProof to generate a response proofof data storage correctness. Specifically, the serverchooses a random element r ← Zp, and calculates R =e(u, v)r ∈ GT . Let µ′ denote the linear combination ofsampled blocks specified in chal: µ′ =

∑i∈I νimi. To

blind µ′ with r, the server computes: µ = r+γµ′ modp, where γ = h(R) ∈ Zp. Meanwhile, the server alsocalculates an aggregated authenticator σ =

∏i∈I σνi

i ∈G1. It then sends {µ, σ, R} as the response proof ofstorage correctness to the TPA. With the response fromthe server, the TPA runs VerifyProof to validatethe response by first computing γ = h(R) and thenchecking the verification equation

R · e(σγ , g)?= e((

sc∏

i=s1

H(Wi)νi)γ · uµ, v) (1)

The protocol is illustrated in Fig. 2. The correctnessof the above verification equation can be elaboratedas follows:

R · e(σγ , g) = e(u, v)r · e((

sc∏

i=s1

(H(Wi) · umi)x·νi)γ , g)

= e(ur, v) · e((

sc∏

i=s1

(H(Wi)νi · uνimi)γ , g)x

= e(ur, v) · e((

sc∏

i=s1

H(Wi)νi)γ · uµ′γ , v)

= e((

sc∏

i=s1

H(Wi)νi )γ · uµ′γ+r, v)

= e((

sc∏

i=s1

H(Wi)νi )γ · uµ, v)

Properties of Our Protocol. It is easy to see thatour protocol achieves public auditability. There isno secret keying material or states for the TPA tokeep or maintain between audits, and the auditingprotocol does not pose any potential online burden onusers. This approach ensures the privacy of user datacontent during the auditing process by employing arandom masking r to hide µ, a linear combination ofthe data blocks. Note that the value R in our protocol,which enables the privacy-preserving guarantee, will

not affect the validity of the equation, due to thecircular relationship between R and γ in γ = h(R)and the verification equation. Storage correctness thusfollows from that of the underlying protocol [13]. Thesecurity of this protocol will be formally proven inSection 4. Besides, the HLA helps achieve the constantcommunication overhead for server’s response duringthe audit: the size of {σ, µ, R} is independent of thenumber of sampled blocks c.

Previous work [8], [10] showed that if the server ismissing a fraction of the data, then the number ofblocks that needs to be checked in order to detectserver misbehavior with high probability is in theorder of O(1). For examples, if the server is missing1% of the data F , to detect this misbehavior withprobability larger than 95%, the TPA only needs toaudit for c = 300 (up to c = 460 for 99%) randomlychosen blocks of F . Given the huge volume of dataoutsourced in the cloud, checking a portion of thedata file is more affordable and practical for boththe TPA and the cloud server than checking all thedata, as long as the sampling strategies provides highprobability assurance. In Section 4, we will present theexperiment result based on these sampling strategies.

3.5 Support for Batch Auditing

With the establishment of privacy-preserving publicauditing, the TPA may concurrently handle multi-ple auditing upon different users’ delegation. Theindividual auditing of these tasks for the TPA canbe tedious and very inefficient. Given K auditingdelegations on K distinct data files from K differentusers, it is more advantageous for the TPA to batchthese multiple tasks together and audit at one time.Keeping this natural demand in mind, we slightlymodify the protocol in a single user case, and achievesthe aggregation of K verification equations (for Kauditing tasks) into a single one, as shown in Equation2. As a result, a secure batch auditing protocol forsimultaneous auditing of multiple tasks is obtained.The details are described as follows.

Setup Phase: Basically, the users just perform Setupindependently. Suppose there are K users in thesystem, and each user k has a data file Fk =(mk,1, . . . , mk,n) to be outsourced to the cloud server,where k ∈ {1, . . . , K}. For simplicity, we assumeeach file Fk has the same number of n blocks. Fora particular user k, denote his/her secret key as(xk, sskk), and the corresponding public parameteras (spkk, vk, g, uk, e(uk, vk)) where vk = gxk . Similarto the single user case, each user k has already ran-domly chosen a different (with overwhelming prob-ability) name namek ∈ Zp for his/her file Fk, andhas correctly generated the corresponding file tagtk = namek||SSigsskk

(namek). Then, each user k runsSigGen and computes σk,i for block mk,i: σk,i ←(H(namek||i) · u

mk,i

k )xk = (H(Wk,i) · umk,i

k )xk ∈ G1

Page 7: Privacy preserving public auditing for

7

TPA Cloud Server

1. Verify file tag tk for each

user k, and quit if fail; For each user k (1 ≤ k ≤ K):

2. Generate a random challenge{(i,νi)}i∈I

−−−−−−−−−−−−−−→challenge request chal

3. Compute µ′k, σk, Rk as single user case;

chal = {(i, νi)}i∈I ; 4. Compute R = R1 ·R2 · · ·RK ,

L = vk1||vk2|| · · · ||vkK

and γk = h(R||vk||L);{{σk,µk}1≤k≤K ,R}

←−−−−−−−−−−−−−−−−storage correctness proof

5. Compute µk = rk + γkµ′k mod p ;

6. Compute γk = h(R||vk||L)

for each user k and perform

batch auditing via Equation 2.

Fig. 3: The batch auditing protocol

(i ∈ {1, . . . , n}), where Wk,i = namek||i. Finally, eachuser k sends file Fk , set of authenticators Φk, and tagtk to the server and deletes them from local storage.

Audit Phase: TPA first retrieves and verifies file tagtk for each user k for later auditing. If the verificationfails, TPA quits by emitting FALSE; otherwise, TPArecovers namek. Then TPA sends the audit challengechal = {(i, νi)}i∈I to the server for auditing data filesof all K users.

Upon receiving chal, for each user k ∈ {1, . . . , K},the server randomly picks rk ∈ Zp and computesRk = e(uk, vk)rk . Denote R = R1 · R2 · · ·RK , andL = vk1||vk2|| · · · ||vkK , our protocol further requiresthe server to compute γk = h(R||vk||L). Then, the ran-domly masked responses can be generated as follows:

µk = γk

sc∑

i=s1

νimk,i + rk mod p and σk =

sc∏

i=s1

σνi

k,i.

The server then responses the TPA with{{σk, µk}1≤k≤K ,R}.

To verify the response, the TPA can first computeγk = h(R||vk||L) for 1 ≤ k ≤ K . Next, TPA checks ifthe following equation holds:

R · e(

K∏

k=1

σkγk , g)

?=

K∏

k=1

e((

sc∏

i=s1

H(Wk,i)νi)γk · uµk

k , vk)

(2)

The batch protocol is illustrated in Fig. 3. Here theleft-hand side (LHS) of Equation 2 expands as:

LHS = R1 ·R2 · · ·RK ·

K∏

k=1

e(σkγk , g)

=

K∏

k=1

Rk · e(σkγk , g)

=

K∏

k=1

e((

sc∏

i=s1

H(Wk,i)νi)γk · uµk

k , vk)

which is the right hand side, as required. Note thatthe last equality follows from Equation 1.

Efficiency Improvement. As shown in Equation 2,batch auditing not only allows TPA to performthe multiple auditing tasks simultaneously, but alsogreatly reduces the computation cost on the TPA side.This is because aggregating K verification equationsinto one helps reduce the number of relatively expen-sive pairing operations from 2K , as required in theindividual auditing, to K + 1. Thus, a considerableamount of auditing time is expected to be saved.

Identification of Invalid Responses. The verifica-tion equation (Equation 2) only holds when all theresponses are valid, and fails with high probabilitywhen there is even one single invalid response in thebatch auditing, as we will show in Section 4. In manysituations, a response collection may contain invalidresponses, especially {µk}1≤k≤K , caused by accidentaldata corruption, or possibly malicious activity by acloud server. The ratio of invalid responses to thevalid could be quite small, and yet a standard batchauditor will reject the entire collection. To further sortout these invalid responses in the batch auditing, wecan utilize a recursive divide-and-conquer approach(binary search), as suggested by [18]. Specifically, ifthe batch auditing fails, we can simply divide thecollection of responses into two halves, and recursethe auditing on halves via Equation 2. TPA may nowrequire the server to send back all the {Rk}1≤k≤K ,as in individual auditing. In Section 4.2.2, we showthrough carefully designed experiment that using thisrecursive binary search approach, even if up to 18%of responses are invalid, batch auditing still performsfaster than individual verification.

3.6 Support for Data Dynamics

In Cloud Computing, outsourced data might notonly be accessed but also updated frequently byusers for various application purposes [10], [19]–

Page 8: Privacy preserving public auditing for

8

[21]. Hence, supporting data dynamics for privacy-preserving public auditing is also of paramount im-portance. Now we show how to build upon the exist-ing work [10] and adapt our main scheme to supportdata dynamics, including block level operations ofmodification, deletion and insertion.

In [10], data dynamics support is achieved byreplacing the index information i with mi in thecomputation of block authenticators and using theclassic data structure – Merkle hash tree (MHT) [22]for the underlying block sequence enforcement. Asa result, the authenticator for each block is changedto σi = (H(mi) · umi)x. We can adopt this tech-nique in our design to achieve privacy-preservingpublic risk auditing with support of data dynamics.Specifically, in the Setup phase, the user has togenerate and send the tree root TRMHT to TPA asadditional metadata, where the leaf nodes of MHTare values of H(mi). In the Audit phase, besides{µ, σ, R}, the server’s response should also include{H(mi)}i∈I and their corresponding auxiliary authen-tication information aux in the MHT. Upon receivingthe response, TPA should first use TRMHT and auxto authenticate {H(mi)}i∈I computed by the server.Once {H(mi)}i∈I are authenticated, TPA can then per-form the auditing on {µ, σ, R, {H(mi)}i∈I} via Equa-tion 1, where

∏s1≤i≤sc

H(Wi)νi is now replaced by∏

s1≤i≤scH(mi)

νi . Data privacy is still preserved dueto the random mask. The details of handling dynamicoperations are similar to [10] and thus omitted.

3.7 Learning µ′ from σ

Though our scheme prevents the TPA from directlyderiving µ′ from µ, it does not rule out the possi-bility of offline guessing attack from the TPA usingvalid σ from the response. Specifically, the TPA canalways guess whether the stored data contains certain

message m̃, by checking e(σ, g)?= e((

∏sc

i=s1H(Wi)

νi) ·

uµ̃′

, v), where µ̃′ is constructed from random coeffi-cients chosen by the TPA in the challenge and theguessed message m̃. Thus, our main scheme is notsemantically secure yet. However, we must note thatµ̃′ is chosen from Zp and |p| is usually larger than160 bits in practical security settings (see Section 4.2).Therefore, the TPA has to test basically all the valuesof Zp in order to make a successful guess. Givenno background information, the success probabilityof such all-or-nothing guess launched by TPA canbe negligible. Nonetheless, for completeness, we willgive a provably zero-knowledge based public audit-ing scheme, which further eliminates the possibilitiesof above offline guessing attack. The details and cor-responding proofs can be found in Appendix A.

3.8 Generalization

As mentioned before, our protocol is based on theHLA in [13]. Recently, it has been shown in [23] that

HLA can be constructed by homomorphic identifi-cation protocols. One may apply the random mask-ing technique we used to construct the correspond-ing zero knowledge proof for different homomorphicidentification protocols. Therefore, it follows that ourprivacy-preserving public auditing system for securecloud storage can be generalized based on other com-plexity assumptions, such as factoring [23].

4 EVALUATION

4.1 Security Analysis

We evaluate the security of the proposed scheme byanalyzing its fulfillment of the security guarantee de-scribed in Section 2.2, namely, the storage correctnessand privacy-preserving property. We start from thesingle user case, where our main result is originated.Then we show the security guarantee of batch audit-ing for the TPA in multi-user setting.

4.1.1 Storage Correctness Guarantee

We need to prove that the cloud server cannot gen-erate valid response for the TPA without faithfullystoring the data, as captured by Theorem 1.

Theorem 1: If the cloud server passes the Auditphase, then it must indeed possess the specified dataintact as it is.

Proof: The proof consists of two steps. First, weshow that there exists an extractor of µ′ in the randomoracle model. Once a valid response {σ, µ′} are ob-tained, the correctness of this statement follows fromTheorem 4.2 in [13].

Now, the cloud server is treated as an adversary.The extractor controls the random oracle h(·) andanswers the hash query issued by the cloud server.For a challenge γ = h(R) returned by the extractor, thecloud server outputs {σ, µ, R} such that the followingequation holds.

R · e(σγ , g) = e((

sc∏

i=s1

H(Wi)νi)γ · uµ, v). (3)

Suppose that an extractor can rewind a cloud serverin the protocol to the point just before the challengeh(R) is given. Now the extractor sets h(R) to be γ∗ 6=γ. The cloud server outputs {σ, µ∗, R} such that thefollowing equation holds.

R · e(σγ∗

, g) = e((

sc∏

i=s1

H(Wi)νi)γ∗

· uµ∗

, v). (4)

The extractor then obtains {σ, µ′ = (µ− µ∗)/(γ − γ∗)}as a valid response of the underlying proof of storagesystem [13]. To see, recall that σi = (H(Wi) · u

mi)x,

Page 9: Privacy preserving public auditing for

9

TABLE 1: Notation of cryptographic operations.

HashtG1

hash t values into the group G1.

MulttG

t multiplications in group G.

ExptG(ℓ) t exponentiations gai , for g ∈ G, |ai| = ℓ.

m-MultExptG(ℓ) t m-term exponentiations

∏m

i=1gai .

PairtG1,G2

t pairings e(ui, gi), where ui ∈ G1, gi ∈ G2.

m-MultPairtG1,G2

t m-term pairings∏m

i=1e(ui, gi).

divide (3) by (4), we have

e(σγ−γ∗

, g) = e((

sc∏

i=s1

H(Wi)νi)γ−γ∗

· uµ−µ∗

, v)

e(σγ−γ∗

, g) = e((

sc∏

i=s1

H(Wi)νi)γ−γ∗

, gx)e(uµ−µ∗

, gx)

σγ−γ∗

= (

sc∏

i=s1

H(Wi)νi)x(γ−γ∗) · ux(µ−µ∗)

(

sc∏

i=s1

σνi

i )γ−γ∗

= (

sc∏

i=s1

H(Wi)νi)x(γ−γ∗) · ux(µ−µ∗)

ux(µ−µ∗) = (

sc∏

i=s1

(σi/H(Wi)x)νi)γ−γ∗

ux(µ−µ∗) = (

sc∏

i=s1

(uxmi)νi)γ−γ∗

µ− µ∗ = (

sc∑

i=s1

miνi) · (γ − γ∗)

(

sc∑

i=s1

miνi) = (µ− µ∗)/(γ − γ∗)

Finally, we remark that this extraction argumentand the random oracle paradigm are also used in theproof of the underlying scheme [13].

4.1.2 Privacy Preserving GuaranteeWe want to make sure that the TPA cannot deriveusers’ data content from the information collectedduring auditing process.

Theorem 2: From the server’s response {σ, µ, R},TPA cannot recover µ′.

Proof: We show the existence of a simulator thatcan produce a valid response even without the knowl-edge of µ′, in the random oracle model. Now, the TPAis treated as an adversary. Given a valid σ from thecloud server, firstly, randomly pick γ, µ from Zp, setR← e((

∏sc

i=s1H(Wi)

νi)γ ·uµ, v)/e(σγ , g). Finally, back-patch γ = h(R) since the simulator is controlling therandom oracle h(·). We remark that this backpatchingtechnique in the random oracle model is also used inthe proof of the underlying scheme [13].

4.1.3 Security Guarantee for Batch AuditingNow we show that our way of extending our result toa multi-user setting will not affect the aforementioned

TABLE 2: Performance under different number ofsampled blocks c for high assurance (≥ 95%) auditing.

Our Scheme [13]

Sampled blocks c 460 300 460 300

Sever comp. time (ms) 411.00 270.20 407.66 265.87

TPA comp. time (ms) 507.79 476.81 504.25 472.55

Comm. cost (Byte) 160 40 160 40

security insurance, as shown in Theorem 3.Theorem 3: Our batch auditing protocol achieves

the same storage correctness and privacy preservingguarantee as in the single-user case.

Proof: The privacy-preserving guarantee in themulti-user setting is very similar to that of Theorem 2,and thus omitted here. For the storage correctnessguarantee, we are going to reduce it to the single-usercase. We use the forking technique as in the proofof Theorem 1. However, the verification equationfor the batch audits involves K challenges from therandom oracle. This time we need to ensure that allthe other K − 1 challenges are determined before theforking of the concerned random oracle response. Thiscan be done using the idea in [24]. As soon as theadversary issues the very first random oracle queryfor γi = h(R||vi||L) for any i ∈ [1, K], the simulatorimmediately determines the values γj = h(R||vj ||L)for all j ∈ [1, K]. This is possible since they are allusing the same R and L. Now, all but one of the γk’sin Equation 2 are equal, so a valid response can beextracted similar to the single-user case in the proofof Theorem 1.

4.2 Performance Analysis

We now assess the performance of the proposedprivacy-preserving public auditing schemes to showthat they are indeed lightweight. We will focus onthe cost of the efficiency of the privacy-preservingprotocol and our proposed batch auditing technique.The experiment is conducted using C on a Linuxsystem with an Intel Core 2 processor running at 1.86GHz, 2048 MB of RAM, and a 7200 RPM WesternDigital 250 GB Serial ATA drive with an 8 MB buffer.Our code uses the Pairing-Based Cryptography (PBC)library version 0.4.18. The elliptic curve utilized in theexperiment is a MNT curve, with base field size of 159bits and the embedding degree 6. The security levelis chosen to be 80 bit, which means |νi| = 80 and |p|= 160. All experimental results represent the mean of20 trials.

4.2.1 Cost of Privacy-Preserving Protocol

We begin by estimating the cost in terms of basic cryp-tographic operations, as notated in Table 1. Supposethere are c random blocks specified in the challenge

Page 10: Privacy preserving public auditing for

10

0 50 100 150 200400

420

440

460

480

500

520

Number of auditing tasks

Aud

iting

tim

e pe

r ta

sk (

ms)

individual auditingbatch auditing (c=460)batch auditing (c=300)

Fig. 4: Comparison on auditing time between batchand individual auditing. Per task auditing time de-notes the total auditing time divided by the numberof tasks. For clarity reasons, we omit the straight curvefor individual auditing when c=300.

message chal during the Audit phase. Under this set-ting, we quantify the cost introduced of the privacy-preserving auditing in terms of server computation,auditor computation as well as communication over-head.

On the server side, the generated response includesan aggregated authenticator σ =

∏i∈I σνi

i ∈ G1, arandom factor R = e(u, v)r ∈ GT , and a blinded linearcombination of sampled blocks µ = γ

∑i∈I νimi + r ∈

Zp, where γ = h(R) ∈ Zp. The corresponding com-putation cost is c-MultExp1

G1(|νi|), Exp1

GT(|p|), and

Hash1Zp

+ AddcZp

+ Multc+1Zp

, respectively. Comparedto the existing HLA-based solution for ensuring re-mote data integrity [13]1, the extra cost for protectingthe user privacy, resulted from the random mask R,is only a constant: Exp1

GT(|p|) + Mult1

Zp+ Hash1

Zp+

Add1Zp

, which has nothing to do with the number ofsampled blocks c. When c is set to be 300 to 460 forhigh assurance of auditing, as discussed in Section 3.4,the extra cost for privacy-preserving guarantee on theserver side would be negligible against the total servercomputation for response generation.

Similarly, on the auditor side, upon receivingthe response {σ, R, µ}, the corresponding compu-tation cost for response validation is Hash1

Zp+ c-

MultExp1G1

(|νi|) + HashcG1

+ Mult1G1

+ Mult1GT

+Exp3

G1(|p|) + Pair2

G1,G2, among which only Hash1

Zp+

Exp2G1

(|p|) + Mult1GT

account for the additional con-stant computation cost. For c = 460 or 300, and con-sidering the relatively expensive pairing operations,this extra cost imposes little overhead on the overallcost of response validation, and thus can be ignored.For the sake of completeness, Table 2 gives the ex-periment result on performance comparison betweenour scheme and the state-of-the-art [13]. It can be

1. We refer readers to [13] for a detailed description of their HLA-based solution.

0 2 4 6 8 10 12 14 16 18410

420

430

440

450

460

470

480

490

500

510

Fraction of invalid responses α

Aud

iting

tim

e pe

r ta

sk (

ms)

individual auditingbatch auditing (c=460)batch auditing (c=300)

Fig. 5: Comparison on auditing time between batchand individual auditing, when α-fraction of 256 re-sponses are invalid. Per task auditing time denotes thetotal auditing time divided by the number of tasks.

shown that the performance of our scheme is almostthe same as that of [13], even if our scheme supportsprivacy-preserving guarantee while [13] does not. Forthe extra communication cost of our scheme opposingto [13], the server’s response {σ, R, µ} contains an ad-ditional random element R, which is a group elementof GT and has the size close to 960 bits.

4.2.2 Batch Auditing EfficiencyDiscussion in Section 3.5 gives an asymptotic effi-ciency analysis on the batch auditing, by consideringonly the total number of pairing operations. However,on the practical side, there are additional less expen-sive operations required for batching, such as modularexponentiations and multiplications. Meanwhile, thedifferent sampling strategies, i.e., different number ofsampled blocks c, is also a variable factor that affectsthe batching efficiency. Thus, whether the benefitsof removing pairings significantly outweighs theseadditional operations is remained to be verified. Toget a complete view of batching efficiency, we conducta timed batch auditing test, where the number ofauditing tasks is increased from 1 to approximately200 with intervals of 8. The performance of the cor-responding non-batched (individual) auditing is pro-vided as a baseline for the measurement. Followingthe same experimental settings c = 300 and c = 460,the average per task auditing time, which is computedby dividing total auditing time by the number oftasks, is given in Fig. 4 for both batch and individualauditing. It can be shown that compared to individualauditing, batch auditing indeed helps reducing theTPA’s computation cost, as more than 11% and 14%of per-task auditing time is saved, when c is set to be460 and 300, respectively.

4.2.3 Sorting out Invalid ResponsesNow we use experiment to justify the efficiency ofour recursive binary search approach for the TPA to

Page 11: Privacy preserving public auditing for

11

sort out the invalid responses when batch auditingfails, as discussed in Section 3.5. This experiment istightly pertained to the work in [18], which evaluatesthe batch verification efficiency of various short sig-natures.

To evaluate the feasibility of the recursive approach,we first generate a collection of 256 valid responses,which implies the TPA may concurrently handle 256different auditing delegations. We then conduct thetests repeatedly while randomly corrupting an α-fraction, ranging from 0 to 18%, by replacing themwith random values. The average auditing time pertask against the individual auditing approach is pre-sented in Fig. 5. The result shows that even thenumber of invalid responses exceeds 15% of the totalbatch size, the performance of batch auditing canstill be safely concluded as more preferable than thestraightforward individual auditing. Note that therandom distribution of invalid responses within thecollection is nearly the worst-case for batch auditing.If invalid responses are grouped together, it is possibleto achieve even better results.

5 RELATED WORK

Ateniese et al. [8] are the first to consider public au-ditability in their defined “provable data possession”(PDP) model for ensuring possession of data files onuntrusted storages. Their scheme utilizes the RSA-based homomorphic linear authenticators for auditingoutsourced data and suggests randomly sampling afew blocks of the file. However, the public auditabilityin their scheme demands the linear combination ofsampled blocks exposed to external auditor. Whenused directly, their protocol is not provably privacypreserving, and thus may leak user data informationto the auditor. Juels et al. [11] describe a “proofof retrievability” (PoR) model, where spot-checkingand error-correcting codes are used to ensure both“possession” and “retrievability” of data files on re-mote archive service systems. However, the numberof audit challenges a user can perform is fixed apriori, and public auditability is not supported in theirmain scheme. Although they describe a straightfor-ward Merkle-tree construction for public PoRs, thisapproach only works with encrypted data. Dodis etal. [25] give a study on different variants of PoRwith private auditability. Shacham et al. [13] design animproved PoR scheme built from BLS signatures [17]with full proofs of security in the security modeldefined in [11]. Similar to the construction in [8], theyuse publicly verifiable homomorphic linear authenti-cators that are built from provably secure BLS signa-tures. Based on the elegant BLS construction, a com-pact and public verifiable scheme is obtained. Again,their approach does not support privacy-preservingauditing for the same reason as [8]. Shah et al. [9], [14]propose allowing a TPA to keep online storage honest

by first encrypting the data then sending a numberof pre-computed symmetric-keyed hashes over theencrypted data to the auditor. The auditor verifiesboth the integrity of the data file and the server’spossession of a previously committed decryption key.This scheme only works for encrypted files, and itsuffers from the auditor statefulness and boundedusage, which may potentially bring in online burdento users when the keyed hashes are used up.

In other related work, Ateniese et al. [19] propose apartially dynamic version of the prior PDP scheme,using only symmetric key cryptography but witha bounded number of audits. In [20], Wang et al.consider a similar support for partial dynamic datastorage in a distributed scenario with additional fea-ture of data error localization. In a subsequent work,Wang et al. [10] propose to combine BLS-based HLAwith MHT to support both public auditability andfull data dynamics. Almost simultaneously, Erwayet al. [21] developed a skip lists based scheme toenable provable data possession with full dynamicssupport. However, the verification in these two proto-cols requires the linear combination of sampled blocksjust as [8], [13], and thus does not support privacy-preserving auditing. While all the above schemesprovide methods for efficient auditing and provableassurance on the correctness of remotely stored data,none of them meet all the requirements for privacy-preserving public auditing in cloud computing. Moreimportantly, none of these schemes consider batchauditing, which can greatly reduce the computationcost on the TPA when coping with a large number ofaudit delegations.

6 CONCLUSION

In this paper, we propose a privacy-preserving publicauditing system for data storage security in CloudComputing. We utilize the homomorphic linear au-thenticator and random masking to guarantee thatthe TPA would not learn any knowledge about thedata content stored on the cloud server during the ef-ficient auditing process, which not only eliminates theburden of cloud user from the tedious and possiblyexpensive auditing task, but also alleviates the users’fear of their outsourced data leakage. ConsideringTPA may concurrently handle multiple audit sessionsfrom different users for their outsourced data files, wefurther extend our privacy-preserving public auditingprotocol into a multi-user setting, where the TPA canperform multiple auditing tasks in a batch mannerfor better efficiency. Extensive analysis shows that ourschemes are provably secure and highly efficient.

ACKNOWLEDGMENTS

This work was supported in part by the US NationalScience Foundation under grant CNS-0831963, CNS-0626601, CNS-0716306, and CNS-0831628.

Page 12: Privacy preserving public auditing for

12

APPENDIX AZERO KNOWLEDGE PUBLIC AUDITING

Here we present a public auditing scheme with prov-ably zero knowledge leakage. The setup phase issimilar to our main scheme presented in Section 3.4.The secret parameters are sk = (x, ssk) and the publicparameters are pk = (spk, v, g, u, e(u, v), g1), whereg1 ∈ G1 is an additional public group element.

In the audit phase, upon receiving challengechal = {(i, νi)}i∈I , the server chooses three randomelements rm, rσ, ρ← Zp, and calculates R = e(g1, g)rσ ·e(u, v)rm ∈ GT and γ = h(R) ∈ Zp. Let µ′ de-note the linear combination of sampled blocks µ′ =∑

i∈I νimi, and σ denote the aggregated authenticatorσ =

∏i∈I σνi

i ∈ G1. To make the auditing scheme withzero knowledge leakage, the server has to blind bothµ′ and σ. Specifically, the server computes: µ = rm +γµ′ mod p, and Σ = σ ·gρ

1 . It then sends {ς, µ, Σ, R} asthe response proof of storage correctness to the TPA,where ς = rσ + γρ mod p. With the response fromthe server, the TPA runs VerifyProof to validatethe response by first computing γ = h(R) and thenchecking the verification equation

R · e(Σγ , g)?= e((

sc∏

i=s1

H(Wi)νi)γ · uµ, v) · e(g1, g)ς (5)

The correctness of the above verification equationcan be elaborated as follows:

R · e(Σγ , g) = e(g1, g)rσ · e(u, v)rm · e((σ · gρ1)γ , g)

= e(g1, g)rσ · e(u, v)rm · e((σγ , g) · e(gργ1 , g)

= e(u, v)rm · e((σγ , g) · e(g1, g)rσ+ργ

= e((

sc∏

i=s1

H(Wi)νi)γ · uµ, v) · e(g1, g)ς

The last equality follows from the elaboration ofEquation 1.

Theorem 4: The above auditing protocol achieveszero-knowledge information leakage to the TPA, andit also ensures the storage correctness guarantee.

Proof: Zero-knowledge is easy to see. Randomlypick γ, µ, ς from Zp and Σ from G1, set R ←e((

∏sc

i=s1H(Wi)

νi)γ ·uµ, v)·e(g1, g)ς/e(Σγ , g) and back-patch γ = h(R). For proof of storage correctness,we can extract ρ similar to the extraction of µ′ as inthe proof of Theorem 1. With ρ, σ can be recoveredfrom Σ. To conclude, a valid pair of σ and µ′ can beextracted.

REFERENCES

[1] P. Mell and T. Grance, “Draft NIST working definition ofcloud computing,” Referenced on June. 3rd, 2009 Onlineat http://csrc.nist.gov/groups/SNS/cloud-computing/index.html, 2009.

[2] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz,A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica,and M. Zaharia, “Above the clouds: A berkeley view of cloudcomputing,” University of California, Berkeley, Tech. Rep.UCB-EECS-2009-28, Feb 2009.

[3] M. Arrington, “Gmail disaster: Reports of mass emaildeletions,” Online at http://www.techcrunch.com/2006/12/28/gmail-disasterreports-of-mass-email-deletions/,December 2006.

[4] J. Kincaid, “MediaMax/TheLinkup Closes Its Doors,”Online at http://www.techcrunch.com/2008/07/10/mediamaxthelinkup-closes-its-doors/, July 2008.

[5] Amazon.com, “Amazon s3 availability event: July 20, 2008,”Online at http://status.aws.amazon.com/s3-20080720.html,2008.

[6] S. Wilson, “Appengine outage,” Online at http://www.cio-weblog.com/50226711/appengine outage.php, June 2008.

[7] B. Krebs, “Payment Processor Breach May Be Largest Ever,”Online at http://voices.washingtonpost.com/securityfix/2009/01/payment processor breach may b.html, Jan. 2009.

[8] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner,Z. Peterson, and D. Song, “Provable data possession at un-trusted stores,” in Proc. of CCS’07, Alexandria, VA, October2007, pp. 598–609.

[9] M. A. Shah, R. Swaminathan, and M. Baker, “Privacy-preserving audit and extraction of digital contents,” Cryptol-ogy ePrint Archive, Report 2008/186, 2008.

[10] Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, “Enabling publicverifiability and data dynamics for storage security in cloudcomputing,” in Proc. of ESORICS’09, volume 5789 of LNCS.Springer-Verlag, Sep. 2009, pp. 355–370.

[11] A. Juels and J. Burton S. Kaliski, “Pors: Proofs of retrievabilityfor large files,” in Proc. of CCS’07, Alexandria, VA, October2007, pp. 584–597.

[12] Cloud Security Alliance, “Security guidance for criticalareas of focus in cloud computing,” 2009, http://www.cloudsecurityalliance.org.

[13] H. Shacham and B. Waters, “Compact proofs of retrievability,”in Proc. of Asiacrypt 2008, vol. 5350, Dec 2008, pp. 90–107.

[14] M. A. Shah, M. Baker, J. C. Mogul, and R. Swaminathan,“Auditing to keep online storage services honest,” in Proc. ofHotOS’07. Berkeley, CA, USA: USENIX Association, 2007, pp.1–6.

[15] 104th United States Congress, “Health Insurance Portabilityand Accountability Act of 1996 (HIPPA),” Online at http://aspe.hhs.gov/admnsimp/pl104191.htm, 1996.

[16] S. Yu, C. Wang, K. Ren, and W. Lou, “Achieving secure,scalable, and fine-grained access control in cloud computing,”in Proc. of IEEE INFOCOM’10, San Diego, CA, USA, March2010.

[17] D. Boneh, B. Lynn, and H. Shacham, “Short signatures fromthe Weil pairing,” J. Cryptology, vol. 17, no. 4, pp. 297–319,2004.

[18] A. L. Ferrara, M. Greeny, S. Hohenberger, and M. Pedersen,“Practical short signature batch verification,” in Proceedings ofCT-RSA, volume 5473 of LNCS. Springer-Verlag, 2009, pp. 309–324.

[19] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik,“Scalable and efficient provable data possession,” in Proc. ofSecureComm’08, 2008, pp. 1–10.

[20] C. Wang, Q. Wang, K. Ren, and W. Lou, “Ensuring data storagesecurity in cloud computing,” in Proc. of IWQoS’09, July 2009,pp. 1–9.

[21] C. Erway, A. Kupcu, C. Papamanthou, and R. Tamassia,“Dynamic provable data possession,” in Proc. of CCS’09, 2009,pp. 213–222.

[22] R. C. Merkle, “Protocols for public key cryptosystems,” in Proc.of IEEE Symposium on Security and Privacy, Los Alamitos, CA,USA, 1980.

[23] G. Ateniese, S. Kamara, and J. Katz, “Proofs of storage fromhomomorphic identification protocols,” in ASIACRYPT, 2009,pp. 319–333.

[24] M. Bellare and G. Neven, “Multi-signatures in the plain public-key model and a general forking lemma,” in ACM Conferenceon Computer and Communications Security, 2006, pp. 390–399.

[25] Y. Dodis, S. P. Vadhan, and D. Wichs, “Proofs of retrievabilityvia hardness amplification,” in TCC, 2009, pp. 109–127.