Chapter 4 Homomorpic Distribution Verification …shodhganga.inflibnet.ac.in/bitstream/10603/20850/13/13...cloud computing. Their scheme relies on Vandermonde Reed-Solomon Code for

65

Chapter 4 Homomorpic Distribution Verification Protocol

4.1. Introduction

In this chapter, we propose a new homomorpic distributed verification protocol to address

the Availability and Integrity of data stored in the cloud. Recently, Wang et al. [164, 165]

described a homomorpic distributed verification protocol to ensure the data storage security in

cloud computing. Their scheme relies on Vandermonde Reed-Solomon Code for the data

Availability, and utilizes the token pre-computation using pseudorandom data to verify the data

storage correctness. It achieves the guaranty of data Availability, Integrity and identifies the

misbehaving servers also. However, their scheme is inefficient and gives the week Integrity

guarantee to the Clients about the security of their data. The HDVP over comes the drawbacks of

Wang‘s[164,165] protocol and achieves the strong guarantee to the Availability and Integrity of

data stored in the cloud with partial dynamic data support through private verifiability.

4.2. Preliminaries

4.2.1. Erasure Codes

a) Cauchy Reed-Solomon (CRS) Code

Erasure coding is a fundamental technique to prevent the data loss in distributed storage

systems, which is composed of multiple storage devices, such as network application [123], data

domain [193], peer-to-peer storage network [190] and cloud storage [26, 165].

Erasure codes based on Vandermonde Reed-Solomon codes, which have existed for

decades [131] and are widely used in communication and storage systems. The Vandermonde

Reed-Solomon codes are very powerful as they can be defined for any value of m(original

blocks) and n(redundancy blocks) . However, they have a drawback of requiring n Galois Field

multiplications per coding block, and since coding blocks are typically smaller than a machine‘s

word size, they can require 2n to 8n multiplications per machine word. Thus, Vandermonde

Reed-Solomon codes are expensive [132].

In 1995, Blomer, et al. [22] proposed a Cauchy Reed-Solomon Code (CRS) with two

improvements to Vandermonde Reed-Solomon code. The first improvement is to use projections

that convert encoding operations over Galios Field GF(2w) to XORs, so that encoding operation

66

takes O(n log2(m+n)) XORs per coding block. The second improvement is that it uses Cauchy

Matrix over Galios Field GF(2w) instead of classical vandermonde matrix. The distribution

matrix has property that all n×n matrices are invertible.

A Cauchy matrix is defined over the Galois Field GF (2w ) in the following way:

let X={x1, . . . , xm} and Y = {y1, . . . , yn} be defined such that each xi and yi is a distinct

element of GF(2w), and X ∩ Y=Ф. Then the Cauchy matrix defined by X and Y is 1/(xi + yj ) in

element i, j as shown in the Fig. 4.1.

nmmm

nmmm

n

n

yxyxyx

yxyxyx

yxyxyx

yxyxyx

1,

1,

1

)

1,

1,

1

11,

1

11,

1

21

12111

22212

12111

Fig. 4.1 Cauchy Matrix

This approach is a systematic layout with parity blocks that are achieved with the Cauchy

distribution matrix A is derived from Cauchy Matrix [22].

T

A= (I/P)T

=

Fig. 4.2 Cauchy Distribution Matrix

1 0 0

0 1 0

0 0 1

1/( p1,0) 1/(p1,1 ) 1/(p1,(m-1))

1/ (p2,0) 1/ (p2,1) 1/(p2,(m-1))

1/ (pn-1,1) 1/ (pn-1,1) 1/(pn-1, (m-1))

67

Where )(

1,

ndmP nm

(4.1)

The Cauchy distribution matrix A is given in Fig. 4.2 is composed of (m×m) identity matrix

in the first m columns and (m×n) Cauchy distribution matrix is in the remaining n columns

Next, we expands the Cauchy distribution matrix using a projection defined by in

GF (2w) by factor of w in each dimension to yield a w(m+n)×wm distribution matrix known as

Binary Distribution Matrix(BDM). To encode with bit matrix, we multiply the Binary

Distribution Matrix A with wm element vector, where wm is composed of w bits from each data

device. This product operation is depicted in Fig. 4.3.

T T T

By multiplying F with A, the Client obtains encoded file, which is a w(m+n) element vector

and is composed of w bits from data and coding devices. In multiplication operation, the Cauchy

Reed-Solomon uses series of bit-wise XOR operations to encode the file instead of performing n

multiplication using Galois Arithmetic in classical Vandermonde Reed-Solomon code [164].

Although Cauchy-Reed Solomon codes are achieving faster encoding and decoding times

than vandermonde Reed Solomon Code, the decoding and encoding times are slow for large

files.

Fig. 4.3 Encoding System with Binary Distribution Matrix [133]

Data +Coding

Data

= *

0 0 0

P0,0 P0,1

P0,2

P1,0

P1,1

P1,2

1

P0,m-1

P1,m-1

1 0 0

0 1 0

0 0 1

0

0

0

Pn-1,0

Pn-1,2

Pn-1,1

p1,m-1

Dm-1

D1

D2

C1

C2

D3

Cn-1

D

2

D1

D2

D3

Dm-1

Binary Distribution Matrix (BDM)

68

To improve the encoding and decoding performance of large files, we use Tornado codes

instead of Cauchy Reed-Solomon code. The brief description of tornado codes will be explained

in next section.

b) Tornado Codes

Erasure codes based on reed-Solomon code can be used to encode and decode the data in

distributed storage systems [131-133]. The limitation of these codes is that encoding and

decoding times are slow for large block sizes in files, effectively limiting block size to small

values for practical applications.

Tornado codes are erasure block codes based on irregular bipartite graphs. They can be

constructed for arbitrary rates allowing recovery of the original file from any fraction of the

encoded file [11]. The encoding and decoding will take O(n log n), whereas Reed-Solomon (RS)

codes take O(n2) as shown in Table 4.1.

Table 4.1: Difference between Tornado Codes and Reed-Solomon Codes

Erasure code Encoding Time Decoding Time

Tornado code[11] O(nlogn) O(nlogn)

Reed-Solomon

code[131] O(n2) O(n2)

Tornado codes are generated by cascading a sequence of irregular random bipartite graphs;

these graphs are equivalent to generator matrices. The operation of this graph is shown in

Fig .4.4. The nodes on the leftmost level of the graph correspond to the original data. The values

of nodes on the right side contains redundant data, which is computed by performing an XOR

operation of the neighboring input nodes level to the left. The number of exclusive-or operations

required for both encoding and decoding is thus dominated by the number of edges in the entire

graph.

69

C1=x1+x2+x3

Fig. 4.4 Structure of Tornado Codes

The overall code C(B0,B1,…,Bm , Λ) is a cascade of bipartite graphs B0,B1,…,Bm and Λ. the

graph B0 has n message bits as input and produces βn check bits. These form the input bits of B1

and β2n new check bits are formed. The graph Bi, has β

in input bits and produce β

i+1n check bits.

This sequence is truncated by a conventional rate 1-β erasure code Λ. The codeword consists of

the n message bits and all the check bits produced at each stage of the cascade. It is thus a

systematic code. The total number of check bits produced by this sequence is given by

11

1

1

2 nnn

m

i

mi (4.2)

The length of the codeword produced given n input bits is

11

nnn . The resulting

code is rate 1-β code for all values of m. the length of cascade is selected such that βm+1

n≈√n. we

begin by using this decoding algorithm for Λ, to recover losses that occur within its bits. This

will be successful if at most β fractions of bits have been lost in Λ. If all the losses are recovered,

we know the check bits of Bm. These check bits could be used to recover any losses in the input

bits of Bm. since the input bits of Bi are check bits of Bi-1, this recursion can be continued until all

the input bits of B0 are recovered.

X3

Original Data

Redundant Data

X2

X1

70

4.2.2. Sobol Sequence

I.M. Sobol in 1967 introduced the construction of quasi-random sequence of points that

have low-star discrepancy, which is called Sobol sequence [27], The Sobol sequence generates

numbers between zero and one as binary fraction of length w bits, from a set of special binary

fractions, vi i=1,2..w, called direction numbers. It uses base two for all dimensions in which

numbers are generated sequentially to fill the larger ―gaps‖ in the uncorrelated sequences

(pseudorandom sequences). The Sobol sequence is more uniform than pseudorandom sequence

[147] as shown in Fig. 4.5.

To generate sobol sequence of values x1, x

2 ,…

with 0< x

i<1, first, we need to construct

direction numbers v1,v2,... each vi is a binary fraction that can be written as

vi=mi/2, (4.3)

where mi is an odd integer such that 0<mi<2i.To obtain mi,

The construction of starts by choosing a primitive polynomial in the field Z2 i.e. one may

choose

P=xd+a1xd-1+…+ad-1x+1 (4.4)

where each ai is 0 or 1 and P is an arbitrary chosen primitive polynomial of degree d in Z2.

Then, the mi by the q-term recurrence relation as

didi

d

did

d

iii mmmamamam

2222 11

1

22

2

11 (4.5)

Fig. 4.5 Comparison of Pseudorandom data and Sobol Sequence.

71

where each term is expressed in base 2 and denotes a bit-by-bit exclusive. When using a

primitive polynomial of degree d, the initial values m1….md can be arbitrarily chosen provided

that each mi is odd and mi<2i, i=1,..d.

In order to genarate sequence x1,x

2,..........,x

n Sobol proposed using

and

Where b1,b2 is the binary representation of n and c is the rightmost zero bit in binary

representation of n .

For Example[172]: To generate the Sobol QRNs, choose a primitive polynomial, say,

P = x3+x+1=1x3++ox2++1x + 1, (4.8)

where the coefficients h1 =1 and h2 =0.

Using(4.5) the corresponding recurence relation is:

mi =4mi−2 ⊕8mi−3 ⊕mi−3. (4.9)

we Choose the first three mi’s arbitrarily as m1 =1,m2 =3,and m3 =7.

Then m4 =4m2 ⊕8m1⊕m1 =12⊕8⊕1 = 1100⊕1000⊕0001 = 0101 = 5.

Similarly m5 =4m3 ⊕8m2 ⊕m2 =28 ⊕24 ⊕3 = 11100 ⊕11000 ⊕00011 = 00111 = 7,

m6 =4m4 ⊕8m3 ⊕m3 == 43, and so on.

Using (4.1), find v1= 1/21=1/2=0.1,

v2 = 3/22=3/4=0.11,

v3 = 7/23=7/8=0.111, and so on.

Now x0 = 0 , n=0 , c=l

Step 1: x1 = x0 ⊕ v1 = 0.0 ⊕ 0.1 = 0.1 =1 /2

n=l , C= 2

Step 2 : x2 = x1 ⊕ v2 == 0.10 ⊕ 0.11 = 0.01 =1/4

n = 2 ,c= 1

Step 3: x3 = x2 ⊕ v1

= 0.01 ⊕ 0.10 == 0.11 =3/4, so on

Sobol sequence is: 0,1/2/ 1/4, 3/4, ....

c

nn vxx 1

2211 vbvbn

x(4.6)

(4.7)

72

4.2.3. Universal Hash Function(UHF)

UHF [31] is an algebraic function h: K × Il ← I that compresses a message or file element

mєIl into a compact digest or ―hash‖ based on a key. kєK . We denote the output of h as hk(m) .

A UHF has the property that given two inputs x≠y, with overwhelming probability over keys k ,

it is the case that hk( x ) ≠ hk( y ) . In other words, a UHF is collision-resistant when the message

pair (x; y ) is selected independently of the key k . A related notion is that of almost exclusive-or

universal (AXU) hash functions that have the property that given three input messages, the

probability that the XOR of the hashes of the first two inputs matches the third input is small.

Formally:

Definition 4.1: h is an universal hash function family if for any x ≠ y єIl :

.)]()([Pr yhxh kkKk h is an є-AXU family if for any x ≠ y єIl, and for any z є I

: .])()([Pr zyhxh kkKk

Many common UHFs are also linear, meaning that for any message pair (m1 , m2 ), it is the

case that

h( m 1) +hk(m2 ) = hk ( m1 + m2 ) (4.10)

4.3. Homomorpic Distribution Verification Protocol (HDVP)

The Homomorphic Distribution Verification Protocol is used to ensure the Availability and

Integrity of data stored in the cloud. It consists of three phases as shown in fig. 4.5:

1) Setup phase: in this phase, the Client encode the data for Availability and pre-computes the

certain number of short verification tokens before distributing a file into cloud servers.

2) Verification Phase: in this phase, the Client verifies the Integrity of data stored in the cloud

and recovers the data that lost in the cloud servers and gives guaranty to file retrieval.

3) Dynamic Data Operations and Verification Phase: in this phase, the Client performs

dynamic data operations to cloud data storage without retrieving original data.

4.3.1. Setup Phase: HDVP

The setup phase consists of three methods: a) Encoding b) KeyGeneration

c) MetadataGeneration as shown in the Fig. 4.6.

The detailed description of these three methods is explained in following sections:

73

a) Encoding: HDVP

In cloud data storage, we rely on erasure codes based on Cauchy Reed-Solomon(CRS) code

[22, 132] to distribute the entire file redundantly across the set of k=m+n cloud servers to

guaranty the data Availability against data loss. A (m+n, n) Cauchy Reed-Solomon code

generates n redundancy blocks(parity blocks) from m data blocks in such a way that the original

m data blocks can be reconstructed from any m out of m+n data and redundancy blocks.

Because of storing each of m+n data blocks on multiple servers, the original file can be survived

from any failures of any m of m+n servers without any data loss, with less space overhead

Fig.4.6 Architecture of Homomorpic Distributed Verification Protocol

Client

1

2

3

4

5

6

7 8

TPA

CSP

(a) Encoding

(b) KeyGeneration

(c) MetadataGeneration

(a)Challenge

(b)Response

(c)Check Integrity

(a) PrepareUpdate

(b) ExecuteUpdate

(c)CheckUpdate

Setup Phase

Verification Phase

Dynamic Data Operations

and Verification Phase

74

(m/n)[100]. For the support of an efficient sequential I/O to the original file, data file layout is

systematic, i.e. the unmodified m data blocks together with the n parity blocks are distributed

across m+n different servers. In the algorithm 4.1[176], we write procedure for encoding of the

file using CRS.

In algorithm 4.1, we assume the following:

Let F= {D1, D2,…Dm) and Di =(d1i, d2i…..dli )T

where (i є {1……m}), l ≤ 2w-1, and

w is a data word, which is equal to 8 or 16. Here T (shorthand for transpose) denotes that each Di

is represented as a column vector, and l denotes data vector size in blocks. All these elements

are elements of Galois Field (GF) (2w). To encode the data file, the Client multiplying F with A,

as follows:

AFC (4.11)

),,,,,( 2121 nmm CCCDDDC

(4.12)

),( )()()( j

i

j

i

j CDC

where j є{ 1, …,k}

),,,( )()(

2

)(

1

)( j

l

jjj

i dddD (4.13)

where (j є { 1, …,k}) and i

є { 1, …,l})

Algorithm 4.1: Encoding : HDVP

1. Procedure: File-Encoding 2. for i=0 to n-1 do 3. for k=0 to w-1 do 4. Ci

k=[0,0,..,0] 5. for j=0 to m-1 do 6. for l=0 to w-1 do 7. Ci

k=Cik fi

k,j,l * dj,l 8. end for 9. end for 10. end for 11. end for 12. end procedure

75

And ),,,( )()(

2

)(

1

)( j

l

jjj

i cccC (4.14)

where (j є { 1, …,k}) and i

є { 1, …,l}

After encoded the data file, the Client generates key pair for the later processing of the

system, which we discuss in the next section.

b) KeyGeneration: HDVP

After encoding the data, the Client generates the random challenge key x and master

permutation key y for the later processing of the file as shown algorithm 4.2: we use Sobol

Random Function(SRF) to generate the keys x and y.

)(ifxSRFk (4.15)

and

)(ify

SRPk (4.16)

The both the keys are indexed on some (usually secret) key: f :{0,1}* ×key-GF(2

w)

c) MetadataGeneration: HDVP

In order to guaranty the Integrity of data stored in the cloud, this protocol entirely relies on

the metadata. The main idea as follows: before distributing the encoded data into cloud, the

Client computes the certain number of short verification tokens(metadata) on individual data

block C(k)

using Sobol Sequence[27] instead of pseudorandom sequence[164]. Each token covers

the random set of blocks. The procedure of metadata Computation is given in algorithm 4.3:

Algorithm 4.2: KeyGen: HDVP

1. Procedure: KeyGen 2. Generate random challenge key kSRF and 3. master permutation key kSRP using Sobol sequence;

4. Derive )(ifx

SRFk and

)(ifySRPk

5. end for 6. end procedure

76

The Client does the following procedure while generating ith

token computation:

1) The Client generates the set of r randomly chosen indices

)(qI yq (4.17)

Where ]}1][....1[{ rqlIq and

πkey(.) is a Sobol Random Permutation (SRP), which is indexed under key:

π : {0,1}log2(l)

×key–{0,1} log2(l).

2) Then, the Client Computes the tokens as

][)(

1

)(

q

kr

q

qj

i ICxV (4.18)

Where x is a challenge key and y is a master permutation key.

The metadata computation function Vi(j)

belongs to a family of Universal Hash

Function(UHF) [31], chosen to preserve the homomorphic properties which can be perfectly

integrated with the verification of erasure-coded data . The Vi(k)

is an element of GF(2w) with

small size which is the response that the Client expecting to receive from the CSP, when Client

challenges it on specific random blocks.

Algorithm 4.3: MetadataGen: HDVP

1. Procedure: MetadataGen 2. for vector G(k), k←1,n do 3. for round i←1, t do

4. Compute the set of random indices: )(qI yq

]}1][....1[{ rqlIq

5. Compute ][)(

1

)(

q

kr

q

qj

i ICxV

6. end for

7. end for

8. store all the Vi ‘s locally. 9. end procedure

77

The overall processes of setup phase is given in Fig.4.7

Fig. 4.7 Setup Phase: HDVP

Once this required number of metadata is computed, the Client keeps metadata locally and

distributes the all the k =m+n encoded data blocks C(j)

(j є {1……k}) to CSP. Then, CSP stores

all the data and parity blocks across the cloud servers namely S1, S2………Sk.

4.3.2. Verification Phase: HDVP

The verification phase consists of three methods as shown in Fig 4.6: a) Challenge

b) Response c) CheckIntegrity.

a) Challenge: HDVP

In this phase, the Client issues ―random sample‖ challenge to verify the Integrity of data

stored in the cloud. To create a challenge, the Client re-generates the challenge key and master

permutation keys using Sobol sequence and send to the CSP. Here, challenge key is needed to

prevent the potential pre-computations performed by the CSP. The procedure of creating

challenge is given in algorithm 4.4.

Client CSP 1. The Client encodes the file 2. Client generate a key pair x and y

3. Client computes metadata

][)(

1

)(

q

kr

q

qj

i ICxV

4.Sends F=(m1,m2,…mn) to server

5. The CSP stores

the file F in cloud

6. keep metadata at locally

and deletes file F and metadata Tm from locally

78

b) Response: HDVP

Upon receiving a request from the Client, the CSP computes the response (signatures as

Integrity proof) on specified random block indices and return response back to the Client. The

procedure of Response protocol is given in algorithm 4.5.

The procedure of the ith

response algorithm for cross check over the k servers in algorithm

4.5 is: the CSP computes a response over each data block G(j)

specified by index is:

)]([)(

1

)(qGxR y

jr

q

qj

i (4.19)

c) CheckIntegrity: HDVP

After receiving a response R(j)

from the CSP, the Client checks the data Integrity by

comparing response with metadata previously computed by the Client as shown in

algorithm 4.6:

)()( j

i

j

i VR , (jє {1, ……,k}). (4.20)

Algorithm 4.5: Response: HDVP

1. Procedure: Response (i) 2. for j←1, k do

3. CSP computes }1|)]([{ )(

1

)(kjqGxR y

jr

q

qj

i

4. return to Ri(j) to Client;

5. end procedure

Algorithm 4.4: Challenge: HDVP

1. Procedure: Challenge 2. The Client re-Generate random challenge key kSRF

and master permutation key kSRP using Sobol sequence;

3. Derive )(ifx

SRFk and

)(ifySRPk

4. end procedure

79

If the Integrity of data is achieved, HDVP is ready for the next challenge. Otherwise it

retrieves the file and corrects the data loss in the file. The process of verification phase is

illustrated in Fig.4.8.

Fig. 4.8 Verification Phase: HDVP

In next phase, we explain dynamic data operations of this protocol, If Client wants to update

the data in the cloud without retrieving the file.

Algorithm 4.6: CheckIntegrity: HDVP

1. Procedure: CheckIntegrity 2. for j←1, k do

3. if )( )()( j

i

j

i VR then

4. Accept and ready for next challenge 5. else 6. return data has been corrupted 7. end if 8. end for 9. end Procedure

Client CSP 1. generates a challenge

chal=( x,y)

2. Sends chal to the CSP

3. Computes

}1|)]([{ )(

1

)(kjqGxR y

jr

q

qj

i ]

4. Send Integrity Proof )( j

iR to the Client

5. The Client Verify if )( )()( j

i

j

i VR

returns ‘1’ otherwise ‘0’

80

4.3.3. Dynamic Data Operations and Verification Phase: HDVP

The cloud storage is a dynamic data storage, which means that the Clients frequently

modifying their data in the cloud, like photos, electronic documents and log files. So, it is critical

importance to consider the dynamic block level operations to modify the data file in the cloud

while maintaining the same data Integrity and Availability assurance, such as modification,

delete, append and insert.

There is one easy way to support all dynamic operations is that the Client has to download

the entire data from the CSPs, update it and re-distribute it again to the cloud. Clearly, it would

be secure, but highly inefficient. In this section, we will show how HDVP scheme explicitly and

efficiently supports dynamic data operations to cloud data storage as follows:

This phase consists of three methods: a) PrepareUpdate b) ExecuteUpdate c) CheckUpdate

as given in Fig.4.6.

a) PrepareUpdate: HDVP

After storing the data in the cloud, later, the Client may need to update the some data

(insertions of a new block after a given block i, deletion of a block i, and modification of a block

i). To perform an update operation, the Client prepares an update request (modification,

insertion, append and delete) and sends to the CSP. We outline the algorithm 4.7 for the prepare

update request performed by the Client as follows:

1) If an update operation is a modification, then the Client prepare the update request as

follows: modification means adding a current value dij to a new value Δdij

ijij dd (4.21)

Due to the linear property of Cauchy Reed-Solomon Code, a Client can easily performs the

update operation and generate update parity blocks without involving any other unchanged

blocks. So the Client can construct a generated updated matrix ΔF as follows:

ΔF=

Δd11 Δd12......... Δd1m

Δd21 Δd22……. Δd2m

Δdl1 Δdl2… …. Δdlm

81

=( ΔD1 ΔD2......... ΔDm) (4.22)

To keep the corresponding parity blocks are consistent with as well as original file layout,

the Client can multiply ΔF by Binary Distribution Matrix (BDM) and thus generate the update

parity information for both the data blocks and parity blocks as follows:

AFG ),,,,,( 121121 nm CCCDDD (4.23)

),,,,,( )()(

2

)(

1

)()(

2

)(

1

)( j

n

jjj

m

jjj CCCDDDG

(4.24)

where j є{ 1, …,k}

The data block update operation automatically affects some or all remaining verification

metadata. So, in order to maintain the same Integrity assurance, the Client has to modify those

unused metadata for each data block C(k)

. In other words, the Client needs to exclude every

occurrence of the old data block and replace it with new one as follows:

The data block G(j)

[Is] is covered by the specific token Vi(j)

has been changed to G(j)

[Is] + Δ

G(j)

[Is] as shown in algorithm 4.7, where Is= πy(s). To maintain the usability of token Vi(j)

, the

Client can simply update it by without retrieving any other r-1 blocks required in pre-

computation of Vi(j)

.

][)()()(

s

jsj

i

j

i IGxVV (4.25)

The crucial part of the algorithm is at line 7 where the Client simultaneously changes the old

version Vi(j)

with the new one

][)(

s

js IGx (4.26)

2) If an update operation is deletion. The delete operation considering here similar to general

delete operation in which the Client replaces the data blocks zero or some special reserved data

symbol. From this point of view, the delete operation is a special case of update operation.

Therefore, the Client can use the update operation procedure to support delete operation i.e.

ijij dd in ∆F (4.27)

82

In delete operation also the effected tokens have to be modified using same algorithm 4.7

in modification operation.

3) In some times, the Client wants to increase the size of his file in the cloud by adding new

data blocks at the end of file which is called as block append operation. If update operation is

append‘ in which the Client needs to upload large amount of data as follows: Given the file

matrix F is illustrated in file encoding, appending blocks towards the end of a data file is to

concatenate corresponding rows at the bottom of the matrix layout for the file F. We assume that

in the beginning there are only ‗l’ rows in the file matrix F. In order to support block append

operation, the Clients have to do a small modification to metadata as specified in modification

operation.

The Client has to expect the maximum size for his data blocks denoted as lmax, for each data

block. The idea of block append operation is considering here is similar to [16, 164] in which

each encoded data vector as well as the system parameter rmax=[r*(lmax/l)] for each pre-commuted

token. The pre-computation of the ith

token on server j is modified as follows:

][)(

1

)( max

q

jr

q

qj

i IGxV (4.28)

where G(j)

[Iq]= G(j)

[πy(q)],[ πy(s)]≤l

0 ,[πy(s)]≥l

This formula guarantee the on average, there will be r indices falling into the range of

existing l blocks, since the CSP and the Clients has the agree on the number of existing blocks

in each data block G(j)

, the CSP will follow the exactly same procedure when re-computing each

token upon receiving a challenge request.

After the re-computation of metadata, then, the Client is ready to append number of new

blocks i.e. both the data and parity blocks are generated, the total length of each block G(j)

will

be increased and fall into the range[l, lmax]. Therefore, the Client will update those affected

metadata by adding xs

* G(j)

[Is] to old Vi using algorithm 4.7 in update operation whenever

G(j)

[Is]!=0 for Is>l, where Is= πy(s) thus is omitted here.

83

4) if update operation is an insert operation, the insert operation to data file stored in the

cloud is similar to an append operation at the desired index position while maintaining the same

block level structure for the entire data file i.e. inserting a block F[j] corresponds to shifting all

blocks starting with index j+1 by one slot. An insert operation may affect all rows in logical

data file in matrix F, and a subsequent number of computations are required to rearrange the all

sub sequent blocks as well as pre-computation tokens. Hence, it is difficult to support an efficient

insertion operation.

After the preparation of an update request, the Client sends it to the CSP.

Algorithm 4.7: PrepareUpdate: HDVP

/assume that data block dij changed to ∆dij/ 1. Procedure: PrepareUpdate 2. If(update==modification/append) 3. for round i to t do

4. Derive )(ifx

SRFk and

)(ifySRPk

5. for each vector G(j) j←1 to r do

6. ][)()()(

s

jsj

i

j

i IGxVV

7. end for 8. end for 9. else if(update==delete) 10. for i = 1 to r do 11. if ( gk i ( l) == j) then

12. )],[( )()()( DblockIGxVV s

jsj

i

j

i

13. end for 14. else if(update=insert) 15. physical insert is not supported 16. end if 17. store Vi

(j) locally 18. end Procedure

84

b)ExecuteUpdate: HDVP

In this method, the CSP performs the update operations in response to Client requests which

perform the update operation and put update version of the file as follows:

}),,1{()()()( kjGGG jjj (4.29)

The procedure of execute operation is given in algorithm 4.8.

Algorithm 4.8: ExecuteUpdate: HDVP 1. Procedure: ExecuteUpdate←{F’’} 2. if(update==modification/append)

3. }),,1{()()()( kjGGG jjj

4. updates the file

5. else if(update==deletion)

6. }),,1{()()()( kjGGG jjj

7. update the file F 8. move all blocks backward after ith block 9. end if 10. end procedure

c)CheckUpdate : HDVP

After updates the data, the Client wants to verify whether the CSP has updated the data

file successfully or not? The verification of dynamic data operations in this protocol directly

starts form the default Integrity verification process (same as algorithm 4.6). The only difference

is in the way is we compute vi rather than hashing the concatenation of blocks, the adversary

hashes the each single block and XOR the resulting outputs. This does not change the ability of

the simulator to extract the blocks queried during the challenge.

The process of dynamic data operations and verification process is depicted in Fig. 4.9.

85

Fig. 4.9 Dynamic Data operations and Verification Phase: HDVP

4.4. Analysis of HDVP

In this section, we analyze the security and performance of the Homomorpic distribution

verification protocol.

4.4.1. Security Analysis of HDVP

In security analysis, we analyze the Integrity; Availability of data stored in the cloud and

compares results with existing schemes [164,165].

a) Integrity

In this section, we analyze the detection probability of it being corrupted blocks and show

that HDVP scheme required to operate on selected (random) blocks instead of all which can

greatly reduce the computational overhead of the verifier (Client) and server (CSP) while

maintaining detection of data corruptions with a high probability (99%).

Assume that, an attacker modified the data blocks in z rows out of ‗l‘ rows in encoded

file matrix G. Let r be number of selected rows for which the Client challenges the CSP to verify

the Integrity of data stored in the cloud in each Integrity verification process. Let X is a discrete

random variable which is defined as number of rows modified by the attacker. First, we analyze

the matching probability that at least one of the blocks picked by the Client matches one of the

blocks modified by an attacker is [164,165]:

Client CSP

1. Computes ][)()()(

s

jsj

i

j

i IGxVV

Update request = (s, m'i,update/append/delete), 2. Update request

3. Server updates the file

}),,1{()()()( kjGGG jjj

4. Delete m'i , from locally 5. The verification directly starts from algorithm 4.6

86

rr

i

r

m rzlzXPP )/1(1})1,1/min{1(1}0{11

0

(4.30)

The attacker avoids detection only if none of selected rows in ith

verification process are

deleted or modified.

Next, we consider the probability of false negative result. That is the specified data blocks

in those specified r rows has been damaged or deleted but verification equation still holds:

Ri(j)

= Vi(j)

. Consider the responses Ri(1)

.…..Ri(k)

returned to the Client from the CSPs for the ith

challenge and each response value Ri(j)

is computed within the GF(2w) is based on r blocks on

server j. The number of responses R(m+1)

…….R(k)

from parity servers is n=k-m. Thus according

to preposition 2 of [170], the false negative probability is:

where

kP

kckcP

r

fP

)2)(1Pr1(Pr

)12/()1)21((Pr

PrPr

2

1

21

Based on above discussion it follows that detection probability of data corruption across all

cloud storage servers is [164]:

).1( r

f

r

md PPP (4.32)

Comparison with Existing Scheme

In this section, we compare the detection probability of HDVP scheme with the existing

schemes [164-165] for the corrupted data blocks and prove that proposed scheme is better than

existing scheme in terms of detection probability.

We compare HDVP scheme with existing scheme in two cases:

i) Strong Integrity: the HDVP gives the strong Integrity guarantee to the Client‘s data than

existing probabilistic schemes i.e. If an adversary corrupts some part of the file, HDVP method

is able to detect the data corruption with high probability by issuing ―random samples‖ challenge

to the CSP for the data Integrity proof. These random samples generated by sobol sequence,

which covers the entire file uniformly or equally distributed in the file in order to detect

(20)

(4.31)

87

corrupted part, whereas existing schemes may not give satisfactory Integrity results, i.e. the

existing schemes will say that no data is lost or data is safe in the cloud even though data is

corrupted. Because, pseudorandom sequence may not cover the entire file or it may lie on the

same place while computing Integrity proof, so it is giving the Integrity proof without covering

corruption part of the file. To conclude that existing schemes may not provide satisfactory

Integrity results when a data blocks is corrupted in the file, assume that a file contains 100000

blocks, out of which 1% or 5% data is corrupted. If the verifier use 1000 or 500 sobal random

samples over entire file to detect data corruptions respectively, these random samples are

uniformly distributed or equally distributed to entire file for each and every challenge, thus it will

detect the corruptions with 99% probability, whereas if use pseudorandom samples for the same,

these random blocks may not detect corruption because these blocks may not be uniformly

distributed i.e. these may lie on upper part of the file or lower part of file and sometimes more

samples on the upper part and less samples on the lower side. Hence, sometimes these blocks

will give Integrity proof without covering the entire file. In Fig. 4.10a and 4.10b, we have shown

that probability of detection of 1% and 5% data corruptions respectively using sobol sequence

and pseudorandom sequence.

0 1 2 3 4 5 6 7 8 9 100

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

r(number of queried blocks)(as percentage of l)

l(to

tal

nu

mb

er

of

blo

ck

s)

Sobol Sequence(99%)

Pseudorandom sequence(99%)

(a) z=1%l

88

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

r(number of queried rows)(as percentage of l)

l(to

tal

nu

mb

er

of

row

s)

Sobol Sequence(99%)

Pseudorandom sequence(99%)

ii) Efficiency: Here, we analyze the efficiency of HDVP scheme with existing schemes in terms

of detection probability for the corrupted blocks as follows: assume that both schemes are

detecting corruptions with high probability. However, the verifier using existing methods would

have to dramatically increase the number of random samples in order to achieve detection with

high probability. This would make impractical the whole concept of lightweight verification,

where as HDVP scheme is always detects the corruption with high probability in a few numbers

of random samples as shown in Fig. 4.11a & b.

Fig. 4.11 plots Px for the detection probability of different values of l, r, z. To understand

the importance of Fig.4.11, for example we assume that a file contains only l=10000 blocks, and

in Fig. 4.11a if z=1%l (data corruption) data is corrupted. The sobol sequence detects with 99%

probability in a 4.9%l samples, while pseudorandom sequence requires nearly 10%l samples and

in Fig. 4.11b if z=5%l data is corrupted. The HDVP scheme using sobol sequence detects with

3%l random samples, while existing schemes[164,165] detects with 6%l random indices.

(b) z=5%l

Fig. 4.10: The detection probability Px against 1%l and 5%l data corruption using Existing Scheme and HDVP.

89

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

1

2

3

4

5

6

7

8

9

10

l (total number of blocks)

r(n

um

be

r o

f q

ue

rie

d b

loc

ks

as

pe

rce

nta

ge

of

l)

Proposed Method (0.99%)



Existing Method (0.99%)



0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

1

2

3

4

5

6

l (Total number of blocks)

r(n

um

be

r o

f q

ue

rie

d b

loc

ks

as

pe

rce

nta

ge

of

l)

Proposed Method(0.99%)



Existing Method(0.99%)



Clearly, we are verifying the Integrity of data stored in the cloud using random sampling

approach. The Pseudorandom sequence is often used for this purpose; however, pseudorandom

sequence tends to show some clustering effect, this effect will be more or less pronounced based

on the Pseudorandom Generator (PRNG) used. Because of this effect, we may not get

satisfactory Integrity results. To obviate this undesired effect, consequently, we use Sobol

(b) z=5%l

(a) z=1%l

Fig. 4.11 The detection probability Px against data corruption using Existing Scheme and HDVP Scheme. Px shown as a function of l(total number of blocks) and r(the number of blocks queried by the Client, shown as percentage of l) for value of z(the number of blocks modified by the adversary).

90

sequence; the generators of these numbers are designed and developed such that they produce

more uniformity distributed random sequence, hence, can get satisfactory Integrity results.

Therefore, HDVP scheme is more secure and efficient than the existing probability

verification schemes [164, 165] since, the sobol sequence is more uniform than pseudorandom

sequence.

Uniformity Testing

Now, we turn to Monte-Carlo simulation to determine uniformity of random sequences. For

the goodness of random numbers, we calculated the Monte Carlo integration using random

numbers. The integration of a function f(x) in the s-dimensional unit cube Is. we are in fact

calculating the average of the function at a set of randomly sample points. Where there are N

sample points in the integral is:

)'(1

1

N

i

xfN

V

(4.33)

Where V is used to denote the approximation to the integral and x1, x

1,.. ,x

N are the N, s-

dimensional sample points. The Monte Carlo integration of V sampling in the region -1<x'<1 for

the two cases: uncorrelated random numbers (pseudorandom sequence) and Sobol sequence. If

pseudo-random sequence is used, the points x' will be independently and identically distributed,

the estimate the expected error of integral is N-1/2

, while sobol sequence is used, whose fractional

error decreases of N-1

. In Fig. 4.12, we presented for calculation of six dimensional integral is:

654321

1

0

1

0

1

0

1

0

1

0

1

0

1

0

6

1

)cos(( dxdxdxdxdxdxixiI i

i

(4.34)

The exact value of integral is:

6

1

)sin(i

iI

(4.35)

Fig. 4.12 shows that the pseudorandom sequence gives worst performance, whilst Sobol

Sequence gives rapid convergence to the solution. To conclude that it has been shown that Sobol

sequence can evaluate integrals more efficient than pseudorandom sequences.

(25)

(24)

91

0 20 40 60 80 100 120 140 160 180 200-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

Number of Points(in thousands)

Valu

e o

f in

teg

ral

Pseudorandom Data

Sobol Sequence

Fig. 4.12 Monte Carlo simulation using random numbers

b) Availability

To make sure that original file can be recoverable and retrievable, if an attacker corrupts a

fraction of data file. The HDVP scheme gives guaranty to data retrievablility using erasure code

as shown in Table 4.2. From the Table 4.2, we can see that as soon as we increasing the n blocks,

the guarantee of Availability of the data is also increasing. Therefore, by generating n parity

blocks from m blocks, we can retrieve the original data from any of (m, n) blocks.

Table 4.2: For increasing (m, n) values to get 99.99% Availability guarantee.

Total no. of Blocks m blocks n blocks Availability guarantee

6 6 0 53%

10 6 4 85%

14 8 6 96.16%

18 12 6 97.19%

22 14 8 98.83%

26 16 10 99.97%

30 18 12 99.99%

The following theorem would prove the Availability guaranty of the file:

92

Theorem 4.1[148]: Given a ρ fraction of the n blocks of an encoded file, it is possible to

recover the entire original data with all but negligible probability.

Proof: Here, we considering the economically motivated adversaries those are willing to

modify or delete the small percentage of the file. In this context, deleting or modifying the few

bits won‘t provide any financial benefit. If detection of any modification/deletion of small parts

of file, then erasure codes could be used. So ρ fraction of encoded file blocks sufficient for

recover the original file with a linear time. Therefore, erasure code are guarantee that the ρ

fraction of retrieved blocks will allow decoding with overwhelming probability.

4.4.2. Performance Analysis and Experimental Results of HDVP

The performance analysis focuses on implementation of encoding, metadata generation and

CSP computation and compares the experimental results with Wang‘s scheme [164-165].

a) Encoding

In file encoding, we implemented a data file encoding for the data Availability guarantee.

HDVP experiments are conducted using C++ on system with core due 2 processor, running at

2.80GHz, 4GB of RAM and 3GB of SATA hard disk. We are considering 2 parameters for the

(m+n, n) Cauchy Reed-Solomon code over Galois Field GF(2w), w=8 or 16. The bellow Table

4.3 and 4.4 shows that the average encoding cost of 1GB file using Vandermonde Reed-Solomon

code and Cauchy Reed-Solomon Code respectively. In both the tables, we fixed the n parity

blocks are 2 and increase number of data blocks m in Set I and fixed the number of m data

blocks are 10 and increase the parity blocks in Set II. Note that as m increases, the length ‗l’ of

data blocks on every server will decrease, which enables fewer calls to the Cauchy Reed-

Solomon Encoder.

Fig. 4.13a &b shows that Compare to existing ssheme [164,165], the HDVP scheme

takes less time for encoding of 1GB file over Galois Field GF (2w) on different servers.

Therefore, the encoding cost of 1GB file using Cauchy Reed-Solomon encoding is faster than

Vandermonde based Reed-Solomon encoding because it uses XOR operations instead of

classical Galois field arithmetic to encode the file.

93

Table 4.3: Encoding cost of 1 GB file using Vander monde Reed-Solomon Code

Table 4.4: Encoding cost of 1 GB file using Cauchy Reed-Solomon Code

Set I m=4 m=6 m=8 m=10

n=2 110.21s 81.87s 65.42s 49.1s

Set II n=2 n=4 n=6 n=8

m=10 49.1s 83.2s 138.11s 189.87

Set I m=4 m=6 m=8 m=10

n=2 80.3s 62.67s 47.32 32.42

Set II n=2 n=4 n=6 n=8

m=10 32.42s 57.25s 103.21s 154.32s

94

10,2 10,4 10,6 10,8

0

20

40

60

80

100

120

140

160

180

200

m is fixed and n is increasing

To

tal co

st

in t

ime(S

eco

nd

s)

Vandermonde Reed-Solomon Code

Cauchy Reed-Solomon Code

(a)

4,2 6,2 8,2 10,20

20

40

60

80

100

120

m is increasing, and n is fixed

To

tal co

st

in t

ime(s

eco

nd

s)

Vandermonde Reed-Solomon Code

Cauchy Reed-Solomon Code

(b)

Next, we measure the encoding and decoding performance of file using erasure codes based

on Tornado code and compare the results with Reed-Solomon code as shown in Tables 4.5 and

Fig. 4.13 Encoding Performance Comparison between two different parameter settings for 1GB file encoding using different erasure coding techniques (Cauchy Reed-Solomon code and Vandermonde Reed-Solomon code) under the different Systems.

95

4.6. In tables 4.5 & 4.6, we can see that the encoding and decoding cost of HDVP scheme is

faster than existing schemes [164, 165]

Table 4.5: Encoding cost of the file with different sizes using Reed-Solomon code

and Tornado code

File Size Reed-Solomon

code[131] Tornado Codes[11]

2MB 442 seconds 0.60 seconds





Table 4.6: Decoding cost of the file with different sizes using Reed-Solomon code

and Tornado code

File Size Reed-Solomon

code[131] Tornado Code[11]






b) MetadataGeneration

In this section, we measure the processor speed to compute the metadata. The HDVP

scheme deciding total metadata dynamically, for example, when t is selected to be 7300 and

14600, the data file can be verified every day for next 20 and 40 years respectively. In metadata

computation, the Client first generates random key, permutation key using sobol sequence and

computes the metadata using UHF [31]. Other parameters are along with file encoding; HDVP

implementation shows that average token pre-computation cost is 0.2. The metadata generation

96

also contains SRF and SRPs. However, these operations are performed over the short inputs, so

their costs are negligible with respect to universal hash function (UHF).

Table 4.7 gives the summary of storage and computation cost of metadata for 1GB file using

HDVP scheme and existing scheme under different system settings.

Table 4.7: Storage and computation cost of metadata generation for 1GB data file under different

system settings using HDVP and existing scheme.

c) CSP Computation

Here, we measure the computation cost of the CSP to compute the Integrity proof for a

corresponding challenge that is created by the Client during the challenge-response protocol. The

HDVP takes less time to compute a response (Integrity proof) when compared the existing

method because in HDVP, the CSP computes a response for the few numbers of challenged

blocks. For example assume that the file contains l=10000 blocks out of which 1%l data is

corrupted. The HDVP computes a response for the c=360 random blocks out of total blocks to

achieve 99% detection probability where as existing schemes compute a response for the c=460

blocks to get 99% detection probability. Therefore, the CSP computation time of HDVP is faster

than existing scheme as shown in Table 4.8.

Verify Daily for next 20 Years

(m, n) = (10,2) (m, n) = (10,4) (m, n) = (10,6) (m, n) = (10,8)

HDVP Existing Scheme

[165] HDVP

Existing Scheme

[165] HDVP

Existing Scheme[

165] HDVP

Existing Scheme[165]

Storage Overhead(KB)

167.11 199.61 183.6

1 228.13 217.63 256.64

260.34

313.67

Computation Overhead(Sec)

33.26 41.40 38.40 47.31 45.32 53.22 53.09 63.05

Verify Daily for next 40 Years

Storage Overhead(KB)

341.22 399.22 369.0

8 456.25 413.65 513.28 470.85 627.34

Computation Overhead(Sec)

64.51 82.79 73.22 94.62 81.11 106.45 94.32 130.10

97

Table 4.8: Detection probability of 1%l data corruption out of 10000 blocks

Detection Probability Number of samples out of total samples are required

HDVP Existing schemes[164,165]

0.50 40 110

0.6 90 170

0.7 120 230

0.8 160 280

0.9 195 320

0.95 240 380

0.99 290 460

4.5. Summary

In this chapter, we proposed a homomorpic distribution verification protocol(HDVP) to

address Integrity and Availability of data in cloud computing. This scheme relies on erasure

codes in setup phase instead of replica mechanisms to guaranty the Availability of data storage,

and utilize token pre-computation using Sobol Sequence to check the Integrity of data storage.

The homomorpic distributed verification protocol achieves the Availability and Integrity of

data guaranty stored in the cloud through private verifiability and it is useful where the

application needs Availability and Integrity of data in the cloud. Through detailed security and

performance analysis, we proved that, the HDVP scheme is more efficient and protects Clients‘s

data stored in the cloud against internal and external attacks.

However, it does not support an efficient Dynamic Data operations because, the

construction of metadata is involved with the file index information i.e. once a file block is

inserted, the computation overhead is unacceptable since the metadata of all the following file

blocks should be recomputed with the new indexes. In addition, since it is based on symmetric

key cryptography, does not support public verifiability. It is also difficult for the Clients to verify

the Integrity of data when file size is large and Clients having fewer resources and limited

computing power.

To overcome these drawbacks of HDVP, we propose Dynamic public audit protocols, which

will be explained in the next chapter.

Documents

Chapter 4 Homomorpic Distribution Verification …shodhganga.inflibnet.ac.in/bitstream/10603/20850/13/13...cloud computing. Their scheme relies on Vandermonde Reed-Solomon Code for