A Low-Complexity Fault-Tolerant Document Storage Systemvjsosa/clases/sd/version...Cloud Computing Cloud Computing The quick growth and development of the Internet Has turned the nodes

Introduction Proposed System Results Conclusion References

A Low-Complexity Fault-Tolerant Document Storage System

Dishelt Francisco Torres-Paz, Jorge Antonio Perez-Espinozaand Jose Juan Garcia-Hernandez

LTI-CINVESTAV

Email: {dtorres,jperez,jjuan}@tamps.cinvestav.mx


Content

1 IntroductionFault Tolerant SystemsCloud ComputingAn alternative technique

2 Proposed SystemStrategy and Fundamental OperationsModel DescriptionData blocks building processFile rebuilding from data blocks processCommunication Client-Server

3 ResultsEnvironment and TestsTests

4 Conclusion

5 References


Fault Tolerant Systems

Fault Tolerant Systems

Fault-tolerant systems are those that continue to work properly even whenfaults occur. Distributed systems for files storing are susceptible to faults,these failures can lead to loss of files completely, for this reason, thesesystems require strategies for recovery from failures.


Cloud Computing

Cloud Computing

The quick growth and development of the Internet

Has turned the nodes from isolated entities without connection to the rest,to become part of the network

Storage Centers Capacity

Storage centers have increased, then the system designers build modelslarger and more complex, this complexity increases the possibility of failure.

Availability and Reliability

When the files owner stores a file in the cloud, always expects to beavailable without any kind of corruption. On the other hand, providers ofsuch services should ensure the availability of files to the owners, even whenthe infrastructure suffers a fault.


Cloud Computing

Cloud Computing








Cloud Computing

Cloud Computing








Cloud Computing

Cloud Computing








An alternative technique


Most approaches maintain replicas of the data on a server, i.e., it createsentire copies from the original file and are distributed throughout the servergroup available.




Most approaches maintain replicas of the data on a server, i.e., it createsentire copies from the original file and are distributed throughout the servergroup available.




An alternative to replicas

Is the distribution of the file in a set of servers such that each server has adifferent part of the data block adding a minimum redundancy. Thisapproach greatly reduces the storage space and increases the security byreducing the number of file copies.




An alternative to replicas

Is the distribution of the file in a set of servers such that each server has adifferent part of the data block adding a minimum redundancy. Thisapproach greatly reduces the storage space and increases the security byreducing the number of file copies.


Objective

A parity-based low-complexity system for cloud storage

The proposed storage system consists of 7 servers and is able to recoverfrom any failure of any of the remaining servers using 6 servers to retrievethe original file while complexity is kept low.


Strategy and Fundamental Operations

Strategy

Parity Strategy

The proposed system is based in a parity strategy which consists of addingextra bits in a set of bits, in case of a failure within any set component, theparity bits are used to recover the lost bits.

Redundancy data

A parity block is calculated using an XOR operation between two groups ofbits, generating a third set of bits, called redundancy data.



Fundamental Operations

A fundamental operation is used to calculate the redundancy data, considerthe following example:

In another way:





In another way:





In another way:




Three blocks of data are obtained; with any two of the three blocks ispossible to reconstruct the third by applying the XOR operation to the twoavailable blocks. Using P12 y D2 the D1 block is recovered:




Three blocks of data are obtained; with any two of the three blocks ispossible to reconstruct the third by applying the XOR operation to the twoavailable blocks. Using P12 y D2 the D1 block is recovered:


Model Description

Getting parity blocks

Given a file F with a size of n bytes, a set of 5 data block:

F = D1, D2, D3, D4, D5

where a size of each Di:Di =

n

5

From the 5 generated blocks, the parity blocks P are calculated usingpairs of blocks such that Pi,i+1 :

D1⊕D2 = P1,2

D2⊕D3 = P2,3

D3⊕D4 = P3,4

D4⊕D5 = P4,5


Model Description



F = D1, D2, D3, D4, D5


n

5


D1⊕D2 = P1,2

D2⊕D3 = P2,3

D3⊕D4 = P3,4

D4⊕D5 = P4,5


Model Description



F = D1, D2, D3, D4, D5


n

5


D1⊕D2 = P1,2

D2⊕D3 = P2,3

D3⊕D4 = P3,4

D4⊕D5 = P4,5


Model Description

Selecting the blocks

Considering the original data blocks D1...D5 and the parity blocksgenerated P1,2...P4,5 , with 9 total blocks we can choose 7 to store ineach of the seven servers.

To select blocks, a combination between blocks of original data andparity blocks is chosen, such that together generate 5 possible blocks ofdata from the original file, even if one of the 7 blocks can not be read.The following blocks were selected:

D1; D2; D5; P1,2; P2,3; P3,4; P4,5

D3 y D4 can be generated using the parity blocks and blocks of data:

D3 = D2 ⊕ P2,3; D4 = D5 ⊕ P4,5


Model Description




D1; D2; D5; P1,2; P2,3; P3,4; P4,5


D3 = D2 ⊕ P2,3; D4 = D5 ⊕ P4,5


Model Description




D1; D2; D5; P1,2; P2,3; P3,4; P4,5


D3 = D2 ⊕ P2,3; D4 = D5 ⊕ P4,5


Model Description

Generation of blocks

With this combination of 7 blocks, it is possible to generate 5 blocks oforiginal data, even if any of these information can not be read. Suchprocedure is shown below:

D1 Server FailsD1 = P1,2 ⊕D2

D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5

D2 Server FailsD1 = D1

D2 = D1 ⊕ P1,2

D3 = D1 ⊕ P1,2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D3 ⊕ P3,4

D5 = D3 ⊕ P3,4 ⊕ P4,5

P1,2 Server FailsD1 = D1

D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D5 ⊕ P4,5 ⊕ P3,4

D4 = D5 ⊕ P4,5

D5 = D5

P3,4 Server FailsD1 = P1,2 ⊕D2

D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


Model Description




D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D1 ⊕ P1,2

D3 = D1 ⊕ P1,2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D3 ⊕ P3,4

D5 = D3 ⊕ P3,4 ⊕ P4,5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D5 ⊕ P4,5 ⊕ P3,4

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


Model Description




D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D1 ⊕ P1,2

D3 = D1 ⊕ P1,2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D3 ⊕ P3,4

D5 = D3 ⊕ P3,4 ⊕ P4,5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D5 ⊕ P4,5 ⊕ P3,4

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


Model Description




D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D1 ⊕ P1,2

D3 = D1 ⊕ P1,2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D3 ⊕ P3,4

D5 = D3 ⊕ P3,4 ⊕ P4,5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D5 ⊕ P4,5 ⊕ P3,4

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


Model Description




D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D1 ⊕ P1,2

D3 = D1 ⊕ P1,2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D3 ⊕ P3,4

D5 = D3 ⊕ P3,4 ⊕ P4,5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D5 ⊕ P4,5 ⊕ P3,4

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


Model Description




D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D1 ⊕ P1,2

D3 = D1 ⊕ P1,2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D3 ⊕ P3,4

D5 = D3 ⊕ P3,4 ⊕ P4,5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D5 ⊕ P4,5 ⊕ P3,4

D4 = D5 ⊕ P4,5

D5 = D5


D2 = D2

D3 = D2 ⊕ P2,3

D4 = D5 ⊕ P4,5

D5 = D5


Data blocks building process


A file F with a size of n bytes such that:

F = b1, b2, b3...bn

From this byte stream a byte array is built and from it, a matrix with5 columns and n

5rows:

MF =

b1 b2 b3 b4 b5b6 b7 b8 b9 b10b11 b12 b13 b14 b15. . . . .. . . . .

bn−4 bn−3 bn−2 bn−1 bn

(1)




A file F with a size of n bytes such that:

F = b1, b2, b3...bn

From this byte stream a byte array is built and from it, a matrix with5 columns and n

5rows:

MF =

b1 b2 b3 b4 b5b6 b7 b8 b9 b10b11 b12 b13 b14 b15. . . . .. . . . .

bn−4 bn−3 bn−2 bn−1 bn

(1)




Each column of the matrix MF represents a Di from the 5 blocksmentioned in order to perform the concatenation of all positions ofeach column in all rows.

D1 = b1, b6, b11...bn−4

D2 = b2, b7, b12...bn−3

D3 = b3, b8, b13...bn−2

D4 = b4, b9, b14...bn−1

D5 = b5, b10, b15...bn

Therefore, Di by itself does not represent any information, this leads toa safety mechanism that prevents compromising information withindividual blocks of information.




Each column of the matrix MF represents a Di from the 5 blocksmentioned in order to perform the concatenation of all positions ofeach column in all rows.

D1 = b1, b6, b11...bn−4

D2 = b2, b7, b12...bn−3

D3 = b3, b8, b13...bn−2

D4 = b4, b9, b14...bn−1

D5 = b5, b10, b15...bn

Therefore, Di by itself does not represent any information, this leads toa safety mechanism that prevents compromising information withindividual blocks of information.




From 5 blocks generated, 4 parity blocks are calculated and only 7 of 9are stored.

D1 ⊕D2 = P1,2

D2 ⊕D3 = P2,3

D3 ⊕D4 = P3,4

D4 ⊕D5 = P4,5

With all generated blocks, each block is distributed within the set ofinterconnected file system servers:




From 5 blocks generated, 4 parity blocks are calculated and only 7 of 9are stored.

D1 ⊕D2 = P1,2

D2 ⊕D3 = P2,3

D3 ⊕D4 = P3,4

D4 ⊕D5 = P4,5

With all generated blocks, each block is distributed within the set ofinterconnected file system servers:


File rebuilding from data blocks process

File rebuilding from data blocks process

After 5 data blocks have been rebuilt from the servers, it is required toperform the reverse process to the generation of blocks to reconstructthe original file , i.e., generating the MF matrix with columnsrepresented by Di, then convert MF into a stream of bytes that willbecome the original F file.


Communication Client-Server

General Application Architecture

The communication between the mobile device and the cloud service isa bidirectional communication. In this case, the cloud is defined byseven servers, which is completely invisible to the end user.



.

Architecture

Is based on a distributed system, since the partition and reconstruction ofthe file that the user needs to store are not performed on the device.

High energy consumption

This process consumes high energy, therefore, the mobile device is able toperform the reconstruction process of partition when is connected to anyserver.

Real process

The mobile really only load the file to one of the servers. Upon completionof loading the file, the server performs the partition and distribution of thefile in chunks to other servers.



.

Architecture




Real process




.

Architecture




Real process




The reconstruction process

The proposed system has a non-centralized architecture to ensure the corefunctionality of the application, in addition the system supports theunavailability of any of the servers and consequently, the unavailability ofany part of the file.



Server Architecture

Servers...

Are designed to communicate with the device and they have the ability tocommunicate with everyone else. Consequently, the server must bemultithreaded, that is, they must be able to handle multiple requests fromanother server or mobile device.



Server Architecture

Servers...

Therefore, the application has N devices which may be connected to Mservers, in this case 7 servers, but they can also have P connections orrequests between them.



Client Architecture

The client architecture is based on requests to any available server,otherwise connects to one that does this. The device is based on differentprotocols on the server may or may not post a reply. This certaincommunication protocols were defined for messaging and file transfer.



Client Architecture

Protocols

The protocols used in the application are:

Delete File: Delete Files Stored

Show Files: Show Files Stored on the Client

Download File: Download a file required by a user.

Upload File: Upload a file required by a user.

Test Connection: Test a connection with a server



Communication Unit

It defines the communication unit to the object that is used to transfermessages, files and information useful for the client, cloud and internalservers.

FilePacket

Communication Unit to exchange data between the mobile device and thecloud0, which contains the following attributes:

File Name

File Size

File Extension

User Identification

Data



Communication Unit

Fragment

The fragment refers to the fragment generated on the server, once appliedthe partitioning process and redundancy. This communication unit is usedto exchange chunks between internal servers.

Fragment Identifier

File Identifier

Fragment Size

Address Server

Path

Data



File-Fragment Relation

The model contains a fragment identifier, which corresponds to Id from afile fragment in common. It also contains a file identifier to match eachpiece with its own file. Finally, it has attributes of server address, in orderto locate the fragment on each server, and with the attribute path can belocated the fragment.


Environment and Tests

Environment

Environment

The proposed scheme can be implemented on different operating systemsand programming languages, however, in this case an object-orientedparadigm language, specifically Java, is utilized. For the mobile device, theplatform used was Android, which is based on the Java language.

Once the application had been developed and installed on the mobiledevice, and the application installed in each server program, a series of testswere made to validate the feasibility of the solution.


Tests

Test 1

In the first test, the servers were loaded with multiple files, from 1 MBto 200 MB.

The graph shows that the time is proportional to partition size. Asexpected, large files takes more time to make partition and distributionin the others servers. On the other hand, files with a size close to 50MB, performed the task in an acceptable time, however, for files with asize >100 MB, partitioning and distribution takes a considerable timeto complete the task.


Tests

Test 2

The second test is based on measuring time to join the fragments torecover the requested file. Although the test measures the time ofjoining, the test was completed affecting different servers, with theintention of study the behavior of reconstruction in the different cases.

The second graph shows the same behavior as in the first case, thetime with different fault reconstruction is almost identical. Afteranalyzing the behavior, it was determined that there is no differencebetween different servers fails.


Environment

Fault-Tolerant

A low-complexity fault-tolerant distributed storage file system wasproposed. The model is based on recovery partition and parity blocks usingonly XOR operations.

Recovery Blocks

It was shown that the combination of blocks chosen for the distribution isable to reconstruct the original file when a node fails.

Replication strategies

There are several P2P architectures, which each storage node or serverstores different files, and generally uses data replication strategies.


Environment

Degree of fault tolerance

Meanwhile, the proposed system allows a degree of fault tolerance withoutcompletely replicate the information.

Security

In addition to providing fault tolerance with less redundancy information,this strategy gives security to documents, since the document is completelyunreadable within isolated servers.


P. JaloteFault Tolerance in Distributed SystemsPrentice Hall, 1994.

P. JaloteFault Tolerance in Distributed SystemsPrentice Hall, 1994.

V. J. Sosa-Sosa and E. M. Hernandez-RamirezA file storage service on a cloud computing environment for digitallibrariesInformation Technologies and Libraries, pp. 34?45, 2012.

Y. Jin, T. Deyu, and Z. XianrongResearch on the distributed electronic medical records storage model2011 International Symposium on IT in Medicine and Education(ITME), December 2011, pp. 288?292.

J. Abawajy and M. M. DerisData replication approach with data consistency guarantee for data gridIEEE Transactions on Computers, vol. preprint, 2014.

Documents

A Low-Complexity Fault-Tolerant Document Storage Systemvjsosa/clases/sd/version...Cloud Computing Cloud Computing The quick growth and development of the Internet Has turned the nodes