DeDu : Building a Deduplication Storage system over Cloud computing

  • View
    110

  • Download
    0

Embed Size (px)

DESCRIPTION

DeDu : Building a Deduplication Storage system over Cloud computing. Speaker: Yen-Yi Chen MA190104 Date: 2013/05/28. This paper appears in : Computer Supported Cooperative work in Design(CSCWD) ,2011 15 th International Data of Conference: 8-10 June 2011 Author(s): - PowerPoint PPT Presentation

Transcript

DeDu: Building a Deduplication Storage system over Cloud computing

DeDu: Building a Deduplication Storage system over Cloud computingThis paper appears in : Computer Supported Cooperative work in Design(CSCWD) ,2011 15th InternationalData of Conference: 8-10 June 2011Author(s): Zhe Sun, Jun Shen, Fac. of inf., Univ. of Wollongong, Wollongong, NSW, Australia Jianming Yong, Fac. of bus., Univ. of Southern Queensland, Toowoomab, QLD ,Australia

Speaker: Yen-Yi Chen MA190104Date:2013/05/28

1OutlineIntroductionTwo issues to be addressedDeduplicationTheories and approaches System designSimulations and ExperimentsConclusions

2Introduction

*3IntroductionSystem nameDeDuFront-end: deduplication applicationBack-end: Hadoop Distributed File System HDFS HBase 4Two issues to be addressedHow does the system identify the duplication? *hash function-MD5 and SHA-1

How does the system manage the data? *HDFS and HBase**MD5SHA15DeduplicationACBACBCCAABAAABBCCBData StoreData StoreData Storeaaacbb1. Data chunks are evaluated to determine a unique signature for each2. Signature values are compared to identify all duplicates3.Duplicate data chunks are replaced with pointes to a single stored chunk. Saving storage spaceFile-levelBlock-level1:2~1:51:2006Theories and approaches A. The architecture of source data and link filesB. Architecture of deduplication cloud storage system

7Source data and link files

8Deduplication Cloud storage system

dedustorage system(HDFS)(Hbase)HDFSlarge datasetHbase9System designData organisationStorage of the filesAccess to the filesDeletion of files

10Data organisation

*3-----HDFSlink file

11Storage of the files

12Access to the files

*13Deletion of files

14Simulations and Experiments

*table1-----VMWare 7.10 workationCPU:3.32GHZHD:320GBhadoop:0.20.2Hbase0.20.6*7--------deduHDFSHBASE15Performance evaluations

16Conclusions1. The fewer the data nodes, the writing efficiency is high; but the reading efficiency is low;2. The more data nodes, the writing efficiency is low, but reading efficiency is hight;3. single file is big, the time to calculate hash values becomes higher ; but transmission cost is low;4.single file is small, the time to calculate hash values becomes lower ; but transmission cost is high.

17Thanks for your listening18