18
DeDu: Building a Deduplication Storage system over Cloud computing This paper appears in : Computer Supported Cooperative work in Design(CSCWD) ,2011 15 th International Data of Conference: 8-10 June 2011 Author(s): Zhe Sun, Jun Shen, Fac. of inf., Univ. of Wollongong, Wollongong, NSW, Australia Jianming Yong, Fac. of bus., Univ. of Southern Queensland, Toowoomab, QLD ,Australia Speaker: Yen-Yi Chen MA190104 Date:2013/05/28

DeDu : Building a Deduplication Storage system over Cloud computing

  • Upload
    taipa

  • View
    126

  • Download
    0

Embed Size (px)

DESCRIPTION

DeDu : Building a Deduplication Storage system over Cloud computing. Speaker: Yen-Yi Chen MA190104 Date: 2013/05/28. This paper appears in : Computer Supported Cooperative work in Design(CSCWD) ,2011 15 th International Data of Conference: 8-10 June 2011 Author(s): - PowerPoint PPT Presentation

Citation preview

Page 1: DeDu : Building a  Deduplication  Storage system over Cloud computing

DeDu: Building a Deduplication Storage system over Cloud computing

This paper appears in : Computer Supported Cooperative work in Design(CSCWD) ,2011 15th InternationalData of Conference: 8-10 June 2011Author(s): Zhe Sun, Jun Shen, Fac. of inf., Univ. of Wollongong, Wollongong, NSW, Australia Jianming Yong, Fac. of bus., Univ. of Southern Queensland, Toowoomab, QLD ,Australia

Speaker: Yen-Yi Chen MA190104Date:2013/05/28

Page 2: DeDu : Building a  Deduplication  Storage system over Cloud computing

Outline

• Introduction• Two issues to be addressed• Deduplication• Theories and approaches • System design• Simulations and Experiments• Conclusions

Page 3: DeDu : Building a  Deduplication  Storage system over Cloud computing

Introduction

• 雲端運算興起、分散式系統架構• 資訊爆炸、資料海量• 儲存設備成本上升• 增加資料傳輸與減緩佔用網路頻寬

Page 4: DeDu : Building a  Deduplication  Storage system over Cloud computing

Introduction

• System name: DeDu• Front-end: deduplication application• Back-end: Hadoop Distributed File System

• HDFS• HBase

Page 5: DeDu : Building a  Deduplication  Storage system over Cloud computing

Two issues to be addressed

• How does the system identify the duplication? *hash function-MD5 and SHA-1

• How does the system manage the data? *HDFS and HBase

Page 6: DeDu : Building a  Deduplication  Storage system over Cloud computing

Deduplication

A C

B A

C

B

C

CA

AB

A

A

A

B

B

C

C

B

Data Store Data StoreData Store

a a

a

c

b

b

1. Data chunks are evaluated to determine a unique signature for

each

2. Signature values are compared to identify all

duplicates

3.Duplicate data chunks are replaced

with pointes to a single stored chunk. Saving

storage space

類別 File-level Block-level

重複資料比對層級 檔案 區塊

重複資料比對範圍 整個指定磁碟區 整個指定磁碟區

優點對單一檔案的容量刪減效果最好

可跨檔案比對,也能比對不同檔案底層的重複部份

缺點對已編碼檔案無效,對完全相同的兩份檔案仍會重複儲存

較消耗處理資源

重複資料刪檢比例 1:2~1:5 1:200甚至更高

Page 7: DeDu : Building a  Deduplication  Storage system over Cloud computing

Theories and approaches

A. The architecture of source data and link filesB. Architecture of deduplication cloud storage system

Page 8: DeDu : Building a  Deduplication  Storage system over Cloud computing

Source data and link files

Page 9: DeDu : Building a  Deduplication  Storage system over Cloud computing

Deduplication Cloud storage system

Page 10: DeDu : Building a  Deduplication  Storage system over Cloud computing

System design

A. Data organisationB. Storage of the filesC. Access to the filesD. Deletion of files

Page 11: DeDu : Building a  Deduplication  Storage system over Cloud computing

Data organisation

Page 12: DeDu : Building a  Deduplication  Storage system over Cloud computing

Storage of the files

Page 13: DeDu : Building a  Deduplication  Storage system over Cloud computing

Access to the files

Page 14: DeDu : Building a  Deduplication  Storage system over Cloud computing

Deletion of files

Page 15: DeDu : Building a  Deduplication  Storage system over Cloud computing

Simulations and Experiments

Page 16: DeDu : Building a  Deduplication  Storage system over Cloud computing

Performance evaluations

Page 17: DeDu : Building a  Deduplication  Storage system over Cloud computing

Conclusions

• 1. The fewer the data nodes, the writing efficiency is high; but the reading efficiency is low;• 2. The more data nodes, the writing efficiency is low, but reading efficiency is hight;• 3. single file is big, the time to calculate hash values becomes higher ; but transmission cost is low;• 4.single file is small, the time to calculate hash

values becomes lower ; but transmission cost is high.

Page 18: DeDu : Building a  Deduplication  Storage system over Cloud computing

Thanks for your listening