31
Wayne State University Cluster and Internet Computing Laboratory RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication Fan Ni* [email protected] ATG, NetApp Song Jiang [email protected] UT Arlington * The work was done when he was a Ph.D. student at UT Arlington

RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Wayne State UniversityCluster and Internet Computing Laboratory

RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication

Fan Ni*[email protected]

ATG, NetApp

Song [email protected]

UT Arlington

* The work was done when he was a Ph.D. student at UT Arlington

Page 2: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Data is Growing Rapidly

§ Many of the data needs to be stored for preservation and processing.§ Efficient data storage and management has become a big challenge.

From storagenewsletter.com

2

Page 3: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

The Opportunity: Data Duplication is Common

§ Sources of duplicate data:– The same files are stored by multiple users into the cloud. – Continuously updating of files to generate multiple versions. – Use of checkpointing and repeated data archiving.

§ Significant data duplication has been observed for both backup and primary storage workloads.

3

Page 4: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

The Deduplication Technique can Help

Logical Physical

File1File2

File1

File2SHA1( ) = SHA1( )

When duplication is detected (using fingerprinting):

Only one copy is stored:

§ Benefits– Storage space– I/O bandwidth– Network traffic

§ An important feature in commercial storage systems.– NetApp ONTAP system– Dell-EMC Data Domain system

§ Two critical issues:– How to deduplicate more data?– How to deduplicate faster? 4

Page 5: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Chunking andfingerprinting

Remove duplicate chunks

Deduplicate at Smaller Chunks …

… for higher deduplication ratio § Two potentially major sources of cost in the deduplication:

– Chunking– Fingerprinting

§ Can chunking be very fast?5

Page 6: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Fixed-Size Chunking (FSC)

HOWAREYOU?OK?REALLY?YES?NO File A

HOWAREYOU?OK?REALLY?YES?NO File B

§ FSC: partition files (or data streams) into equal- and fixed-sized chunks.– Very fast!

§ But the deduplication ratio can be significantly compromised.– The boundary-shift problem.

6

Page 7: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Fixed-Size Chunking (FSC)

§ FSC: partition files (or data streams) into equal- and fixed-size chunks.– Very fast!

§ But the deduplication ratio can be significantly compromised.– The boundary-shift problem.

HOWAREYOU?OK?REALLY?YES?NO File A

HOWAREYOU?OK?REALLY?YES?NO File B H

7

Page 8: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Content-Defined Chunking (CDC)

HOWAREYOU?OK?REALLY?YES?NO File A

HOWAREYOU?OK?REALLY?YES?NO File B H

§ CDC: determines chunk boundaries according to contents (a predefined special marker).– Variable chunk size.– Addresses boundary-shift problem

§ Assume the special marker is ‘?’

88

Page 9: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

The Advantage of CDC

§ Real-world datasets include two-week’s google news, Linux kernels, and various Docker images.

§ CDC’s deduplication ratio is much higher than FSC.§ However, CDC can be very expensive.

Google-n

ews

Linux-

tar

Cassand

raRedi

s

Debian-

docker

Neo4j

Wordpre

ssNode

js0

4

8

12

16

20

24

28

32

36

40D

edup

licat

ion

ratio

CDC FSC

9

Page 10: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

CDC can be Too Expensive!

HOWAREYOU?OK?REALLY?YES?NO File A

HOWAREYOU?OK?REALLY?YES?NO File B H

Assume the special marker is ‘?’

§ The marker for identifying chunk boundaries must – be evenly spaced out with a controllable distance in between.

§ Actually the marker is determined by applying a hash function on a window of bytes.– E.g., hash(“YOU?”) == pre-defined-value

§ The window rolls forward byte-by-byte and the hashing is applied continuously. 10

Page 11: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

CDC Chunking Becomes a Bottleneck

§ Chunking time > 60% of the CPU time.§ I/O bandwidth is not fully utilized.§ The bottleneck shifts from the disk to CPU.

1111

Linux-tardr=4.06

Redisdr=7.17

Neo4jdr=19.04

0

20

40

60

80

100

Tim

e(%

)

Fingerprinting

Chunking

Breakdown of CPU time

Page 12: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Linux-tardr=4.06

Redisdr=7.17

Neo4jdr=19.04

0

20

40

60

80

100

Tim

e(%

)

CDC Chunking Becomes a Bottleneck

§ Chunking time > 60% of the CPU time.§ I/O bandwidth is not fully utilized.§ The bottleneck shifts from the disk to CPU.

1212

Fingerprinting

Chunking

I/O Idle

Breakdown of CPU time Breakdown of IO time

I/O Busy

Page 13: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Linux-tardr=4.06

Redisdr=7.17

Neo4jdr=19.04

0

20

40

60

80

100

Tim

e(%

)

CDC Chunking Becomes a Bottleneck

§ Chunking time > 60% of the CPU time.§ I/O bandwidth is not fully utilized.§ The bottleneck shifts from the disk to CPU.

1313

I/O Busy

Fingerprinting

Chunking

I/O Idle

Breakdown of CPU time Breakdown of IO time

Page 14: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Efforts on Acceleration of CDC Chunking

§ Make hashing faster– Example functions: SimpleByte, gear, and AE– More likely to generate small chunks

• increasing size of metadata cached in memory for performance

§ Use GPU/multi-core to parallelize the chunking process– Extra hardware cost– Substantial efforts to deploy – The speedup is bounded by hardware parallelism.

§ Significant software/hardware efforts, but limited performance return

1414

Page 15: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

We proposed RapidCDC that …

§ is still sequential and doesn’t require additional cores/threads.

§ makes the hashing speed almost irrelevant.

§ accelerates the CDC chunking often by 10-30 times.

§ has a deduplication ratio the same as regular CDC methods.

§ can be adopted in an existing CDC deduplication system by adding 100~200 LOC in a few functions.

1515

Page 16: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

The Path to the Breakthrough

Unique Chunks in the Disk

16

Page 17: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

The Path to the Breakthrough

Fingerprint

Matched!

17

Page 18: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

The Path to the Breakthrough

Fingerprint

Matched !15KB

15KB

Confirm it !

18

Page 19: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

The Path to the Breakthrough

Fingerprint

Matched!15KB

15KB

10KB

9KB 20KB

12KB

12KB

7KB

19

Page 20: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

The Path to the Breakthrough

Fingerprint

Matched !

16KB

20

Page 21: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

The Path to the Breakthrough

Fingerprint

Matched !Fingerprint

Matched !

7KB16KB

P

21

Page 22: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

The Path to the Breakthrough

Fingerprint

Matched !Fingerprint

Matched !

Fingerprint

Matched !

20KB16KB 7KB

P P

22

Page 23: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

The Path to the Breakthrough

Fingerprint

Matched !Fingerprint

Matched !

Fingerprint

Matched !

Fingerprint

Matched !

Fingerprint

Matched !

Palmost always happens !

16KB 7KB 20KB

P P P

23

Page 24: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Duplicate Locality§ Duplicate locality: if two of chunks are duplicates, their next chunks (in their

respective files or data stream) are likely duplicates of each other.

§ Duplicate chunks tend to stay together.

10 20 40 80 90# of files

0

20

40

60

80

100

Per

cent

age

ofch

unks

(%)

All duplicate chunks

Duplicate chunk immediately following another duplicate chunk

24(Debian)

Page 25: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Duplicate Locality§ Duplicate locality: if two of chunks are duplicates, their next chunks (in their

respective files or data stream) are likely duplicates of each other.

§ Duplicate chunks tend to stay together.

10 20 40 80 90# of files

0

20

40

60

80

100

Per

cent

age

ofch

unks

(%)

All duplicate chunks

Duplicate chunk immediately following another duplicate chunk

25

Page 26: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

RapidCDC: Using Next Chunk in History as a Hint

+s2=

When FP(B1) == FP(A1):

<FP1, s2> <FP2, s3> <FP3, s4> <FP4, …>

B1 B2 B3…

…File A

File B

P1P0 P2 P3 P4

+s3=B4

+s4=

A2A1 A3 A4

Offset in file:

§ History recording: whenever a chunk is detected, its size is attached to its previous chunk (fingerprint);

§ Hint-assisted chunking: whenever a duplication is detected, use the history chunk size as a hint for the next chunk boundary.

§ Regular CDC is used for chunking until a duplicate chunk (e.g., B1) is found

26

Page 27: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

More Design Considerations …

27

§ A chunk may have been followed with chunks of different sizes– Maintain a size list

§ Validation of Hinted Next Chunk Boundaries– Four alternative criterions with different efficiency and

confidencesØ FF (fast-forwarding only)Ø FF+RWT (Rolling window Test)Ø FF+MT (Marker Test)Ø FF+RWT+FPT (Fingerprint Test)

§ Please refer to the paper for detail.

Page 28: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Evaluation of RapidCDC

§ Prototype: based on a rolling-window-based CDC system.– Using Rabin/Gear as rolling function for rolling window computation.– Using SHA1 to calculate fingerprints.

§ Three disks with different speed are tested.– SATA Hard disk: 138 MB/s and 150MB/s for sequential read/write. – SATA SSD: 520 MB/s and 550MB/s for sequential read/write.– NVMe SSD: 1.2 GB/s and 2.4G/s for sequential read/write.

28

Page 29: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

§ Chunking speedup correlates to the deduplication ratio.§ Deduplication ratio is little affected (except for one very

aggressive validation criterion).

Synthetic Datasets: Insert/Delete

29

1000 2000 5000 10000 20000# of modifications

0

1

2

3

4

5

6

7

Dedu

plica

tion

ratio

Regular

FF+RWT+FPT

FF+RWT

FF+MT

FF

1000 2000 5000 10000 20000# of modifications

0

1

2

3

4

5

6

Spee

dup

FF+RWT+FPT

FF+RWT

FF+MT

FF

Spee

dup

Ded

uplic

atio

n ra

tio# of modifications # of modifications

Page 30: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Debian Neo4j Wordpress Nodejs0

5

10

15

20

25

30

Spee

dup

FF+RWT+FPT

FF+RWT

FF+MT

FF

Real-world Datasets: Chunking Speed

§ Chunking speedup approaches deduplication ratio.§ Negligible deduplication ratio reductions (if any).

33XFaster!

Debian Neo4j Wordpress Nodejs0

10

20

30

40

Ded

uplic

atio

nra

tio

Regular

FF+RWT+FPT

FF+RWT

FF+MT

FF

30

Spee

dup

Ded

uplic

atio

n ra

tio

Page 31: RapidCDC: Leveraging Duplicate Locality to Accelerate ... · CDC Chunking Becomes a Bottleneck § Chunking time > 60% of the CPU time. § I/O bandwidth is not fully utilized. § The

Conclusions

§ RapidCDC represents a disruptively new approach to improve CDC chunking speed.

§ It increases chunking speed by up to 33X without loss of deduplication ratio.

§ Its adoption in an existing CDC deduplication system does not require any major change of its current operation flow.

§ Its implementation in any existing CDC deduplication systems requires minimal code changes (100-200 lines of C code in our prototype)

§ A prototype implementation is available at https://github.com/moking/rapidcdc

31