29
Experiences with a Distributed Deduplication API Fred Douglis Andy Huber Donna Lewis Rachel Traylor

Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

Experiences with a Distributed Deduplication API Fred DouglisAndy HuberDonna LewisRachel Traylor

Page 2: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.2

Background: Traditional Backup Model

Disk-Based Backups, Data Deduplication

Archive to Tape

Daily Full Backup, Periodic Full + Daily Incremental

Networker

Data Domain PBBA

PBBA

NetBackup

Networker

OST

OST

Page 3: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.3

NFS Processing

f1

f2

f4

f3

fn

CLIENT

PBBAC3fp3

C2fp2

C1fp1

C4fp4

C5fp5

6KB 8KB 12KB 4KB4KB

C1fp1

C2fp2

C3fp3

C4fp4

Data from f2 Data from f3

C5fp5

Data from f2 Data from f3

… fnf1 f2 f3 f4 f5 ALL DATAsent to

appliance

Chunks, fingerprints andfilters data

Compresses and stores new data

f5

Page 4: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.4

NFS Processing - Subsequent Full

f1

f2

f4

fn

CLIENT

PBBA

6KB 10KB 12KB 4KB6KB

C1fp1

C6fp6

C7fp7

C4fp4

Data from f2 Data from f3

C5fp5

Data from f2 Data from f3

… fnf1 f2 f3 f4 f5 ALL DATAsent to

appliancef3

C6fp6

C7fp7

Chunks, fingerprints andfilters data

Compresses and stores only new data

f5

C3fp3

C2fp2

C1fp1

C4fp4

C5fp5

Page 5: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.5

DD BOOST Distributed Deduplication

• Offloads Chunk and Fingerprint processing to the client– Reduces resource load on the PBBA significantly

• Filtering occurs between DD BOOST and the Data Domain PBBA

• Only new chunks and fingerprints are sent for storage, saving network bandwidth– Significantly reduces bandwidth

Page 6: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.6

DD BOOST Distributed Deduplication

f1

f2

f4

fn

CLIENT

PBBA

I1=6KB I2=8KB I4=12KB I5=4KBI3=4KB

C1fp1

C2fp2

C3fp3

C4fp4

Data from f2 Data from f3

C5fp5

f3

fp5, l5fp4, l4

fp6, l6fp7, l7

fp1, l1 DD BOOST Sends fingerprints to PBBA

C3fp3

C2fp2

C1fp1

C4fp4

C5fp5

fp1 I1, fp2 I2, fp3 I3, fp4 I4, fp5 I5 PBBA Returns list of chunks not found

DD BOOST Compresses/Sends New Data

f5

Page 7: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.7

DD BOOST Distributed Deduplication

f1

f2

f4

fn

CLIENT

PBBA

I1=6KB I6=10KB I4=12KB I5=4KBI7=6KBC1fp1

C6fp6

C7fp7

C4fp4

Data from f2 Data from f3

C5fp5

f3

fp5, l5fp4, l4

fp6, l6fp7, l7

fp1, l1 DD BOOST Sends fingerprints to PBBA

C3fp3

C2fp2

C1fp1

C4fp4

C5fp5

fp6, l6fp7, l7

PBBA Returns list of chunks not found

C6fp6

C7fp7

DD BOOST Compresses/Sends New Data

f5

Page 8: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.8

DD BOOST Virtual Synthetics

• Leverages change information known to the application– Identifies regions of unchanged data

• Method of directing PBBA to copy references from previous backups to the current one, data isn’t sent– Significant bandwidth savings– Most effective with large regions

• New data processed using DD BOOST– Additional savings when interleaving

Page 9: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.9

DD BOOST: Virtual Synthetics

f1

f2

f3

f4

f5

fn

…f1 f2 f3 f4 f5 fn

Original full backup writes all data in n files to single backup file.

VS writes copy unchanged files: NO DATA sent

Normal writes send changed or new files; DD BOOST sends ONLY NEW DATA

PBBA

C3fp3

C2fp2

C1fp1

C4fp4

C5fp5

. . . Cmfpm

f1

f2

f4

fn

f3

f5

f1 f2 f3 f4 f5 . . . fn

f1 f2 f3 f4 f5 . . . fn

DD BOOST Write

DD BOOST Write f2

DD BOOST Write f3

VS Write f4 … fn

VS Write f1

CLIENT

Page 10: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.10

DD BOOST Architecture

DDBoost plugin / library

Symantec Netbackup& Backup Exec

OST plugin RMAN plugin

Oracle RMANAvamarNetworker

DDCL

Data Domain ServerSCSI Driver

FC DaemonNFS

Daemon

RPC over TCP RPC over FCClient

Page 11: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.11

EvaluationAnalyzed Customer Telemetry• Adoption Rate• Bandwidth Savings

Performance Benchmarks• DD BOOST and NFS• DD BOOST and DD BOOST Virtual Synthetics

Page 12: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.12

DD BOOST Adoption

Page 13: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.13

DD BOOST Adoption

Page 14: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.14

DD BOOST Bandwidth Savings

No Virtual Synthetics Systems using Virtual Synthetics

Page 15: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.15

Performance Evaluation MethodSetup:• Internal Load Generator – DD BOOST, NFS, Virtual Synthetics• Parallel Stream Processing (1, 8, 16, 32, 64)• Streams divided evenly across four systems• 10 Gb/sec Connections

GenerationsGen 0 – Initial Backup “Generation 0”Low-gen – First Changed Backup, Generation 1High-gen – Generation 41

Page 16: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.16

Performance Evaluation Method – Change Rate• Normal Distribution

– Concentrates updates more frequently in a smaller range of data– Validated against customer datasets– One cluster of added data, one cluster of deleted, one cluster of modified

per 1GB of data written. – 1% added, 1% deleted, 3% modified

• Uniform Distribution– Used in previous studies– Changes are distributed uniformly throughout the file– Useful for performance benchmarking release over release– Ineffective measurement for Virtual Synthetics

Page 17: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.17

Performance: DD BOOST vs NFSBackup Measurements

Nearly the same For Gen0

Page 18: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.18

Performance: DD BOOST vs NFSBackup Measurements

Page 19: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.19

Performance: DD BOOST vs NFSBackup Measurements

Continues to Increase. NFS remains flat.

Page 20: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.20

Performance: DD BOOST vs NFSRestore Measurements

DD BOOST has a limited read size, but applies read-ahead logic

Page 21: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.21

Performance: DD BOOST and Virtual SyntheticsNormal Distribution

Page 22: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.22

Performance: DD BOOST and Virtual SyntheticsNormal Distribution

Greater effective bandwidth, very little protocol overhead

Both show expected performance impact at high-gen

Page 23: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.23

Performance: DD BOOST and Virtual SyntheticsUniform Distribution

Greater overhead if synthesizing smaller more distributed regions

Page 24: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.24

Performance: DD BOOST and Virtual Synthetics

Uniform DistributionNormal Distribution

Page 25: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.25

DD BOOST : Lessons LearnedDeduplication Obstacles

• Compression and Encryption

• In-line Transformations

File Usage

• Smaller files and File Management Operations

Virtualization

• DD BOOST’s limited resource requirements are an ideal fit

Page 26: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.26

Related Work• Low-bandwidth Network File System (LBFS)

– Similar chunking and fingerprinting strategy– Targeted at reducing network traffic for file operations

• Rsync– Reduces network bandwidth when syncing a directory– Embedded deduplication with network transport– Similar approaches: DOT, czip, and Jumbo Store

• Deduplication optimizations and tradeoffs

Page 27: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.27

Conclusion and Ongoing Work• Customer Telemetry shows significant bandwidth savings

• Virtual Synthetics further reduces bandwidth, allowing customers to manage full backups while only writing changes

• Steady increase in adoption

Ongoing Work:

BoostFS – FUSE-based offering to eliminate integration effortFirst Linux version released in Fall 2016

Page 28: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller
Page 29: Experiences with a Distributed Deduplication APIstorageconference.us/2017/Presentations/DistributedDe...• Normal Distribution – Concentrates updates more frequently in a smaller

© Copyright 2017 Dell Inc.29

Additional DD BOOST Functionality• Lightweight load-balance/failover mechanism

• Fastcopy or Clone

• Per File Replication

• Compare, Return Differences

• Snapshot Storage Unit

• Control Token Based Authentication

• Set Extended Attributes

• Retrieve Data Movement Statistics