Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Experiences with a Distributed Deduplication API Fred DouglisAndy HuberDonna LewisRachel Traylor
© Copyright 2017 Dell Inc.2
Background: Traditional Backup Model
Disk-Based Backups, Data Deduplication
Archive to Tape
Daily Full Backup, Periodic Full + Daily Incremental
Networker
Data Domain PBBA
PBBA
NetBackup
Networker
OST
OST
© Copyright 2017 Dell Inc.3
NFS Processing
f1
f2
f4
f3
fn
CLIENT
PBBAC3fp3
C2fp2
C1fp1
C4fp4
C5fp5
6KB 8KB 12KB 4KB4KB
C1fp1
C2fp2
C3fp3
C4fp4
Data from f2 Data from f3
C5fp5
Data from f2 Data from f3
… fnf1 f2 f3 f4 f5 ALL DATAsent to
appliance
Chunks, fingerprints andfilters data
Compresses and stores new data
f5
© Copyright 2017 Dell Inc.4
NFS Processing - Subsequent Full
f1
f2
f4
fn
CLIENT
PBBA
6KB 10KB 12KB 4KB6KB
C1fp1
C6fp6
C7fp7
C4fp4
Data from f2 Data from f3
C5fp5
Data from f2 Data from f3
… fnf1 f2 f3 f4 f5 ALL DATAsent to
appliancef3
C6fp6
C7fp7
Chunks, fingerprints andfilters data
Compresses and stores only new data
f5
C3fp3
C2fp2
C1fp1
C4fp4
C5fp5
© Copyright 2017 Dell Inc.5
DD BOOST Distributed Deduplication
• Offloads Chunk and Fingerprint processing to the client– Reduces resource load on the PBBA significantly
• Filtering occurs between DD BOOST and the Data Domain PBBA
• Only new chunks and fingerprints are sent for storage, saving network bandwidth– Significantly reduces bandwidth
© Copyright 2017 Dell Inc.6
DD BOOST Distributed Deduplication
f1
f2
f4
fn
CLIENT
PBBA
I1=6KB I2=8KB I4=12KB I5=4KBI3=4KB
C1fp1
C2fp2
C3fp3
C4fp4
Data from f2 Data from f3
C5fp5
f3
fp5, l5fp4, l4
fp6, l6fp7, l7
fp1, l1 DD BOOST Sends fingerprints to PBBA
C3fp3
C2fp2
C1fp1
C4fp4
C5fp5
fp1 I1, fp2 I2, fp3 I3, fp4 I4, fp5 I5 PBBA Returns list of chunks not found
DD BOOST Compresses/Sends New Data
f5
© Copyright 2017 Dell Inc.7
DD BOOST Distributed Deduplication
f1
f2
f4
fn
CLIENT
PBBA
I1=6KB I6=10KB I4=12KB I5=4KBI7=6KBC1fp1
C6fp6
C7fp7
C4fp4
Data from f2 Data from f3
C5fp5
f3
fp5, l5fp4, l4
fp6, l6fp7, l7
fp1, l1 DD BOOST Sends fingerprints to PBBA
C3fp3
C2fp2
C1fp1
C4fp4
C5fp5
fp6, l6fp7, l7
PBBA Returns list of chunks not found
C6fp6
C7fp7
DD BOOST Compresses/Sends New Data
f5
© Copyright 2017 Dell Inc.8
DD BOOST Virtual Synthetics
• Leverages change information known to the application– Identifies regions of unchanged data
• Method of directing PBBA to copy references from previous backups to the current one, data isn’t sent– Significant bandwidth savings– Most effective with large regions
• New data processed using DD BOOST– Additional savings when interleaving
© Copyright 2017 Dell Inc.9
DD BOOST: Virtual Synthetics
f1
f2
f3
f4
f5
fn
…f1 f2 f3 f4 f5 fn
Original full backup writes all data in n files to single backup file.
VS writes copy unchanged files: NO DATA sent
Normal writes send changed or new files; DD BOOST sends ONLY NEW DATA
PBBA
C3fp3
C2fp2
C1fp1
C4fp4
C5fp5
. . . Cmfpm
f1
f2
f4
fn
f3
f5
f1 f2 f3 f4 f5 . . . fn
f1 f2 f3 f4 f5 . . . fn
DD BOOST Write
DD BOOST Write f2
DD BOOST Write f3
VS Write f4 … fn
VS Write f1
CLIENT
© Copyright 2017 Dell Inc.10
DD BOOST Architecture
DDBoost plugin / library
Symantec Netbackup& Backup Exec
OST plugin RMAN plugin
Oracle RMANAvamarNetworker
DDCL
Data Domain ServerSCSI Driver
FC DaemonNFS
Daemon
RPC over TCP RPC over FCClient
© Copyright 2017 Dell Inc.11
EvaluationAnalyzed Customer Telemetry• Adoption Rate• Bandwidth Savings
Performance Benchmarks• DD BOOST and NFS• DD BOOST and DD BOOST Virtual Synthetics
© Copyright 2017 Dell Inc.12
DD BOOST Adoption
© Copyright 2017 Dell Inc.13
DD BOOST Adoption
© Copyright 2017 Dell Inc.14
DD BOOST Bandwidth Savings
No Virtual Synthetics Systems using Virtual Synthetics
© Copyright 2017 Dell Inc.15
Performance Evaluation MethodSetup:• Internal Load Generator – DD BOOST, NFS, Virtual Synthetics• Parallel Stream Processing (1, 8, 16, 32, 64)• Streams divided evenly across four systems• 10 Gb/sec Connections
GenerationsGen 0 – Initial Backup “Generation 0”Low-gen – First Changed Backup, Generation 1High-gen – Generation 41
© Copyright 2017 Dell Inc.16
Performance Evaluation Method – Change Rate• Normal Distribution
– Concentrates updates more frequently in a smaller range of data– Validated against customer datasets– One cluster of added data, one cluster of deleted, one cluster of modified
per 1GB of data written. – 1% added, 1% deleted, 3% modified
• Uniform Distribution– Used in previous studies– Changes are distributed uniformly throughout the file– Useful for performance benchmarking release over release– Ineffective measurement for Virtual Synthetics
© Copyright 2017 Dell Inc.17
Performance: DD BOOST vs NFSBackup Measurements
Nearly the same For Gen0
© Copyright 2017 Dell Inc.18
Performance: DD BOOST vs NFSBackup Measurements
© Copyright 2017 Dell Inc.19
Performance: DD BOOST vs NFSBackup Measurements
Continues to Increase. NFS remains flat.
© Copyright 2017 Dell Inc.20
Performance: DD BOOST vs NFSRestore Measurements
DD BOOST has a limited read size, but applies read-ahead logic
© Copyright 2017 Dell Inc.21
Performance: DD BOOST and Virtual SyntheticsNormal Distribution
© Copyright 2017 Dell Inc.22
Performance: DD BOOST and Virtual SyntheticsNormal Distribution
Greater effective bandwidth, very little protocol overhead
Both show expected performance impact at high-gen
© Copyright 2017 Dell Inc.23
Performance: DD BOOST and Virtual SyntheticsUniform Distribution
Greater overhead if synthesizing smaller more distributed regions
© Copyright 2017 Dell Inc.24
Performance: DD BOOST and Virtual Synthetics
Uniform DistributionNormal Distribution
© Copyright 2017 Dell Inc.25
DD BOOST : Lessons LearnedDeduplication Obstacles
• Compression and Encryption
• In-line Transformations
File Usage
• Smaller files and File Management Operations
Virtualization
• DD BOOST’s limited resource requirements are an ideal fit
© Copyright 2017 Dell Inc.26
Related Work• Low-bandwidth Network File System (LBFS)
– Similar chunking and fingerprinting strategy– Targeted at reducing network traffic for file operations
• Rsync– Reduces network bandwidth when syncing a directory– Embedded deduplication with network transport– Similar approaches: DOT, czip, and Jumbo Store
• Deduplication optimizations and tradeoffs
© Copyright 2017 Dell Inc.27
Conclusion and Ongoing Work• Customer Telemetry shows significant bandwidth savings
• Virtual Synthetics further reduces bandwidth, allowing customers to manage full backups while only writing changes
• Steady increase in adoption
Ongoing Work:
BoostFS – FUSE-based offering to eliminate integration effortFirst Linux version released in Fall 2016
© Copyright 2017 Dell Inc.29
Additional DD BOOST Functionality• Lightweight load-balance/failover mechanism
• Fastcopy or Clone
• Per File Replication
• Compare, Return Differences
• Snapshot Storage Unit
• Control Token Based Authentication
• Set Extended Attributes
• Retrieve Data Movement Statistics