24
Scale-out Data Deduplication Architecture Gideon Senderov Product Management & Technical Marketing NEC Corporation of America

Scale-out Data Deduplication Architecture

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scale-out Data Deduplication Architecture

Scale-out Data

Deduplication Architecture

Gideon Senderov

Product Management & Technical Marketing

NEC Corporation of America

Page 2: Scale-out Data Deduplication Architecture

Outline

• Data Growth and Retention

• Deduplication Methods

• Legacy Architecture Limitations

• Adaptive Platform for Long-term Data

• Enhanced Scale-out Attributes

– Scalability

– Performance

– Resiliency

Page 2

Page 3: Scale-out Data Deduplication Architecture

Long-term Data

Source: IDC 2010

0

10

20

30

40

50

60

70

80

90

2008 2009 2010 2011 2012 2013 2014

Content Depots & CloudUnstructured dataReplicated dataStructured data

Consumption of Enterprise

Disk Capacity by Type

CAGR

23.6%

54.8%

(EB)

24.2%

75.6%

88%

• According to SNIA, 80% of files are dormant, unchanged.

• According to Contoural, average file shares grow 40% annually.

• Non-mission critical data filling costly Tier 1 storage

Page 3

Page 4: Scale-out Data Deduplication Architecture

Increasing Enterprise Size

Data Outlasts Its Container

Page 4

Months Years Decades Centuries

DATA

MINING

FILE

SHARING

FILE

SHARING

WWW

DATA

MINING

MIGRATE

DATA

MIGRATE

DATA

MIGRATE

DATA

Page 5: Scale-out Data Deduplication Architecture

Data Migration Costs

• In 2011 there will be 1775 Exabytes of information

• Archive data outlasts a storage container, data migrate

costs $1K – $1.8K per TB

• 15% of Storage Admin’s time spent on data migration

Page 5

Source: IDC, Gartner, Contoural Consulting

Page 6: Scale-out Data Deduplication Architecture

Deduplication Granularity

• File (SIS)

– Data deduplication performed at a file or object granularity

• Sub-file (Block)

– Data deduplication performed at a sub-file granularity

• Fixed size chunk

• Variable size chunk

• Sub-file variable size chunk deduplication benefits

– Greater efficiency by deduplicating data at finer granularity

– Automatic adjustments (“slide”) across inserted data

Page 6

Page 7: Scale-out Data Deduplication Architecture

Inline vs. Post-process

• Inline

– Data deduplication performed before writing the deduplicated data

• Post-Process

– Data deduplication performed after the data to be deduplicated has been initially stored

• Inline deduplication benefits

– Eliminate need for disk buffer for un-deduplicated data

– Eliminate need for subsequent “idle time” for processing

– Immediate replication and shorter RTO for DR

Page 7

Page 8: Scale-out Data Deduplication Architecture

Generic vs. Application-aware

• Generic

– Deduplication with same algorithm against all data streams

• Application-aware

– Custom deduplication for specific data streams to account for metadata inserted by the corresponding applications

• Application-aware deduplication benefits

– Greater efficiency by eliminating negative impact of application metadata on deduplication ratio

– Cross-application deduplication for greater scope and higher deduplication efficiency

Page 8

Page 9: Scale-out Data Deduplication Architecture

Local vs. Global

• Local

– Deduplication across a single node or sub-node, requiring separate deduplication repositories for multiple nodes

• Global

– Deduplication spanning multiple nodes with a single shared deduplication repository across all nodes

• Global deduplication benefits

– Greater scalability of a single deduplication repository

– Greater efficiency with a broader scope spanning multiple nodes

Page 9

Page 10: Scale-out Data Deduplication Architecture

Legacy Scalability Limitations

• Inadequate scalability of capacity & performance

– Cannot scale performance to keep up with growth

– Multiple products with different architectures

– More siloed capacity to manage

• Limited deduplication scope

– Limited scalability proliferates duplicate data across appliances

– Lower deduplication ratio for large environments

Page 10

Page 11: Scale-out Data Deduplication Architecture

Local vs. Global Scope

• Single deduplication repository across entire solution

• Data deduplication across ALL data from ALL nodes

– Cross-node dedupe for greater efficiency

– Cross-application dedupe with application-awareness

Page 11

Local – Volume

Dedupe

Repository

Dedupe

Repository

Dedupe

Repository

Local – Node

Dedupe

Repository

Dedupe

Repository

Global

Dedupe

Repository

Page 12: Scale-out Data Deduplication Architecture

Legacy Resiliency Limitations

Page 12

5 6 7 8

2 3 41

Q: What data

can be restored

if block #1

lost?

A: NONE!

14 6 2 1

7 1 4

6

1

3

6

4 6

1

7

6

8

1 4

7

5 1

Day 1: FullDay 2:

Incremental…Day 7: Full

Day 3:

Incremental

1

Traditional RAID is not sufficient for deduped data

Page 13: Scale-out Data Deduplication Architecture

Non-disruptive

RemoveNon-disruptive

Add, Upgrade

V1 Grid + V1 + V2 + Vx = 1 System

Page 13Page 13

Adaptive Scale-out ArchitectureOne system becomes Greener, Faster, Denser

• Concurrently support multiple HW generations

• Non-disruptive resource scaling

• Zero provisioning, dynamic self-management

• Automatically balance system workload

• In-place technology refresh – No migration

Page 14: Scale-out Data Deduplication Architecture

Page 14Page 14

Independent Linear Scalability

• Customized performance and capacity per

application

• Dynamically adjust to changing data growth

Uniquely Scale

Performance and Capacity

to Meet Current and

Future Needs

Capacity

Pe

rfo

rma

nce

Page 15: Scale-out Data Deduplication Architecture

SAFE DEDUPEProtection

SCALABLE DEDUPEPerformance

GLOBAL DEDUPEScope

AttributesCriteria

Page 15

Enhanced Scale-out Attributes

Page 16: Scale-out Data Deduplication Architecture

Scope

• Multiple applications and data sources across a single scalable deduplication repository

• Broader scope leads to greater efficiencies

– Higher dedupe ratios

– Improved capacity utilization

– Easier management

– Longer retention periods

– Lower costs

• Single scalable system vs. multiple disparate silos

– Cross-application deduplication for greater efficiency

– Global deduplication for entire environment

Page 16

Page 17: Scale-out Data Deduplication Architecture

Performance

• Linear performance scalability to keep up with data growth

• Scalability

– Linear scalability vs. high degradation

– Independent performance scalability

– Beyond the physical boundaries of the system

• Inline dedupe vs. post-process

– Keeping up with the workload

• Effects of deduplication rates

– Higher performance with higher dedupe

Page 17

Page 18: Scale-out Data Deduplication Architecture

Scale-out Dedupe Architecture

• Multi-node architecture

– Inline data routing

– Processing and memory scales with capacity

• Distributed hash table

• Linear performance scalability

• Global deduplication across ALL nodes

Page 18

Page 19: Scale-out Data Deduplication Architecture

Linear Performance Scalability

Page 19

Page 20: Scale-out Data Deduplication Architecture

Resiliency

• Enhanced data resiliency, especially with fewer copies due to deduplication

• The greater the dedupe ratio, the greater the exposure

• Maximum protection against multiple failures

– Component failures

– Appliance or node failures

• Upgradeability and technology refresh

– In-place technology refresh vs. forklift upgrade

– Online versus downtime

– Configurable resiliency for different applications

– Investment protection

Page 20

Page 21: Scale-out Data Deduplication Architecture

Erasure-coded Resiliency

• “User dialable” disk/node protection

– Intermix of multiple resiliency levels

– Dynamically allocated protection

• Greater protection with less overhead

– No idle spare drives

• Faster healing while maximizing I/O performance

– Only data is reconstructed

– Reconstruction across multiple spindles/processors

Page 21

Page 22: Scale-out Data Deduplication Architecture

Long-term Evolution

• Functionality

– Application-specific requirements (Security, Resiliency)

– Regulatory compliance requirements

• Performance

– Linear scalability throughput HW/SW enhancements

– Multiple I/O profiles

• Attributes

– Flexibility

– Efficiency

– Granularity

Page 22

Page 23: Scale-out Data Deduplication Architecture

Page 23

Long-term Data Protection

Sto

rag

e S

pe

nd

ing

TimeMonths Years Decades Centuries

Clones/Snapshots

WAN-Optimized Replication

Erasure-Coded Resiliency

Inline Global Deduplication

Automatic Load Balancing

Dynamic Provisioning

Massive ScalabilityFeatures drive

savings

Page 24: Scale-out Data Deduplication Architecture