View
264
Download
0
Category
Preview:
Citation preview
8/10/2019 Data DeDuplication for Dummies.pdf
1/43
8/10/2019 Data DeDuplication for Dummies.pdf
2/43
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
3/43
DataDeduplicationFOR
DUMmIES
QUANTUM 2ND SPECIAL EDITION
by Mark R. Coppockand Steve Whitner
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
4/43
Data Deduplication For Dummies, Quantum 2nd Special Edition
Published by
Wiley Publishing, Inc.111 River StreetHoboken, NJ 07030-5774
www.wiley.com
Copyright 2011 by Wiley Publishing, Inc., Indianapolis, Indiana
Published by Wiley Publishing, Inc., Indianapolis, Indiana
No part of this publication may be reproduced, stored in a retrieval system or transmitted in anyform or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without theprior written permission of the Publisher. Requests to the Publisher for permission should beaddressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks:Wiley, the Wiley Publishing logo, For Dummies, the Dummies Man logo, A Referencefor the Rest of Us!, The Dummies Way, Dummies.com, Making Everything Easier, and related tradedress are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in theUnited States and other countries, and may not be used without written permission. Quantum andthe Quantum logo are trademarks of Quantum Corporation. StorNext is a registered trademark ofQuantum Corporation. All other trademarks are the property of their respective owners. WileyPublishing, Inc., is not associated with any product or vendor mentioned in this book.
Figure 3-2 is from an IDC White Paper, sponsored by Quantum, Demonstrating the Business Value ofDeduplication for Data Protection, November 2011.
LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKENO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETE-NESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES,INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE.NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS.THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITU-ATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOTENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PRO-FESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONALPERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLEFOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE ISREFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHERINFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THEINFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT
MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED INTHIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRIT-TEN AND WHEN IT IS READ.
For general information on our other products and services, please contact our BusinessDevelopment Department in the U.S. at 317-572-3205. For details on how to create a custom
For Dummies book for your business or organization, contact info@dummies.biz . Forinformation about licensing theFor Dummies brand for products or services, contactBrandedRights&Licenses@Wiley.com.
ISBN: 978-1-118-03204-6
Manufactured in the United States of America
10 9 8 7 6 5 4 3 2
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
http://www.wiley.com/http://www.wiley.com/go/permissionshttp://www.wiley.com/go/permissionshttp://www.wiley.com/go/permissionshttp://www.wiley.com/8/10/2019 Data DeDuplication for Dummies.pdf
5/43
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
How This Book Is Organized .................................................... 1
Icons Used in This Book ............................................................ 2
Chapter 1: Data Deduplication: Why Less Is More . . . . .3
Duplicate Data: Empty Calories for Storageand Backup Systems .............................................................. 3
Data Deduplication: Putting Your Data on a Diet .................. 4
Why Data Deduplication Matters ............................................. 6
Chapter 2: Data Deduplication in Detail . . . . . . . . . . . . . .7
Making the Most of the Building Blocks of Data .................... 7
Fixed-length blocks versus
variable-length data segments ................................... 8
Effect of change in deduplicated storage pools ......... 10Sharing a Common Data Deduplication Pool ....................... 12
Data Deduplication Architectures ......................................... 13
Chapter 3: The Business Case forData Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Deduplication to the Rescue: Replication
and Disaster Recovery Protection ..................................... 16
Reducing the Overall Cost of Storing Data ........................... 18
Data Deduplication Also Works for Archiving ..................... 20
Looking at the Quantum Data Deduplication Advantage ......20
Chapter 4: Ten Frequently Asked DataDeduplication Questions (And Their Answers) . . . .23
What Does the Term Data Deduplication Really Mean? .....23
How Is Data Deduplication Applied to Replication? ............ 24
What Applications Does Data Deduplication Support? ...... 24
Is There Any Way to Tell How Much ImprovementData Deduplication Will Give Me? ...................................... 24
What Are the Real Benefits of Data Deduplication? ............ 25
What Is Variable-Block-Length Data Deduplication? ........... 25
If the Data Is Divided into Blocks, Is It Safe? ......................... 26
When Does Data Deduplication Occur during Backup? ...... 26
Does Data Deduplication Support Tape? .............................. 27
What Do Data Deduplication Solutions Cost? ...................... 28
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
6/43
Data Deduplication For Dummies, Quantum 2nd Special Editioniv
Appendix: Quantums Data DeduplicationProduct Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
DXi4500 ........................................................................... 31
DXi6500 Family ............................................................... 31
DXi6700 ........................................................................... 31
DXi8500 ........................................................................... 32
iv
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
7/43
Publishers AcknowledgmentsWere proud of this book and of the people who worked on it. For details on how to
create a customFor Dummies book for your business or organization, contactinfo@dummies.biz. For details on licensing theFor Dummies brand for products or services,contact BrandedRights&Licenses@Wiley.com.
Some of the people who helped bring this book to market include the following:
Acquisitions, Editorial, and Media
Development
Project Editor: Linda Morris
Editorial Managers:Jodi Jensen,Rev Mengle
Acquisitions Editor: Kyle Looper
Business Development Representative:Karen Hattan
Custom Publishing Project Specialist:Michael Sullivan
Composition Services
Project Coordinator: Kristie Rees
Layout and Graphics: Lavonne Roberts,Laura Westhuis
Proofreaders: Jessica Kramer,Lindsay Littrell
Publishing and Editorial for Technology Dummies
Richard Swadley,Vice President and Executive Group Publisher
Andy Cummings,Vice President and Publisher
Mary Bednarek,Executive Director, Acquisitions
Mary C. Corder,Editorial Director
Publishing and Editorial for Consumer Dummies
Diane Graves Steele, Vice President and Publisher, Consumer Dummies
Ensley Eikenburg,Associate Publisher, Travel
Composition Services
Debbie Stailey,Director of Composition Services
Business Development
Lisa Coleman, Director, New Market and Brand Development
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
8/43
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
9/43
Introduction
Right now, duplicate data is stealing time and moneyfrom your organization. It could be a presentation sit-ting in hundreds of users network folders or a group e-mail
sitting in thousands of inboxes. This redundant data makesboth storage and your backup process more costly, moretime-consuming, and less efficient. Data deduplication, usedon Quantums DXi-Series disk backup and replication appli-ances, dramatically reduces this redundant data and the costsassociated with it.
Data Deduplication For Dummies,Quantum 2nd SpecialEdition, discusses the methods and rationale for reducing the
amount of duplicate data maintained by your organization.This book is intended to provide you with the information youneed to understand how data deduplication can make a mean-ingful impact on your organizations data management.
How This Book Is OrganizedThis book is arranged to guide you from the basics of data
deduplication, through its details, and then to the businesscase for data deduplication.
Chapter 1: Data Deduplication: Why Less Is More:Provides an overview of data deduplication, includingwhy its needed, the basics of how it works, and why itmatters to your organization.
Chapter 2: Data Deduplication in Detail:Gives a relatively
technical description of how data deduplication functions,how it can be optimized, its various architectures, andwhat happens when it gets applied to replication.
Chapter 3: The Business Case for Data Deduplication:Provides an overview of the business costs of duplicatedata, how data deduplication can be effectively appliedto your current data management process, and how itcan aid in backup and recovery.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
10/43
8/10/2019 Data DeDuplication for Dummies.pdf
11/43
Chapter 1
Data Deduplication:Why Less Is More
In This Chapter Understanding where duplicate data comes from
Identifying duplicate data
Using data deduplication to reduce storage needs
Figuring out why data deduplication is needed
Maybe youve heard the clich Information is the life-blood of an organization. But many clichs have truthbehind them, and this is one such case. The organization thatbest manages its information is likely the most competitive.
Of course, the data that makes up an organizations informa-tion must also be well-managed and protected. As the amount
and types of data an organization must manage increase expo-nentially, this task becomes harder and harder. Complicatingmatters is the simple fact that so much data is redundant.
To operate most effectively, every organization needs toreduce its duplicate data, increase the efficiency of its storageand backup systems, and reduce the overall cost of storage.Data deduplication is a powerful technology for doing just that.
Duplicate Data: Empty Caloriesfor Storage and Backup Systems
Allowing duplicate data in your storage and backup systemsis like eating whipped cream straight out of the bowl: You get
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
12/43
Data Deduplication For Dummies, Quantum 2nd Special Edition4
plenty of calories, but no nutrition. Take it to an extreme, andyou end up overweight and undernourished. In the IT world,
that means buying lots more storage than you really need.
The tricky part is that its not really the IT team that controlshow much duplicate data you have. All of your users andsystems generate duplicate data, and the larger your organiza-tion and the more careful you are about backup, the biggerthe impact is.
For example, say that a sales manager sends out a 10MB pre-
sentation via e-mail to 500 salespeople and each person storesthe file. The presentation now takes up 5GB of your storagespace. Okay, you can live with that, but look at the impact onyour backup!
Because yours is a prudent organization, each users networkshare is backed up nightly. So day after day, week after week,you are adding 5GB of data each day to your backup, and mostof the data in those files consists of the same blocks repeated
over and over and over again. Multiply this by untold numbersof other sources of duplicate data, and the impact on your stor-age and backup systems becomes clear. Your storage needsskyrocket, and your backup costs explode.
Data Deduplication: Putting
Your Data on a DietIf you want to lose weight, you either reduce your calories orincrease your exercise. The same is sort of true for your data,except you cant make your storage and backup systems runlaps to slim down.
Instead, you need a way to identify duplicate data and theneliminate it.Data deduplicationtechnology provides just such
a solution. Systems like Quantums DXi products that useblock-based deduplication start by segmenting a dataset intovariable-length blocks and then check for duplicates. Whenthey find a block theyve seen before, instead of storing itagain, they store a pointer to the original. Reading the file issimple the sequence of pointers makes sure all the blocksare accessed in the right order.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
13/43
Chapter 1: Data Deduplication: Why Less Is More 5
Compared to other storage reduction methods that look forrepeated whole files (single-instance storage is an example),
data deduplication provides much more granularity. Thatmeans that in most cases, it dramatically reduces the amountof storage space needed.
As an example, consider the sales deck that everybody saved.Imagine that everybody put their name on the title page. Asingle-instance system would identify all the files as uniqueand save all of them. A system with data deduplication, how-ever, can tell the difference between unique and duplicate
blocks inside files and between files, and its designed to saveonly one copy of the redundant data segments. That meansthat you use much less storage.
Data deduplication isnt a stand-alone technology it canwork with single-instance storage and conventional compres-sion. That means data deduplication can be integrated intoexisting storage and backup systems to decrease storagerequirements without making drastic changes to an
organizations infrastructure.
A brief history of data reductionOne of the earliest approaches todata reduction was data compres-sion, which searches for repeated
strings within a single file. Differenttypes of compression technologiesexist for different types of files, butall share a common limitation: Eachreduces duplicate data only withinspecific parts of individual files.
Next came single-instance storage,which reduces storage needs byrecognizing when files are repeated.Single-instance storage is used inbackup systems, for example, wherea full backup is made first, and then
incremental backups are made ofonly changed and new files. Theeffectiveness of single-instance
storage is limited because it savesmultiple copies of files that may haveonly minor differences.
Data deduplication is the newesttechnique for reducing data.Because it recognizes differences ata variable-length block basis withinfiles and betweenfiles, data dedu-plication is the most efficient datareduction technique yet developedand allows for the highest savings instorage costs.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
14/43
Data Deduplication For Dummies, Quantum 2nd Special Edition6
Data deduplication utilizes proven technology. Most data isalready stored in non-contiguous blocks, even on a single-disk
system, with pointers to where each files blocks reside. InWindows systems, theFile Allocation Table (FAT) maps thepointers. Each time a file is accessed, the FAT is referenced toread blocks in the right sequence. Data deduplication refer-ences identical blocks of data with multiple pointers, but ituses the same basic principles for reading multi-block filesthat you are using today.
Why Data Deduplication MattersIncreasing the data you can put on a given disk makes sensefor an IT organization for lots of reasons. The obvious one isthat it reduces direct costs. Although disk costs have droppeddramatically over the last decade, the increase in the amountof data being stored has more than eaten up the savings.
Just as important, however, is that data deduplication also re-duces network bandwidth needs for transmitting data whenyou store less data, you have to move less data, too. That opensup new protection and disaster recovery capabilities replica-tion of backup data, for example which make management ofdata much easier.
Finally, there are major impacts on indirect costs theamount of space required for storage, cooling requirements,and power use. Management time is also reduced oftendramatically. Quantum DXi customers in a recent surveyaveraged a 63 percent reduction in the amount of timethey had to spend managing their backups.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
15/43
8/10/2019 Data DeDuplication for Dummies.pdf
16/43
8/10/2019 Data DeDuplication for Dummies.pdf
17/43
Chapter 2: Data Deduplication in Detail 9
dividing a data stream into fixed-length blocks, then chang-ing any single block means that all the downstream blocks
will look different the next time the data set is transmitted.Bottom line, you wont find very many common segments.
So instead of fixed blocks, Quantums deduplication technol-ogy divides the data stream into variable-length data seg-ments using a system that can find the same block boundariesin different locations and contexts. This block-creation pro-cess lets the boundaries float within the data stream so thatchanges in one part of the dataset have little or no impact on
the blocks in other parts of the dataset. Duplicate data seg-ments can then be found globally at different locations insidea file, inside different files, inside files created by differentapplications, and inside files created at different times.Figure 2-1 shows fixed-block data deduplication.
A B C D
E F G H
Figure 2-1:Fixed-length block data in data deduplication.
The upper line shows the original blocks the lowershows the blocks after making a single change to Block A(an insertion). The shaded sequence is identical in bothlines, but all of the blocks have changed and no duplicationis detected there are eight unique blocks.
Data deduplication utilizes variable-length blocks. In Figure 2-2,Block A changes when the new data is added (it is now E), butnone of the other blocks are affected. Blocks B, C, and D are allidentical to the same blocks in the first line. In all, we have onlyfive unique blocks.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
18/43
Data Deduplication For Dummies, Quantum 2nd Special Edition10
E B C D
A B C D
Figure 2-2:Variable-length block data in data deduplication.
Effect of change in deduplicatedstorage poolsWhen a dataset is processed for the first time by a data de-duplication system, the number of duplicate data segmentsvaries depending on the nature of the data (both file typeand content). The gain can range from negligible to 50% ormore in storage efficiency.
But when multiple similar datasets like a sequence ofbackup images from the same volume are written to acommon deduplication pool, the benefit is very significantbecause each new write only increases the size of the totalpool by the number of new data segments. In typical businessdata sets, its common to see block-level differences betweentwo backups of only 1% or 2%, although higher change ratesare also frequently seen.
The number of new data segments in each new backupdepends a little on the data type, but mostly on the rate ofchange between backups. And total storage requirement alsodepends to a very great extent on your retention policies the number of backup jobs and the length of time they areheld on disk. The relationship between the amount of datasent to the deduplication system and the disk capacity actu-ally used to store it is referred to as the deduplicationratio.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
19/43
Chapter 2: Data Deduplication in Detail 11
Figure 2-3 shows the formula used to derive the data dedupli-cation ratio, and Figure 2-4 shows the ratio for four different
backup datasets with different change rates (compressionalso figures in, so the figure also shows different compressioneffects). These charts assume full backups, but deduplicationalso works when incremental backups are included. As it turnsout, though, the total amount of data stored in the deduplica-tion appliance may well be the same for either method becausethe storage pool only stores new blocks under either system.The deduplication ratio differs, though, because the amount ofdata sent to the system is much greater in a daily full model.
So the storage advantage is greater for full backups even if theamount of data stored is the same.
Data deduplication ratio =Total data before reduction
Total data after reduction
Figure 2-3:Deduplication ratio formula.
It makes sense that data deduplication has the most powerfuleffect when it is used for backup data sets with low or modestchange rates, but even for data sets with high rates of change,the advantage can be significant.
To help you select the right deduplication appliance, Quantumuses a sizing calculator that models the growth of backup data-sets based on the amount of data to be protected, the backupmethodology, type of data, overall compressibility, rates of
growth and change, and the length of time the data is to beretained. The sizing calculator helps you understand wheredata deduplication has the most advantage and where moreconventional disk or tape backup systems provide moreappropriate functionality.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
20/43
8/10/2019 Data DeDuplication for Dummies.pdf
21/43
8/10/2019 Data DeDuplication for Dummies.pdf
22/43
8/10/2019 Data DeDuplication for Dummies.pdf
23/43
Chapter 3
The Business Case forData Deduplication
In This Chapter Looking at the business value of deduplication
Finding out why applying the technology to replication anddisaster recovery is key
Identifying the cost of storing duplicate data
Looking at the Quantum data deduplication advantage
As with all IT investments, data deduplication must makebusiness sense to merit adoption. At one level, the valueis pretty easy to establish. Adding disk to your backup strategycan provide faster backup and restore performance, as well asgive you RAID levels of fault tolerance. But with conventionalstorage technology, the amount of disk people need for backup
just costs too much. Data deduplication solves that problemfor many users by letting them reduce the amount of disk theyneed to hold their backup data by 90 percent or more, whichtranslates into immediate savings.
Conventional disk backup has a second limitation that someusers think is even more important disaster recovery (DR)protection. Can data deduplication help there? Absolutely!The key is using the technology to power remote replication,
and the outcome provides another compelling set ofbusiness advantages.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
24/43
8/10/2019 Data DeDuplication for Dummies.pdf
25/43
Chapter 3: The Business Case for Data Deduplication 17
What about tape? Do you still need it? Disk-based deduplica-tion and replication can reduce the amount of tape you use,
but most IT departments combine the technologies, using tapefor longer-term retention. This approach makes sense for mostusers. If you want to keep data for six months or three years orseven years, tape provides the right economics and portability,and the new encryption capabilities that tape drives offer nowmake securing the data that goes off site on tape easy.
The best solution providers will help you get the right balance,and at least one of them Quantum lets you manage the
disk and tape systems from a single management console, and itsupports all your backup systems with the same service team.
The asynchronous replication method employed by Quantumin its DXi-Series disk backup and replication solutions can giveusers extra bandwidth leverage. Before any blocks are replicatedto a target, the source system sends a list of blocks it wants toreplicate. The target checks this list of candidate blocks againstthe blocks it already has, and then it tells the source what it
needs to send. So if the same blocks exist in two different offices,they have to be replicated to the target only one time.
Figure 3-1 shows how the deduplication process works onreplication over a WAN.
C e
Target
WAN
Step 2:Only the missing datablocks are replicatedand moved over the WAN.
Step 1:Source sends a list of elements toreplicate to the target. Targetreturns list of blocks not already
stored there.
A B C D A B D
C
A,B,C,D?
Sourceour e
Source
Figure 3-1:Verifying data segments prior to transmission.
Because many organizations use public data exchanges tosupply WAN services between distributed sites, and becausedata transmitted between sites can take multiple paths fromsource to target, deduplication appliances should offer encryp-tion capabilities to ensure the security of data transmissions.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
26/43
Data Deduplication For Dummies, Quantum 2nd Special Edition18
In the case of DXi-Series appliances, all replicated data bothmetadata and actual blocks of data can be encrypted at the
source using SHA-AES 256-bit encryption and decrypted at thetarget appliance.
Reducing the OverallCost of Storing Data
Storing redundant backup data brings with it a number ofcosts, from hard costs such as storage hardware to opera-tional costs such as the labor to manage removable backupmedia and off-site storage and retrieval fees. Data deduplica-tion offers a number of opportunities for organizations toimprove the effectiveness of their backup and to reduceoverall data protection costs.
These include the opportunity to reduce hardware acquisi-
tion costs, but even more important for many IT organizationsis the combination of all the costs that go into backup. Theyinclude ongoing service costs, costs of removable media,the time spent managing backup at different locations, andthe potential lost opportunity or liability costs if critical databecomes unavailable.
The situation is also made more complex by the fact that in thebackup world, there are several kinds of technology and different
situations often call for different combinations of them. If data ischanging rapidly, for example, or only needs to be retained for afew days, the best option may be conventional disk backup. If itneeds to be retained for longer periods six months, a year, ormore traditional tape-based systems may make more sense.For many organizations, the need is likely to be different fordifferent kinds of data.
The savings from combining disk-based backup, deduplica-
tion, replication, and tape in an optimal way can providevery significant savings when users look at their total data-protection costs. A white paper published in November 2011by industry group IDC titled Demonstrating the BusinessValue of Deduplication for Data Protection, and sponsored by
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
27/43
Chapter 3: The Business Case for Data Deduplication 19
Quantum studied organizations that had deployed QuantumDXi deduplication systems. The findings? The study found
that over three years the companies saved $4.75 for $1 dollarinvested. The systems paid for themselves in an average timeof 7 months. Where were the savings? In reduced media usage,lower power and cooling, savings on license and service costs,and in increased productivity. The key was data deduplication,replication, and combining it with traditional tape in an optimalway. (See Figure 3-2.)
Average Annual Benets (per 100 users)
Storage Environment Cost Savings
IT Staff Productivity Optimization
End User Productivity Enhancement
($/Year/100Users)
Source:IDCWhitePaper
50,000
45,000
40,000
35,000
30,000
25,000
20,000
15,000
10,000
5,000
0
$47,316
$22,670
$15,515
$9,131
Figure 3-2:A recent IDC study found significant savings from combining
disk-based backup, deduplication, replication, and tape.
The key to finding the best answer is looking clearly at all thealternatives and finding the best way to combine them. A sup-plier like Quantum that can provide and support all the differ-ent options is likely to give users a wider range of solutionsthan a company that offers only one kind of technology, and
such suppliers have teams of people that can help IT depart-ments look at the alternatives in an objective way.
You can get an idea of the kinds of savings that deduplica-tion can provide for your organization by using an on-line ROIestimating tool developed by IDC, available at www.quantum.com.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
28/43
8/10/2019 Data DeDuplication for Dummies.pdf
29/43
Chapter 3: The Business Case for Data Deduplication 21
Quantum deduplication products cover a broad range of sizes,from compact units for small businesses and remote offices, to
midrange appliances, to enterprise systems that can hold6.4 petabytes of backup data. All systems include deduplicationand replication functionality in their base price, and the largersystems include software for creating tapes directly and soft-ware that provides the option of hybrid-mode operation.
The DXi-Series works with all leading backup software, includ-ing Symantecs OpenStorage API, to provide end-to-end sup-port that spans multiple sites and integrates with tape backup
systems to make integrating deduplication technology intoexisting backup architecture easy for users. DXi-Series appli-ances are part of a comprehensive set of backup solutionsfrom Quantum, the leading global specialist in backup, recov-ery, and archive. Whether the solution is disk with deduplica-tion and replication, conventional disk, tape, or a combinationof technologies, Quantum offers advanced technology, provenproducts, centralized management, and expert professionalservices offerings for all your backup and archive systems.
The results that Quantum DXi customers report show the kindof direct business benefits that adding deduplication technol-ogy can have on IT departments. The same IDC report men-tioned earlier in this chapter found that:
Backups on average were more than twice as fast asbefore (52 percent reduction in time required).
Failed backup jobs were reduced by 91 percent.
Time to restore files was reduced by 95 percent
Overall sys admin time for backup was reduced by 61percent.
And the productivity gains were not limited to IT person-nel. The companies in the study, on average, realizeda gain of nearly 30 hours per year for each end userbecause backups and restores were faster, and negative
impact on server operations from backup were reduced.
Overall, systems paid for themselves in an average of 7 monthsthrough a combination of increased productivity and reduceddirect costs, including savings in the purchase, transport, stor-age and recall of removable media.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
30/43
Data Deduplication For Dummies, Quantum 2nd Special Edition22
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
31/43
8/10/2019 Data DeDuplication for Dummies.pdf
32/43
Data Deduplication For Dummies, Quantum 2nd Special Edition24
instead of storing the block again. Because the pointer takesup less space than the block, you save space. In backup,
where the same blocks show up again and again, userstypically reduce disk needs by 90 percent or more.
How Is Data DeduplicationApplied to Replication?
Replication is the process of sending duplicate data from asource to a target. Typically, a relatively high performancenetwork is required to replicate large amounts of backup data.But with deduplication, the source system the one sendingdata looks for duplicate blocks in the replication stream.Blocks already transmitted to the target system dont needto be transmitted again. The system simply sends a pointer,which is much smaller than the block of data and requiresmuch less bandwidth.
What Applications Does DataDeduplication Support?
When used for backup, data deduplication supports allapplications and all qualified backup packages. Certain filetypes some rich media files, for example dont see much
advantage the first time they are sent through deduplicationbecause the applications that wrote the files already elimi-nated redundancy. But if those files are backed up multipletimes or backed up after small changes are made, deduplica-tion can create very powerful capacity advantages.
Is There Any Way to Tell HowMuch Improvement DataDeduplication Will Give Me?
Four primary variables affect how much improvement you willrealize from data deduplication:
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
33/43
Chapter 4: Ten Frequently Asked Data Deduplication Questions 25
How much your data changes (that is, how many newblocks get introduced)
How well your data compresses using conventionalcompression techniques
How your backup methodology is designed (that is,full versus incremental or differential)
How long you plan to retain the backup data
Quantum offers sizing calculators to estimate the effect thatdata deduplication will have on your business. Pre-salessystems engineers can walk you through the process andshow you what kind of benefit you will see.
What Are the Real Benefitsof Data Deduplication?
There are two main benefits of data deduplication. First, datadeduplication technology lets you keep more backup data ondisk than with any conventional disk backup system, whichmeans that you can restore more data faster. Second, it makesit practical to use standard WANs and replication for disasterrecovery (DR) protection, which means that users can pro-vide DR protection while reducing the amount of removablemedia (thats tape) handling that they do.
What Is Variable-Block-LengthData Deduplication?
Its easiest to think of the alternative to variable-length, whichis fixed-length. If you divided a stream of data into fixed-lengthsegments, every time something changed at one point, all
the blocks downstream would also change. The system ofvariable-length blocks that Quantum uses allows some of thesegments to stretch or shrink, while leaving downstream blocksunchanged. This increases the ability of the system to findduplicate data segments, so it saves significantly more space.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
34/43
Data Deduplication For Dummies, Quantum 2nd Special Edition26
If the Data Is Divided intoBlocks, Is It Safe?
The technology for using pointers to reference a sequence ofdata segments has been standard in the industry for decades:You use it every day, and it is safe. Whenever a large file iswritten to disk, it is stored in blocks on different disk sectorsin an order determined by space availability. When you reada file, you are really reading pointers in the files metadata
that reference the various sectors in the right order. Block-based data deduplication applies a similar kind of technology,but it allows a single block to be referenced by multiple setsof metadata.
When Does Data Deduplication
Occur during Backup?There are really three choices.You can send all your backup data to a backup target andperform deduplication there (usually called target-baseddeduplication), you can perform the deduplication on eachprotected host, or you can use a central media server tocarry out the deduplication. All three systems are available
and have advantages.
If deduplication is carried out in the backup application onthe media server, you dont have to buy a special-purposetarget deduplication device, but support is limited to oneapplication and all the overhead of the deduplication is addedto the servers other duties and deduplication systemsthat provide good reduction require significant processing.So users deploying server-based deduplication report slower
backup, limited scalability, and requirements to upgradetheir disk storage and buy more, heavier-duty servers.
If you use a target deduplication appliance, you send all thedata to the device and deduplicate it there. You have to buy
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
35/43
8/10/2019 Data DeDuplication for Dummies.pdf
36/43
Data Deduplication For Dummies, Quantum 2nd Special Edition28
What Do Data DeduplicationSolutions Cost?
Costs can vary a lot, but seeing list prices in the range of 30to 75 cents per GB of stored, deduplicated data is common. Agood rule-of-thumb rate for deduplication is 20:1 meaningthat you can store 20 times more data than conventional disk.Using that figure, systems that could retain 44TB of backupdata would have a list price of $12,500 or 28 cents a GB. So
even at the manufacturers suggested list and discounts arenormally available deduplication appliance costs are a lotlower than if you protected the same data using conventionaldisk. Even more important, a recent IDC study (a summary ofwhich is available from www.quantum.com) concluded thatcompanies saved $4.75 for every $1 invested over a three-year deployment, and that the deduplication systems paid forthemselves in savings in an average of 7 months.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
37/43
Appendix
Quantums DataDeduplication Product Line
In This Appendix Reviewing the Quantum DXi-Series disk backup and remote
replication appliances
Identifying the features and benefits of the DXi-Series
Quantum Corp. is the leading global storage companyspecializing in backup, recovery, and archive. Combining
focused expertise, customer-driven innovation, and platformindependence, Quantum provides a comprehensive range ofdisk, tape, and software solutions supported by a world-classsales and service organization. As a long-standing and trustedpartner, the company works closely with a broad network ofresellers, original equipment manufacturers (OEMs), and othersuppliers to meet customers evolving data protection needs.
Quantums DXi-Series disk backup appliances leverage pat-ented data deduplication technology to reduce the diskneeded for backup by 90 percent or more, make remotereplication a practical and cost-effective DR technique, andreduce network bandwidth needs by distributing data reduc-tion between servers and appliances. Figure A-1 shows howDXi-Series replication uses existing WANs for DR protection,
linking backup data across sites and reducing or eliminatingmedia handling.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
38/43
Data Deduplication For Dummies, Quantum 2nd Special Edition30
DXi8500located atcentraldata center
Quantums Replication TechnologyUsers replicate data over existing WANs to provide automated DRprotection and centralized media management. Quantum replicationfeatures cross-site deduplication prior to data transmission foradditional bandwidth savings.
Remote office ADXi4000
DXi6700
Remote office B
Remote office C
Scalar i500tape library
DXi4000
Figure A-1:DXi-Series replication.
The DXi Series spans the widest range of backup capacitypoints in the industry. Some of the features and benefits ofQuantums DXi Series include:
Patented data deduplication technology that reducesdisk requirements by 90 percent or more
A broad solution set of turnkey appliances for small andmedium business, distributed and midrange sites, andscalable systems for the enterprise
High backup performance for each class of appliances,providing optimal protection, even when there are tightbackup windows
Software (DXi Accent) that distributes deduplicationbetween backup servers and appliances to increasebackup speeds in bandwidth-constrained environmentsand enable remote backup
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
39/43
Appendix: Quantums Data Deduplication Product Line 31
Software licenses that are included in the base price tomaximize value, streamline deployment, and give users
leading price-performance across the entire product line
Quantums data deduplication also dramatically reduces thebandwidth needed to replicate backup data between sites for automated disaster recovery protection.
All models share a common software layer, including dedu-plication and remote replication, allowing IT departments toconnect all their sites in a comprehensive data protectionstrategy that boosts backup performance, reduces or elimi-nates media handling, and centralizes disaster recovery oper-ations. Support includes Symantec OpenStorage API (OST) forboth disk and tape on DXi4000, DXi6700 and DXi8500 models.
The following sections offer more details about the individualDXi systems.
DXi4000 SeriesThe DXi4000 backup appliances provide an affordable, easyalternative with the industrys first capacity-on-demand dedu-plication. With up to twice the performance of competitorsand as little as half the cost, DXi4000 deduplication applianceskeep backup and restore performance high while deliveringindustry-leading value for fast return on investment. Designedfor small to medium businesses or branch offices, DXi4000appliances support all leading backup software, includingthose designed specifically for virtual servers.
DXi6700 SeriesThe DXi6700 Series provides deduplication without compro-mise, combining the broadest scalability and highest perfor-mance with leading value and unique extensibility supportingthe broadest range of IT environments. The DXi6700 models
provide maximum flexibility and value for maximum invest-ment protection in evolving backup environments, provid-ing simultaneous NAS, VTL and OST interfaces. Finally, theDXi6700 Series has integrated support for vmPRO software,providing faster, easier protection of virtual servers and opti-mized deduplication rates.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
40/43
Data Deduplication For Dummies, Quantum 2nd Special Edition32
DXi8500 SeriesThe Enterprise-class DXi8500 appliances support high perfor-mance backup and anchor a multi-site, multi-tier data protec-tion strategy. Replication, VTL, OST, and direct tape creationare included in the DXi8500s base price, and it offers full sup-port for vmPRO software for faster, easier protection of vir-tual servers and optimized deduplication rates. The DXi8500sdirect path-to-tape feature gives users a tool for integratingthe creation of removable media into the disk backup processunder full control of the backup application while reducingloads on backup servers. The DXi8500 provides faster back-ups, streamlined restores, automated DR protection, and inte-grated tape creation to simplify backup and reduce costs.
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
41/43
Notes
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
42/43
Notes
These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.
8/10/2019 Data DeDuplication for Dummies.pdf
43/43
Recommended