45
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Backup and Recovery Fundamentals - 1 © 2007 EMC Corpora tion. All rights reserved. Backup and Recovery Fundamentals Backup and Recovery Fundamentals Welcome to Backup and Recovery Fundamentals The AUDIO por tion of th is cours e is suppl ementa l to the mat erial and is not a repla cement f or the student not es accompa nying this course . EMC recommends downloading the Student Resource Guide from the Supporting Materials tab, and reading the notes in their entirety. Copyright © 2007 EMC Corporation. All rights reserved. These materials may not be copied without EMC's written consent. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLIC ATION IS PROVIDED “AS IS.” EMC C ORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. EMC 2 , EMC, Symmetrix, CLARiiON, Navisphere, PowerPath, SRDF, TimeFinder, VisualSAN, and where information lives are registered trademarks, and Ac cess Logix and SnapView are trademarks of EMC C orporation. All other trademarks used herein are the property of their respective owners.

Backup and Recovery Fundamentalssrg

  • Upload
    loriss

  • View
    67

  • Download
    0

Embed Size (px)

DESCRIPTION

Backup and Recovery Fundamentalssrg

Citation preview

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 1

    2007 EMC Corporation. All rights reserved.

    Backup and Recovery Fundamentals Backup and Recovery Fundamentals

    Welcome to Backup and Recovery FundamentalsThe AUDIO portion of this course is supplemental to the material and is not a replacement for the student notes accompanying this course.EMC recommends downloading the Student Resource Guide from the Supporting Materials tab, and reading the notes in their entirety.

    Copyright 2007 EMC Corporation. All rights reserved.These materials may not be copied without EMC's written consent.EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

    EMC2, EMC, Symmetrix, CLARiiON, Navisphere, PowerPath, SRDF, TimeFinder, VisualSAN, and where information lives are registered trademarks, and Access Logix and SnapView are trademarks of EMC Corporation.

    All other trademarks used herein are the property of their respective owners.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 2

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 2

    Course Objectives

    Upon completion of this course, you will be able to:

    y Describe basic backup procedures and terminologyy Define the different backup typesy Describe generic backup architecture

    The objectives for this course are shown here. Please take a moment to read them.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 3

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 3

    Module 1 Backup Overview

    Upon completion of this module, you will be able to:

    y Describe basic backup procedures and terminologyy Define basic backup types

    The objectives for this module are shown here. Please take a moment to read them.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 4

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 4

    Backup Overview - What is Backup?

    y Backup is an additional copy of data that can be used for restore and recovery purposes

    y Backups are often stored on portable media such as tape

    A backup operation refers to the copying of data for the purpose of having an additional copy of an original source. Date is stored on separate tape media not located on the server. If the original data is damaged or lost, the data may be copied back from that source.

    The backup copy is usually retained over a period of time, depending on the type of data, and the type of backup. There are three primary purposes for backup: disaster recovery, archival, and operational backup. We review them in more detail on the next slide.

    Backed-up data may be on such media as disk or tape, depending largely on the purpose of the backup. For example, backing up to disk may be more efficient than tape in operational backup environments.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 5

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 5

    Three Primary Purposes for Backups

    y Disaster Recovery Restores a computer to an operational state following a disaster

    y Archival Consists of files and records that have been selected for permanent

    or long-term preservation

    y Operational Backup Restores small numbers of files after they have been accidentally

    deleted or corrupted

    Disaster-recovery addresses the requirement to be able to restore all, or a large part of, an IT infrastructure in the event of a major disaster. Some organizations use tape-based backup media for their critical data. This media is stored off-site as part of the disaster recovery plan. Other organizations use remote replication technology to create disaster-recovery sites. These sites often replicate whole data centers, and can be brought online in a relatively short period of time. While replication technologies work very well for disaster-recovery, they share one important (and occasionally undesirable) characteristic. Because they replicate data faithfully from one place to another, any infected or corrupted file is replicated just as faithfully as the good and pure file. So this makes them valuable to disaster-recovery, but not so good for operational backup.

    Archival is a common requirement used to preserve transaction records, email, and other business work products for regulatory compliance. The regulations could be internal, governmental, or perhaps derived from specific industry requirements. Data archived is reference data, not live, operational data.

    Operational backup is typically the collection of data for the eventual purpose of restoring, at some point in the future, data that has become lost or corrupted.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 6

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 6

    Backup/Recovery Statistics

    y Reliance on tape alone for data recovery is no longer a best practice

    y More than 80% of restore requests are made within 48 hours of the data loss

    y 60~70% of storage management effort is devoted to Backup/Restore

    y 15% of a storage administrators time is spent on recovery operations

    y 5~20% of Backup/Restore jobs fail nightlyy B/R cost are approximately $5,935 per TB of disk

    storage per year (META Group, April 1, 2004 )

    This slide shows some statistics relating to backup and recovery from a study conducted by the META Group. These statistics are very important and can help drive the development of a backup solution. It emphasizes the importance of a backup/recovery solution to companies, how complex the solution can be, and can also be used to evaluate metrics such as cost-benefit.

    Today, users can choose from a wide array of backup solutions to meet their requirements. They dont need to rely exclusively on tape-based media as their only option for backup. For example, backup-to-disk offers faster, more predictable backup and recovery, higher service levels and more manageable backup windows.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 7

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 7

    Considerations for the Backup/Restore Process

    y Business needs determine backup requirements: What are the restore requirements RPO & RTO? Where and when will the restores occur? What are the most frequent restore requests? Which data needs to be backed up? How frequently should data be backed up?hourly, daily, weekly, monthly

    How long will it take to backup? How many copies to create? How long to retain backup copies?

    This slide presents a number of important questions that need to be considered before implementing a backup/restore solution. Some examples include: y The Recovery Point Objective y The Recovery Time Objectivey The media type to be used (disk or tape) y Where and when restore operations occur especially if an alternate host will be used to

    receive the restored data y When to perform backupsy The backups granularity Full, incremental or cumulative y How long to keep the backup for example, some backups need to be retained for four

    years, others just for a month y Is it necessary to take multiple copies of the backup?

    The concepts behind many of these questions are discussed in more detail later in this module.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 8

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 8

    Backup to Tape Today

    y Over 70% of all backed up data today goes to tapey Restore from tape is usually a slow processy Disaster recovery may require retrieving tapes stored

    offsite

    y Operational recoveries may require mounting of many tapes just to restore a single file

    Typically, most of backed up data goes to tape (~70%), but this number is going down due to the adoption of backup-to-disk solutions. Tape is a good storage media when you consider such factors as portability and capacity, as well as the ability to take a set of tapes offsite, at a low cost.

    The problem with tapes is clear when you need to restore the data. If you are restoring from a disaster situation, tapes are manageable albeit slow, because you most likely go to your offsite location, retrieve the tapes and start the restore. But thats a very small percentage of all the restore requests youll have in your environment. Most restore requests are operational, and not disaster recovery requests.

    Tapes are a problem for operational backup. You may find that the backup software is trying to mount two, three, sometimes even more tapes, depending on the backup policy, just to restore a single file. This underscores the need for a new backup solution model. Tapes are not the most reliable way to store all types of backups, and may not be the best way to perform restores. Today, the usage of disks to store some types of backup data improves the restore performance and reliability.

    For example, supposing you performed a full backup on Sunday, and incremental backups for the remaining weekdays. This model can prove slow in the case of a full restore. If a full restore is needed, you would need to first restore the full backup and then apply all of the incremental backups (which can take a long time).

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 9

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 9

    What is Backed up?y Operating Environments

    Servers Desktop PCs Laptop PCs

    y Applications ERP - (i.e. SAP, Oracle Apps, Peoplesoft, etc.) CRM - (i.e. Siebel, etc) Databases (Oracle, UDB, MS SQL) Messaging ( Microsoft Exchange, etc)

    y Application data For all of the above

    y Logs and journals Application transaction logs, database journals, file system journals

    Before defining a backup solution, it is important to examine your backup environment, in order to determine the type of data you would need to back up. Different types of data can require different backup strategies. For example, some applications may need to be placed in a specific quiescent state or even to be closed before backup starts. This would guarantee consistency in case of a restore. In some cases special backup agents are used to make this process automatic. If an application must be closed before it can be backed up, that could have an impact on when and how backups are performed.

    In addition to finding out the type of data to back up, it is necessary to decide when and where to perform the backups. Each type of data has different backup requirements, for example; frequency, backup media, and retention.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 10

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 10

    What is Operational Restore?

    y Most restores are at the file and volume level Restore frequency is usually high

    y Full system restores are rarey Most common restores

    Email Files Application data

    A good general rule to follow when planning a backup solution is to evaluate the restore needs. For instance, if a particular user has a high requirement for email restores, it is important to plan a backup on a reliable media that also provides highly granular restore capabilities to ensure rapid data recovery.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 11

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 11

    Backup Granularity and LevelsFull Backup

    Cumulative (Differential)

    Incremental

    Full Cumulative Incremental

    The granularity and backup levels depend on business needs and to some extent on technological limitations. Some backup strategies define up to ten levels of backup. IT organizations use a combination of these to fulfill their requirements. Most use some combination of Full, Cumulative, and Incremental backups.

    Full

    A full backup is exactly what the name implies, a backup of all data on the target volumes, regardless of any changes made to the data itself. Another scheme that is possible is a synthetic or constructed full backup. In a synthetic full backup, information is taken from a full backup and the incremental to create a new full backup. This allows a full backup to be created offline, allowing the network to continue to function without any performance degradation or disruption to network users. Synthetic full backups are used when the backup window is too small for the other options.

    Cumulative (Differential)

    A cumulative backup is a kind of incremental backup that contains changes since the last full backup.

    Incremental

    An incremental contains the changes since the last incremental backup, or the last full, whichever was most recent.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 12

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 12

    Restoring a Cumulative Backup

    y Key Features More files to be backed up, therefore it takes more time to backup and uses

    more storage space Much faster restore because only the last full and the last cumulative

    backup must be applied

    Files 1, 2, 3, 4, 5, 6

    ProductionProduction

    CumulativeCumulative

    Tuesday

    File 4Files 1, 2, 3

    Monday

    Full BackupFull Backup CumulativeCumulative

    Wednesday

    Files 4, 5

    CumulativeCumulative

    Thursday

    Files 4, 5, 6

    In this example, a full backup is taken on Monday. For the remaining weekdays, a cumulative backup is taken. These cumulative backups backup ALL FILES that have changed since the LAST FULL BACKUP.

    On Tuesday, File 4 is added. Since File 4 is a new file that has been added since the last full backup, it is backed up that evening (Tuesday).

    On Wednesday, File 5 is added. Now, since both File 4 and File 5 are files that have been added or changed since the last full backup, both files will be backed up that evening (Wednesday).

    On Thursday, File 6 is added. Again, since File 4, File 5, and File 6 are files that have been added or changed since the last full backup, all three files will be backed up that evening (Thursday).

    On Friday morning, there is a corruption of the data, so the data must be restored. The first step is to restore the full backup from Monday evening. Then, only the backup from Thursday evening is restored because it contains all the new/changed files from Tuesday, Wednesday, and Thursday.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 13

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 13

    Restoring an Incremental Backup

    y Key Features Files that have changed since the last full or incremental backup are backed

    up Fewest amount of files to be backed up, therefore faster backup and less

    storage space Longer restore because last full and all subsequent incremental backups

    must be applied

    Files 1, 2, 3, 4, 5

    ProductionProduction

    IncrementalIncremental

    Tuesday

    File 4

    IncrementalIncremental

    Wednesday

    File 3

    IncrementalIncremental

    Thursday

    File 5Files 1, 2, 3

    Monday

    Full BackupFull Backup

    In this example, a full backup is taken on Monday. For the remaining weekdays, an incremental backup is taken. These incremental backups only backup files that are new or that have changed since the last full or incremental backup.

    On Tuesday, a new file is added, File 4. No other files have been changed. Since File 4 is a new file that has been added after the previous backup on Monday evening, it is backed up that evening (Tuesday).

    On Wednesday, there are no new files added since Tuesday, but File 3 has changed. Since File 3 has changed after the previous evening backup (Tuesday), it will be backed up that evening (Wednesday).

    On Thursday, no files have changed but a new file has been added, File 5. Since File 5 was added after the previous evening backup, it will be backed up that evening (Thursday).

    On Friday morning, there is a data corruption, so the data must be restored. The first step is to restore the full backup from Monday evening. Then, every incremental backup that was done since the last full backup must be applied, which, in this example, means the Tuesday, Wednesday, and Thursday incremental backups.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 14

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 14

    Backup and Restore Conceptsy Backup Metadatay Backup Servery Backup Softwarey Backup Windowy Catalog

    y Expiration Datey Full backupy Hot Backup

    Here are some useful terms when discussing backup technology:

    Backup Metadata: Information about the backup data, such as file names, time of backup, size, permissions, ownership, and most importantly, tracking information to allow locating the data to be restored. The tracking information is stored in the backup catalog.

    Backup Server: The central point of administration and management. It maintains the Backup Metadata.

    Backup Software: Software running on the backup server and backup clients that manages the flow of backup data from backup clients to the backup media. This software also manages the restoration of previously backed up data.

    Backup Window: The period of time that a system is available to perform a backup procedure, traditionally 6-8 hours in the evening or weekends, but could occur at any time. Due to the accelerating rate of data growth, backup windows for many applications are shrinking and, in some cases, nonexistent.

    Catalog: A metadata database maintained by the backup server.

    Expiration Date: The date that the contents of a tape cartridge can be overwritten. (see Retention Period)

    Full backup: A backup that includes all data, usually done weekly.

    Hot Backup: A backup performed while the application (e.g. Exchange, Oracle, SQL, etc.) is still running and providing services to end users. Performance on the application may be somewhat degraded during this operation.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 15

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 15

    Backup and Restore Concepts

    y Recovery Point Objective (RPO)y Recovery Time Objective (RTO) y Restorey Retention Periody Rotation Period

    Some more backup and recovery concepts are listed on this slide.

    Recovery Point Objective (RPO): A point in time in which application data must be recovered in order to resume business transactions.

    Recovery Time Objective (RTO): Maximum allowable time to bring the application back online.

    Restore (Operational): The movement of a file or a group of files from a previous backup back to a primary storage device. The backup copy of this data was created and retained for the sole purpose of recovering deleted, broken, or corrupted data on the primary disk. Usually kept for a short period of time. This backed-up data may be on disk or tape. Depending on a companys policies, some or all of this data may be moved to tape (if already not on tape) for off-site storage to be used for Disaster Recovery.

    Retention Period: The length of time that the backup software prevents the overwriting of a tape. This concept is tied to expiration date (mentioned previously).

    Rotation Period: The length of time that a particular backup set is retained on tape before it is overwritten by a new backup set.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 16

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 16

    Data Storage Methods

    y Online storagey Near-line storagey Offline storagey Off-site vault

    Different storage methods offer different levels of accessibility, security and cost. In most cases, a mix of all four storage methods can be the most effective storage strategy.

    Online storage: Sometimes called secondary storage, online storage is typically the most accessible type of data storage. A good example would be a large disk array. This type of storage is very convenient and speedy, but is relatively expensive and vulnerable to being deleted or overwritten, either by accident, or in the wake of a data-deleting virus payload.

    Near-line storage: Sometimes called tertiary storage, near-line storage is typically less accessible and less expensive than online storage. A good example would be an automatic tape library. Near-line storage is used for archival of rarely accessed information, since it is much slower than secondary storage.

    Offline storage: An example of offline storage is a computer storage system which must be driven by a human operator before a computer can access the information stored on the medium. For example, a media library system which uses off-line storage media, as opposed to near-line storage, where the handling of media is automatic.

    Off-site vault: To protect against a disaster or other site-specific problem, many people choose to send backup media to an off-site vault. The vault can be as simple as the system administrator's home office or as sophisticated as a disaster hardened, temperature controlled, high security bunker that has facilities for backup media storage.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 17

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 17

    Developing the Success Criteria

    y Requires understanding of: Application capacity to address Each applications criticality to the business Recovery point objectives Ties to backup frequency and retention timelines

    Recovery time objectives Ties to service lever requirements

    y Choice of connectivity SAN, LAN, or combination

    17

    In understanding how to develop an effective backup architecture, you need to first look at the total amount of capacity that has to be backed up, and then look at the types of applications involved. With that in mind, you need to make choices as to what needs to be backed up, how often its backed up, and how fast recovery needs to be if required. Finally, the connectivity needs to be determined; whether its SAN based, LAN based, or some combination of the two.

    Talking specifically about backing up to disk, one of the biggest challenges that a user faces is trying to figure out how to size the solution. When a backup-to-disk scenario is implemented, it can change the current backup retention, as well as backup frequency, in order to gain the best value from the solution.

    The following slides cover these ideas in more detail.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 18

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 18

    Application Mix Example

    Uptime RTO RPO Backup WindowTier 1 applications 24x7x365 Seconds Last transaction None

    E-mail 24x7x365 Minutes Full restore Minutes

    Tier 2 applications

    Business hours

    Minutes to hours Minimal loss

    Minutes to hours

    File servers Business hoursMinutes to hours Minimal loss Hours

    Business records and archived data

    Business hours

    Hours to days

    Best effort (unless regulated)

    Days

    Mis

    sion

    Crit

    ical

    ity

    Here is a typical mix of applications. In order to have a successful backup implementation, it is important to understand the operational characteristics of each of the applications in the environment.

    For example, Tier 1 applications may need to be recovered within a matter of seconds, or revenues could be impacted. This is particularly true in businesses that have revenues tied to system uptime. Note that the RTO is measured in seconds, and the RPO goal is the very last transaction. Also, in this particular case, there is no window of time during which backups can occur, so leveraging online backups to a point-in-time copy makes sense.

    E-mail in this situation is similar to Tier 1, with minor differences in the recovery objectives as well as the backup times.

    In considering the other applications, their requirements are a lot less stringent. As a consequence, the backup and recovery strategies employed with a backup scenario will be architected differently than the Tier 1 and e-mail applications.

    Creating this kind of spreadsheet gives clarity to the requirements of a backup implementation.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 19

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 19

    Inventory and Gather Datay Backup content

    How much is backed up? How often is it backed up? How long is it retained for?

    y Clean house! Stale data, duplicate data Non-corporate dataMP3s

    Extinct user data

    y Removing the inactive data Accelerates backups Accelerates restores

    This slide illustrates the use of Storage Resource Management tools that provide a general look at the content currently stored within the enterprise. Of particular interest in the effort to re-architect backup solutions is stale data-- multiple copies of the same data, data that doesnt belong to a corporate backup, and data associated with employees that are no longer employed at the company. This content should be stripped from the current backup process.

    For example, many organizations do not get around to removing old users content from file servers and continue to backup data that hasnt changed in years. The same applies to e-mails and application data.

    It is important to note the financial results of this effort. Taking this into account, it not only shrinks the backup size (enabling faster, more reliable backups and recoveries), but also changes the volume of contents that is being backed up. You can now take advantage of the significant savings in the actual number of cartridges and drives required.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 20

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 20

    EMC Internal Case Study

    EMC IS: E-mail Restore Requests Since Backup

    Cumulative Percent of Restore Requests

    27%

    77%

    92%100% 100% 100%

    0%

    20%

    40%

    60%

    80%

    100%

    120%

    Same Day 12 Days 36 Days 714 Days 1529 Days >30 Days

    Sizing the RequirementBackup Capacity

    This slide addresses an important question: How much backup data is really enough to protect my business? One companys internal IT department evaluated their backup and recovery process and the amount of backup data they were storing on tape. The analysis looked at restore requests, due to inadvertent e-mail deletions by users, over a 12-month period for their large e-mail infrastructure.

    The chart diagrams the cumulative number of restore requests over time, starting with the actual receipt of an e-mail. The diagram indicates that for e-mail data, after 14 days, nearly everyone had already requested that an e-mail be restored. Yet, the internal IT department had a policy to store e-mail backups on tape for over 60 days.

    By eliminating retention of unnecessary backups, significant cost savings were achieved.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 21

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 21

    Module Summary

    Key points covered in this module:

    y Backup basicsy Backup conceptsy Backup types

    Full Incremental Cumulative

    y Backup planning

    These are the key points covered in this module. Please take a moment to review them

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 22

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 22

    Module 2 Backup Architecture

    Upon completion of this module, you will be able to:

    y Describe Generic Backup Architecture Client, Server, Storage Node

    y Identify Backup Topologies Direct Attached Backup, LAN Backup, SAN Backup

    y Discuss Backup Granularity

    The objectives for this module are shown here. Please take a moment to read them.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 23

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 23

    Backup Architecture How It Works

    y Client/Server Relationshipy Server

    Directs Operation Maintains Catalog

    y Client Gathers Data for Backup

    y Storage Node

    Backup products vary, but some share common characteristics. The basic architecture of a backup software system is the client-server relationship, with a backup server and some number of backup clients or agents. The backup server directs the operations and owns the backup catalog (the information about the backup). The catalog contains the table-of-contents for the backup image. It also contains information about the backup session itself.

    The backup server depends on the backup client to gather the data to be backed up. The backup client can be local or it can reside on another system, presumably to backup the data visible to that system.

    There is another component called a storage node. It is known by other names by different vendors (Tivoli-Storage Agent, Veritas-Media Server, CommVault-Media Agent), but storage node is the Storage Networking Industry Association (SNIA ) term. The storage node is the entity responsible for writing the backup image to the backup device. Typically, there is a storage node packaged with the backup server and the backup device is attached directly to the backup servers host platform. Storage nodes play an important role in backup planning as it can be used to consolidate backup servers.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 24

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 24

    Backup Architecture Backup Topologies

    y There are three basic backup topologies: Direct Attached Backup LAN Backup SAN Backup

    This slide reviews the most common backup topologies:y Direct Attached Backup The backup data flows directly from the host to be backed up to

    the tape, without utilizing the LAN. In this model, there is no centralized management and it is difficult to grow the environment.y LAN Backup In this model, the backup data flows from the host to be backed up to the

    tape through the LAN. We have centralized management, but the problem is the LAN utilization as all data goes through the LAN.y SAN Backup The backup data goes through the SAN. The LAN is used only to move

    metadata. In this model, we have good performance for the backup and simplified management, but the added expense of an additional infrastructure.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 25

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 25

    Direct-Attached Backups

    Backups are performed directly from the backup clientsdisk to the backup clients tape devices.

    y Advantages High Speed Tape devices dedicated to the host

    y Disadvantages Impacts the host and application performance Distance restrictions

    Advantages

    The key advantage of direct-attached backups is speed. The tape devices can operate at the speed of the channels. Direct-attached backups optimize backup and restore speed since the tape devices are close to the data source and dedicated to the host.

    Disadvantages

    Direct-attached backups impact the host and application performance since backups consume host I/O bandwidth, memory, and CPU resources. Direct-attached backups potentially have distance restrictions if short-distance connections such as SCSI are used.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 26

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 26

    Data

    Direct-Attached Backups

    Catalog

    Backup Server

    Metadata

    MediaBackupStorage Node

    LAN

    This is an example of a Direct Attached Backup environment. Notice some of the features of this backup:y A tape drive is attached directly to the client.y Only metadata goes to the backup server, relieving pressure on the LAN. This could

    potentially be a management nightmare and the cost could be prohibitive.y A solution is to share the tape units.

    In this example, the client is a Storage Node, which is the entity responsible for writing the backup image to the backup device.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 27

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 27

    LAN-Based Backups

    y The Backup Server is the central control point for all backups

    y The metadata and backup policies reside in the Backup Server

    y Storage Nodes control backup devices and are controlled by the Backup Server

    Advantages

    LAN backups enable an organization to centralize backups and pool tape resources. The centralization and pooling can enable standardization of processes, tools, and backup media. Centralization of tapes can also improve operational efficiency.

    Disadvantages

    The backup process has an impact on production systems, the client network, and the applications. It consumes CPU, I/O bandwidth, LAN bandwidth, and memory. In order to maintain finite backup points, applications might have to be halted and databases shut down.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 28

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 28

    LAN Backup Data Flow

    Backup Server

    LAN

    Metadata

    Storage Node

    Data

    Mail ServerFile ServerDatabase Server

    MetadataData

    This is an example of how LAN-based backups work.

    Lets start with the simplest example of a traditional LAN backup. All systems are LAN-connected and all storage is direct-attached. The tape is locally-attached to the backup server.

    Backup data has to make its way from the backup client (the source) to the backup device (the destination). It should do so with the least possible impact to the production network. There are a number of ways to minimize this impact. These include configuring separate networks for backup, and installing dedicated storage nodes on some application servers. Even when utilizing these types of measures, it is possible for even a high-speed network to be overwhelmed by two cached disk-array connections and two to six tape libraries operating in full streaming mode.

    Also worth considering is that backup data, streaming across the LAN, affects the network performance of all systems connected to the same network segment as the backup server. Environments that back up many logical disks to many tape libraries will be constrained by even the fastest network technologies.

    The critical performance path is the network connection between the backup client and the LAN. This path is critical since it ultimately determines how much data can be backed up or restored within time constraints.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 29

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 29

    SAN Based Backups

    y LAN-free backups use storage area networks (SANs) to move backup data rapidly and reliably. The SAN is usually used in conjunction with backup software that supports tape device sharing

    y Metadata is still moved over the LAN to the backup server

    Backup Metadata contains information about what has been backed up, such as file names, time of backup, size, permissions, ownership, and most importantly, tracking information for rapid location and restore. It also indicates where it has been stored, for example, which tape.

    Data, the contents of files, databases, etc., is the primary information source to be backed up.

    A SAN-enabled backup infrastructure introduces these advantages to the backup process:y Provides Fibre Channel performance, reliability, and distance.y Requires fewer processes and reduced overhead.y Does not use the LAN to move backup data.y Eliminates or reduces dedicated backup servers.y Improves backup and restore performance.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 30

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 30

    LAN

    SAN Backup Data Flow

    Metadata

    Storage Node

    Data

    Mail Server

    DataSAN

    Backup Server

    The SAN is valuable if you want to share a Tape Library Unit (TLU). Attach the TLU and clients to the SAN, and all clients can share a single TLU.

    During backup, the clients read the data from the SAN and write to the SAN-attached tape. The data never leaves the SAN environment. The only thing to fly over the LAN is the metadata, but that pales in comparison to the data volumes.

    The emergence of ATA as a backup medium brings us to the next step in the evolution. You can add a CLARiiON/ATA box to the SAN and have your immediate backup go to disk. Later, the backup server moves the backup data from disk to tape so that the tape can be shipped off-site for disaster recovery and long-term retention.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 31

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 31

    Best Level of Granularityy Total Volume of Datay Volume of Changed DatayWhat Type of Data is Backed Up

    Is Compression an OptionHardware or Software compression

    y Backup Window Staggering Jobs Rush to daylight

    Deciding which level of backup to schedule is not as easy as it may seem. Granularity levels hinge on several considerations. First, what is the aggregate weekly data change rate? If the change rate were close to or greater than 100% (daily change about 20%), it makes little sense to entertain an incremental backup because of the overhead for deciding which files need to be backed up. In that case, a full backup could actually take less time than the incremental, even though less physical data is being backed up.

    If the rate of data change amounts to considerably less than 100% per week, then an alternate model might be more appropriate. A rotation scheme that still includes monthly full backups and daily incrementals, but instead of performing full backups every week, performs cumulative incremental backups. This can, given a modest data change rate, save both time and storage resources.

    When devising a backup strategy, it is critical to understand the nature of the data, and the nature of changes to the data. Some applications use larger files than others. An environment with such applications tends to have a larger data change rate, because even a small change to the data results in the whole file being changed. The larger the average file size, the greater the percentage of the data set. Other applications, like software development, use many smaller files. The rate of change in these environments can be much lower. In such environments, the more mature the data set, the lower the change rate. Another factor to consider is the properties of the files in your backup set. For instance, are they natively compressible or will the negative impact compression has on performance make it less desirable?

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 32

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 32

    Module Summary

    Key points covered in this module:

    y Generic Backup Architecture Direct Attached Backup (LAN Free Backup) LAN Backup SAN Backup

    y Backup Granularity and levels

    These are the key points covered in this module. Please take a moment to review them

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 33

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 33

    Module 3 Backup Terminology & Considerations

    Upon completion of this module, you will be able to:

    y Define RTO and RPOy Define Backup Data and Business Data

    The objectives for this module are shown here. Please take a moment to read them.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 34

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 34

    RPO and RTO

    y RPOs must match as closely as possible to the users needs Finer granularity means lower cost to resume Longer retention periods support requests farther into the past Longer retention means higher storage costs

    y RTOs must be as close to immediate as possible Shorter restore times minimize impact of data loss

    We previously discussed what RPO and RTO were, now lets look at their impact on the backup solution.

    The RPOs must match as closely as possible the users needs, so your backup policy must consider that and have appropriate retention periods and granularity. But, you have to consider a lot of things, such as storage costs. Longer retention means higher storage costs.

    RTOs must be defined to be as close as possible, just to minimize impact of data loss. Of course, every type of data has a different value and it must be assigned and defined by the company policy. For example, restoring data from production databases can be more important than file server data. In this case, the RTO of the database file will be shorter than the file server data.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 35

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 35

    Backup Data Set Properties

    y Backup Data on Tape Used to recover deleted, broken, or corrupted data on disk Backup data is NOT archive data

    y Business Data on Tape Data created and retained as a result of business activity Transactions, records, files, objects, reports, etc. Business data on tape kept for a long period of time is Archive Data

    The definition of the data type to be backed up is one of the most important factors in the development of a backup solution. If the backup administrator has the information regarding the type of data to be backed up, then its possible to define the correct retention period for each type of data, the correct media type, etc. Another thing to analyze is the number of tapes involved in a given backup solution. After doing an inventory of the environment, it may be found that a lot of unnecessary backups are being retained for long periods. In order to minimize this, you can categorize data into two general types:

    Operational Backup Datay Data created and retained for the sole purpose of recovering deleted, broken or corrupted

    data on disk.y Usually kept for a short period of time - Backup data is NOT archive data.

    Archived Business Datay Transaction, records, files, objects, reports, etc. that are created and retained as a result of

    business activity with customers, suppliers, and partners.y Business data on tape kept for a long period of time is Archive data.y Data retained for long periods of time for retention and regulatory purposes.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 36

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 36

    Data Considerations

    y Filesy File sizes and the number of filesy Data compressiony Retention periods and data management

    Many organizations have dozens of heterogeneous platforms that support a complex application. Consider a data warehouse where data from many sources is fed into the warehouse. When this scenario is viewed as The Data Warehouse Application, it easily fits this model. To capacity plan, back up, restore, and recover these complex applications can easily involve hundreds or thousands of files scattered across dozens of heterogeneous systems. These systems may not be in a single physical location. Portions of the application may have differing backup schedules.

    Managing business continuance for such an application is a big challenge for the application owner, but consider that a storage administrator may have to manage hundreds of these complex applications.

    The key issues are:y How the backups for subsets of the data are synchronizedy How these applications are restoredy How these applications are recoveredy File sizes and the number of filesy Data can have a large impact on backup, restore, and recovery performance

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 37

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 37

    Compression

    y Compression rate depends on the type of data Application binaries Text JPEG/ZIP files

    y Some types of data compresses welly Other types of data are already compressed, such as

    JPEG and ZIP files

    Many tape devices have built-in hardware compression technologies. To effectively use these technologies, it is important to understand the characteristics of the data. Some data, such as application binaries, do not compress well. Data such as text can compress very well, while other data like JPEG and ZIP files are already compressed. Files that are already compressed have a tendency to get larger when they are compressed again.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 38

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 38

    Retention Periods

    y Vaultingy Cloning

    Twinning

    y Rotation

    Retention Periods are the length of time that a particular version of a dataset is available to be restored. The rate of change to the data ties in with restore requirements as well. The faster the data changes, the finer the granularity needed to support precise RPOs. By understanding the probability of restore requests over time, along with the nature and criticality of restores, it is possible to determine the optimal strategy for backup granularity and retention periods.

    The Cloning and Vaulting of datasets goes hand-in-hand with retention as this is essentially what turns an operational backup into an archival one. Cloning (aka. twinning, when done during the backup process) is used to produce duplicate sets of tapes, which then can be vaulted either in a secure location onsite or shipped offsite. This shifts their use to a more disaster recovery focus.

    Offsite vaulting generally serves two purposes: first, to provide business continuance capabilities. If the primary location suffers a disaster, then the copies of the data from the offsite vault are used for recovery. Second, to maintain an archive of data requiring extended retention for legal, governmental, and other business requirements.

    One advantage of the cloning approach (compared to rotation) is that restores requiring older versions of the backup do not have to be obtained from the offsite location (unless there is a media failure), reducing the time to get the data back in operation.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 39

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 39

    Retention Periods

    Once a company has decided to vault its data, they need to determine for how long the data needs to be retained. There is no magic number that can be applied to all environments. Key factors that influence the number of backup copies that a location maintains, and how long they are kept, are legal requirements, government requirements, and business requirements.

    Legal Requirements

    A corporate legal counsel may suggest that certain data be kept for specific periods in case this data is needed for legal proceedings. Examples might be engineering records that could be useful in litigation involving protection of intellectual property or protection against liability suits.

    Government Regulations

    Government regulations require that some information be available for a specific number of years. An example is corporate financial data. Some governments require or suggest that this data be kept accessible for a set period. For instance, in the United States, seven-year retention of key financial records is common.

    Business Requirements

    Some businesses may require extended retention of data to maintain the business. One example is the medical industry, such as hospitals, where patient histories are used to aid patient treatment.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 40

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 40

    Backup and Recovery Capacity and PerformanceConsiderationsy Data movementy CPUy Memoryy Paths and I/O bandwidthy Network bandwidth

    In considering performance of a backup/recovery solution there are several points to keep in mind:

    Data Movement

    If the length of the window and the amount of data that must be moved is known, then the required data movement rate can be estimated. Knowing the required rate in gigabytes per hour is useful, but since most devices and transport mechanisms are rated in megabytes per second, both scales should be considered.

    CPU

    Backups can require significant CPU resources. For example, one server vendor suggests a rule of thumb of 5 MHz of processor power for every megabyte per second of data that needs to be moved. A direct-attached backup requires two data movements from the server. A LAN backup requires four data movements, two on the backup client and two on the backup server. Direct-attached backups require 10 MHz per MB on the backup client. LAN backups require 10 MHz per MB on the backup client, and 10 MHz per MB on the backup server. All of these are merit considerations when deciding on a solution.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 41

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 41

    Staging

    yWrites backup to disk cachey Improve the performance of backupsy Shortening the backup windowy The Staging process is driven by:

    As part of an automatic process As part of an event driven process As part of an administrator initiated process

    Staging is a process of transferring data from one storage medium to another. Staging reduces the time it takes to complete a backup by directing the initial backup to a high performance file type device. The data can then be staged to a storage medium, freeing up the disk space. For example, when staging a backup, administrators first copy the target data onto the disk cache and, later, move the backup image to tape according to the established disk staging schedule. Disk staging enables administrators to complete backups faster, shortening the backup window, and thereby affecting business applications less than a direct backup-to-tape method.

    Different backup software vendors implement different features to the staging process. Usually, the staging process is started by one of the following conditions:y An automatic process, such as keeping the save set for 30 days on the staging device before

    staging the data to the next device.y An event-driven process, such as when available space in the staging pool drops below a set

    threshold. When this happens, the oldest save sets are moved first, until available space reaches the upper threshold that has been set.y An administrator-initiated process, such as allowing the administrator to either reset the

    threshold and kick off staging or manually select save sets to stage.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 42

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 42

    NDMP

    y NDMP is an open network protocol that defines common functional interfaces used for these data flows

    y NDMP meets the strategic need to: Centrally manage Control distributed data Minimize network traffic

    y NDMP separates the data path and the control path, so network data can be backed up locally, yet managed from a central location

    Network Data Management Protocol, NDMP, is a protocol pioneered by Intelliguard and Network Appliance that defines a common architecture for the way heterogeneous file servers on a network are backed up.

    The protocol allows the creation of a common agent used by the central backup application to backup different file servers running different platforms and platform versions.

    With NDMP, network congestion is minimized because the data path and control path are separated. Backup can occur locally, from file servers direct to tape drives, while management can occur from a central location.

    NDMP is an open standard protocol promoted and supported by server vendors, backup software vendors, and backup device vendors.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 43

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 43

    Cross-Vendor Terms Chart

    Common Terms

    EMC NetWorker

    Veritas NetBackup

    Veritas BackupExec IBM TSM

    HP Data Protector

    (OmniBack)

    Backup Server Server Master ServerMedia Server / BackupExec

    EngineServer

    Client

    Storage Agent

    File Index

    Media Database

    File Space

    Migration

    Cloning Cloning Duplicate or Inline Copy N/A ReclamationObject Copy

    Session

    Cell Manager

    Backup Client Client Client Workstation / Server Agent Client System

    Storage Node Storage Node Media Server N/A Media Agent

    Client File Index Catalog

    Media Database Volume Database

    Data Set Save Set Backup Image Backup Set Backup Session

    Staging Staging Disk Staging N/A Disk Staging

    Catalog Internal DatabaseBackup Catalog

    The chart shown relates the backup terms used across several vendors. Please take a moment to review them.

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 44

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 44

    Module Summary

    Key points covered in this module:

    y Recovery Time Objective (RTO)y Recovery Point Objective (RPO)y Backup Data and Business Datay Compressiony Retention Periodsy Network Data Management Protocol (NDMP)

    These are the key points covered in this module. Please take a moment to review them

  • Copyright 2007 EMC Corporation. Do not Copy - All Rights Reserved.

    Backup and Recovery Fundamentals - 45

    2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 45

    Course Summary

    Key points covered in this course:

    y Basic Backup procedures and terminologyy Backup typesy Generic Backup Architecturey Backup Granularity and levels

    These are the key points covered in this training. Please take a moment to review them.

    This concludes the training. In order to receive credit for this course, please proceed to the Course Completion slide to update your transcript and access the assessment.

    Backup and Recovery Fundamentals Course ObjectivesModule 1 Backup OverviewBackup Overview - What is Backup?Three Primary Purposes for BackupsBackup/Recovery StatisticsConsiderations for the Backup/Restore Process Backup to Tape TodayWhat is Backed up?What is Operational Restore?Backup Granularity and LevelsRestoring a Cumulative BackupRestoring an Incremental BackupBackup and Restore ConceptsBackup and Restore ConceptsData Storage MethodsDeveloping the Success CriteriaApplication Mix ExampleInventory and Gather DataSizing the RequirementBackup CapacityModule SummaryModule 2 Backup ArchitectureBackup Architecture How It WorksBackup Architecture Backup TopologiesDirect-Attached BackupsDirect-Attached BackupsLAN-Based BackupsLAN Backup Data FlowSAN Based BackupsSAN Backup Data FlowBest Level of GranularityModule SummaryModule 3 Backup Terminology & ConsiderationsRPO and RTOBackup Data Set Properties Data ConsiderationsCompressionRetention PeriodsRetention PeriodsBackup and Recovery Capacity and PerformanceConsiderationsStagingNDMPCross-Vendor Terms ChartModule SummaryCourse Summary