Data Recovery - Best Practices

Embed Size (px)

Citation preview

  • 8/6/2019 Data Recovery - Best Practices

    1/12

    A White Paper by Stephen Wynkoop

    Microsoft SQL Server MVP

    Founder The SQL Server Worldwide Users Group

    Data Recovery Best PracticesBuilding a responsible backup and recovery system for your databases

  • 8/6/2019 Data Recovery - Best Practices

    2/12

    Data Recovery Best Practices White Paper

    Table of Contents

    Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

    Why Backup is Necessary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

    Full Database Recovery and Restore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2

    Point-in-Time Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2

    Specific Transaction Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2

    Disaster Planning and Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2

    How Much Data Can You Afford to Lose? . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

    About Transaction Logs and Keeping Historical Backups . . . . . . . . . . . . . . . . .3

    Optimize Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

    Plan for the Future, Dont Fail to Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

    Pointers to Keep in Mind for the Restoration Process Planning . . . . . . . . . . . .6

    Disk to Disk = Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6

    Being Prepared for Recovery The Backup Process . . . . . . . . . . . . . . . . . . . . . . . . . . .8

    Summary/Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8

  • 8/6/2019 Data Recovery - Best Practices

    3/12

    1

    Data Recovery Best Practices White Paper

    Introduction

    When people think about Data Recovery, they think largely about backups and the actual

    act of both backing up the database and associated files and the process of restoring

    those files to the server. Without a solid plan in place that reviews the best approaches

    for setting up a plan, testing the plan and executing on that plan, you can quickly get

    into trouble.

    Planning for data recovery is more than just making sure your database is backed up. You

    need to understand how the process works, you need to have the right tools in place, and

    you need to have practice in using those tools. When the time comes to restore information

    to your production systems, you wont want to be learning about how things work; youll

    want to get the job done as quickly as possible.

    There are many dif ferent components to a competent backup and recovery plan. In addition,

    there are many types of recovery plans available. Each of these different approaches may

    suit what you need for different types of issues that arise. You need to understand and

    plan for the differences between a full system restore and a point-in-time recovery. At themost precise level, you may even need to recover a specific transaction or data element.

    As you can imagine, understanding each of these, and how to execute on them, is critical

    to managing your data resources.

    In this white paper, well explain each of these items, talk about what they mean and

    how they apply. Well also provide key planning points, and investigate how some different

    tools can help you accomplish these tasks.

    Why Backup is Necessary

    Backup provides you a recovery avenue when things go wrong. Hard drives fail, connections

    between systems fail and have to be restored, people make mistakes, all causing the

    need to recover at different levels.

    NOTE

    Backup processes and planning often revolve around the unsettling question of how

    much can you afford to lose. This is because you need to determine the frequency

    that you backup the transaction logs and databases, while at the same time paying

    attention to disk and/or tape space constraints. In addition, youll need to decide how

    you store backups, how many days of backups you retain and lastly, whether you want

    to maintain a sub-set of your backups off-site.

    Remember, in the worst possible scenario, if your backups are stored right next to your

    computer and there is a fire, the backups will go up in smoke too, right along with

    your computer. Its important to have at least a skeleton off-site storage plan.

    Keep in mind that responsible planning and management of your systems includes more than

    just backing up to a device and then restoring the database should systems fail. There are

    really three different types of recoveries you may be faced with, and several shades of gray

    between each of these. The major restore options are explained in the next three sections.

  • 8/6/2019 Data Recovery - Best Practices

    4/12

    Data Recovery Best Practices White Paper

    2

    FULL DATABASE RECOVERY AND RESTORE

    Full database backup and restore is what many people think of when they consider their

    backup strategy, and its the most drastic recovery path. This requires that you restore

    the most recent full database backup, and then apply all transaction logs that were

    backed up after that backup was taken. At the end of the process, your database will be

    in the same state it was as of the time of the last transaction log backup. Your data loss

    in this scenario will amount to that information that was not in the most recent transaction

    log backup.

    POINT-IN-TIME RECOVERY

    Point-in-Time Recovery lets you recover, typically using transaction logs, to a specific

    time when you know the data was valid. This typically means youve discovered data

    issues after some time has passed. This usually means restoring the most recent backup,

    then applying transaction logs to the system up to just before the time when you know

    the data began to have issues. This lets you restore to a known good point in time. You

    can also perform differential database backupsthese allow you to backup just the

    changes since the last backup was performed.

    SPECIFIC TRANSACTION RECOVERY

    Transaction-based recovery is typically done in one of two different ways. First, your appli-

    cation can be managing transactions in the code by starting transactions, doing a bit of

    work, and then committing the work to the database with an end transaction call. If the

    transaction fails, it can be rolled-back, putting the information in the tables into the same

    state that it was in when the transaction was started. In addition, if the server were

    forced to restart during the transaction, SQL Server would roll back the transaction,

    putting the database into a known statethe values representing the values in the data-

    base at the time that the transaction was started.

    Its also possible to roll back specific transactions (either literal transactions or merely

    changes to the data in the database) using third party tools. Lumigents Log Explorer

    product will let you peruse data changes, along with a whole host of information about

    those changes. This includes who made the change, what was the value before the

    change, etc. From this information, the tool will allow you to restore specific values, in

    essence rolling-back data modifications, even without the benefit of transactions.

    Disaster Planning and Recovery

    Disaster planning must take into account the types of recovery you want and need to

    support. You need to have a written plan, and you need to test the plan to make sure itaddresses the different facets of any restore process. Remember, you wont control when

    the process is needed. You want to be able to provide for how the process is done, what

    the expected outcome will be, and how to provide for support for these processes up to

    the time you need the recovery ef forts to begin.

    What follows are some guidelines to thinking through your plan.

  • 8/6/2019 Data Recovery - Best Practices

    5/12

    3

    Data Recovery Best Practices White Paper

    HOW MUCH DATA CAN YOU AFFORD TO LOSE?

    As mentioned above, this is perhaps the most telling question you need to be sure you

    can answer. If you cant lose a single transaction or a single change, your disaster planning

    and recovery efforts will need to include fail-over systems. This means youll be looking

    into clustering solutions, and youll be working with hot stand-by systems and real-time

    replication and archival solutions. These tend to lead to rather large budgets, so depending

    on your budget, no data loss whatsoever may not be a reality.

    That said, and assuming that youre not looking into a clustered solution, youll need to

    know how much data you have in the actual database(s) youre backing up, and youll

    need to know what size the transaction logs get to as the database is used.

    One of the most common approaches to backups, and one which allows for only a maximum

    one hour data loss window, is to backup the database nightly and the transaction logs

    hourly. Typically, youll set up SQL Server to keep a specific number of days worth of

    backup as archive. When you set up this type of backup structure, youll tell SQL Server

    Keep 14 days of backups, backup the database each morning at 3AM and the transactionlogs every hour for all other times.

    Keep in mind that, if youre using this approach, you need to have disk (or tape, if youre

    backing up directly to tape) space equal to more than 14 times the size of your database

    since youll be keeping 14 archival copies in the queue. In addition, you need to plan for

    enough space to support the 13 transaction log dumps. The size of transaction log dumps

    varies wildly and is entirely dependent on the volume of information processed by SQL Server.

    ABOUT TRANSACTION LOGS AND KEEPING HISTORICAL BACKUPS

    Many people make the mistake of thinking that as long as they have several days of

    backups, they can restore to any point in time during those several days. It can be a

    painful lesson to learn that this may not be the case, depending on your archive solution.

    Consider the following backup policy:

    Nightly backups

    Hourly transaction log dumps

    Database backups are kept online for five days, then archived to a secondary source

    Transaction logs are rotated to keep the most recent 24 hours available

    At first glance, this is great. You can recover to the last database backup, then apply

    the transaction logs to recover beyond that to the current state, or any time in between.

    If your system fails, and you recognize the failure within 24 hours of the last database

    backup, youre correct in saying that youre covered.

    Keep in mind, though, that if you have the possibility of needing to restore further back

    than that last database backup, you will be faced with data loss.

  • 8/6/2019 Data Recovery - Best Practices

    6/12

    Data Recovery Best Practices White Paper

    4

    This situation comes from the fact that youll restore the database from three days ago

    (as an example), which would be available online. But if you follow the history configuration

    for the transaction logs, youll find that the transaction logs are only available for the last

    24 hours. This would mean you wouldnt be able to move forward beyond that three-day

    old backup. Youd be restoring to that point and no further in the database.

    Keep this in mind as you architect your recovery solution. You need to consider your

    transaction log rotation schedule in addition to your backup rotation schedule. It all goes

    back to how much data can you lose and how far back are you willing to support in the

    need to recover that data? If the answer is that you need to be able to restore to a point

    in time during that five day window (from our example of five days online backup storage),

    youll need to also be storing five days of hourly transaction logs.

    OPTIMIZE AVAILABILITY

    When youre building out your plan, be sure to consider the impact on your users and

    those dependent on access to the database. If youre in a situation that requires access

    at all times (financial applications are an example of this), youll want to look not only ata recovery plan, but also a failover plan.

    Failover will protect you in cases where a hard drive fails, or other instances where the

    server goes offline, taking your database systems with it. Failover typically includes clus-

    tered server capabilities, where you have more than one server working against a given

    set of data. If one server does fail, the other server is able to pick up where the failing

    server left off and the user experience is largely unaffected by the downtime.

    NOTE

    In a clustered environment, if a failover situation does occur, the application working

    against the database may need to be restarted to see the recovery server. Typically

    this is merely a restart of the application, or a reconnection to the web site or other

    resource working with your SQL Server. The important point here is that your recovery

    plan in a clustered environment should include several phases:

    Bring the applications back online against the recovery server(s).

    Take the server of fline that is down and/or experiencing trouble.

    Correct the issue with the original server.

    Bring the original server back into the cluster to begin supporting the cluster again.

    On the other hand, if you dont need to make sure you have full access, all the time to

    the server, you can work out your plan so you know exactly what you need to do to recover

    your system, get people back working again in the shortest period of time, and how to

    address problems that may arise during that process.

  • 8/6/2019 Data Recovery - Best Practices

    7/12

    5

    Data Recovery Best Practices White Paper

    PLAN FOR THE FAILURE, DONT FAIL TO PLAN

    Executing on your plans will be keybelow youll find different things youll need to

    consider and work through as you design your recovery plans.

    Backup Procedure Checks

    Are they working?

    - Check your scheduled tasks history entries.

    - Check the backup directory for the related database and transaction log dump files.

    Are they archiving appropriate numbers of past copies of the backups?

    - Check the directory for past copies of the database and transaction log dump

    filesif youre expecting a rotation of files, perhaps several days worth or more of

    these files, make sure theyre in the directory.

    Are the transaction logs backing up on time?

    - Check the job history.

    - Check the directory that is used for the backups; make sure the transaction logs

    dumps are there.

    TIP

    When you review the backup file sizes, if you see that your transaction log dump files

    are rather large, you may want to consider making the time between transaction log

    backups smaller. Remember, in the case of a restore, youll be restoring the database,

    then the transaction logs to get caught up. If the transaction logs are large, this can mean

    that you are running a large number of transactions, which translates into losing a large

    number of transactions (since the last transaction log backup) between backup processes.

    If youre using SQL LiteSpeed, try running LiteSpeed with the debug option turned

    on. This will enable you to see the various messages as the backups are performed.

    Youll need to manually run the backups to be able to review/see these messages.

    Alternatively, you can have the output of the backup operations directed to a log file,

    external to SQL Server. You can then review this log file for any issues that may

    arise. For more information, read about the @logfile option with LiteSpeed.

    Perhaps the most important check is whether your backup files can be restored. It sounds

    silly, but there are a large number of people that can attest to the fact that they thought

    they were successfully backing up and were protected from disaster. When it came time

    to recover and restore their files from backup, they found that they didnt know how (didnt

    know the commands), the backup files were either missing or corrupt, or they couldnt

    find the correct hardware/software combination to get the files back onto the server for

    restoration. (This last point is one that pertains largely to tape backup systems.)

  • 8/6/2019 Data Recovery - Best Practices

    8/12

    Data Recovery Best Practices White Paper

    6

    Once you have your backup files, you need to make absolutely certain they are valid, that

    you know how to restore them, and that the restoration process is documented. Remember,

    if youre encrypting or password protecting your backups, the password should be stored

    somewhere safe, but somewhere where the right person knows how to get to it. If youre

    away on vacation and the system must be restored, there should be a procedure that can

    be followed to complete the restoration, complete with passwords.

    Keep in mind that just because you may not be taking vacations, this doesnt mean you

    dont need a plan. When things go wrong, the last thing you want to be doing is trying to

    remember the steps you need to follow to get your systems back online. Take the time

    now to write out the stepsthen practice them.

    HERE ARE SOME POINTERS TO KEEP IN MIND FOR THE RESTORATION PROCESS PLANNING:

    Have a written plan with steps to follow for the restoration and recovery process. One

    very important thought on this topic has surfaced given the recent mass power outage

    in New York City and the surrounding areas. If you consider that, if you were the DBA,

    the phones and many transportation systems were out of commission, and you quickly

    see that you cant count on getting back to the office to address issues. While this is

    extreme, it does point out that its possible that whoever happens to be in the office at

    the time a critical issue arises needs to be able to address that issue. You need to have

    a written plan.

    Try performing your restores against a second server. Make sure you know the process

    and that youve gone through the steps of restoring the database, checking user permis-

    sions, applying transaction logs.

    If youre working in a clustered environment, run through a test with a failed node. Note

    of course that unless you have an extra clustered environment this can be tricky relative

    to downtime. Make sure you have a planned maintenance window and that youre pre-

    pared for issues that may arise. While this will take some meticulous planning to avoidcomplications, all the planning and studying to understand the failover technologies will

    pay offnot just in the dry run, but in the real thing when the knowledge is needed most.

    Disk to Disk = Best Practices

    You have several options when considering the actual approach to backing up your system,

    especially as it relates to how youll store the backups, how you make them available for

    restores, and how you archive those backups. Typically, you can expect your backups to

    be needed for a restoration process within a reasonably short time. This is because back-

    ups are used to recover a system after a system failurenot to go back in time to see

    data. This is an important distinction because youll want to make sure your most recent

    backups are both the most protected and the most readily available.

    As a general rule of thumb, youll find that disk-to-disk backup is a much better solution

    than tape-based alternatives when it comes to recovery options and processes. Some of

    the benefits of this approach include:

  • 8/6/2019 Data Recovery - Best Practices

    9/12

    7

    Data Recovery Best Practices White Paper

    Speed with no tape transfer process to work with, you can access your database and

    transaction log backups immediately, providing a much faster path to recovery.

    Additional recovery options you can use products like Lumigents Log Explorer to work

    with the transaction logs, making transaction and specific data element recovery possible.

    This may be possible with tape backup, but would force a restore to your server or

    other location.

    More reliable data storage medium since youre backing up to disk, you stand a better

    chance of not having the media go bad for your backups. That said, of course, make

    sure youre backing up your backup devices, just in case. Keep in mind too that the

    Acts of God issues still remainif youre backing up to the disk on the same server

    that has your SQL Server, or youre backing up to another server physically located near

    your SQL Server, you can still be in danger of not being able to recover from fire or other

    catastrophic disaster. For this reason, its good to keep archive copies (perhaps weekly,

    for example) off-site as a last-step recovery mechanism.

    By backing up to disk, and keeping those backups online and available, you are able to

    use world class tools to quickly provide recovery options. Time is of the essence when

    youre working to bring systems or data elements back online. Backing up to tape

    requires locating the tape, restoring to your serverboth of which require time and

    introduce variables that can stand in the way of your recovery process success.

    If given a choice, its always a better solution to backup to disk.

    The table below shows some examples and recovery approaches you can employ with

    this type of system in place, based on the scenario youre facing.

    Recover a database Restore the database; restore the logs, in order, from

    the point in time of the last backup. The resulting

    system will include all updates up to the time of the

    more recent transaction log backup.

    If you only want to recover to a specific point in time,

    determine which log file occurs closest to the point in

    time before your target time period. Restore the data-

    base, restore the log files up to that point.

    Recover a specific data Using Log Explorer, you can review the

    element change transaction logs, locate the change that was in error

    and restore the data to the value prior to the change.

    Recover a dropped table Restore your database and log files to a new, temporary

    database. From this database, you can copy the lost

    table back to the production database.

    Alternate solution, use Lumigents Log Explorer product

    to recover the lost tablerecovery is possible for

    DROPped or TRUNCATEd tables, depending on your

    transaction logs.

  • 8/6/2019 Data Recovery - Best Practices

    10/12

    Data Recovery Best Practices White Paper

    8

    Being Prepared for RecoveryThe Backup Process

    By utilizing disk-based backup procedures, you can optimize your responsiveness and

    available up time to support the recovery methods youll need. By using the right tools,

    you will have a full circle of options when it comes to restoring and recovering from system

    and database issues.

    Backing up your information, and how you do it is just as important as having the tools

    and knowledge available to you to recover your data. Backing up your data with tools or

    technologies that can become faulty or cause time delays in your recovery cycles are simply

    not good practice.

    A very significant tool you can use to optimize your systemboth on the backup and

    recovery sides of the equation is the SQL LiteSpeed product from DBassociatesIT. The

    product offers fast, non-CPU-intensive, encrypted and compressed backups. One objection

    to backing up to disk has been the amount of disk space required to support a solid

    recovery model. With LiteSpeeds compression technologies, youll not have to use third-

    party archive and compression utilities, and you can save drastically on the disk spaceyou need to store and manage your database and transaction log backups.

    LiteSpeed runs just like the native backup routines in SQL Server and syntax is nearly

    identical to native backup options in all but just a few new commands. In addition, you

    can address the security issues associated with traditional backups by encrypting your

    database and transaction log backups with true encryption that protects the whole of

    your backup set.

    To be best prepared, set up a backup serverthe destination for your backups. Install a

    good amount of disk space and use this as the destination for your backups. Dont store

    the backups on the same drive as your databases. This is a solution that would provide

    no recovery path when the disk fails.

    Summary/Conclusion

    There is much to consider as you build out your backup, restore and recovery plans. Its

    more than the ability to simply restore your database; you need to manage the recovery

    options and make sure you have all available options available to you.

    Be sure to write out your plan. Test the plan, practice the plan, and make sure others

    that may be in contact with the servers in your absence are also aware of and familiar

    with your plans. While restoration of a single point in time transaction isnt something

    you need to train everyone one, you should consider training on full system restores,

    transaction log restores and how to work with the backup media you use.

    Use 3rd party tools as appropriate to make sure your systems are both optimized and

    providing the highest level of functionality you need. Having too many options is just not

    possible when the users are screaming, the boss is sweating and youre in the hot seat

    to get things right again with your database server.

  • 8/6/2019 Data Recovery - Best Practices

    11/12

    9

    Data Recovery Best Practices White Paper

    IF YOURE INTERESTED IN MORE INFORMATION ON EITHER OF THE PRODUCTS MENTIONED, YOU CAN VISIT:

    Lumigent Technologies, Log Explorer, http://www.lumigent.com

    DBassociatesIT, SQL LiteSpeed, http://www.dbassociatesit.com

    About Stephen Wynkoop

    Stephen Wynkoop is the founder of The SQL Server Worldwide Users Group (www.sswug.org) where he writes

    a daily database column and newsletter, and a Microsoft SQL Server MVP. Stephen is a best-selling SQL Server

    author and a well-known speaker at technical conferences. Stephen first star ted working with SQL Ser ver whenit was first introduced in 1993 and has worked with SQL Server ever since. In addition, Stephen has authored

    online and offline columns, books, and other references on Office Development Technologies, web site design

    and deployment technologies and Microsoft Access. To contact Stephen, email [email protected].

  • 8/6/2019 Data Recovery - Best Practices

    12/12

    Copyright 2003 Lumigent Technologies,

    Inc. All rights reserved. Lumigent, the

    Lumigent logo and Log Explorer are registered

    trademarks or trademarks of Lumigent

    Technologies, Inc. All other names and marks

    are property of their respective owners.

    Lumigent Technologies, Inc.

    289 Great Road

    Acton, MA 01720 USA

    Toll Free 1 866-LUMIGENT

    1 866-586-4436

    Phone 1 978-206-3700

    E-mail [email protected]

    www.lumigent.com

    Lumigent Technologies, Inc. is t

    leader in enterprise data auditin

    solutions for organizations that

    need to reduce risk associated

    with regulatory compliance and

    the use of corporate data assets