33
Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO HERE] Date of Last Update: MM/DD/YYYY DISASTER RECOVERY RUN BOOK

DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

  • Upload
    vanphuc

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013

[YOUR LOGO HERE]

Date of Last Update:

MM/DD/YYYY

DISASTER RECOVERY RUN BOOK

  

Page 2: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 2

DR Run Book Template provided by Xtium © 2013

Foreword By Laura DuBois, Program Vice President, Storage, IDC

The Disaster Recovery Imperative

Nearly all organizations today rely on information technology and the data it manages to operate. Keeping computers and networks running, and data accessible, is imperative. Without this information technology, customers cannot be serviced, orders taken, transactions completed, patients treated, and on and on.

Disasters that create IT downtime are numerous and common, spanning the physical and logical, the man-made and natural. Organizations must be resilient to these disasters, and able to operate in a disruption of any type, whether it is a security incident, human error, device failure, or power failure.

State of Preparedness

Most organizations know the importance of disaster recovery, and firms of all sizes are investing to drive greater uptime. An IDC study on business continuity and disaster recovery (DR) showed that unplanned events of most concern were power, telecom, and data center failures (physical infrastructure) – more so than natural events such as fire or weather. Security was considered the second most critical and extreme threat to business resiliency.

Seventy-one percent of those surveyed had as many as 10 hours of unplanned downtime over a 12-month period. This underscores the importance of greater uptime and DR, which is driving firms to conduct DR tests more frequently. Approximately one in four firms are conducting DR testing quarterly or monthly, while another 45% are testing semi-annually or annually.

This is a marked increase from previous research, which IDC conducted three years ago, where firms were testing annually at best. However, 25% of firms are still not doing any DR testing.

IDC Advice

DR planning is complex and spans three key areas: technology, people, and process. From an IT perspective, planning starts with a business impact analysis (BIA) by application/workload. Natural tiers or stages of DR begin at phase 1 – infrastructure (networking, AD, DHCP, etc.) – then extend to recovery by application tiers. Each application tier should have an established recovery time objective (RTO) and recovery point objective (RPO) based on business risk.

DR testing is essential to adequate recovery of systems and data, but also to uncover events or conditions met during real disasters scenarios that were not previously accounted for. Examples

Page 3: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 3

DR Run Book Template provided by Xtium © 2013

include change management such as the needed reconfiguration of applications or systems. Also, the recovery of systems in the right sequence is important. To ensure that DR testing, planning, and recovery is organized and effective, many organizations use a disaster recovery "run book."

A DR run book is a working document, unique to every organization, which outlines the necessary steps to recover from a disaster or service interruption. It provides an instruction set for personnel in the event of a disaster, including both infrastructure and process information. Run books, or updates to run books, are the outputs of every DR test.

However, a run book is only useful if it is up-to-date. If documented properly, it can take the confusion and uncertainty out the recovery environment which, during an actual disaster, is often in a state of panic. Using the run book template provided here by Xtium can make the difference for an organization between two extremes: being prepared for an unexpected event and efficiently recovering, or never recovering at all.

Page 4: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 4

DR Run Book Template provided by Xtium © 2013

Your Disaster Recovery Run Book A disaster recovery run book is a working document unique to every organization that outlines the necessary steps to recover from a disaster or service interruption.

Run books should be updated as part of your organization’s change management practice. For instance, once a production change has been committed, run book restoration instructions should be reviewed for accuracy. In addition to synchronizing run books with corporate change management, the outcomes and action plans of each DR test should also be incorporated into run book update cycles.

How to Use this Template This template outlines the critical components of your disaster recovery and business continuity practices. Disaster recovery tests should be regularly conducted, reviewed, and plans subsequently updated. Use this template as a guide for documenting your disaster recovery test efforts. It includes sections to specify contact information, roles and responsibilities, disaster scenarios likely to affect your business and recovery priorities for your business’ IT assets. Keep in mind there may be more sections of a run book based on your deployment model; this template serves as a standard with all its sections applicable (and necessary) to any disaster recovery testing procedure. Similarly, your run book may look different if you are working with a managed service provider that handles most or all aspects of your disaster recovery tests. If you have further questions about DR tests, take a look at our disaster recovery testing guide, available for free at xtium.com, or contact us for more information.

Page 5: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 5

DR Run Book Template provided by Xtium © 2013

DR Scenarios Though not part of the run book itself, we’re providing this section to list some common events that would cause DR scenarios. These threats are general and could affect any business, so you might also want to list those which would threaten your business specifically.

Research firm Forrester outlined some of the most common causes of disaster scenarios from a 2011/2012 study. The findings illuminate the fact that your business should not just be prepared for the news-making types of disaster threats (hurricanes or tornados, for example). Instead, consider all these potential causes for disaster:

Source: http://it.toolbox.com/blogs/managed-hosting-news/whats-your-2012-it-disaster-recovery-plan-49333

Page 6: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 6

DR Run Book Template provided by Xtium © 2013

It is wise to also list disaster scenarios that are unique to, or are more likely to affect, your business. For each possibility, include details on the scenario, methods for data restoration on the part of the provider and your company, and procedures by which DR events will be initiated.

For example:

Scenario #1: List your first disaster scenario or business continuity threat here. Examples might include significant loss of hardware, a power outage of significant length, an infrastructure outage, disk corruption, or loss of most or all systems due to unavoidable natural disaster. Identify and address those disaster scenarios that are most relevant and likely to affect your business.

For each scenario, include:

Overview of the associated scenario and systems most likely to be affected by the threat

Time frame of potential outages, based on the likely elements of the specific scenario Systems that may be brought up locally via on-premise failover equipment or premise-

based cloud enablement technology Procedures for initiation of system failover to external data centers Priority schedule for system restoration Procedures for contacting your hosting provider (if applicable) to initiate critical support

Continue listing disaster scenarios with all important details. Do not feel limited to only a few disaster recovery scenarios; list all those that could realistically impact your business along with the associated recovery procedures. The table below may be an effective tool for listing your potential DR scenarios:

Event Plan of Action Owner

Power failure Enact affected system run book plans

Application business owner

Data center failure Enact total failover plan Disaster Recovery Coordinator (DRC)

Pending weather event (winter storm, hurricane, etc.)

Review all DR plans, notify DRC, put key employees on standby

Disaster Recovery Coordinator (DRC)

Business Owner

Page 7: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 7

DR Run Book Template provided by Xtium © 2013

Distribution List

This secion is also critical to the development of your run book. You must keep a clearly defined distribution list for the run book, ensuring that all key stakeholders have access to the document. Use the chart below to indicate the stakeholders to whom this run book will be distributed.

Role Name Email Phone

Owner

Approver

Auditor

Contributor (Technical)

Contributor (DBA)

Contributor (Network)

Contributor (Vendor)

Location

Specify the location(s) where this document may be found in electronic and/or hard copy. You may wish to include it on your company’s shared drive or portal.

If located on a shared drive or company portal, consider providing a link here so the most recent version is readily accessible.

If this run book is also stored as a hard copy in one or multiple locations, list those locations here (along with who has access to those locations). We do recommend making your run book available outside of shared networks, as the document must be readily accessible at time of disaster in the event that primary systems like email are not accessible to employees. In other words, ensure your run book is accessible under any circumstances!

Page 8: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 8

DR Run Book Template provided by Xtium © 2013

Table of Contents

Document Control 7

Contact Information 8

Data Center Access Control List 10

Communication Structure of Plan 11

Declaration Guidelines 13

Alert Response Procedures 15

Issue Management and Escalation 16

Changes to SOP During Recovery 17

Infrastructure Overview 19

Data Center 19

Network Layout Topology 21

Access to Facilities 21

Order of Restoration 22

System Configuration 23

Backup Configuration 25

Monitors 26

Roles and Responsibilities 27

Data Restoration Processes 29

 

 

 

 

 

Page 9: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 9

DR Run Book Template provided by Xtium © 2013

Document Control Document creation and edit records should be maintained by your company’s disaster recovery coordinator (DRC) or business continuity manager (BCM). If your organization does not have a DRC, consider creating that role to manage all future disaster recovery activities.

 

Document Name Disaster Recovery Run Book for [Your Company’s Name Here]

Version  

Date created  

Date last modified  

Last modified by  

 

 

 

Document Change History

Version Date Description Approval

V 1.0 11/20/2010 Initial version Business Owner / DRC

V1.1 12/30/2010 End of year DR test action plan updates to run book

Test Manager / DRC

 

 

 

Keep the most up-to-date information on your disaster recovery plan in this section, including the most recent dates your plan was accessed, used and modified. Keep a running log, with as many lines as necessary, on document changes and document reviews, as well.

Page 10: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 10

DR Run Book Template provided by Xtium © 2013

Contact Information This section will list your service provider’s contacts (if applicable) along with those from your IT department. This is the team that will conduct ongoing disaster recovery operations and respond in the case of a true emergency. Specific roles listed below are examples of those that might comprise your team.

All of these roles need to be in communication when in a disaster recovery mode of operation. For pending events, this same distribution list should be used to provide advanced notice of potential incidents. Customer support teams should also not be overlooked as they are the first line of communication to your customer base. Forgetting this step will cause extra work on your primary recovery team as they take time to explain what is going on.

Your company’s contacts

Title Phone Email

Name Disaster Recovery Coordinator

Primary phoneSecondary phone

Email

Name Chief Information Officer Primary phoneSecondary phone

Email

Name Network Systems Administrator

Primary phoneSecondary phone

Email

Name Database Systems Administrator

Primary phoneSecondary phone

Email

Name Chief Security Officer Primary phoneSecondary phone

Email

Name Chief Technology Officer Primary phoneSecondary phone

Email

Name Business Owner Primary phoneSecondary phone

Email

Name Application Development Lead (as applicable)

Primary phoneSecondary phone

Email

Name Data Center Manager Primary phoneSecondary phone

Email

Name Customer Support Manager

Primary phoneSecondary phone

Email

Name Call Center Manager Primary phone Email

Page 11: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 11

DR Run Book Template provided by Xtium © 2013

Service Provider Contacts (if applicable)

Role Phone Email

Name Disaster Recovery Coordinator

Primary phoneSecondary phone

Email

Name Customer Service Primary phoneSecondary phone

Email

Name Emergency Support Primary phoneSecondary phone

Email

Name Sr. System Engineer Primary phoneSecondary phone

Email

Name Director – Service Delivery

Primary phoneSecondary phone

Email

If you are working with a service provider, this position might be alternately filled with an account or test manager.

Page 12: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 12

DR Run Book Template provided by Xtium © 2013

Data Center Access Control List Maintain an up-to-date access control list (ACL) specifying who, in both your company and your IT service provider (if applicable), has access to your data center and resources therein.

Also specify which individuals can introduce guests to the data center. This will be useful for determining, in the event of an emergency scenario, who may be designated a point person for facilitating access to critical infrastructure. During a recovey event your primary operations team is going to be busy recovering systems, so be sure you know who to contact and how to gain access to your data center.

Examples are provided in the table below. Remove, replace and add individuals to this list as appropriate for your organization and infrastructure.

Name Role Contact Info Access level

Name Chief Technology Officer

Phone Email

General access Can authorize guest

access

Name Director of Service Delivery

Phone Email

General access Can authorize guest

access

Name Service Delivery Engineer

Phone Email

Server room access, cage/cabinet, NOC

access Cannot authorize

guest access

Name Systems Engineer Phone Email

Server room access Cannot authorize

guest access

Name Network Engineer Phone Email

Server room access Cannot authorize

guest access

Name Chief Security Officer Phone Email

General Access Can authorize guest

access

Name Chief Information Officer

Phone Email

General Access Can authorize guest

access

 

Page 13: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 13

DR Run Book Template provided by Xtium © 2013

Communication Structure of Plan

Disaster event call tree

During any disaster event there should be a defined call tree specifying the exact roles and procedures for each member of your IT organization to communicate with key stakeholders (both inside and outside of your company). When defining the call structure, limit your tree and branches to a 1:10 ratio of caller to call recipient.

As a first step, for example, your Disaster Recovery Coordinator might call both the company CEO and head of operations, both of whom would then inform the appropriate contacts in their teams along with key customers, service providers, and other stakeholders responsible for correcting the service outage and restoring data and operations.

An example call tree might appear as follows:

 

 

 

Page 14: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 14

DR Run Book Template provided by Xtium © 2013

And, for the situation written above, your general progression of calls might be as follows:

Disaster Recovery

Coordinator

Head of Operations

Director of Service Delivery

Sr. Systems Engineer

Network Engineer

Systems Administrator

CEO Director of Business

Development

Sales contact

PR Representative

Page 15: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 15

DR Run Book Template provided by Xtium © 2013

Declaration Guidelines As you create your run book, you must consider guidelines for declaring a disaster scenario. Guidelines that we recommend are specified in the chart below:

Situation Action Owner

Workaround does not exist in a matter of time that does not affect customer SLAs

Declare application level failover and enact failover to secondary site

Restoration procedres cannot be completed in your production environment

Declare application level failover and enact failover to secondary site

A production environment no longer exists or is unable to be accessed

Declare a data center failure and enact a total failover plan from primary to secondary data center

Service provider issues cannot be resolved

Notify service provider and have them enact DR plans

The use of technology can be incorporated into the declration steps of a DR plan. Be sure not to declare on the first instance of an event unless it is completely understood that secondary instances of the event will result in increased damage to your customer or your business sytems. The table below details some standard practices to use in order to mitigate premature declarations. SLAs should be built in a manner that allows for some troubleshooting and system restoration prior to the need to declare a disaster.

Also use this section to outline standard monitoring procedures along with associated thresholds. List all system monitors, what they do, their associated thresholds, associated alerts when those thresholds are met or exceeded, the individual(s) who receive the alerts, and the remediation steps for each monitor.

List event monitoring standards by defining thresholds for event types, durations, corrective actions to be taken once the threshold is met, and event criticality level. Use the following chart (or a derivative thereof for your monitoring standards) to specify event monitoring standards.

Page 16: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 16

DR Run Book Template provided by Xtium © 2013

The first few rows have been filled in with examples:

 

Event Type Duration of Event Corrective Action Event criticality

Performance Monitoring Status = Warning Alert Level

> 2 minutes Isolate problem device / recycle device

Critical Level

Memory Usage > 80%

> 5 minutes - Isolate physical device / virtual machine

- configure memory pool increase

- clear memory cache - clear memory buffer

Critical Level

CPU Usage > 90% > 3 minutes - increase compute allocation (virtual)

- add additional compute resources into application pool

Critical Level

Memory > 15 minutes - check memory queue

- clear memory cache of affected system

- increase memory allocation (virtual)

Storage

Network

Ping Check

IP Check

These event types (memory, storage, network, ping check and IP check) are categories of events for which you should list specific examples in this chart.

Page 17: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 17

DR Run Book Template provided by Xtium © 2013

Alert Response Procedures

List out your step-by-step procedures for responding to service issue alerts in this section. As an example, Xtium’s ticket submission and response procedures follow this general outline:

 

Service interruption identified > Service Delivery Manager contacted

1. Ticket is opened with support team (either in-house or third party provider’s ticket creation system).

2. Contact key stakeholders to ensure they are aware of the alert and determine if any current activity or recent changes may be responsible for the service interruption.

3. Verify that alert is legitimate and not an isolated single user issue or monitoring time out. 4. Notify end users of ticket creation. 5. Contact the appropriate member(s) of your operations or engineering teams to notify them

of the alert and assign investigation and data restoration procedures.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Page 18: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 18

DR Run Book Template provided by Xtium © 2013

Issue Management and Escalation This section should list detailed procedures for issue management and escalation, when necessary, in the case of an unmet service objective.

Escalation procedures will vary by levels of operation and severity of the associated activities. At Xtium, for example, we categorize standard operating procedure interruptions in five levels (5 being the lowest severity, 1 the highest). Of course, these can and will differ among organizations. The following serves only as an example:

1 – Fatal – Functionality has ceased completely with no known workaround for all users. Impact is highest.

2 – Critical – Functionality is critically impaired but still operational for some users. Impact is high.

3 – Serious – Functionality is impaired but workarounds still exist for all users. Impact is moderate.

4 – Minor – Some functionality is impaired but there is a reasonable workaround for some users. Impact is low.

5 – Request – This is an enhancement-related service request that does not at all impact current operations or functionality.

Depending on the severity of the service interruption, your escalation procedures will vary by parties involved, response chain, response time and target resolution.

Page 19: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 19

DR Run Book Template provided by Xtium © 2013

Changes to SOP During Recovery

Recovery events necessitate the priority of data and business process restoration. At times, other non-critical standard operating procedures (SOPs) must be suspended.

During a recovery event, recovery operations should take precedent over inbound queries or tickets. Monitors and alerts should also be reviewed for suspension until recovery is complete. This is a best practice procedure to avoid flooding your network operations center (NOC) and support teams with bogus or bad alarms.

Change management policies should also be altered to expedite recovery procedures. For example, adding a new server or firewall rule in a standard environment might take one day once all necessary reviews and permissions are met. But during recovery operations, a standard firewall change should be expedited to support recovery operations.

Ticketing of work during recovery operations should be reviewed to ensure the necessity of any requested tasks. Non-critical tickets should be deferred and addressed once recovery procedures are complete.

Remember, the number one rule in recovery is: Recover! Get things back up and running whether in a workaround, failover or full restore state.

That in mind, use this section to identify which standard operating procedures will be suspended in the event of a true emergency scenario (one that would fall under your critical or fatal service interruption classifications). List out specifications for change management, monitors and alerts, and problem and issue resolution during recovery procedures. Certain non-critical standard operating procedures may be suspended, such as in the following situation:

A user submits a call/ticket to your service desk stating they cannot access the company website. This ticket would be responded to with a message that your organization is currently in a recovery operations cycle and your service ticket will be addressed as soon as technicians have completed the restoration work.

Page 20: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 20

DR Run Book Template provided by Xtium © 2013

System Level Procedures

Your run book content, up to this point, has addressed organizational points of concern. At this stage in your run book you should have fully documented procedures in your company for issue management and escalation, criteria for evaluating and declaring an emergency scenario, and procedures for ensuring all key stakeholders and responsible parties are in communication and are ready and able to take the necessary steps to begin disaster recovery procedures.

From this point forward, the run book will shift focus to system level procedures to address infrastructure and network level configurations, restoration steps, and system level responsibilities while in disaster recovery mode.

Page 21: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 21

DR Run Book Template provided by Xtium © 2013

Infrastructure Overview

 

Provide a detailed overview of your IT environment in this section, including the location(s) of all data center(s), nature of use of those facilities (e.g. colocation, tape storage, cloud hosting), security features of your infrastructure and the hosting facilities, and procedures for access to those facilities.

Data center

Specify the location of all facilities in which your company’s data is stored. Include an address and directions to each location.

Example – Simple data center diagram:

Source: http://www.storageguardian.com/media/network_diagram.gif

Examples of a data center diagram need to be detailed enough to provide your backup recovery team member the necessary information to perform his or her responsibilities if called upon.

Page 22: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 22

DR Run Book Template provided by Xtium © 2013

Example – Detailed data center diagram:

Source: http://www.routereflector.com/en/2013/05/data-center-topology-with-cisco-nexus-hp-virtual-connect-and-vmware/

Page 23: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 23

DR Run Book Template provided by Xtium © 2013

Network Layout Topology

Source: http://sanketshukla.blogspot.com/2009/11/dhs-network-topology-diagram.html

Access to Facilities

Data centers and colocation facilities typically maintain strict entry protocol. Certain members of your organization will typically hold the appropriate credentials to enter the facility. Detail members of your team (and/or your IT service provider’s team) who have access to all data facilities along with any requirements for access.

Page 24: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 24

DR Run Book Template provided by Xtium © 2013

Order of Restoration

This section will include instructions for recovery personnel to follow that lay out which infrastructure components to restore and in which order. It should take into account application dependencies, authentication, middleware, database and third party elements and list restoration items by system or application type.

Ensure that this order of restoration is understood before engaging in restore work. An example is provided below. The rest of the table should be filled out in the exact order that restoration procedures are to be completed.

Order of Restoration Table:

Server Name Server Role Order of

Restoration OS / Patch

level Application

loaded

Ws12_VF1 Web Server Valley Forge 1

Restore prior to db12_VF1 startup

ESX4.1 Apache

Page 25: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 25

DR Run Book Template provided by Xtium © 2013

System Configuration

This section should include systems and application specific typology diagrams and an inventory of elements that comprise your overall system. Include networking, web app middleware, database and storage elements, along with third party systems that connect to and share data with this system.

Network table:

Device type

Name Primary IP OS level Gateway Subnet Mask

Firewall

Load balancer

Switch

Router

Server table:

Server Name/Priority

OS Patch IP Address Sub Gateway DNS Alternate

DNS Secondary

IPs

Production Mac

Address

You should lay out each of your systems separately and include a table for your network, server layout and storage layout.

Page 26: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 26

DR Run Book Template provided by Xtium © 2013

Storage table:

Name LUN Address RAID

configuration Host name

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Page 27: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 27

DR Run Book Template provided by Xtium © 2013

Backup Configuration

Use this section to list instructions specifying the servers, directories and files from (and to) which backup procedures will be run. This should be the location of your last known good copy of production data.

Server Software Version Backup Cycle

Backup Source

Backup Target

Page 28: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 28

DR Run Book Template provided by Xtium © 2013

Monitors Listed by server, be sure that these monitors are put in place and activated as part of your restore activities. Restoring from a disaster should result in a mirror to your production environment (even if scaled). Monitors and alerts are a critical element to your production system.

Server name Monitor Cycle Alert

Page 29: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 29

DR Run Book Template provided by Xtium © 2013

Roles and Responsibilities

Service Delivery Responsibility Assignment Matrix

Table Key

Code Description

R Responsible Party: Those who do the work to achieve the task

A Accountable Party: The party ultimately answerable for the correct and thorough completion of the deliverable or task, and the one from whom responsible party is delegated the work

C Consulted Party: Those whose opinions are sought, typically subject matter experts; and with whom there is two-way communication

I Informed Party: Those who are kept up-to-date on progress, often only on completion of the task or deliverable; and with whom there is just one-way communication

Positions that will fill these roles and responsibilities will often include your DR coordinator, network engineer, database engineer, systems engineer, application owner, data center service coordinator, and your service provider. Identify the responsibilities of each of these roles in a disaster event, then map them onto a matrix of all activities associated with recovery procedures, as in the example table provided below.

Activity Responsible Parties

R A C I

Maintain situational management of recovery events

DRC DRC DRC All

React to server outage alerts

React to file system alerts

React to host outage alerts

React to network outage alerts

Document technical landscape

Configure network for system access

Configure VPN and acceleration between your business and service provider network (if applicable

This matrix describes the participation by various roles to complete DR tasks or deliverables. It clarifies roles and responsibilities for IT stakeholders in your organization as well as any service providers involved with your business’ disaster recovery program. Fill in the matrix below, specifying the roles for your company, your service provider (if applicable) and any other 3rd parties that will be involved in your disaster recovery tests.

Page 30: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 30

DR Run Book Template provided by Xtium © 2013

Maintain DNS or host file

Monitor service provider network availability (if applicable)

Diagnose service provider network errors (if applicable

Create named users at OS level

Create domain users

Manage OS privileges

Create virtual machines

Convert physical servers to virtual servers

Install base operating system

Configure operating system

Configure OS disks

Diagnose OS errors

Start/Stop the virtual machine

Windows OS licensing (or your operating system)

Security hardening of the OS

Daily server level backup

Patch Management for Windows servers (or your operating system)

Provide a project manager

Provide a key technical contact for OS, network, and SAN

Coordinate deployment schedule

Support, management and update of Protection Software

Install, support management and update of Terminal Server

Page 31: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 31

DR Run Book Template provided by Xtium © 2013

Data Restoration Processes

Use this section to outline the steps necessary to respond to outage alerts and, subsequently, restore data from backup records. Include your order of backup operations in this section, including data dependencies (based on organization of your data backups) and troubleshooting steps.

These processes will be followed in the event that a data recovery is necessary, including scenarios in which data is still running but a backup is needed, restoring data in a post-disaster event or restoring from a backup volume.

In this section you should identify the order of operations for a data restore, the location of your backup, and step-by-step procedures to re-establish your data volumes into your production environment.

 

Restoration Procedures

Though your order of operations should stay relatively consistent, list steps taken for each and every backup system. For example:

Payroll system backup:

System “XYZ” – Payroll

Start Db server – vm2345-qa1

Start Application server – vm354-r1

Start web server – vm6_ws4

Terminal server to Ws1_Vf1_Payroll

Login to backup archive – url: backup.archive.payroll

Create temp target folder for backup file

Login: user1

Password: 1resu

Nagivate to most recent backup file

Select file

Select restore target Ws1_Vf1_PayrollProd1

Initiate restore

This is only an example of what procedures for one system restoration may look like. For each of your actual systems, similarly list step-by-step instructions for full system backup.

Page 32: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 32

DR Run Book Template provided by Xtium © 2013

Select overwrite options

Confirm dialog box warning “Are you sure?”

Complete restore backup file

Login to Ws1_Vf1_PayrollProd1

Start Payroll App local\temp\dirs\payrollprod1.exe

Navigate via Explorer to temp backup folder

Select file

Open payrollprod application console

Select data source > temp\backup\payrollWs1bckup

Import

Validate through report test 1 run

 

   

Use the rest of this section to similarly list restoration procedures for each of your backup systems.

 

 

 

 

 

 

 

 

Page 33: DISASTER RECOVERY RUN BOOK - James M. Reiss, CPAjimrcpa/images/Disaster Recovery Template from... · Disaster Recovery Run Book Template Provided by Xtium, Inc. © 2013 [YOUR LOGO

Page | 33

DR Run Book Template provided by Xtium © 2013

Have questions about this run book or disaster recovery for your business?

Contact Us!

[email protected]

800-707-9116