54
Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Embed Size (px)

Citation preview

Page 1: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and

Disaster Recovery

Chapter 6Contingency Strategies for Business

Resumption Planning

Page 2: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 2

Objectives

• Know and understand the relationships between the overall use of contingency planning and the subordinate elements of incident response, business resumption, disaster recovery, and business continuity planning

• Become familiar with the techniques used for data and application backup and recovery

• Know the strategies employed for resumption of critical business processes at alternate and recovered sites

Page 3: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 3

Introduction

• Contingency planning addresses everything done by an organization to prepare for the unexpected

• IR process focuses on detecting, evaluating, and reacting to an incident

• Later phases focus on keeping the business functioning even if the physical plant is destroyed or unavailable

• Business resumption (BR) plan: takes over when the IR process cannot contain and resolve an incident

Page 4: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 4

Introduction (continued)• Business resumption (BR) plan major elements:

– Disaster recovery (DR) plan: lists and describes the efforts to resume normal operations at the primary places of business

– Business continuity (BC) plan: contains steps for implementing critical business functions using alternative mechanisms until normal operations can be resumed at the primary site or elsewhere

• Primary site: location(s) at which the organization executes its functions

• BR plan operates concurrently with DR plan when damage is major or long-term

Page 5: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 5

Introduction (continued)

Page 6: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 6

Introduction (continued)

Page 7: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 7

Introduction (continued)

• Each component of CP (IRP, DRP, and BCP) comes into play at specific times in the life of an event

• 5 key procedural mechanisms for restoring critical information and facilitating continuation of operations:– Delayed protection– Real-time protection– Server recovery– Application recovery– Site recovery

Page 8: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 8

Data and Application Resumption• Backup methods must be used according to an

established policy:– How often to back up– How long to retain the backups– What must be backed up

• Data files and critical system files should be backed up daily, with one copy on-site and one copy off-site

• Nonessential files should be backed up weekly

• Full backups: keep at least one copy in a secure location off-site

Page 9: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 9

Disk-to-Disk-to-Tape: Delayed Protection

• Decreasing costs of storage media, especially hard drives and removable drives, precludes the time-consuming nature of tape backup

• Storage area networks provide on-line backups• Lack of redundancy if both online and backup

versions fail or are attacked dictates that tape backup is still required periodically

• Disk-to-disk initial copies are efficient and can run simultaneously with other processes

• Secondary disk-to-tape copies do not affect production processing

Page 10: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 10

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Types of backups:– Full backup– Differential backup– Incremental backup

• Full backup: – Includes entire system, including applications, OS

components, and data– Pro: provides a comprehensive snapshot– Con: requires large media; time consuming

Page 11: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 11

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Differential backup:– Includes all files that have changed or been added

since the last full backup– Pro: faster and less storage space than full backup;

only 1 backup file needed to restore from full backup– Con: gets larger each day and takes longer; one

corrupt file loses everything• Incremental backup:

– Includes only files that were modified that day– Pro: requires less space and time than the

differential– Con: multiple incremental backups are required to

restore from the last full backup

Page 12: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 12

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Fastest backup method: incremental backups• Fastest recovery time: differential backups• All on-site and off-site storage must be secured and

must have a controlled environment (temperature and humidity)

• Media should be clearly labeled and write-protected• Tape media types:

– Digital audio tape (DAT)– Quarter-inch cartridge (QIC)– 8 mm tape– Digital linear tape (DLT)

Page 13: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 13

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Typical backup scheduling:– Daily: on-site incremental or differential backup– Weekly: off-site full backup

• Tape media should be retired and replaced periodically

• Popular strategies for selecting the files to back up:– Six-tape rotation– Grandfather-Father-Son– Towers of Hanoi

Page 14: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 14

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Six-tape rotation:– Uses a rotation of six sets of media – Five media sets per week are used with one extra

labeled Friday2– Friday full backup is taken off-site– Friday1 and Friday2 are rotated off-site every week– Provides roughly 2 weeks of recovery capability– Variation: keep a copy of each off-site Friday tape on-

site for faster recovery

Page 15: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 15

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Grandfather-Father-Son (GFS):– Uses five media sets per week– Allows recovery for previous 3 weeks– First week uses first set, second week uses second

set, third week uses third set– Following week starts with first set– Every 2nd or 3rd month, a group of media sets are

taken out of the cycle for permanent storage and replaced with a new set

Page 16: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 16

Disk-to-Disk-to-Tape: Delayed Protection (continued)

• Towers of Hanoi:– More complex approach– Based on statistical principles to optimize media wear– 16-step strategy assumes that 5 media sets are used

per week on a daily basis– First media set is used more often and must be

monitored for wear

Page 17: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 17

Disk-to-Disk-to-Tape: Delayed Protection (continued)

Page 18: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 18

Disk-to-Disk-to-Tape: Delayed Protection (continued)

Page 19: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 19

Disk-to-Disk-to-Tape: Delayed Protection (continued)

Page 20: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 20

Disk-to-Disk-to-Tape: Delayed Protection (continued)

Page 21: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 21

Redundancy-Based Backup and Recovery Using RAID

• Redundant array of independent disks (RAID): uses online disk drives for redundancy

• RAID spreads out data across multiple units, and offers recovery from hard drive failure

• 9 established RAID configurations: RAID Level 0 through 10

• RAID Level 0 (disk striping without parity):– Not redundant– Spreads data across several drives in segments

called stripes– Failure of one drive may make all data inaccessible

Page 22: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 22

Redundancy-Based Backup and Recovery Using RAID (continued)

• RAID Level 1 (disk mirroring):– Uses twin drives in a system– All data written to one drive is written to the other

simultaneously– Is expensive and is an inefficient use of disk space– Vulnerable to a disk controller failure– Disk duplexing: mirroring with dual disk controllers

• RAID Level 2:– Specialized form of disk striping with parity that is not

widely used– Uses the Hamming code for parity– No commercial implementations of this

Page 23: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 23

Redundancy-Based Backup and Recovery Using RAID (continued)

• RAID Levels 3 and 4: – RAID 3 uses byte-level striping while RAID 4 uses

block-level striping– Parity information is stored on a separate drive and

provides error recovery• RAID Level 5:

– Balances safety and redundancy against costs– Stripes data across multiple drives– Parity is interleaved with data segments on all drives– Hot-swappable: drives can be replaced without

shutting down the system

Page 24: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 24

Redundancy-Based Backup and Recovery Using RAID (continued)

• RAID Level 6:– Combination of RAID 1 and RAID 5– Performs two different parity computations or the

same computation on overlapping subsets of data• RAID Level 7:

– Proprietary variation on RAID 5 in which the array works as a single virtual drive

– May be implemented via software running on RAID 5 hardware

• RAID Level 10:– Combination of RAID 1 and RAID 0

Page 25: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 25

Redundancy-Based Backup and Recovery Using RAID (continued)

Page 26: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 26

Database Backups

• Databases require special considerations when planning backup and recovery procedures– Are special utilities required to perform database

backups?– Can the database be backed up without interrupting

its use?– Are there additional journal files or database system

files that are required in order to use backup tapes or disk images?

Page 27: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 27

Application Backups

• Some applications use file systems and databases in unusual ways

• Members of the application development and support teams should be involved in the planning process

Page 28: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 28

Backup and Recovery Plans

• The backup and recovery setting should be provided with complete recovery plans

• Plans need to be developed, tested, and rehearsed periodically

• Plans should include information about:– How and when backups are created and verified– Who is responsible for backup creation and

verification– Storage and retention of backup media– Review cycle of the plan– Rehearsal of the plan

Page 29: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 29

Real-Time Protection, Server Recovery, and Application Recovery

• Entire servers can be mirrored to provide real-time protection and recovery in a strategy of hot, warm, and cold servers– Hot server: the server in production– Warm server: backup server that is running and may

handle overflow work from hot server– Cold server: offline, test server

• If hot server goes down, warm and cold servers are promoted while the hot server is being repaired

• Bare metal recovery: technologies designed to replace operating systems and services when they fail

Page 30: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 30

Real-Time Protection, Server Recovery, and Application Recovery

(continued)• Application recovery (or clustering plus

replication):– Applications are installed on multiple servers – If one fails, the secondary systems take over the role

• Electronic vaulting: – Bulk transfer of data in batches to an off-site facility– Receiving server archives the data– Can be more expensive than tape backup and slower

than data mirroring– Data must be encrypted for transfer over public

infrastructure

Page 31: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 31

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

Page 32: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 32

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

• Remote journaling (RJ): – Transfer of live transactions to an off-site facility– Only transactions are transferred in near real-time to

a remote location– Facilitates the recovery of key transactions in near

real-time

Page 33: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 33

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

Page 34: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 34

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

• Database shadowing (or databank shadowing): – Storage of duplicate online transaction data and

duplication of databases at a remote site on a redundant server

– Both databases are updated, but only the primary responds to the user

– Combines electronic vaulting with remote journaling– Used when immediate data recovery is a priority– Also used for data warehousing, data mining, batch

reporting, complex SQL queries, local access at the shadow site, and load balancing

Page 35: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 35

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

Page 36: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 36

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

• Network-attached storage (NAS):– Usually a single device or server attached to a

network to provide online storage– Not well suited for real-time applications due to

latency

• Storage area networks (SANs):– Online storage devices connected by fiber-channel

direct connections between the servers and the additional storage

Page 37: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 37

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

Page 38: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 38

Real-Time Protection, Server Recovery, and Application Recovery

(continued)

Page 39: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 39

Site Resumption Strategies

• If the primary business site is not available, alternative processing capability may be needed

• CPMT can choose from several strategies for business resumption planning

• Exclusive control options:– Hot sites– Warm sites– Cold sites

• Shared-use options:– Timeshare– Service bureaus– Mutual agreements

Page 40: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 40

Exclusive Site Resumption Strategies

Page 41: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 41

Exclusive Site Resumption Strategies (continued)

• Hot site:– Fully configured computer facility– Duplicates computing resources, peripherals, phone

systems, applications, and workstations – Can be 24/7 if desired– Can be a mirrored site that is identical to the primary

site

Page 42: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 42

Exclusive Site Resumption Strategies (continued)

Page 43: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 43

Exclusive Site Resumption Strategies (continued)

• Warm site:– Provides some of the same services and options as a

hot site– May include computing equipment and peripherals

but not workstations– Has access to data backups or off-site storage– Lower cost than a hot site, but takes more time to be

fully functional

Page 44: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 44

Exclusive Site Resumption Strategies (continued)

Page 45: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 45

Exclusive Site Resumption Strategies (continued)

• Cold site:– Provides only rudimentary services and facilities– No computer hardware or software are provided– Communications services must be installed when the

site is occupied– Often no quick recovery or data duplication functions

on site– Primary advantage is cost

Page 46: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 46

Exclusive Site Resumption Strategies (continued)

Page 47: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 47

Exclusive Site Resumption Strategies (continued)

• Other options:– Rolling mobile site configured in the payload area of a

tractor-trailer– Rental storage area with duplicate or second

generation equipment– Mobile temporary offices

Page 48: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 48

Exclusive Site Resumption Strategies (continued)

Page 49: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 49

Shared Site Resumption Strategies

• Timeshare:– Leased site shared with other organizations– Possibility that more than one organization might

need the facility simultaneously

• Service bureaus:– Service agency that provides physical facilities in the

event of a disaster– May provide off-site data storage

Page 50: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 50

Shared Site Resumption Strategies (continued)

• Mutual agreement:– Contract between two organizations to provide

mutual assistance in the event of a disaster– Each organization is obligated to provide facilities,

resources, and services to the other – Good for divisions of the same parent company,

between business partners, or when both parties have similar capabilities and capacities

– A memorandum of agreement (MOA) should be drawn up with specific details

Page 51: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 51

Service Agreements

• Service agreement:– A contractual document guaranteeing certain

minimum levels of service provided by a vendor

• Service agreement should specify:– The parties in the agreement– Services to be provided by the vendor– Fees and payments for those services– Statements of indemnification– Nondisclosure agreements and intellectual property

assurances– Noncompetitive agreements

Page 52: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 52

Summary• Contingency planning includes everything done to

prepare for the unexpected and recover from it

• BR plan includes the DR plan for resuming operations at the primary site and the BC plan for moving to an alternate site if needed

• 5 procedural mechanisms for restoration of critical data: delayed protection, real-time protection, server recovery, application recovery, and site recovery

• Backup plan is essential

• Retention period for backups must be specified

Page 53: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 53

Summary (continued)

• 3 types of backups: full, differential, and incremental

• RAID systems provide online disk drives for redundancy

• Databases require special considerations for backup and recovery planning

• Mirroring and duplication of server data storage provide real-time protection

• Electronic vaulting, remote journaling, and database shadowing store data at remote locations

Page 54: Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Principles of Incident Response and Disaster Recovery 54

Summary (continued)

• Business resumption strategies include hot sites, warm sites, cold sites, timeshare, service bureaus, and mutual agreements

• Service agreements guarantee certain minimum levels of service by the vendor