CSE 4482 – Session 9

CSE 4482 – Session 91. Understand system availability and business

continuity, and recognize differences between the two.

2. Comprehend incident response systems and their role in achieving the system availability objective.

3. Explain disaster recovery planning objectives and its, design, implementation and testing requirements.

4. Comprehend the link between business continuity and disaster recovery.

5. Understand the role of backup and recovery in disaster recovery plans.

System

Availability

Business

Concerns include development, operation, security, and

Continuity

Concerns include strategy, operation, control, and

Is interrupted by

Incidences or breaches

Disasters

is impacted by

Incidence detection and protection

Recovery

Warranta response

Warrant

Back up Recovery

Permits, when necessary

Is designed as two stages

Systems resources

Data

Technology-focused concern

Business process- focused concern

Power outage at Northwest Airlines Thunderstorm and lightening at the datacenter

location caused the problem. Systems, down initially, operated in a degraded

manner the next morning. Took very long to check people in flights. NWA triggered manual processes. Lines became

longer and so did the delays in departure. Arrivals were late, but the departures from gates at

the destination airport made the flights to wait before they could get to the gate.

NWA announced an embargo, limiting itself to what it can handle under the circumstances.

System Availability and Business Continuity

System availability assures you that business will continue to operate.

Business continuity is necessary for systems to add value on an ongoing basis.

The issues of business continuity and systems availability are related and even overlap to a degree.

Incident Response Incident: A level of interruption in the system availability that

appears to be temporary. An incident can be triggered by an accidental action by an

authorized user, it may result from a threat. Incidents may be detected by:

End-users who may describe the symptom but not the cause. Those monitoring systems and processes may detect anomalies

which lead to an incident that has occurred. Attack: A series of steps taken by an attacker to achieve an

unauthorized result. Event: An action directed at a target that is intended to result in

a change of state, or status, of the target. An event consists of an action and a target.

Nature of Response to an Incident Assess the business significance of the incident’s

impact. Identify critical business processes that might have been

compromised. Determine the root causes of the incident. This might

present a challenge, for every incident could be of a different variety. The team may need to consult experts from outside the team.

Training in forensics could help the team collect and evaluate evidence systematically.

Standard procedures must be followed for restoring the affected systems and processes, instead of ad hoc, one-off attempts to restore what is compromised or lost.

Preventive Measures

Prevention is better – and could be more cost effective - than a cure.

Preventive measures require an anticipation or prediction of what might happen in terms of incidents and consequent compromises.

Lessons learned from the organization’s and from others’ experiences can help design and implement effective preventive measures.

Incident Response Team

A multi-skilled group, since the incident may be any variety and may impact almost any information asset.

May include representation from human resources, legal, information systems, networks and communications, physical security, information security, and public relations.

A top management team member may be designated as a direct contact for counseling and support.

CERT

CERT stands for Computer Emergency Readiness Team.

Also called CERT Coordination Center (CERT CC), it is the Internet’s official emergency team.

Provides alerts and offers incident handling and avoidance guidelines.

Is located at Carnegie-Mellon University. www.cert.org

Disaster Recovery Disaster: An event that causes a significant and perhaps

prolonged disruption in system availability. Disasters can be man-made or natural.

Man-made disasters can be malicious or unintentional. Disaster recovery is a systematic effort to recover from the

impact of a disaster. Best way to understand recovery is by focusing on post-disaster

phases. Post-disaster phases

Immediate response Near-term resumption Recovery toward normalization Restoration to pre-disaster state

Phase ImmediateResponse

Near-termresumption

Recovery toward normalization

Restoration to pre-disaster state

Objective Address emergency situation only.

Resume operations at any level possible.

Expand operations and extend capabilities and functionalities.

Return as close to the original (pre-disaster) state as possible.

Example Event:A logic bomb destroyed the operating system and customer data.

Call customers whose orders are yet to be filled. Determine the current state of the system and data. Call in backup tapes and equipment to a warm site. Begin manual processing of critical orders.

Install equipment, load operating system and applications, restore data, and test outputs. Switch to automated processing.

Expand the order processing cycle. Increase the functionality (e.g. report generation).

Load operating system, data, and applications at the original site. Pre-test. Resume processing in a parallel run with the warm site. Cut over to the original site. Fold operations at the warm site and return the equipment.

Timeliness of Action and Value of Recovery

Timeliness of action The timeline of actions planned should reflect value of the

action at the time. Certain steps can wait while others must be taken without

delay, to minimize losses.

Value of recovery Timeliness of action reflects value of the recovery target. Considering this, recovery tasks should be systematically

assigned to each post-disaster phase.

Disaster Recovery Planning (DRP)

DRP: The definition of business processes, their infrastructure supports and tolerances to interruptions, and formulation of strategies for reducing the likelihood of interruption or its consequences.

Component steps of DRP: Define the process Identify what supports the process and its tolerance to

interruptions Determine and implement strategies that would reduce the

likelihood and cosequences of interruptions.


Assessing potential losses: Disaster Impact Analysis What disasters the firm is likely to face? What is the probability of each type of disaster? What is the impact of the disaster on the firm?

Disaster Recovery Planning (DRP) Value-based recovery planning

Definition of criticality and criteria to determine criticality Identification of critical business processes and their

supports Identification of the role of information systems resources

in the critical process Determination of process owners and process customers Determination of the amount of time the business can

survive without the process post-disaster Identify interdependencies between the process and the

rest of the business processes and systems To find critical processes, consider attributes such

as importance, key users, tolerance to outage, waiting time between cycles, possibility of data recovery.


Disaster recovery strategies How do we recover a system given its priority? Address the question by system components.

Data (e.g., designate off-site storage) Processing (e.g., backup and store offsite current

copies of the software) Network and communication (e.g., backup and store

offsite a copy the current network configuration) Dependencies with other systems (e.g., identify how

these processes will be interfaced post-disaster)

Assessing potential losses

Finding criticality

Value-based recovery plan

Recovery strategies

Recovery locations

Recovery teams

Disaster readiness

requires

Results in

Is based on

To formTo select

Are tested for Are tested for

Potential fordisasters

DRP: Recovery Locations Recovery location: A site(s) where processes and

systems will be recovered post-disaster. Hot sites: Near-perfect replicas of the operations. Cold sites: Just the infrastructure (computer operations

room, platform for installing hardware, power and communication lines, cabling, etc.).

Warm sites: More than just a cold site, but not quite as ready as a hot site. For example, it may include commonly used computers and operating system.

Reciprocal agreements: Sharing of similar resources by those in the same or similar computing enviornments.

Colocations: Recovery is planned using availability of computing resources at the firm’s many locations.

DRP: Teams Purpose of forming teams is to ensure that recovery

tasks are accomplished in an orderly and responsible manner.

The number and nature of teams could vary across organizations. However, each team should include knowledge and skills

necessary to perform its assigned tasks. Recovery teams can be organized by recovery

phases. Flexibility in assignments is necessary, for an actual

disaster may need adjustments to the team. Non-availability of some team members when disaster strikes is also likely.

DRP: Disaster Readiness

Meaning of readiness: Having the assurance that if and when a disaster strikes, the firm has a high likelihood of recovering from the disaster. Testing of the plan is crucial to get this assurance. Disaster readiness practices include: Walkthroughs: Having a plan preparer walk though others

to show how the plan leads from point A to point B. Rehearsals: An “as-if” exercise to simulate a disaster’s

impact and have people responsible recreate recovery of “lost” processes and systems.

Compliance (Live) testing: Actual test of recovery with a simulated disaster.

Business Continuity Planning (BCP)

BCP: The totality of plans made to recover the business operations following a disaster. Recovery of all operations is involved, not just

information assets. Methods and strategies adopted for BCP are

comparable to, and often overlap with, those used in DRP.

Business Continuity Planning (BCP) Business impact analysis is an exercise in risk

assessment. Identify vulnerabilities of the firm. Assess the business impact

Focus on a particular disaster and determine processes that might be affected, and/or

Analyze all business processes to assess probable business impact in the event that a disaster strikes.

Initiate a planning process to develop methods and strategies to mitigate risk.

Business recovery Approaches and methods for business recovery are similar

to those discussed in disaster recovery planning.

Assurance Considerations

Any assurance that BCP/DRP will be effective requires an examination of such plans from three angles: Method: Review the method followed in the development

of the plan. A sound planning process make possible a plan that is complete and reliable.

Content: Should have been collected from “right” participants, and the instruments and methods used to collect data must be valid. The plan should be current.

Testing: Critical components of the plan should be tested, results should be documented, and corrective action, where necessary, should follow.

System

Availability

Business

Concerns include development, operation, security, and

Continuity

Concerns include strategy, operation, control, and

Is interrupted by

Incidences or breaches

Disasters

is impacted by

Incidence detection and protection

Recovery

Warranta response

Warrant

Back up Recovery

Permits, when necessary

Is designed as two stages

Systems resources

Data

Technology-focused concern

Business process- focused concern

Documents

CSE 4482 – Session 9