Upload
conor-rover
View
224
Download
2
Embed Size (px)
Citation preview
Disaster Recovery (DR)
GEORGE F. CLAFFEY JRCHIEF INFORMATION OFFICER
CHARTER OAK STATE COLLEGE &CT DISTANCE LEARNING CONSORTIUM
Every dollar spent on disaster mitigation can save seven dollars in economic losses from
a disaster. The multimillion dollar price tag on the levee improvements in New Orleans
that were left undone has been dwarfed by the multibillion dollar cost of rebuilding flooded
neighborhoods in the wake of the hurricane and the ensuing storm surge.
Worldwatch Institute, 2006
Jenzabar Inc., An EDUCAUSE Platinum Partner, and Charter Oak State College - Disaster Recovery for Small SchoolsDelivered 3/20/2007 at Nercomp 2007Copyright 2007, George F. Claffey Jr.
Basic Terms
Disaster Recovery (DR): A plan to recover business critical data in the event of a disaster
Business Continuity (BC or BCP): A management process to ensure the continuity of businesses
Continuity of Operations Plan (COOP): A plan to ensure operations continuity after a disastrous event has already occurred.
Recovery Time Objective (RTO): Acceptable disruption or amount of downtime between the disaster event and the recovery of operations.
What Elements Compromise DR/BCP
Security (both Physical and Technological)Equipment (Servers, Storage, and Network)Information (Customer Data, Log Files,
archives)Business Processes or Business RulesCommunications
Old Disaster Recovery Plan
Only mission critical app was SIS (uptime undefined)
Reciprocal Site (Sister College)Data Tapes at home / alt non-secured locationJust in time delivery of replacement gear (drives,
NICs, servers)Disaster Recovery planning was only done by IT
It has been difficult for Higher Ed to measure downtime in relation to monetary loss
This has handicapped organizations ability to create a DR/BCP Budget
Recent Changes to our Understanding of Disaster Recovery are a result of…
AndrewKatrina9/11Regulatory CompliancePotential Pandemic
Old Disaster Recovery ModelWhy Didn’t it Work
Only mission critical app was SIS ERPs got bigger, web-enabled, and Single Sign-on make them the core of the
Campus Activities (registration, grades, etc.) Reciprocal Site (Sister College)
A great idea but who do you call at 2AM when something on your site is broken. This became more like 9-5 M-F friendly help, no SLA, no deal if the disaster is regional (Katrina)
Data Tapes at home / alt non-secured location Tapes stolen, lost, at a terminated employee’s house, FERPA – GLB restrictions
Just in time delivery of replacement gear (drives, NICs, servers) Katrina shut down shipping, UPS strike was similar.
Disaster Recovery planning was only done by IT Katrina made everyone aware that alternate web-presence etc wasn’t just a
nicety but required in the event of catastrophic failure
It has been difficult for Higher Ed to measure downtime in relation to monetary loss
This has handicapped organizations ability to create a DR/BCP Budget
New Disaster Recovery /New Definition of What’s Mission Critical
Protect from internal threatsProtection from natural disastersProtection with the ability to persist
(locally/remotely)
What’s mission critical ERP/SIS, Learning Mgmt System E-Mail, Domain Web Presence Phones / 911
Begin by Taking a Good Look Around
Take an honest look…My current Disaster Recovery plan doesn’t work!
Common Problems In Existing/Static DR Plans
The plan does not encompass all of a campuses current technological services and/or it becomes outdated Servers or Storage have changed Versions of Software or patches have been applied Personnel have changed Networks or network addresses have changed Security or the network devices have changed Storage has change Again
Measure the Institutions Level of IT Maturity
Understand Where You Are
Most schools and colleges have small or limited IT staffs. The result is that Disaster Recovery often sites on the back burner.
Goal “A Living DR/BCP Document”
The Disaster Recovery Plan is dynamic, it needs to be changed as networks, applications, and business processes change. At minimum the plan must be updated once a year
The DR/BCP Plan is a very public document
We Need a Strong Foundation
Ingredients for A strong DR Foundation
•Executive Support
•Cabinet / Stakeholder Support
•Information Technology Staff
•Faculty Support
•Facilities Support
•Budget
Engage People…Tell a Story
We found that a story helped our end-users and executives get into the proper frame of mind.
A pipe burst and the basement (data center) took two feet of water. The water damage voided server warranties and multiple pieces of critical data center equipment was damaged including the UPS. We have contacted our vendor and new equipment is on the way but we only have three people in IT to begin the restoration. Activities must be done in “serial order.”
What should we prioritize, what can we live without for: 4business days, 10 business days, 30 business days. Who will notify our students we
are having problems, how will they notify them if e-mail and the SIS is down?
Recovery Time Objective (RTO) is principal factor in DR Planning and
Budgets
How long can we be without Student Information System E-Mail Website Anti-Virus Protection Internet Connectivity
MonthsWeeksDaysHoursSeconds
Recovery Time Objective (RTO) is principal factor in DR Planning and
Budgets
Start by defining all your systems and key components Servers (model, type, specs, HD configurations) Personnel (skills, location, emergency contact info) Applications (versions, patch, custom tweaks) Network Information (domain, trusts, IP Schema,
Firewall config)
Seek business and user input as to what is important Seek executive input
What Impacts RTO
Number of SystemsAmount of Storage RequiredNumber of Restoration Devices (HD, Tape,
etc.)Personnel Available (skills required/shared)Equipment availability (cold site, warm site,
hot site, 1-800-IBM)
More Storage, More Servers = Greater Time to Restore
Minutes
Recover data
Transport tapes
Replay logs
Tape vaulting
Replication Based Solution
Recovery Time
Multiple days
Disaster-Recovery Mechanism
Time
Case Example
WebCT Campus Edition (serving 20K students)
WebCT Vista (serving 20K students)
N-Tier Systems Cannot Be reconstructed Quickly
Load BalancersWeb Logic DomainsDatabase ServersStorage Area NetworksFirewalls (VPN Tunnels)IP and VIP addressing
What determines the Big Ticket Items
Infrastructure Required (Fiber Lines to campus buildings, routers, UPS, Electrical, HVAC, Internet/DMark)
Complexity of the Systems (N+1 WebCT Vista Architecture)
The RTO and application can have
Put DR and BCP Planning on the Permanent Radar
Disaster Recovery
Active Directory Project
WLAN AP Deployment
Active Directory Project
Security Audit
The DR Plan needs to be a LIVING document
Integrate DR with existing change control procedures
Connect with Project Management Offices or Key application stakeholders
Document current and changes in business processes
Document changes in network infrastructure and security infrastructure
Create the plan online (DR Software or something more simple – MS SharePoint)
How do you eat an elephant?
One Bite at a Time
DR Must Plan for Catastrophic Failure but have its roots in Small Recoveries
Most Likely you will perform Entire Mailbox Restoration Jenzabar/SIS Database Restoration Domain Controller or FSMO roll change
How will you handle a larger challenge E-Mail Server Restoration Domain Restoration
Short Term Preparedness
Make sure you are backing up the right systems and the right data (no cost)
Leverage Virtual Server and Imaging Technology to “image” computers and equipment (2-5K)
Look into Bare Metal Restore (BMR) type backup solutions to your existing products (Symantec Backup Exec for Windows 1-10K)
Outsource tape/disk vaulting for storage needs (basic contract 1 yr = $3K)
Prepare and test a plan to shutdown your server room/data center and restart it
Move a domain controller to an MDF Closet in another building, Secondary AV server, tertiary DNS server, DHCP (disabled)
Aggregate and Copy Institutional Recovery Information
Equipment Warranties – and policiesLicense information – Media, License KeysVendor Support ContractsStaff lists and Alt Contact InformationUtility Vendors / Acct # / Emergency ContactsInsurance Information, Coverage amounts,
riders
What is my coverage and how do I engage services at 2:00AM on Saturday?
Short Term Preparedness (Contd)
Begin Sharing Information These Slides Your Current Plan (Good or Bad)
Take a critical look at your current plan Can you perform minor updates or do you need
wholesale replacement?
Determine What Support You Need Executive IT or Staffing Academic Support
Test the Plan and Test it again
Perform plan and routine validationsRecall TapesRecall StaffPerform Table Top Exercises w/Executive and
Cabinet StaffPerform actual physical exercises
(shutdowns/restarts, alternate center)Provide cross-training to IT staff
Identify Non-IT tasks that impact DR
Purchasing Processes (including Approvals)HotelsFoodTransportationMoversTradesmen (Electrical/Mechanical/HVAC)Security (Guards/Police)
How Can We Purchase Supplies/Equipment
Without An Electronic System Can We?Can we cut P.O.’sCan we use Corporate Credit Cards or Raise
LimitsAre there existing lines of credit we can use
Create ProceduresHow Do We Notify Our Constituents of a
Disaster
How do we Recall the Emergency Response Team
Notify StudentsNotify Faculty/StaffNotify Parents
All done without use of Electronic Information Systems
Never Assume Anything
Never Assume your staff will be at 100% capacity
Never assume your emergency systems will work
Never assume you are protected because you built redundant systems
Never assume normal communication mediums will be available
Long Term Preparedness
Start building Disaster Recovery Awareness among staff (start w/IT)
Become knowledgeable about other Emergency plans on campus (Police, Facilities, Medical, etc.)
Build Disaster Recovery into project budgets and availability questions into decision making criteria
Identify onsite and off-site locations for possible DR Recovery
Connect w/National Higher Ed initiatives (Educause, Nercomp, Sloan, Hi Ed pandemic planning)
Seek advice from DR Consultants (SunGard, IBM, or extend relationships with existing vendors (Jenzabar).
Long Term Preparedness
Make the Institutional and Policy Changes NowCommand and Control AssignmentsPurchasing GuidelinesNotification SystemsAuthority for enacting a disaster