56
Change Management Procedures Document No: ITS-NOC 300 Title: Change Management Procedures Month Year May 2011 Doc. Type MS Word File name: ITS-NOC 300: Change Management Procedures V4.1_.doc

Operations Pattern Document using OPERDOC · Web view“lessons learned” incorporated into “pre-RFC” checklist report and followup with problems (e.g.: vendor, customer related,

  • Upload
    hakhue

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

Change Management Procedures

Document No: ITS-NOC 300Title: Change Management ProceduresMonth Year May 2011Doc. Type MS WordFile name: ITS-NOC 300: Change Management Procedures V4.1_.doc

Classify by , , and .

Pre-RFC ChecklistStart

The Change Process

Urgency High ? Get Customer Approval

Schedule using

No

?

Determine appropriate Alert Change Manager to

No

with Customers

change request.

Send to CAB

Perform ChangeTest change

User OK?

CAB OK?

Yes

Yes

No

Yes

Analyze/resubmit

2

Yes

No

Analyze/resubmit

Change Management Procedures

- 2 -

The Change Process Part 2

No

Back Out Successful? No

Yes

Yes

Change Successful?

Stop

)

2

- Complete Documentation updates - Schedule Training (User, Staff)

Change Management Procedures

- 3

Change Management Procedures

Table of Contents

ContentsSummary..........................................................................................................................6Goals of Change Management.........................................................................................7Change Classifications.....................................................................................................8

1) Urgency...........................................................................................................8ChangeLog.......................................................................................................................9

2) Impact..............................................................................................................103) Risk..................................................................................................................10Example of Change Types...................................................................................11

Forward Calendar of Changes.........................................................................................11Maintenance Window......................................................................................................12

By Service............................................................................................................12Guidelines for Selecting Upgrade Windows........................................................13

Change Approval Matrix.................................................................................................14Approvers Table (low and medium risk changes only).......................................14

Activities and Deliverables..............................................................................................15Approval Procedures............................................................................................15

The Request for Change (RFC).......................................................................................16How to..................................................................................................................16

Lead Time for Approval..................................................................................................17RFC Submission Fields....................................................................................................18Customer Groups Communication/Service Bulletins......................................................20Forward Schedule of Changes (FSC)..............................................................................21Emergency Change Process.............................................................................................22Closing the RFC...............................................................................................................22Change Assessment (Post-Implementation Review).......................................................23Glossary...........................................................................................................................24Appendix 1 CAB Contact List.........................................................................................26Appendix 2 Pre-RFC Initiation Process..........................................................................27Appendix 3 Critical Incident Reviews (aka Post-Mortem”) Guidelines.........................32Appendix 4 Examples of changes....................................................................................33Appendix 5 Examples: Communications Strategy.........................................................35

- 4 -

Change Management Procedures

Version 4.0

4.0 November 28, 2009 Review of initial documentation

- 5

Change Management Procedures

SummaryChange Management applies to all changes to Applications, Databases, Networks, Infrastructure and Documentation.

All changes are recorded and classified by their: service impact risk urgency

Who can initiate changes:Each change is sponsored by a Manager who is responsible for the planning and implementation of the change, and any recovery.

Certain changes are reviewed and approved by a Change Advisory Board (CAB). Those changes include all High Risk changes and changes that meet the criteria illustrated in the Change Approval Matrix. The CAB meets weekly with the sponsoring Manager(s) and approves changes for the following week, and beyond.

Highly Urgent (Emergency) changes require separate approval.

Unsuccessful changes that are successfully backed out within the change window are followed up with an Analysis of Failed Change by the sponsoring managers.

Unsuccessful changes that are not successfully backed out within the change window are followed up with a CIR (Critical Incident Review( aka Post Mortem by the Director, IT Operations or designate.

Frequency of CIR ReviewReviews are conducted by the Operations Committee at the bi-weekly Tuesday morning meetings at 09:00am in Rm443. Meeting is chaired by the Director of Operations or his/her designate..

- 6 -

Change Management Procedures

Goals of Change ManagementThe goals of the Change Management Procedures are to:

Minimize disruption caused by changes to the production environment through effective risk management.

Assign ownership and responsibility for changes to the relevant manager.

Optimize the change process through active management of reliable data on changes.

Manage all changes to the production environment including hardware, software, environmental equipment, networks, and procedures.

Keep users informed of changes to the production environment and manage their expectations. (This is also an opportunity to promote and strengthen client communications.)

Keep UBC IT staff aware of changes to the production environment and increase the number of UBC IT staff reviewing changes.

Develop a more pro-active approach to changes through encouraging better planning of changes, back outs and contingencies.

To measure success and success trends over time.

Key Success FactorsA CM process used by staff who see it as an enabler

Control over urgent changes

Correct scoping and staffing of CM

Knowledge of changes required for good decisions

A CM process aligned with the project lifecycle

Visible management commitment and support to enforce process.

- 7

Change Management Procedures

Change ClassificationsChanges are classified according to:

1. Urgency2. Impact3. Risk4. Nature of Application.

These changes are classified by the change initiator, and checked by the CAB.

1) UrgencyDescribes how quickly the change must be implemented.

High (Emergency Changes). The two changes that fall under this category are: Break/fix: required immediately – a system or service has failed and is not available and cannot wait for the normal change approval process and/or next scheduled maintenance period.

The other type of emergency change is urgent and immediate action is required to avert a system or service outage.

High Urgency changes are considered as emergencies and are subject to a separate Emergency Change process. Approval is granted as noted in the Emergency Change Process section.

For Break/fix situations and the Sponsor deems a Change is required, an Emergency Change can be done without approval received prior to the change but should submit an RFC as soon as possible after the change.

Medium A normal change which will be processed by the CAB and installed in the next maintenance window.

Standard aka Change Log A standard change is one that is effectively pre-approved and can readily reference pre-defined workflows – these can be entered as a Change Log.

A standard change is a change to the infrastructure that has the following characteristics: recurrent, well known and proven pre-defined, relatively risk free is the accepted solution to a specific requirement or set of

requirements. Approval is given in advance. Does not require CAB approval but is subject to Post

Implementation Review by the CAB.(PIR)

- 8 -

Change Management Procedures

Low A change that can be deferred and/or grouped with other changes in a subsequent release.

ChangeLog

A “change log” is defined as a standard change and does not require CAB approval.

These standard changes ae well understood, recurring, low risk change, redundant, accepted response to a specific requirement of set of circumstances, the approval for which has been delegated to the group making the change.

Examples:

- 9

Installation of SSL certificates / patches Scheduled workload to new TWS server Cycling of LDAP instances Ongoing alumni production db maintaince Symposium call pilot integration

Change Management Procedures

2) ImpactThe change impact is determined by the number of users who could be affected by a reduction in their normal service level. This takes into account the worst case where a change does not succeed.

High Change effect covers the campus.

Medium Change effects roughly a department or building.

Standard Change There is no anticipated user impact, for example, a change to a redundant system. A standard change is one that is effectively pre-approved and can be entered as a Change Log. See examples:

Low Change effects a single user or small number of users.

3) RiskDescribes the risk* that the change will not be made successfully.

High There is a higher than normal risk that the change will have to be backed out, or that a system failure will occur after the change is installed.

Medium A change where success is the expected outcome, but the change is not routine, and could require a back-out.

Low A routine, proceduralized change, with little chance of failure.

*Risks are managed through back-out and contingency planning, and all changes with high risk require approval of the CAB. The Change Owner of a high risk change is expected to demonstrate back-out & contingency plans consistent with the level of risk.

- 10 -

Change Management Procedures

Example of Change TypesAdmin FMIS, HR, Alumni, Student systems

Elearning WebCT

Enterprise Middleware CWL, LDAP, DNS servers

Enterprise Applications Email, Mercury, uPortal, UBC main web page

Forward Calendar of ChangesCheck the forward calendar of changes for conflicts with your proposed change. This calendar is found on outlook.

See example below.

- 11

Change Management Procedures

Maintenance WindowBy ServiceThe maintenance window is determined from the following table. Note that the maintenance windows in the table are meant to cover backout if required, i.e. in a 6:00 am to 8:00 am window, the change would normally be done from 6:00 am to 7:00 am. If the change required exceeds the allotted time, then the change should begin earlier, rather than impinge on the contingency time.

Please note that at this time – November 2009 – this process is being revised and a draft document is being proposed.

Service Upgrade Time Contingency Time

WebCT (Effective July 1, 2005)

Saturday with an expanded time frame between 08:00 and 12:00.  Saturday 11:00am to 12:00am

HRMS Weekend 06:00 to 9:00 Weekend 07:30 to 09:00

PeopleSoft Saturday, 06:00 to 09:00 Saturday, 06:00 to 09:00

- 12 -

Change Management Procedures

Guidelines for Selecting Upgrade WindowsIf no maintenance window has been established for a service, use the following table to select a potential window. (the default time is 06:00 – 08:00am Monday to Friday when not stated)

Change Type Impact Upgrade Time Contingency Time

Admin applications All Outside business hours and

with agreement of customerOutside business hours and with agreement of customer

Cable system All Set by customer Set by customer

Elearning All 06:00 – 08:00am Saturday 07:00 – 08:00 Saturday

Enterprise middleware

Medium or Low 06:00 to 07:00 weekdays 07:00 to 08:00 weekdays

Enterprise middleware High 06:00 to 07:00 weekends or

holidays07:00 to 08:00 weekends or holidays

Enterprise applications

Medium or Low 06:00 to 07:00 weekdays 07:00 to 08:00 weekdays

Enterprise applications High 06:00 to 07:00 weekends or

holidays07:00 to 08:00 weekends or holidays

Network Medium or Low 06:00 to 07:00 weekdays 07:00 to 08:00 weekdays

Network High 06:00 to 07:00 weekends or holidays

07:00 to 08:00 weekends or holidays

Security (firewalls)

Medium or Low 06:00 to 07:00 weekdays 07:00 to 08:00 weekdays

Security (firewalls) High 06:00 to 07:00 weekends or

holidays07:00 to 08:00 weekends or holidays

Voice All 05:00 to 07:00 07:00 to 08:00

- 13

Change Management Procedures

Change Approval MatrixApprovers Table (low and medium risk changes only)

IMPACT

Service Type NoneNo Impact

Low Single Users

MediumDepartment

HighCampus

Admin applications Group Group Group CAB

Cable Group Group Group CAB

eLearning Manager CAB CAB CABEnterprise Operating Systems CAB CAB CAB CAB

Enterprise middleware CAB CAB CAB CAB

Enterprise applications CAB CAB CAB CAB

Network Manager Manager Manager CAB

Security (firewalls) CAB CAB CAB CAB

Voice Manager Manager Manager CAB

Infrastructure CAB

Firewall Rule Changes CAB

Note: all High Risk Changes require CAB approval.

- 14 -

Change Management Procedures

Activities and Deliverables

Approval ProceduresWho can approve:

Determine from the Approvers table whether the change can be approved by your group, manager or theCAB.

Group or manager level?If group or manager level approval is indicated, then follow approval procedures for your own group.

If CAB approval is requiredComplete the RFC web form at http://www.UBC IT.ubc.ca/change.html

This webform will send a message to NOC staff, who will open an RFC in Magic. The newly created Trouble Ticket number generated will be sent to you for further referencing. (for example to relay success of change.

- 15

Change Management Procedures

The Request for Change (RFC)

How toGet approval by CAB

Enter the RFC on the webform at:http://www.UBC IT.ubc.ca/change.html

This webform will send a message to NOC staff, who will open an RFC in Magic trouble ticketing system..

The newly created Trouble Ticket number generated will be sent to you for further referencing. (for example to relay status of change or other explanations.

CAB meetings scheduleThe CAB meets Tuesday afternoon at 1:30 pm in Rm443.

DeadlinesRFCs to be considered by the CAB need to be entered by 23:59 on Monday evening.

Change to be implemented on the subsequent weekend must be sent to CAB by Monday (up to 23:59)

Decisions wil be sent out by Tuesday afternoon following the CAB meeting.

Change Owner responsibilityAll changes must be preent to the CAB by the change owner or designate in the event that there is a need for clarification regarding the proposed change.

RFC change “logs” It is possible that you may want to “log” but not require a change to be approved by the CAB.

a. Ensure that the criteria for the log has been met.b. Then “Send RFC to Log” instead of sending “RFC to

CAB” when completing the webform.

How to comment on/change RFCDo a “reply to all” from the original message that the NOC staff had sent to you notifiying you of RFC reference #.(which in fact is a Magic trouble ticket)

Closing the change

- 16 -

Change Management Procedures

All ITS staff have access to Magic and have IDS. Find the RFC number in Magic and add an action item indicating the outcome of the change: Successful, Cancelled, Backed Out, Failed.

If you do not have a Magic ID (i.e.: you are a new employee or a consultant, for example) – send a message to [email protected] containing the above status information of the RFC and they will enter the completion status on your behalf.

Lead Time for ApprovalLead time The lead time required to make a change depends on the SLA in place

for the service(s) interrupted.

In the absence of an SLA, the service manager or customer can set the lead time for approval. Without an SLA or Service Manager or customer input, the CAB requires the following minimum notice: An RFC submitted on Monday can be scheduled for the subsequent Saturday morning or after.

There is a 72 hour window that is required for customer notification once change has been approved.

A change that cannot wait for the next CAB meeting can be marked Urgent, and may be approved through the emergency change process.

- 17

Change Management Procedures

RFC Submission FieldsSubject A short description of the change which will appear on the email

subject line.

Contact Name The name of the RFC submitter.

Change Owner The name of the UBC IT manager responsible for the change.

Contact email Email of the RFC submitter.

CC Email addresses to which the change will be sent (in addition to the CAB).

Sponsor Name The name of the service manager or group manager who accepts responsibility for the change and its outcome. The change sponsor is normally a member of CAB, represents the change to CAB during its discussion, and is the point person if the change fails. Note that the sponsor must not be away at the time of the change or as required, have a designate in place.

Routing This specifies whether the RFC will be circulated to CAB (the usual case), or just logged in Magic.

Logging straight to Magic is appropriate when the RFC has been approved at group manager level.

Proposed date, start time, end timeThe date and time for the change will normally fall into one of the maintenance windows listed under Maintenance Windows.

Reason for Change Overall description of the change. (Is there a need to use specific terminology that should appear on website?)

Urgency High, medium, or low.

Impact High, medium, low, or none.

Risk High medium, low.

Effects The services that could be affected by the change. Give a clear description of the effect, and potential effect, on the customers.

- 18 -

Change Management Procedures

Communications The person responsible for communications.

Communications and Web Site UpdatesWho is responsible for communicating this change to key stakeholders? This may be the same person as the person requesting the change, or the service manager, or other person. Please note that the UBC IT main website is the default.

Web Site Content Suggested wording for any potential service bulletin, including a concise headline. All websites should be consistent in message content and layout where possible.

Components From the UBC IT perspective, what platform components will have been modified.

QA approval Name of the person(s) responsible for assuring the quality of the change. This could be the person who tested the change, or the installer.

Implementers Who will make the change. (Will change require training or documentation updates?)

Backout personnel List of people to contact if the change must be backed out.

QA Checklist A series of tic boxes that provide an overall view of the level of quality assurance.

- 19

Change Management Procedures

Customer Groups Communication/Service BulletinsCustomer groups are contacted in sufficient time to amend or reschedule the proposed change.

The benefit of good communication with the customer is that it not only allows a better understanding by the customer of the effort that is required to sustain a service but shows that any downtime has been carefully considered and reviewed to minimize impact on the customer’s business cycle.

The normal means of contact consists of email lists and web postings, although personal contact is also used as appropriate.

Documentation on who to contact and how, together with sample scripts for service outages (located in section 4: Guidelines for describing the problem OR can be found in the document Service Outage Notifications located here:.

Saturn\users\UBC IT|public|Systeminfo|Service Outage Notification Procedures.doc

(The contact is done by the person designated in Communication and Web Site Updates on the RFC.)

If customer objects to the change time after the notification goes out, the customer is referred to the change sponsor for resolution.

- 20 -

Change Management Procedures

Forward Schedule of Changes (FSC)The NOC operators will add new approved changes to the forward calendar of changes (FSC).

The forward FSC is available on Microsoft Outlook. – See example below.

- 21

Change Management Procedures

Emergency Change ProcessAn Emergency Change is a change whose Urgency is high, and where a deliberate decision is made to shortcut the normal change process. Examples:

Service outages (major) HW/Application failure Security threat

The steps for getting an Emergency Change approved are:

Requestor: Obtain approval of the customer and Service Manager Alert the Change Manager to the forthcoming change Issue the RFC with Urgency = High

The following conditions must be met for an Emergency Change to be approved: An Emergency Change requires 3 approvers from the UBC IT Management (Directors and

Managers). One of the approvers should be a Director, preferably the assigned Code 3 Director. The Change Sponsor cannot be an approver. A Director can veto the approvers.

The normal change control process notifies CAB of the RFC through email. CAB members can communicate through reply email or telephone Operations to participate in the approval process.

In the event that Operations needs to escalate and telephone Management to obtain approvers, then they will start with the assigned Code 3 team.

Ideally the approvers are ones who are familar with the subject area as they can best represent the impacts, urgency, and risks of the Change.

With completion of the above, the normal change control process is adapted as appropriate.

*Note that changes that are not genuinely urgent will not be approved through the emergency process, and will have to wait for the next meeting of the CAB.

Closing the RFCFollowing the change, its implementation is reviewed and closed. The outcome is recorded as a Magic action based on one of the following:

Successful Change accomplished and system back up on time.

Cancelled/Denied Change called off before the change start time.

- 22 -

Change Management Procedures

Backed out Change was removed from service during the maintenance window.

Failed The change interrupted normal service.

All UBC IT staff have a Magic ID, the action can be entered directly online. If for some reason you do not have a Magic ID, a message can be sent to the NOC staff at [email protected], quoting the RFC number and the outcome, and the staff member will update the record.

- 23

Change Management Procedures

Change Assessment (Post-Implementation Review)For all completed changes, the following steps are taken:

The NOC staff produce KPIs which summarize the changes performed in each period by outcome

Changes from the previous week are examined at the weekly CAB meeting, for example:

change objective achieved, feedback (positive/negative) from users and customers, were there unexpected side effects from implementing change, effective resource planning, implementation was executed as planned, change was on time, backout was successful if/when applied) “lessons learned” incorporated into “pre-RFC” checklist report and followup with problems (e.g.: vendor, customer related, CAB) information is absorbed into improvement process.

The statistics are reviewed by the change committee.

For any changes which was backed out or failed: a problem report is created by the operators and sent to the change sponsor, or the

service manager if different from the change sponsor.

For any change which failed, a critical incident review is conducted at the next Operations Committee meeting

following the change, and the results of the review is published along with the Operations Committee minutes.

- 24 -

Change Management Procedures

GlossaryAnalysis of Failed Change An analysis of a failed change is held after a change fails and is

successfully backed-out. The analysis is managed by the Change Owner Manager or designate and results reported to the Director, IT Operations, and the Operations Committee. It confirms the sequence of events, establishes the cause(s) of the failure(s) and makes recommendations to improve future performance.

CAB The Change Advisory Board.The CAB consists of all members of the Operations Committee, all service managers, and others as required to represent the various customer groups. Change Advisory Board Changes are the responsibility of the Change Advisory Board.

Change Manager The Change Committee convener. The Change Manager performs the following tasks:

Convenes the CAB Circulates new changes to CAB Issues approvals for changes

There is always a duty change Manager. If you do not know who it is, you can find out by calling the Operations Centre.

Change Owner The service manager or group manager who accepts responsibility for the change and its outcome. The Sponsor performs the following tasks:

Represents the change to CAB during its discussion Takes responsibility for recovery if the change fails. Deals with customer objections to the change timing.

Note that the Change Owner must not be away at the time of the change. If absence is unavoidable, please identify designate prior to meeting.

Critical Incident Review (aka)Post Mortem A Critical Incident Review (aka “post mortem”) is held after both a

change and the back-out plan fail. The Critical Incident Review is managed by the Director IT Operations or designate and is reported to the Operations Committee and UBC IT’ Senior Management. It confirms the sequence of events, establishes the cause(s) of the failure(s) and makes recommendations to improve future performance.A Critical Incident Review may also be called on other non-change failures in the production environment at the discretion of the Director IT Operations.

- 25

Change Management Procedures

Emergency Change A change whose Urgency is High, which cannot wait for the next CAB meeting. An emergency change is a case where a deliberate decision is made to shortcut the change process.

Enterprise Middleware Examples of middleware at UBC IT include, webservers (Apache, Tomcat, ColdFusion), LDAP services,

Enterprise Applications Examples of applications which are extended to the UBC community include: UBC Mail (Exchange, SunOne), CWL, MyUBC, UBC Directory.

FSC Forward Schedule of ChangesThis shows when approved changes are scheduled for installation. The FSC is available using Microsoft Outlook. (open ChangeLog calendar)

Normal Changes Normal changes are changes to the live environment that can be scheduled in advance and installed in a predictable manner.

Problems Problems are unscheduled failures of the live environment. Problems are handled through a process of problem management and escalation. Changes to the live environment made as part of a problem fix are recorded in the trouble ticket system, not the change management log.

Problem management is not covered by this document.

Standard Changes A standard change is a well understood, recurring, low risk change, accepted response to a specific requirement of set of circumstances, the approval for which has been delegated to the group making the change. These can be entered as change log where the outcome is expected to be successful with no impact to the user community.

- 26 -

Change Management Procedures

Appendix 1 CAB Contact ListEmergency CAB Members

 Listed alphabetically

Aksentsev, Felix (IBA) Belsito, Mark (Apps PM) Burns, Jennifer (Passive) Cooper, Lynda (Parental Leave) Craven, William (Passive) Cumming, Lois (Systems) Fong, Kent (Access) Frazer, Dave (Passive) Haeusser, Jens – (Passive) Hay, Marilyn – (NMC) Huang, Amy (Passive) IT - Systems Operations (Passive) Johnson, Patrick Kita, Stan Lay, Sean Lee, Jeanne Lim, Michael Loewen, Doug Macdonald, Bob McKelvie, Evelyn Miladinovic, Jovan Ng, Susan Operations (ITServices) Quinville, Doug Rosco, Steve Sayer, Margaret Shaw, Wes (Passive) Smith, Brock Thompson, Don (Passive) Thorson, Michael Twining, Neil IT Resource: Klinck Rm.443 Bourdon, Eric Razi, Sam

- 27

Change Management Procedures

Appendix 2 Pre-RFC Initiation Process (Draft: Please comment (improvements or additions to list)

Note 1: This is not meant to be a comprehensive list, but a starting point in determining what needs to be considered, your input is greatly appreciated.

1) Reason for change: - Response to a business need - new service rollout- end of service/systems lifecycle- security, audit, legislation - customer driven- current service - identified problem/hardware failure- technical – software upgrades

2) Perform risk benefits/risk analysis for each change prior to submission

What is risk of failure, what might fail, what would be estimated time to restore and recover.

a) Customer Impact on Business - What services would be affected by unplanned outage- What is effect of not implementing change?- Is timing right for cycle of business? - What are the customers critical dates (see osmium calendar)- What is impact to an identified SLA or other agreement if there is one

in existence?Customer Notifications

- Notification of outage / Does it meet customer notification timelines? Are alternate avenues (websites, systems status line) provided for customer/ Are customers notified of resolution.

b) Environmental Impact- What is impact on other current processes/projects – What is the

priority attached to this?- What are other services that run on same infrastructure- What other changes are occurring that day (see outlook calendar

calendar)- Evaluate the impact on the following:

o System Hardware OS Release Memory Disk space CPU type

- 28 -

Change Management Procedures

Remote access Password issues

o Network Bandwidth, routing, SNMP passwords, interference,

firewall, name resolution, security

o Application Detailed list of applications and dependencies,

release and patch level, critical applications

o Processes Description of the existing business processes and

dependencies, critical processes

o Organization Information on all units, alarm plan, list of

phone numbers, logistic issues, etc. health and well-being of staff who may be working extra long hours. There may also be physical security issues such as after hours access.

o Security Office Are existing security measures in place met? Is the security office familiar with change

c) Resources requirements - Identify internal and external resources/ skillsets required for successful

change - Availability of an experienced architect resident to ensure the

infrastructure and changes are planned and vetted before production

- Verify that all equipment, software, hardware, and updates are available

- Verify that backup tapes are available in the event of a back-out or restore.

- Research the requirements/ other supporting resources to achieve a successful change (required patches and stability of upgrade, licenses, security issues identified).

- Identify who needs to be aware and/or on standby when changes are planned. A rep from each group in department change management team to be ‘on duty’

- 29

Change Management Procedures

- Systems and services should be operationalized to the extent that the NOC can turn services on and off

- Must communicate fully to NOC who provide updates via status messages/website updates: Problem process loop, open/close, ongoing

d) Testing

This should address issues of performance, security, maintainability, supportability, reliability, and functionality:

- Develop a detailed plan of action to reduce the risk to an acceptable level. (Comprehensive testing plan/signoff on tests carried out) (implementation details) This should explain the steps that must be taken to restore access in the event that the change has a negative impact. Provide online link or file location .

- Develop a plan of action to lessen the affects on the customer if the change should cause an outage. What is workaround? Is there automatic failover in place?

- Always develop a pre-change check-list (template should be developed) When did we last do a shut-down? A stop/start should be tested without changes Checkpoints need to be identified as problems arise – stop or go?

E.g. no shutdown script – stop? Go live checklist Walkthrough all dependencies first Ensure everything is tested with the current version of the

operation system Test cards prior to installing into production (done in this case). Test the drivers supplied under the correct release of the O/S – test

every configuration parameter in advance Additional or spare hardware requirements? Create an outage ‘timetable’ for planned outage window - e.g.

Sunday 5-7ama. Flag go/no go stepsb. Identify when to back out if necessaryc. Create for every change

- A duplicate environment is required (parallel to try out) – need for all applications - many development environments are not adequate

- 30 -

Change Management Procedures

- Communicate to senior management the importance of appropriate equipment

need functionality – not necessarily the exact same boxes (save $$$)

- IF vendor resources are required, check availabilitya. Is the support line adequately ‘manned’ during off-hours

(weekends?)b. Pre-arrange contacts to ‘stick handle’ the situation with

internal resources 24x7?c. Notify vendor (s) of any impending changes/outages to

ensure we are on their radar if any issues arise

e) Escalation process:- Clarify Technical and Mgmt stakeholders: - Contact lists available: cell-phone / wireless-phone / pager, all can be automated- Refer to policies as to who and how the go/no go decision is made – management?

f) After a successful change:

Review the following: was the planning correctly carried out? Report on completion/ Update RFC record or log as to status. Reply to

email that was sent to you by NOC informing you of RFC reference number

has the impact been correctly estimated? has the usage of resources been correctly estimated? Was it necessary

to deploy the resources of the standby team during the rollout or change?

has the right effect of the change been reached? are the users satisfied with the result? # incidents on the change were / are there unexpected side effects? have detected anomalies been communicated to the CAB for future

changes? Are there necessary documentation / training (customer, staff) updates

required after change. Identify resources: who will do what

- 31

Change Management Procedures

Finally:

Obtain approval to proceed from your immediate or appropriate responsible manager for requesting the change.

Submit a complete, concise, and descriptive Change Request no later than Monday (up to 23:59 - midnight.)

- 32 -

Change Management Procedures

Appendix 3 Critical Incident Reviews (aka Post-Mortem”) Guidelines

When to holdA “CIR” (Critial Incident Review)is held after both a change and the back-out plan fail.

This is managed by the Director IT Operations or designate and is reported to the Operations Committee and UBC IT’ Senior Management.

It confirms the sequence of events, establishes the cause(s) of the failure(s) and makes recommendations to improve future performance.

A “CIR” (Critial Incident Review) may also be called on other non-change failures in the production environment at the discretion of the Director IT Operations.

Example:“CIR” (Critical Incident Review)Description of IncidentDate

Attendees: Absent:

Preparation: Always bring documentation/ticket info to CIR meeting

Agenda:-Ground rules for critical incident review.-What took place? Chronology of events-Why did it take place?-What to do prevent this in the future (processes, environment, etc)

1. Action ItemsAction requiredArea of responsibilityDate dueStatusReview

-Other Lessons Learned Things that worked well Further recommendations.

PROBLEM RESOLUTION PROCESS

- 33

Change Management Procedures

Appendix 4 Examples of changes

Applications new application rollout new systems releases/conversions/functional enhancements/fixes maintenance

Databases changing the name of a db table, view, or column modifying a stored procedure, trigger, or user-defined function changing or adding relationships using referential integrity features changing or adding db partitioning. moving a table from one db, dbspace, or table space to another. changing the uniqueness specification of an index. clustering the table data by a different index. changing the order of an index (ascending or descending).

Networks installs, upgrades, disconnects, router configs.

Telecommunications facilities- communication rooms

Infrastructure Hardware (moves, add, changes, hw relocation, emergency

replacements, OS, installations, ) SW releases, enhancements

Documentation

Periodic maintenance, User requests, Hardware and/or software upgrades, Acquisition of new hardware and/or software, Changes or modifications to the infrastructure, Environmental changes, Operations schedule changes, Changes in hours of availability, and Unforeseen events.

Facilities Major electrical upgrades, installs Air Conditioner upgrades Fire suppression systems

- 34 -

Change Management Procedures

UPS systems Security panels

Desktops HW configurations OS Applications, releases, patches

- 35

Change Management Procedures

Appendix 5 Examples: Communications Strategy

PurposeThis document serves as a GUIDE FOR COMMUNICATIONS REQUIRED for the Change Management process.

Include all stakeholders in the information flow to manage expectations and invite participation prior to, during and after rolling out new features or service.

Issues:o What will be communicated

o What media mixes are most effective

o Frequency

o Urgent messages

o Audiences

- 36 -

Change Management Procedures

Example guidelines for describing the problem

Concise Description

Describe concisely and specifically what the planned outage affects in terms of what the customer uses the service for.

Indicate what this outage affects or what error messages a user might see if they try to access the service while it is out.

Examples“This service or system will be unavailable…. Faculty and staff will be unable to log into..…”“The main UBC web server will be unavailable…. You will be unable to access any web pages beginning with, …..may return ‘no such domain name’ messages.”“E-mail services for faculty and staff will be unavailable ….”

Specify the duration of the outage. Example“…unavailable on Saturday, August 7, between 5AM and 7AM.”

Indicate if there is an alternative available to users while a service is unavailable.

Examples“The __________web server will be unavailable… In the meantime you can access your email by using…..” “The _________server at www._______________will be unavailable … Please use the following ……”

Identify if there is a web page that has useful information for the users. Example“For updates and a complete list of services affected by the outage, please visit http://www.itservices.ubc.ca/support/ .”

Include suggestions where the user can go for additional information

Examples“If you have questions please contact the Help Desk at 822-2008“Please call 822-4115 for recorded updates.”Please refer to the "Systems Alerts" section found at http://www.itservices.ubc.ca/support/ for further updates.Please contact the Network Operations Centre at 822-5438 (option 4) when network problems are experienced.

Web page updates, examplesExample 1: Oracle3 outage notices for websites.

- 37

Change Management Procedures

“System Alert” section at http://www.itservices.ubc.ca/support/bulletins

Oracle3 Service Outage - A service outage is scheduled on August 27, 200x from 6:00am to 7:30am for maintenance on the oracle3 server. It will affect the following services and applications.

Asbestos Tracking System Campus-Wide Login (CWL) Faculty and Staff Pension Magic TSD Call Tracking System Paradigm Call Tracking System myPress Copyright Management myUBC

ORSIL Tracc-II UBC Faculty and Administrative Directory UTAW (Utility Tool for Administrators of WebCT)

We apologize for any inconvenience. Please contact the Network Operations Centre at 822-5438 (option 4} if problems are experienced.

Users of Windows 2000 and XP - If you encounter access problems following the Oracle3 Service Outage please try the following: From the Start menu, select "Run", then type in 'cmd' (without the quotes) and press the "Enter" key. A DOS command will appear. Enter the following information.ipconfig /displaydns Then press the "Enter" key to view the local DNS cache.ipconfig /flushdnsThe press the "Enter" key to flush the local DNS cache.or you can simply reboot your PS to pick up the new DNS for Oracle3. Please contact the ITServices Help Desk at 822-2008 if you continue to experience difficulties.

CWL Site

Service BulletinA service outage is scheduled on August 27, 200x from 6:00am to 7:30am for maintenance on the oracle3 server. As a result, myUBC, CWL, WebCT administration tools and other online services may not be available during this time. For a full list of services affected, visit www.itservices.ubc.ca/support/bulletinsNETINFO/ INTERCHANGE

Service Bulletin A service outage is scheduled on August 27, 200x from 6:00am to 7:30am for maintenance on the oracle3 server. As a result, myUBC, CWL, WebCT administration tools and other online services may not be available during this time. For a full list of services affected, visit www.itservices.ubc.ca/support/bulletins

- 38 -

Change Management Procedures

my.ubc.ca

Service Bulletin

A service outage is scheduled on August 27, 200x from 6:00am to 7:30am for maintenance on the oracle3 server. As a result, myUBC, CWL, WebCT administration tools and other online services may not be available during this time. For a full list of services affected, visit www.itservices.ubc.ca/support/bulletins

- 39

Change Management Procedures

Example 2: Enrollment Services Database outage notices for websites.Student Centre Service web site

Service Bulletin

The Enrolment Services Database will be unavailable Saturday, October 19, 200x (and Sunday Oct 20 if required). The outage is expected to start at 3:00AM Saturday morning and last all day. The purpose of this outage is to perform an Oracle Database upgrade. We apologize for any inconvenience. For a list of all services affected by the outage, please visit www.itservices.ubc.ca/support. “System Alert” section at http://www.itservices.ubc.ca/support/bulletins

Service Bulletin

The Enrolment Services Database will be unavailable from 3:00 AM, Saturday, October 19, 200x and Sunday, October 20, if required. The purpose of this outage is to perform an Oracle Database upgrade.All applications/services that require access to the Enrolment Services Database will be affected. The following is a list of known services that will be impacted:Enrolment Services - Admissions- Awards- Course Scheduling/Catalog (Ad Astra)- Elections- Degree Navigator (DAG)- Faculty Service Centre (FSC)- Student Authentication Service- Student Information Service Centre (SISC)- Student Information System (Old Green Screens/Unikix)- Student Service Centre (SSC)Others- TracII/Netinfo access requiring student number validation- myUBC authentication using student number- CWL registration using student number- Housing authentication using student number- WebCT administrationWe apologize for any inconvenience.

my.ubc.ca

Service Bulletin

Due to upgrades to the Enrolment Services server, students will not be able to log in using student numbers on Saturday, October 19 and Sunday, October 20. We apologize for any inconvenience. For a list of all services affected by the outage, please visit www.itservices.ubc.ca/support/bulletins

CWL Site

- 40 -

Change Management Procedures

Service Bulletin

Student CWL registration will not be available on Saturday, October 19 and Sunday, October 20, if required, due to upgrades to the Enrolment Services database server. For more information, visit www.itservices.ubc.ca/support/bulletins. We apologize for any inconvenience.

NETINFO

Service Bulletin

Netinfo registration and password resets will not be available on Saturday, October 19 and Sunday, October 20, if required, due to upgrades to the Enrolment Services database server. For more information, visit www.itservices.ubc.ca/support/bulletins. We apologize for any inconvenience.

www.webct.ubc.ca

Service Bulletin

Enrolment Services interruption: Saturday, October 19/02 - all dayDue to the downtime of the Enrolment Services database on Saturday, October 19/02, the following WebCT services will be affected:Users will not be able to login to WebCT using student number and PIN.* Please use your interchange/netinfo account to avoid interruptionFor more information, visit www.itservices.ubc.ca/support/bulletins. We apologize for the inconvenience.Sincerely,ITServices WebCT Admin Team

- 41

Change Management Procedures

Example 3: Phase 2 of the admin cluster firewall installation.

“System Alert” section at http://www.itservices.ubc.ca/support/bulletins

Service Bulletin

Notification of ITServices Outage

The following is advance notification of a major outage within ITServices network and the services it provides. This outage has been scheduled in the early morning to minimize the impact on students, faculty and staff services:

From 06:00am November 13, 200xTo: 08:00am November 13, 200xReason: Scheduled network configuration maintenance

Impact: The following services will be unavailable  during the above period

Enrolment Services AdmissionsAwardsCourse Scheduling/Catalog (Ad Astra)Degree Navigator (DAG)ElectionsFaculty Service Centre (FSC)Student Authentication ServiceStudent Information Service Centre (SISC)Student Information System (Old Green Screens/Unikix)Student Service Centre (SSC)

OthersTracII/Netinfo access requiring student number validationmyUBC authentication using student numberCWL authentication using student numberHousing authentication using student numberWebCT Administration

 We apologize for this disruption in service and thank you for your patience.

Please refer to the "Systems Alerts" section found at http://www.itservices.ubc.ca/support/bulletins for further updates.

- 42 -

Change Management Procedures

Please contact the Network Operations Centre at 822-5438 (option 4) when network problems are experienced.

 

- 43