39
L&I - Systems Management Plan Problem/Incident Management Version 1.14 Prepared for Commonwealth of Pennsylvania Department of Labor and Industry Department of Labor and Industry – Office of Information Technology Systems Management Plan – Problem/Incident Management T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL) i

IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

L&I - Systems Management Plan Problem/Incident Management

Version 1.14

Prepared for

Commonwealth of PennsylvaniaDepartment of Labor and Industry

December 2010

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementT:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

i

Page 2: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

Revision History

Release/Version

Information

Revision Date Author / Editor Summary of Changes

1.0 02/01/07 Mike Smith Creation of document 1.2 05/15/08 Mary Hill-Hartman

John TamosaitisUpdates per L&I commentsUpdate document

1.3 07/22/08 John Tamosaitis Updates per L&I comments

1.4 8/13/08 John Tamosaitis Updates per L&I comments1.5 8/25 John Tamosaitis Updates per L&I comments1.6 10/31 John Tamosaitis Updates per L&I comments1.7 12/22/08 John Tamosaitis Updates per L&I comments1.8 3/2/09 John Tamosaitis Updates per L&I comments1.9 4/27/09 John Tamosaitis Updates per L&I comments

1.10 5/15/09 John Tamosaitis Updates per L&I comments1.11 5/27/09 John Tamosaitis Updates per L&I comments1.12 10/16/09 John Tamosaitis Updates per L&I comments1.13 10/30 John Tamosaitis Updates per L&I comments1.14 12/9/09 John Tamosaitis Updates per L& I comments1.15 11/15/10 John Tamosaitis Update for Prod outages – use of incident reports

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementT:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

ii

Page 3: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

Reviewed By:

Name Team/Role Reviewer Comments Date Reviewed

Myrna Barnes Chief, Customer Relations Division

Myrna Barnes 12/7/2009

Anita Steinmeier Chief, Enterprise Software and Information Division

Anita Steinmeier 12/7/2009

Karen Fausnacht Chief, Project Mgmt Division

Karen Fausnacht 11/16/2009

Steve Yurich Chief, Security Division Steve Yurich 11/10/2009Jacki Hagmayer Chief, Engineering and

Research DivisionJacki Hagmayer 12/7/2009

Ed Bowlen Chief, Standards Development & Compliance Division

Ed Bowlen 12/8/2009

Joe Sheridan Chief, Data Mgmt & Database Operations Division

Joe Sheridan 12/8/2009

Bryan Reed Chief, Compensation & Insurance Division

Bryan Reed 11/13/2009

Mary Lynn Kowalski Chief, Unemployment Compensation Division

Mary Lynn Kowalski 12/8/2009

John Shontz Chief, Vocational Rehabilitation – Safety & Labor Mgmt Relations Division

John Shontz 12/8/2009

Phil Day Chief, Workforce Development Division

Phil Day 12/8/2009

John Auchey Chief, Server Farm Operations Division

John Auchey 12/8/2009

David Vogelsong Chief, Infrastructure Division

David Vogelsong 11/10/2009

Bill Glatz Chief, Network Support Services Division

Bill Glatz 11/12/2009

Marty Thomas Chief, Mainframe Operations Division

Marty Thomas 11/12/2009

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementT:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

iii

Page 4: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

Approved By:

Name Team/Role Sign-off Date

Michele Sinko Director, BES 12/18/2009John Malinoski Director, BIO 12/18/2009Neil Ross Director, BEA 12/18/2009David Andrews Director, BBAD 12/18/2009

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementT:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

iv

Page 5: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

Table of Contents

1.0 Preface.............................................................................................................................................. 1

2.0 Owner/Responsible.......................................................................................................................... 1

3.0 IT Process Integration...................................................................................................................... 1

4.0 Problem/Incident Management.......................................................................................................34.1 Problem/Incident Management Introduction....................................................................34.2 Problem/Incident Management Purpose...........................................................................34.3 Problem/Incident Management Definitions.......................................................................3

4.3.1 Problem/Incident Management Definitions...................................................................44.4 Problem/Incident Management Objectives.......................................................................54.5 Problem/Incident Management Inter-relationships..........................................................54.6 Problem/Incident Management Guiding Principles..........................................................64.7 Problem Management Roles and Responsibilities..........................................................74.8 Problem/Incident Management Process.........................................................................11

4.8.1 Problem Management Activities.................................................................................114.8.2 Process Components................................................................................................134.8.3 Description of Process Components..........................................................................14

4.9 Problem/Incident Management Procedures....................................................................144.9.1 Problem Identified.....................................................................................................144.9.2 Record, Assess, Classify Problem/Incident................................................................144.9.3 Diagnose and Escalate Problem/Incident...................................................................154.9.4 Resolution/Bypass and Verification of Problem/Incident..............................................154.9.5 Survey and Follow-up Problem/Incident.....................................................................16

4.10 Problem Management Guidelines....................................................................................164.10.1 Remedy Problem/Incident Priority Levels Matrix.........................................................164.10.2 Help Desk Priority/Event Management Matrix for Servers...........................................174.10.3 Tivoli Response Framework Matrix............................................................................184.10.4 Help Desk Brain Knowledgebase Entries...................................................................19

4.11 Problem Management Metrics.........................................................................................204.12 Problem/Incident Management Tool Capabilities...........................................................21

5.0 Appendix A– Sample Problems....................................................................................................225.1 L&I Enterprise Problem/Incident Examples....................................................................22

6.0 Appendix B – Acronyms................................................................................................................236.1 L&I Acronyms.................................................................................................................... 23

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementT:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

v

Page 6: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

1.0 Preface

A cross IBM team signed a 6 1/2 year Service Oriented Architecture(SOA)-based application development contract with the Commonwealth of Pennsylvania for a new unemployment compensation modernization system (UCMS) that will provide a new platform for growth and innovation that will serve the Commonwealth for the foreseeable future. As part of the Agreement, IBM was required to prepare a Systems Management Plan for the UCMS project. This document represents the evolution of the Enterprise Systems Management (ESM) Plan work product. This document, L&I - Systems Management Plan - Problem/Incident Management, can be found at, T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL), along with other Systems Management Plan documents based on Information Technology Infrastructure Library (ITIL).

See Appendix B for a complete list of Acronyms used in this document.

2.0 Owner/Responsible

As of July, 21st, 2008 the Office of Information Technology (OIT), Bureau of Enterprise Services Customer Relations Division (BES-CRD) is the owner of this document. It is expected that the Systems Management Plan will be updated by the owner on a quarterly basis.

3.0 IT Process Integration

Multiple integration points exist between the processes that exist in IT Operational Management. The following figure gives a high-level overview of those integration points.

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

1

Page 7: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

3.0 Figure 1: ESM Process Integration (Refer to section 4.5 for description of this process)

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

ESM Process Integration

ChangeManagement

Process

Help Desk/

Problem Process

Backup/RecoveryProcess

Asset/ConfigurationManagement

Process

Service LevelManagement

Process

EventManagement

System

Performance/Capacity

ManagementProcess

Security Process

Release/SoftwareDistribution

Process

AvailabilityManagement

Process

2

Page 8: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

4.0 Problem/Incident Management

4.1 Problem/Incident Management Introduction

A formal, structured process that addresses and identifies service anomalies and restoration of application or systems functions as quickly as possible to mitigate the impact to the Department of Labor and Industry (L&I) business and bring the services back up to the levels outlined in the Service Level Agreements (SLAs). L&I’s Problem/Incident Management Plan includes a Problem Process Owner, an Operations Manager Team Lead, a Help Desk Manager, Help Desk Coordinator, LINKS Help Desk Agents, and Level 2/Level 3 Subject Matter Experts (SME’s) for diagnosing and resolving Problem tickets. The entire process will be managed by the Help Desk Manager. The process will record the Problem and the root cause behind it, record the results of the resolution of the Problem, and provide information required by other processes, such as Change.

4.2 Problem/Incident Management Purpose

The L&I Problem/Incident Management Plan covers all problems and incidents that occur in all of the L&I custom application software, commercial-off-the-shelf software, and infrastructure/network support services hardware and software components that impact (or may impact) the L&I business and technology environments. Examples are listed in Appendix A – this list is a working document and will be modified over time.

The L&I Problem/Incident Management Plan will also serve as the “starting” point for problems that need to be forwarded to L&I’s overall Problem Management process.

4.3 Problem/Incident Management Definitions

The following diagram illustrates the relationships between Event Management, Problem/Incident Management, and Configuration Management.

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

3

Page 9: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

Information Technology Infrastructure Library (ITIL)Configuration Management Database

(contains relationships)(federated databases)

Currently Remedy and Spreadsheets at DLI

Event ManagementProblem/Incident

Management(records, tracks,

documents problems)

Events tracked in TEC at L&I

Problems tracked in

Remedy at L&I

4.3 Figure 2: Problem, Event and Configuration Management

4.3.1 Problem/Incident Management Definitions

An incident is any event that is not part of the standard operation of a service and causes, or may cause, an interruption to or reduction in the quality of that service.

A problem is an unknown, underlying cause of one or more incidents. A single problem may generate several incidents.

o For the purpose of this document, the following are examples of problems: Events that are detected by L&I Tivoli Infrastructure and escalated to the Tivoli

Enterprise Console (TEC) and Remedy. Incidents that are reported by end users through the LINKS Help Desk. Incidents that are identified by OIT staff and reported through the LINKS Help

Desk.

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

4

Page 10: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

A large scale problem or outage is defined as one or more applications or services which becomes inoperable and causes a major impact on the availability or function of systems. Examples of some systems include but are not limited to:

o Wide Area Network (WAN) links or Metropolitan Area Network (MAN) links that affect a large number of users

o Enterprise applicationso Public facing applicationso Enterprise servers that service a large number of userso Enterprise shared applicationso Mainframe applicationso Voice services affecting a large number of users or multiple siteso Desktop services that affect a large number of users or siteso Facility issues that affect a large number of users or multiple siteso Business Applications

4.4 Problem/Incident Management Objectives

The objective of the L&I Problem/Incident Management Plan is to provide a set of unambiguous and repeatable processes and procedures for:

Providing a model for recording and resolving Problems and Incidents that may occur within the L&I environment. (Please see Appendix A – Sample Problems)

Providing initial support and classification of received Incidents and Problems Ensuring that Problems and Incidents are assigned to the proper support team with an assigned

priority Ensuring that all Problems and Incidents are resolved within established time frames (according

to priority) and/or escalated to the next level of support Effectively tracking and managing Problems and Incidents once they occur Providing information to other processes, such as Change and Service Level Management Leveraging knowledge bases to increase problem resolution effectiveness Reviewing and validating closed problems to ensure customer satisfaction Performing trend analysis and proactive problem prevention Leveraging Help Desk tools to increase problem resolution effectiveness

4.5 Problem/Incident Management Inter-relationships

Following are some specific examples of how Problem/Incident Management interacts with other IT Operational processes. (Please note, not all process listed are depicted in 3.0 Figure 1)

Change Management o Fixes for problems will generate changes to all environments and will require change

requests to install the tested and approved fixes.o The implementation of a change may trigger problems in all environments that need to be

logged and managed by Problem Management Process.o Description and schedule of changes planned for systems is needed for problem analysiso Help Desk training requirements associated with technical changes.

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

5

Page 11: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

Event Management o Future Management Plano When an event is classified as a potential or current problem, a Problem Ticket should be

opened and handled within the Problem Management process. Configuration Management

o Future Management Plano The Problem Management process will obtain configuration information via the

Configuration Management Process, when required.o The Problem Management process uses Configuration Management information during

monitoring and troubleshooting problems. Asset Management

o Future Management Plano The Problem/Incident Management Process requires updated Asset Management

information for use during monitoring and troubleshooting problems. Service Level Management

o Future Management Plano Problem/Incident Management provides data to Service Level Management for use in

preparing measurement reports. o Problem/Incident Management must also detect and identify problem trends that impact

the attainment of service targets as a result of repetitive problems. Performance Management

o Future Management Plano System and performance problems are reported through Problem Management for

analysis and resolution by the responsible technical support staff. Backup and Recovery Management

o Future Management Plano Problem Management is linked to Backup and Recovery Management to ensure that all

component problems have been identified and properly recorded.o Documented and/or automated recovery procedures are essential for fast problem

resolution or service restoration.

4.6 Problem/Incident Management Guiding Principles

Guiding principles are fundamental rules or guidelines that establish design and implementation constraints and align with management’s vision of L&I service delivery:

The LINKS Help Desk provides a single point of contact (SPOC) for L&I employees and business partners needing technology support during agreed upon coverage hours. All problems raised are entered into Remedy.

The long term objective of Problem/Incident Management is to have all problem/incidents called into the LINKS Help Desk as a SPOC.

Events are generated, forwarded to the Tivoli Enterprise Console (TEC) and entered into Remedy depending on a pre-assigned priority and risk level. Priority and risk levels are described in T:\All (Common area for all OIT Staff)\Tivoli.

Soft skills such as customer service orientation, communication and analytic ability are a priority at the LINKS Help Desk.

The Help Desk is proactive rather than reactive wherever possible.

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

6

Page 12: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

Service level targets for the LINKS Help Desk are defined, measured and reported on a regular basis.

Interfaces to other organizations are through a defined set of escalation processes and support agreements and enabled by a Help Desk management system.

Support acceptance criteria for applications and systems include timely review and acceptance rights for the Help Desk at the pre-implementation stages and all required changes are made in accordance with the Change Management process requirements.

The Help Desk is automated wherever possible. There are defined second and third level support groups or SME’s, depending upon level of

expertise, and associated procedures for routing all problems or service requests that can not be addressed by the LINKS Help Desk (LINKS Help Desk = Level 1; Support Groups = Level 2 or Level 3).

Help Desk, Level 2 and Level 3 Support Groups have access to all appropriate resource tools and information databases to assist in servicing the customer request or addressing problems.

Any problems that cause an outage are entered into Remedy and an Incident Report developed. Failure to conform to the Problem Management process will result in appropriate management

action.

4.7 Problem Management Roles and Responsibilities

Role Responsibility Members

L&I Customer/Employee

Initiates the need for a Help Desk Ticket and opens a call with the LINKS Help Desk Analyst (All calls come to LINKS Help Desk)

L&I Customer/Employee

L&I/OIT Employee

Reports problems in response to L&I employee concern by entering into Remedy or by reporting problems to LINKS to enter into Remedy

Provides responsive, timely support to all support requests escalated from the LINKS Help Desk Agents or OIT self reported Help Desk tickets

Resolves the problem, documents the solution in the database or ensures follow-through if the call is passed to another Level 2, Level 3 SME

Maintains service level agreements on response turnaround

L&I/OIT/BBAD Staff L&I/OIT/BEA StaffL&I/OIT/BES Staff L&I/OIT/BIO Staff

Problem Process Owner

Acts as the overall “evangelist” for process work

Prioritizes investment, as the responsible individual for the cost and investment overall in process work

Resolves or escalates cross-process issues

Approves new process definitions and

L&I/OIT/BES/CRD Chief,

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

7

Page 13: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

Role Responsibility Membersapproves or rejects process deviation requests·

Assigns or designates ownership and roles and responsibilities for each Operational process·

Evaluates process performance against standards and control criteria

LINKS Help Desk Agents

Provides telephone assistance to customers and maintains accurate records

Makes the first attempt to resolve the service issue reported by the end user

Acts as end-user advocate to ensure that service issues are resolved in a timely fashion

Ensures that the ticket contains an accurate and properly detailed description of the problem

Ensures that the priority classification is correct

Recognizes patterns of symptoms, applies search tools to identify previously developed solutions, and helps end-users implement the solution.

Assumes responsibility for problem tickets until resolved

Escalates problems, to Level 2 support group, if unable to satisfactorily resolve them

LINKS Help Desk Agents

Lead Help Desk Agent

Provides telephone assistance to customers and maintains accurate records

Makes the first attempt to resolve the service issue reported by the end user

Acts as end-user advocate to ensure that service issues are resolved in a timely fashion

Ensures tickets contain an accurate and properly detailed description of the problem

Ensures ticket priority classification is correct

Recognizes patterns of symptoms, applies search tools to identify previously developed solutions, and helps end-users implement the solution.

Assumes responsibility for problem tickets until resolved

Escalates problems, to Level 2 support group, if unable to satisfactorily resolve them

Verifies customer satisfaction of problem resolutions (Level 1 and Level 2) by

LINKS Help Desk Lead Agent

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

8

Page 14: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

Role Responsibility Membersperforming customer follow-ups

Develops department-specific reports in Remedy and for the Automatic Call Distribution System(ACD)

Help Desk Coordinator

Communicates problem status and unresolved problems to customers and Help Desk management

Verifies customer satisfaction of problem resolutions (Level 1 and Level 2). Remedy sends a survey to customers when a ticket is resolved

Maintains and improves communication and escalation lists

Develops department-specific reports and procedures

Participates in the problem review process

Ensures assigned priority level for tickets follows the agreed-upon guidelines and that problems are resolved or escalated within service level targets

L&I, Help Desk Coordinator

Help Desk Manager

Ensures that a well defined, consistently executed and effective PM/IM process is established and maintained

As owner of the IM process, ensures that the process and capabilities are adequate, and are improved when necessary

Reviews and understands the Problem Management process and tools

Evaluates the effectiveness of the PM/IM process and supporting mechanisms such as reports, communication formats/messages, and escalation procedures

Makes recommendations to the Problem Process Owner on ways to improve the process

L&I Help Desk Manager

Level 2, Level 3 Subject Matter Experts (SMEs)

Provides responsive, timely support to all support requests escalated from the LINKS Help Desk Agents

Resolves the problem, documents the solution in the database and ensures follow-through if the call is passed to another SME

Maintains service level agreements on response turnaround

Works as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines

Escalates and works with appropriate vendor support to resolve issues where

L&I/OIT/BBAD Staff L&I/OIT/BEA StaffL&I/OIT/BES Staff L&I/OIT/BIO Staff

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

9

Page 15: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

Role Responsibility Membersappropriate

Level2, Level 3 SME Managers

Leads and manages Level 2 and Level 3 SMEs throughout the problem resolution process.

Provides communication and notification to users, OIT Bureau Directors and CIO as necessary

L&I/OIT/BBAD ManagementL&I/OIT/BEA ManagementL&I/OIT/BES Management L&I/OIT/BIO Management

Problem/Incident Coordinator

Assembles and manages the Level 2 and Level 3 SME teams and sub-teams

Coordinates with other Commonwealth agencies, OIT managers, business process managers and agency executives

Establishes team leads as needed Leads and manages sub-teams to ensure

close coordination and communications with each of the sub-teams

Takes ownership of business critical IT problems and deliver effective workaround implementation, accurate root cause analysis and problem resolution

Ensures complete and accurate documentation is completed at all stages of the Problem Management process

Details responsibilities and specific tasks for emergency response activities and business resumption operations based upon pre-defined timeframes

L&I OIT BEA Bureau Director

Operations Manager Team Lead

Attends required meetings and effectively communicates the status of problems with high visibility to senior management, when required

Conducts Incident Report meetings for analyzing outages

BIO Technical Operations Lead

4.7 Figure 3: Problem/Incident Management Roles and Responsibilities

4.8 Problem/Incident Management Process

The Problem/Incident Control Process is a structured, step-by-step approach to controlling and managing Problem activity. This process will focus on restoring interrupted service as soon as possible.

4.8.1 Problem Management Activities

Five documented activities make-up the Problem Control Process:

1. Problem Identifieda. Receive notification of Problem/Incident through LINKS Help Desk

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

10

Page 16: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

b. Tivoli generates an event2. Record, Assess, and Classify Problem/Incident

a. Log call details in Remedyb. Assess & classify Problem/Incident or Event (priority) & communicate c. Assign priority leveld. Identify and execute incident bypass or resolution, if possible

3. Diagnose and Escalate Problem/Incidenta. Diagnose or escalate problem (Level 2,Level 3, vendor or Problem Incident Coordinator)

4. Resolution/Bypass and Verifya. Recover from problem, if necessary, apply bypass or temporary fixb. Resolve problem (correction at root cause)c. Update customer and verify resolution

5. Survey and Follow-up Problem/Incidenta. Survey end userb. Conduct Problem/Incident review meeting, produce an Incident Report, and analyze

reports

The following figure, 4.8 Figure 4, illustrates the proposed L&I Problem/Incident Management process flow.

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

11

Page 17: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

4.8 Figure 4: Process Flow

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

12

Page 18: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

4.8.2 Process Components

The Problem/Incident Management process focuses on restoring interrupted service as soon as possible. Process Components are the Inputs, Tools/Techniques, and Outputs required for effective and comprehensive Problem/Incident control management.

The following figure maps each Process Component to the appropriate Process Flow Activity.

Activity Problem Identified

Record, Assess and Classify

Diagnose and Escalate

Resolution, Bypass and

Verify

Survey and Follow-up

Input

Problem as reported by user

Event identified in Tivoli and forwarded to TEC and Remedy

Problem/Incident Escalation Schedule

Communications

Problem/Incident Priority Levels

Contacts

Remedy Problem Ticket

RemedyProblem Ticket

Remedy Problem Ticket

Tools and Techniques

Telephone system

Email

Tivoli Monitors

Remedy

Brain Knowledgebase

Procedures

Level 2 Support team tools (various)

Level 3 Support team tools (various)

RemedyLevel 2 Support team tools (various)Level 3 Support team tools (various)

Remedy

Output

Remedy Problem Ticket

Remedy Problem Ticket

Communication to user

Updated Remedy

Problem Ticket

Remedy Work Log

Conference Call

Updated Remedy

Problem Ticket

Remedy Work Log and Solution

Incident Report(s)

Completed Problem Ticket

Communicate result

Survey

Incident Log

4.8 Figure 6: Process Flow Activity

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

13

Page 19: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

4.8.3 Description of Process Components

Process Component Description

Purpose Restore interrupted service as soon as possible

Owner LINKS Help Desk/OIT Level 2/Level 3/Problem/Incident Coordinator

Input Problem identifiedProblem recorded in Remedy

Output Service restoredEnd user notifiedRecorded in RemedyUpdated Brain Entry, if required

Measurement Quantity of tickets presently openQuantity of incidents (tickets) by time (monthly, quarterly)Quantity of tickets resolved by each support groupsAverage time tickets were assigned to each groupAverage time to resolve incident Percentage of incidents resolved by LINKS Help DeskPercentage of incidents escalated to support groupsCustomer Surveys

4.8 Figure 7: Process Components

4.9 Problem/Incident Management Procedures

A procedure integrates a Process Flow Activity with one or more Process Components to create a series of step-by-step instructions that facilitate effective Problem/Incident Management. Below are the Problem/Incident Management Procedures.

4.9.1 Problem Identified

4.9.1.1 Receive call or notice of the Problem/Incident Problems are identified in one of two ways:

o An incident occurs is reported to the LINKS Help Desk via telephone call.o Tivoli identifies a problem and TEC generates an event.

4.9.2 Record, Assess, Classify Problem/Incident

4.9.2.1 Log call details in Remedy The problem is recorded automatically in Remedy by TEC or manually by the LINKS Help

Desk or an L&I OIT/Employee. Multiple tickets for associated problems/incidents will be related to one parent Remedy ticket as necessary.

The problem is analyzed, properly classified and assigned a priority level. Common or previously identified problems will be resolved at this level when possible.

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

14

Page 20: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

4.9.3 Diagnose and Escalate Problem/Incident

4.9.3.1 Diagnose the Problem/Incident Problems not immediately resolved or those that appear part of a larger problem will then be

escalated to Level 2 SMEs. The Level 2 SMEs will perform problem diagnosis activities using the appropriate technical

tools. In the event of a major problem/incident or an outage, Level 2 SME Manager will notify the

appropriate people using the L&I – Problem/Incident Communication Plan. The Level 2 SME Manager will work with the BES-CRD Chief or the Help Desk Manager to

determine if enterprise wide notification is necessary.

4.9.3.2 Escalate the Problem/Incident If the Level 2 SME is unable to identify and resolve the problem, Level 2 will escalate the

problem to Level 3. In the event of a major problem/incident or an outage, Level 3 SME Manager will notify the

appropriate people using the L&I – Problem/Incident Communication Plan. The Level 3 SME Manager will work with the BES-CRD Chief or the Help Desk Manager to

determine if enterprise wide notification is necessary. Level 3 works to resolve the problem and/or involves the Vendor as required. If the Level 3

SME is unable to identify or resolve the problem or if the problem appears to be part of an outage or larger problem, Level 3 will escalate the problem to the Problem/Incident Coordinator.

The Problem/Incident Coordinator will form adhoc teams to resolve the problem or deliver effective workaround and ensure complete and accurate documentation is completed at all stages of the problem resolution process.

If the problem/incident or outage affects the Production environment and/or a Production System, the Problem/Incident Coordinator will initiate a conference call within the first 30 minutes with all appropriate staff involved (See Problem/Incident Communication PlanSection 5 for teleconference phone number).

In the event of a major problem/incident or an outage, Level 3 SME Manager will notify the appropriate people using the L&I – Problem/Incident Communication Plan.

The Problem/Incident Coordinator will work with the BES-CRD Chief or the Help Desk Manager to determine if enterprise wide notification is necessary. The Problem/Incident Coordinator will work with Level 2, Level 3 SMEs and vendors as necessary on diagnosing the problem/incident.

4.9.4 Resolution/Bypass and Verification of Problem/Incident

4.9.4.1 Recover from problem/incident: Bypass or temporary fix Once the Problem/Incident has been correctly identified, a bypass or temporary fix can be

implemented, if a permanent fix is not available in the required time frame as defined in Section 4.10 Figure 8.

Before any temporary fix or bypass can be implemented, it will need to be tested, and scheduled through the change control process.

The Remedy ticket should be updated and remain open until a permanent resolution can be developed and implemented.

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

15

Page 21: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

4.9.4.2 Resolve problem (correction at root cause) Work will continue on the problem to ensure the root cause is identified and the problem

resolved. The Level 2, Level 3 SMEs will document resolution in Remedy for quicker diagnosis of a

similar Problem tickets in the future. If applicable, the resolution will be incorporated into the BRAIN Knowledgebase for future calls. Multiple tickets for associated problems/incidents will be related to one parent Remedy ticket as necessary.

The resolution will be tested, scheduled through the change control process, and implemented.

4.9.4.3 Update Customer and Verify Resolution Once the Problem/Incident is resolved, the resolving Help Desk agent or Level 2, level 3 SME

will verify with the L&I Customer/Employee that it has been resolved. The resolving agent or technician will proceed to resolve the ticket in Remedy.

4.9.5 Survey and Follow-up Problem/Incident

4.9.5.1 Survey End User When a Problem/Incident is successfully resolved in Remedy, a Remedy survey will be

electronically sent to the L&I Customer/Employee as a follow up to the Problem/Incident resolution.

The L&I Customer/Employee has the option of filling out the survey and commenting. The results are returned to the Help Desk Manger, Help Desk Coordinator and LINKS Lead Help Desk agent to be reviewed for possible follow up. The appropriate Level 2, Level 3 SME Supervisor may be contacted when the survey warrants further follow up action.

4.9.5.2 Conduct Problem/Incident Review meetings and analyze reports Produce OIT Incident Report and review report with management. Monthly reports are generated through Remedy and the LINKS ACD System.

o Top Ten Issues – Category, Type and Itemo Links - HD Services Rpto Call and Ticket statisticso Survey Reporto Close Ratioo Tickets per Agent

The L&I Help Desk Manager will conduct monthly review meetings to review all Problem tickets, and review reports (generated from Remedy).

4.10 Problem Management Guidelines

4.10.1 Remedy Problem/Incident Priority Levels Matrix

The problem/incident priority levels are set depending on their source. An incident occurs for the end user and is reported to the LINKS Help Desk via telephone

call. These problems/incidents are assigned a priority level by the LINKS Help Desk agent following the table in 4.10 Figure 8, Remedy Problem/Incident Priority Levels Matrix.

OIT staff is alerted to Problem/Incident. These problem/incidents are assigned a priority level by the Level 2, Level 3 Subject Matter Experts (SMEs) following the table in 4.10 Figure 8. Remedy Problem/Incident Priority Levels Matrix.

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

16

Page 22: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

Tivoli identifies a problem and TEC generates an event. Based on the server’s risk assessment and the severity of the event in TEC, a ticket may or may not be opened in Remedy. The source of an event from Tivoli may determine its initial severity. If the source does not set the severity, it will be determined by the default settings for the event class in the Tivoli Enterprise Console (TEC). For tickets opened in Remedy, Remedy sets the trouble ticket priority based on the Help Desk Priority/Event Management Matrix for Servers, 4.10 Figure 9.

Priority Scope of Impact Impact Resolution Time

UrgentLocation or System Down

Critical (Entire office or Location, Impacts a large number of users)

0-2 Hours

HighComponent Down or Degraded

Severe (Impacts a number of users)

2-4 Hours

Medium Component Down or Degraded

Minimal (Impact to a single user)

+4 hours -Day

Low

None, component if functional

None (Impact viewed as an inconvenience to a single user)

Day(s)

4.10 Figure 8: Remedy Problem/Incident Priority Levels Matrix

4.10.2 Help Desk Priority/Event Management Matrix for Servers

The Help Desk Priority/Event Management Matrix defines how events generated through TEC are mapped to the Remedy Priority Level. In most cases, this is done by checking the “risk level” assigned to the server in the Remedy Asset Record.

For example, a TEC Severity Level of ‘Critical’ and a Risk Level of ‘Medium’ will produce a Remedy Help Desk priority level of ‘High’. In cases where a Help Desk ticket needs to be entered manually, only the Risk Level assignment of the server will be used to set the Help Desk Priority Level.

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

17

Page 23: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

Help Desk Priority Matrix

Risk Level High Risk Level

MediumRisk Level

Low

Risk Level Blank

TEC Severity

Set Help Desk Priority as Shown Below

Warning High Medium Low Medium

CriticalUrgent High Medium High

FatalUrgent Urgent High Urgent

None (Manually

Created DH Ticket)

Urgent High Medium N/A

4.10 Figure 9: Help Desk Priority/Event Management Matrix for Servers

4.10.3 Tivoli Response Framework Matrix

Problems/incidents that are generated through TEC will be escalated based on the server’s risk assessment and the severity of the event as defined in the following Tivoli Response Framework Matrix below.

For example, if a server risk level is set to “High” and the TEC event is determined to be either critical or fatal the following actions will be taken:

1. An Alarm Point Call will be generated 24/7 AND

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

18

Page 24: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

2. A Remedy Ticket will be created using the Help Desk Priority Matrix AND3. A Text Page will be generated AND4. An Email will be generated AND5. The event will be displayed on the TEC console

In a second example, if a server risk level is set to “HIGH” and the TEC event is determined to be either warning or minor the following actions will be taken:

1. A Remedy Ticket will be created using the Help Desk Priority Matrix AND2. An Email will be generated AND3. The event will be displayed on the TEC console

Server Risk Level/ TEC Event Severity High Priority Medium Priority Low Priority

Critical/Fatal

Alarm Point Call/Operator Call (24/7)

Remedy Ticket Text Page E-Mail TEC Console

Alarm Point Call/Operator Call (Work Hours)

Remedy Ticket Text Page E-Mail TEC Console

Alarm Point Call/Operator Call (Work Hours)

Remedy Ticket Text Page E-Mail TEC Console

Warning/Minor

Remedy Ticket E-Mail TEC Console

TEC Console TEC Console

Harmless/Unknown

TEC Console TEC Console TEC Console

4.10 Figure 10: Tivoli Response Framework Matrix

4.10.4 Help Desk Brain Knowledgebase Entries

When the need for a new Brain Knowledgebase entry is identified, the following outlines the necessary steps.

Notify the Help Desk Manager and Help Desk Coordinator The Help Desk Manager and Coordinator will email Brain Knowledgebase Entry

Template to the requestor Create the new Entry using the template as a guide Email the new Entry to the Help Desk Manager and Coordinator The new Entry will be reviewed by the Help Desk Manager, Coordinator and the LINKS

Help Desk Lead Agent If no corrections or additions are necessary the new Entry will be scheduled, then added

to the Brain Knowledgebase

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

19

Page 25: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

If there were corrections or additions made the to the new Entry, it will be sent back to the requestor for review and approval

The new Entry is then emailed back to the Help Desk Manager and Coordinator It is reviewed once again by the Help Desk Manager , Coordinator and LINKS Lead Help

Desk Agent The LINKS Lead Agent schedules and adds the new Entry to the Brain Knowledgebase

4.11 Problem Management Metrics

Following are current reports of Problem/Incident Management metrics, which can measure the effectiveness of the process:

Key Performance IndicatorsNumber of Tickets generated per Day, week, and Month.Number of Tickets resolved Monthly at Level 1Monthly Top 10 Category of tickets created using CTIs.Number of Tickets processed per agent MonthlyNumber of Tickets escalated to Level 2 MonthlyMonthly Satisfaction SurveysMonthly LINKS Help Desk Services ReportNumber of Tickets generated by Tivoli daily

Remedy Problem Tickets by CategoryApplicationsHardwareNetworkRemote AccessRestoreSecurityServerVoiceWeb Services Event

Problem Priority1 - Urgent2 - High3 - Medium4 - Low

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

20

Page 26: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

4.12 Problem/Incident Management Tool Capabilities

The following are capabilities that have been evaluated and implemented:

Request/problem status communication Ability to assign priority to problems Interface with Tivoli Event Management Tool to create Problem tickets from Events Ability to provide current status of all tickets Ability to forward ticket based on escalation status matrix Ability to provide status and analysis reports Logging of Problem data in a database Access to Asset Management information Automated notification when problems are transferred from queue to queue Simple and quick entry and update of problem tickets High availability for the Remedy application and data Solicit and retrieve customer satisfaction information via Satisfaction Survey emailed to user

after resolution of Problem/Incident Ability to design custom reports to extract desired data

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

21

Page 27: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

5.0 Appendix A– Sample Problems

5.1 L&I Enterprise Problem/Incident Examples

The following are Problem/Incident examples given the use of current CTIs within Remedy

Security/Password/CWOPA - CWOPA security password resets and/or account unlocks Voice/Voice/Dial Tone - Network problems experienced by a site concerning phone issues Network/Connection/Router - Network problems experienced by a site related to

connectivity issues and/or performance problems Hardware/Network Printer/IBM-Lexmark - Local and/or network printer issues Applications/CWDS-BWDP/Staff Access - Application problems concerning CWDS Applications/UCMS-BBAD/Staff Access - Application problems concerning UCMS (Applications/DeskTop/Adobe - Application problems encountered locally on user’s PC Applications/Operating Systems/Windows 2000 - Application problems encountered

concerning operating system errors or performance issues Applications/Email/Outlook-CWOPA - Application problems encountered concerning email

operation Hardware/PC/Hard Disk Drive - Computer hardware problems encountered by users related

to hard disk driver errors Hardware/PC/Network Card - Computer hardware problems encountered by users

specifically related to network connectivity Server/Hardware/Hard Disk Drive - Server Hardware problems experienced by users

generated by faulty hard disk drives Hardware/MainFrame/CPU - Mainframe system problems

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

22

Page 28: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

6.0 Appendix B – Acronyms

6.1 L&I Acronyms

Acronym DefinitionACD System Automatic Call Distribution SystemBBAD Bureau of Business Application DevelopmentBBAD/CI Bureau of Business Application Development/Compensation and Insurance DivisionBBAD/WFD Bureau of Business Application Development/Workforce Development DivisionBBAD/UC Bureau of Business Application Development/Unemployment Compensation DivisionBBAD/OVR Bureau of Business Application Development/Occupational and Vocational Rehabilitation DivisionBBAD/SLMR Bureau of Business Application Development/Safety and Labor-Management Relations DivisionBEA Bureau Of Enterprise ArchitectureBEA/DMDB Bureau Of Enterprise Architecture/Data Management and Database Management DivisionBEA/ERD Bureau Of Enterprise Architecture/Engineering and Research DivisionBEA/SDCD Bureau Of Enterprise Architecture/Standards Development and Compliance DivisionBES Bureau of Enterprise ServicesBES/CoE Bureau of Enterprise Services/Business Center of Excellence DivisionBES/CRD Bureau of Enterprise Services/Customer Relations DivisionBES/PMD Bureau of Enterprise Services/Project Management DivisionBES/SD Bureau of Enterprise Services/Security DivisionBIO Bureau Infrastructure and OperationsBIO/ID Bureau Infrastructure and Operations/Infrastructure DivisionBIO/NSS Bureau Infrastructure and Operations/Network Support Services DivisionBIO/SFO Bureau Infrastructure and Operations/Server Farm Operations DivisionBIO/MFO Bureau Infrastructure and Operations/Mainframe Operations DivisionBWDP Bureau of Workforce Development PartnershipCIO Chief Information OfficerCTI Category, Type, ItemCWDS Commonwealth Workforce Development SystemESM Enterprise System Management IS/IT Information Systems/Information TechnologyIT Information TechnologyITIL Information Technology Infrastructure LibraryMAN Metropolitan Area NetworkOIT Office of Information TechnologyPIC Problem/Incident CoordinatorPM/IM Problem Management/Incident ManagementSLA Service Level Agreement

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

23

Page 29: IBM GTS - PA Web viewWorks as a team to resolve outstanding support problems and/or requests to even workload, establish priorities, and meet deadlines Escalates and works with appropriate

SME Subject Matter ExpertsSOA Service Oriented Architecture

SPOC Single Point of ContactTEC Tivoli Enterprise ConsoleUCMS Unemployment Compensation Modernization SystemWan Wide Area Network

Department of Labor and Industry – Office of Information TechnologySystems Management Plan – Problem/Incident ManagementDocument Location: T:\All (Common area for all OIT Staff)\Enterprise Systems Management Documents (ITIL)

24