18
www.kit.edu KIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) ITIL and Grid services at GridKa CHEP 2009, 21 - 27 March, Prague Tobias König , Dr. Holger Marten Steinbuch Centre for Computing (SCC)

Steinbuch Centre for Computing (SCC)

  • Upload
    alesia

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

ITIL and Grid services at GridKa CHEP 2009, 21 - 27 March, Prague Tobias König , Dr. Holger Marten. Steinbuch Centre for Computing (SCC). Introduction. Karlsruhe Institute of Technology: The cooperation of the Forschungszentrum Karlsruhe GmbH (FZK) and the University of Karlsruhe (TH). - PowerPoint PPT Presentation

Citation preview

Page 1: Steinbuch Centre  for  Computing (SCC)

www.kit.eduKIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH)

ITIL and Grid services at GridKaCHEP 2009, 21 - 27 March, Prague Tobias König, Dr. Holger Marten

Steinbuch Centre for Computing (SCC)

Page 2: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

Main aim of GridKa is offering sustainable Grid services

The availability and reliability of IT Services directly affects The users‘ satisfaction The reputation of the Computing Centre SCC

and also not to forget the economical aspects

Thus it is important to implement processes and tools that increase the availability and reliability of the IT services. The Information Technology Infrastructure Library (ITIL) is a process-orientated framework for the management of IT processes

| Tobias König | CHEP 2009 | 21 - 27 March Prague2

GridKa: The German Tier-1 centre hosted by the SCC

Steinbuch Centre for Computing: The Computing Centre of the KIT

Introduction

Karlsruhe Institute of Technology: The cooperation of the Forschungszentrum Karlsruhe GmbH (FZK) and the University of Karlsruhe (TH)

Page 3: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

General information

The SCC puts emphasis on ITIL v2 Service Support processes Configuration Management Incident Management and

Service Desk Problem Management Change Management Release Management

Only these ITIL processes are currently implemented at the SCC. These processes are the most relevant and most important processes of ITIL for the SCC

The other ITIL Service Delivery processes like Service Level Management Availability Management Capacity Management Continuity Management Financial Management

are not ITIL standardized and implemented at the SCC at the moment and in consequence they are not part of this talk

| Tobias König | CHEP 2009 | 21 - 27 March Prague3

Page 4: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

| Tobias König | CHEP 2009 | 21 - 27 March Prague4

Structure of this talk

Incident Management

Problem Management

Change Management

Configuration Management

ITIL Processes at GridKa

Respon-sibility

(internal roles and

groups)

Internal view

External view

Page 5: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

| Tobias König | CHEP 2009 | 21 - 27 March Prague5

External/Internal Configuration Management

Incident Management

Problem Management

Change Management

Configuration Management

Respon-sibility

(internal roles and

groups)

Internal view

External view

ITIL Processes at GridKa

Page 6: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

External/Internal Configuration Management

The Configuration Management Database (CMDB) is the basis of all ITIL processes

Following items are stored within the CMDB: All the Configuration Items

CIs (HW, SW, Racks) The CIs are related to the

service components The IT services themselves

are compositions of the service components.

Etc.| Tobias König | CHEP 2009 | 21 - 27 March Prague6

Configuration Management

Respon-sibility

(internal roles and

groups)

Internal view

External view

Service components

Services

CIs

Page 7: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

| Tobias König | CHEP 2009 | 21 - 27 March Prague7

Incident Management

Incident Management

Problem Management

Change Management

Configuration Management

ITIL Processes at GridKa

Respon-sibility

(internal roles and

groups)

Internal view

External view

Page 8: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

Incident Management

Prime service hours:Tickets are directly assigned from the 1st level support to the 2nd level support, the technical experts. The experts start immediately to solve the incident

On-call service hours:An On-Call-Engineer (OCE) starts immediately to solve the incident

Documentation of solution| Tobias König | CHEP 2009 | 21 - 27 March Prague8

Incident Management

Problem Management

Respon-sibility

(internal roles and

groups)

Internal view

External view

Help Desk 1st Level Support

Experts 2nd Level Support

5*12 Experts /7*24 On-Call Engineers

Page 9: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

| Tobias König | CHEP 2009 | 21 - 27 March Prague9

Problem Management

Incident Management

Problem Management

Change Management

Configuration Management

ITIL Processes at GridKa

Respon-sibility

(internal roles and

groups)

Internal view

External view

Page 10: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

Problem Management

Prime service hours: Meeting of the task force (OCEs and Experts)

On-call service hours:The OCE can call other experts the next morning or another OCE during the whole night

Problem Management is the detection of the under-lying causes of an incident and their subsequent resolution and prevention.

| Tobias König | CHEP 2009 | 21 - 27 March Prague10

Incident Management

Problem Management

Respon-sibility

(internal roles and

groups)

Internal view

External view

Task Force: Problematic tickets, weekly reports, Availability, 7*24

Operations hand over

Experts 2nd Level Support

7*12 Experts /7*24 On-Call Engineers

Page 11: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

| Tobias König | CHEP 2009 | 21 - 27 March Prague11

External/Internal MonitoringTools to support the Incident and Problem Management

Incident Management

Problem Management

Change Management

Configuration Management

Respon-sibility

(internal roles and

groups)

Internal view

External view

ITIL Processes at GridKa

Page 12: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

External/Internal MonitoringTools to support the Incident and Problem Management

External: A incident or problem can be recognized from outside by the user or the external monitoring system. These alarms automatically create a ticket

Internal: An incident or problem can be recognized from GridKa’s Nagios system

Planned workflow between both systems

| Tobias König | CHEP 2009 | 21 - 27 March Prague12

Incident Management

Problem Management

Respon-sibility

(internal roles and

groups)

Internal view

External view

External Monitoring / SAM tests

Internal Monitoring / Nagios, Ganglia

External ticket system / GGUS

Internal ticket system / Remedy

7*24 Knowledgebase / Wiki and Remedy

Page 13: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

| Tobias König | CHEP 2009 | 21 - 27 March Prague13

Change Management

Incident Management

Problem Management

Change Management

Configuration Management

ITIL Processes at GridKa

Respon-sibility

(internal roles and

groups)

Internal view

External view

Page 14: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

Change Management

For changes of the IT infrastructure a Request for Change (RfC) is created in the CMDB

There are 3 kinds of RfCs: Emergency RfC Standard RfC Planned RfC

GridKa’s downtimes are announced externally via EGEE Broadcasts and via the Change Calendar at http://www.gridka.de/monitoring

| Tobias König | CHEP 2009 | 21 - 27 March Prague14

Change Management

Respon-sibility

(internal roles and

groups)

Internal view

External view

Change calendar for

Maintenances

Planned RfC

Standard RfC

Emergency RfC

CMDB (RfCs)

Page 15: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

| Tobias König | CHEP 2009 | 21 - 27 March Prague15

Special management roles

Incident Management

Problem Management

Change Management

Configuration Management

ITIL Processes at GridKa

Respon-sibility

(internal roles and

groups)

Internal view

External view

Page 16: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

Special management roles

The Configuration Manager Responsible that all CIs are

stored in the CMDB and that the DBs are up to date

The Change Manager Responsible for the change

process and the formal correctness of the RfCs

Organizes the Change Advisory Board (CAB)

Special role: Contact person to the IT Service management department

| Tobias König | CHEP 2009 | 21 - 27 March Prague16

Change Management

Respon-sibility

(internal roles and

groups)

Internal view

External view

Change Manager

CAB Change Advisory Board

Configuration Management

Configuration Manager

GridKa Contact person to Service ManagementCollecting data for CMDB Supporting experts in

filing RfCs

Page 17: Steinbuch Centre  for  Computing (SCC)

www.kit.eduKIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH)

Thank you for your attention!

Steinbuch Centre for Computing (SCC)

Page 18: Steinbuch Centre  for  Computing (SCC)

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)

| Tobias König | CHEP 2009 | 21 - 27 March Prague18

Discussion

Incident Management

Problem Management

Change Management

Configuration Management

ITIL Processes at GridKa

Respon-sibility

(internal roles and

groups)

Internal view

External view

Change Manager

Configuration Manager

Task Force: Problematic tickets, weekly reports, Availability, 7*24

Operations hand over

Help Desk 1st Level Support

Experts 2nd Level Support

5*12 Experts /7*24 On-Call Engineers

CAB Change Advisory Board

GridKa Contact person to Service Management

Collecting data for CMDB organizing Task Force and 7*24 operationsSupporting experts in

filing RfCs

Change calendar for

Maintenances

Planned RfC

Standard RfC

Emergency RfCService components

External Monitoring / SAM tests

Internal Monitoring / Nagios, Ganglia

CMDB (CIs: Services, Service components, HW, SW, RfCs)

SLA

Services External ticket system / GGUS

7*24 Knowledgebase / Wiki and Remedy

Internal ticket system / Remedy

Workflow between Ticket Systems