5
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org SA1 Operations Manual P. Strange RAL, CCLRC UK

INFSO-RI-508833 Enabling Grids for E-sciencE SA1 Operations Manual P. Strange RAL, CCLRC UK

Embed Size (px)

Citation preview

Page 1: INFSO-RI-508833 Enabling Grids for E-sciencE  SA1 Operations Manual P. Strange RAL, CCLRC UK

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

SA1 Operations Manual

P. Strange

RAL, CCLRC

UK

Page 2: INFSO-RI-508833 Enabling Grids for E-sciencE  SA1 Operations Manual P. Strange RAL, CCLRC UK

ARM-7 Krakow – 15/16 May 2006

2

Enabling Grids for E-sciencE

INFSO-RI-508833

What is it called?

What should this document be called?

In many documents in circulation there is no common name for this….

•SA1 Operations Procedure Manual•Operations Manual•Operations Procedure•etc

Page 3: INFSO-RI-508833 Enabling Grids for E-sciencE  SA1 Operations Manual P. Strange RAL, CCLRC UK

ARM-7 Krakow – 15/16 May 2006

3

Enabling Grids for E-sciencE

INFSO-RI-508833

What are things inside SA1 called?

Within the document (and other docs) there are inconsistencies

CIC-on-duty, Operator-on-duty, grid operator-on-duty……CIC-portal, operations portal, on-duty portal…….COD dashboard, operations dashboard, on-duty dashboard…..

Page 4: INFSO-RI-508833 Enabling Grids for E-sciencE  SA1 Operations Manual P. Strange RAL, CCLRC UK

ARM-7 Krakow – 15/16 May 2006

4

Enabling Grids for E-sciencE

INFSO-RI-508833

Escalation procedure

Step Deadline Escalation procedure COD Action Label

1 3 1st mail to site admin and ROC

2 3 2nd mail to ROC and site admin

3 <5 final mail to ROC followed up by a phone call notifying ROC that this will go forward to the next weekly operations meeting for discussion

4 Discuss at the next weekly operations meeting

5 Ask ROC to suspend site

Page 5: INFSO-RI-508833 Enabling Grids for E-sciencE  SA1 Operations Manual P. Strange RAL, CCLRC UK

ARM-7 Krakow – 15/16 May 2006

5

Enabling Grids for E-sciencE

INFSO-RI-508833

Suspension of sites

• For normal course of operations, a site status would be in “production”. For unscheduled troubles that are not addressed by either the site or the ROC, the COD will apply an “escalation step procedure” described in section 7.6 and also section 8. The final step is suspension, and the site is taken out of the grid resources. For the site to be suspended, a given ROC would have to disregard answering several mails and phone calls over a period of more than two weeks and not join the weekly operations meeting when asked to. As soon as the site’s status is modified, the ROC would get another mail of notification.

• Before this happens, the ROC should make contact with the senior people in the federation, the site and the COD. After, the ROC would have to re-certify the site before its status is put into “production” again.

• It is well understood, that such “suspending a site” action may directly apply in emergency cases, e.g. security incidents. The escalation procedure is then by-passed totally by either the ROC or the COD.

• This procedure needs clarifying into simple points………