7
GGUS summary (2 weeks) VO User Team Alarm Total ALICE 1 0 1 2 ATLAS 14 116 6 136 CMS 4 1 1 6 LHCb 1 20 1 22 Totals 20 137 9 166 1

GGUS summary (2 weeks)

  • Upload
    steffi

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

GGUS summary (2 weeks). 1. Support-related events since last MB. We need WLCG shifters, alarmers, management to give us meaningful values for the GGUS ‘Problem Type’ field, in order for periodic reporting to show better weak areas in support. - PowerPoint PPT Presentation

Citation preview

Page 1: GGUS summary (2 weeks)

GGUS summary (2 weeks)

VO User Team Alarm Total

ALICE 1 0 1 2

ATLAS 14 116 6 136

CMS 4 1 1 6

LHCb 1 20 1 22

Totals 20 137 9 166

1

Page 2: GGUS summary (2 weeks)

04/21/23 WLCG MB Report WLCG Service Report 2

Support-related events since last MB

• We need WLCG shifters, alarmers, management to give us meaningful values for the GGUS ‘Problem Type’ field, in order for periodic reporting to show better weak areas in support.

•There were 9 ALARM tickets since the last MB (2 weeks), 5 of which were real, all submitted by ATLAS. Details follow…

Page 3: GGUS summary (2 weeks)

ATLAS ALARM->CERN-CNAF TRANSFERS

•https://gus.fzk.de/ws/ticket_info.php?ticket=62761

04/21/23 WLCG MB Report WLCG Service Report 3

What time UTC What happened

2010/10/05 9:13 GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_Italy.

2010/10/05 10:23 Site acknowledges ticket and finds a StoRM backend problem.

2010/10/05 12:03 Service restored. Site puts the ticket to ‘solved’ and refers to GGUS:62745 for details.

2010/10/11 Submitter ‘verifies’ ticket GGUS:62745. Not sure how ‘symptomatic’ the solution was…

Page 4: GGUS summary (2 weeks)

ATLAS ALARM->TRANSFERS TO .FR CLOUD

•https://gus.fzk.de/ws/ticket_info.php?ticket=62871

04/21/23 WLCG MB Report WLCG Service Report 4

What time UTC What happened

2010/10/08 5:56 GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to NGI_France.

2010/10/08 6:31 Site acknowledges ticket and finds a network problem preventing all DB server access.

2010/10/08 7:29 Service restored.

2010/10/08 10:41 Site puts ticket to status ‘solved’.

Page 5: GGUS summary (2 weeks)

ATLAS ALARM-> CERN SLOW LSF

•https://gus.fzk.de/ws/ticket_info.php?ticket=62467

04/21/23 WLCG MB Report WLCG Service Report 5

What time UTC What happened

2010/09/27 15:34

GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_CERN.

2010/09/27 16:01

Operator acknowledges ticket and contacts the expert.

2010/09/27 16:37 Expert’s 1st diagnosis. Too many queries.

2010/09/27 20:10 Service mgr kills a home-made robot by another experiment launching >> bjob queries and puts ticket to status ‘solved’.

Page 6: GGUS summary (2 weeks)

ATLAS ALARM-> CERN SLOW AFS

•https://gus.fzk.de/ws/ticket_info.php?ticket=62662

04/21/23 WLCG MB Report WLCG Service Report 6

What time UTC What happened

2010/10/01 7:13 GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_CERN.

2010/10/01 7:33 Operator acknowledges ticket and contacts the expert.

2010/10/01 9:37 IT Service manager re-classifies in CERN Remedy PRMS.

2010/10/11 15:33

Still ‘in progress’. Reminder sent during this drill.

Page 7: GGUS summary (2 weeks)

ATLAS ALARM-> CERN CASTOR

•https://gus.fzk.de/ws/ticket_info.php?ticket=62688

04/21/23 WLCG MB Report WLCG Service Report 7

What time UTC What happened

2010/10/01 16:24

GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_CERN.

2010/10/01 16:41

Operator acknowledges ticket and contacts the expert.

2010/10/01 16:42

Expert starts investigation.

2010/10/01 17:23

Solved. PutDONE in SRM not propagated to CASTOR. Done by hand.