Upload
steffi
View
37
Download
0
Embed Size (px)
DESCRIPTION
GGUS summary (2 weeks). 1. Support-related events since last MB. We need WLCG shifters, alarmers, management to give us meaningful values for the GGUS ‘Problem Type’ field, in order for periodic reporting to show better weak areas in support. - PowerPoint PPT Presentation
Citation preview
GGUS summary (2 weeks)
VO User Team Alarm Total
ALICE 1 0 1 2
ATLAS 14 116 6 136
CMS 4 1 1 6
LHCb 1 20 1 22
Totals 20 137 9 166
1
04/21/23 WLCG MB Report WLCG Service Report 2
Support-related events since last MB
• We need WLCG shifters, alarmers, management to give us meaningful values for the GGUS ‘Problem Type’ field, in order for periodic reporting to show better weak areas in support.
•There were 9 ALARM tickets since the last MB (2 weeks), 5 of which were real, all submitted by ATLAS. Details follow…
ATLAS ALARM->CERN-CNAF TRANSFERS
•https://gus.fzk.de/ws/ticket_info.php?ticket=62761
04/21/23 WLCG MB Report WLCG Service Report 3
What time UTC What happened
2010/10/05 9:13 GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_Italy.
2010/10/05 10:23 Site acknowledges ticket and finds a StoRM backend problem.
2010/10/05 12:03 Service restored. Site puts the ticket to ‘solved’ and refers to GGUS:62745 for details.
2010/10/11 Submitter ‘verifies’ ticket GGUS:62745. Not sure how ‘symptomatic’ the solution was…
ATLAS ALARM->TRANSFERS TO .FR CLOUD
•https://gus.fzk.de/ws/ticket_info.php?ticket=62871
04/21/23 WLCG MB Report WLCG Service Report 4
What time UTC What happened
2010/10/08 5:56 GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to NGI_France.
2010/10/08 6:31 Site acknowledges ticket and finds a network problem preventing all DB server access.
2010/10/08 7:29 Service restored.
2010/10/08 10:41 Site puts ticket to status ‘solved’.
ATLAS ALARM-> CERN SLOW LSF
•https://gus.fzk.de/ws/ticket_info.php?ticket=62467
04/21/23 WLCG MB Report WLCG Service Report 5
What time UTC What happened
2010/09/27 15:34
GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_CERN.
2010/09/27 16:01
Operator acknowledges ticket and contacts the expert.
2010/09/27 16:37 Expert’s 1st diagnosis. Too many queries.
2010/09/27 20:10 Service mgr kills a home-made robot by another experiment launching >> bjob queries and puts ticket to status ‘solved’.
ATLAS ALARM-> CERN SLOW AFS
•https://gus.fzk.de/ws/ticket_info.php?ticket=62662
04/21/23 WLCG MB Report WLCG Service Report 6
What time UTC What happened
2010/10/01 7:13 GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_CERN.
2010/10/01 7:33 Operator acknowledges ticket and contacts the expert.
2010/10/01 9:37 IT Service manager re-classifies in CERN Remedy PRMS.
2010/10/11 15:33
Still ‘in progress’. Reminder sent during this drill.
ATLAS ALARM-> CERN CASTOR
•https://gus.fzk.de/ws/ticket_info.php?ticket=62688
04/21/23 WLCG MB Report WLCG Service Report 7
What time UTC What happened
2010/10/01 16:24
GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_CERN.
2010/10/01 16:41
Operator acknowledges ticket and contacts the expert.
2010/10/01 16:42
Expert starts investigation.
2010/10/01 17:23
Solved. PutDONE in SRM not propagated to CASTOR. Done by hand.