20
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/ DB ES Andrea Sciabà Report from WG2 Andrea Sciabà

Report from WG2

Embed Size (px)

DESCRIPTION

Report from WG2. Andrea Sciabà. WG2 areas. Support tools Ticketing tools Accounting tools Request trackers Administration tools Underlying services Messaging services Information services WLCG operations and procedures. Support tools. Overview - PowerPoint PPT Presentation

Citation preview

Page 1: Report from WG2

Experiment Support

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

DBES

Andrea Sciabà

Report from WG2

Andrea Sciabà

Page 2: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

2Andrea Sciabà

WG2 areas

• Support tools– Ticketing tools– Accounting tools– Request trackers– Administration tools

• Underlying services– Messaging services– Information services

• WLCG operations and procedures

Page 3: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

3Andrea Sciabà

Support tools

• Overview– Tools mostly developed by other projects (OSG,

EGEE, EGI…)– WLCG heavily influenced their development– Rather mature by now

Page 4: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

4Andrea Sciabà

Technology and tools

• GGUS• Savannah• TRAC• JIRA• GOCDB• OIM• EGI operations portal

Page 5: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

5Andrea Sciabà

Ticketing tools and request trackers (1/2)• GGUS

– Used by all 4 experiments for incident reporting

• Savannah– Used by ATLAS, CMS, LHCb for internal investigation

before bridging incidents to GGUS (CMS) or to other trackers (ATLAS) for development and/or release management (LHCb)

• TRAC and JIRA– Used by some experiments (as CMS) as development

trackers but supporters make it available ‘as is’ so required improvements (e.g. on performance) are done on a best-effort basis

Page 6: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

6Andrea Sciabà

Ticketing tools and request trackers (2/2)

• Areas of improvement– GGUS

• Some external interfaces periodically break• Ensure continuous availability

– Savannah• Improve integration with other systems

– TRAC / JIRA• Experiments would like them to be officially supporte

• Areas of potential efficiency gains– GGUS: better reporting to avoid information repetition in multiple

meetings

• Largest use of operational effort• Missing areas

– Savannah future incertain

Page 7: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

7Andrea Sciabà

Accounting tools (1/2)

• Overview of technology and tools– APEL, Gratia, SGAS, DGAS

• APEL receives CPU accounting data from its clients and the other accounting systems

• Provides a single database of WLCG accounting data (~ 1 G jobs since 2004)

– EGI Accounting Portal• Provides summaries by site/month/VO/user/FQAN and

data can be plotted and downloaded• Authorisation to see data on users depends on role

– SAM/Nagios used to check that sites publish data and if this is published centrally

Page 8: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

8Andrea Sciabà

Accounting tools (2/2)

• Areas of improvement– Benchmarking: published data not reliable– SAM tests for accounting data publication do not check the

total of all batch systems, hence missing info may pass unnoticed

– Storage accounting: development of a portal under way in EMI; non-EMI SEs will have to provide data in the correct format

– Evolve Accounting Portal API in a full RESTful interface

• Areas of potential efficiency gains– Improved reliability from the redevelopment of the messaging

infrastructure; messaging used also by Gratia, etc.

• Largest use of operational effort– Not reported

• Missing areas– Not reported

Page 9: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

9Andrea Sciabà

Administration tools (1/2)

• Overview– GOCDB and EGI Operations portal provide

several critical functionalities• Information repository for all EGI sites and VOs• Downtime publication• Broadcasts

– GOCDB has a programmatic interface used to get info about registered sites, services and downtimes

– OIM provides very similar functionality for OSG

Page 10: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

10Andrea Sciabà

Administration tools (2/2)

• Areas of improvement– More updated info in GOCDB– Supported VOs

• Areas of potential efficiency gains– Seamless integration of GOCDB and OIM– Smarter and more reliable downtime notifications– Easier definition of new service types

• Largest use of operational effort– None identified

• Missing areas– A way to publish experiment news to a portal (similar

to the CERN IT Status Board)

Page 11: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

11Andrea Sciabà

Underlying services

• Overview– Messaging system and the information system

• Both developed by WLCG

– Will have to include batch systems as well

Page 12: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

12Andrea Sciabà

Technology and tools

• Active-MQ MSG system• BDII• GLUE• LDAP

Page 13: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

13Andrea Sciabà

Messaging system (1/2)

• Overview– Operated by EGI: two brokers at CERN, one at

AUTH and one at SRCE– Two more broker services at CERN for testing

and validation, one for ATLAS/DDM, one for IT-ES (each consisting of 2 prod and 1 test broker)

– Used by several applications• APEL• SAM• Ganga/DIANE monitoring• LFC catalogue synchronisation (EMI prototype)• ATLAS/DDM tracer service (prototye)• FTS monitoring

Page 14: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

14Andrea Sciabà

Messaging system (2/2)

• Areas of improvement– Security– scalability

• Areas of potential efficiency gains– Improve availability and reliability: now the

service must be stopped during some interventions

• Largest use of operational effort– None identified

• Missing areas– None identified

Page 15: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

15Andrea Sciabà

Information services (1/2)

• Overview– Covers several use cases

• Service discovery• Installed software• Storage capacity and accounting• Batch system queue status• Configuration• Installed capacity

– Fully distributed, hierarchical set of BDIIs, based on OpenLDAP

– Implements GLUE schema– Information providers generate the service

information

Page 16: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

16Andrea Sciabà

Information services (2/2)

• Areas of improvement– Stability: service info is prone to disappear, bad because use cases

shifted towards needing more stability– Information validity: info provider info very fragile, configuration very

error prone– Better policies for resource publication– Lower latency for dynamic information

• Areas of potential efficiency gains– Better validation tools– Accurate storage information would make storage accounting a lot

easier– Provide more powerful and user-friendly client tools

• Largest use of operational effort– Configuration and validation of information– Debugging IS problems for users and sites

• Missing areas– A continuous certification and auditing of the BDII information by WLCG

Page 17: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

17Andrea Sciabà

WLCG operations (1/4)

• Overview– Goals are:

• Efficient communication• Quick resolution of issues according to agreed targets• Coordination and decision• Well defined procedures

– Describes roles, bodies, communication channels and procedures

– Lots of experience accumulated– Quality is good but still manpower intensive– No visible decrease of incidents

Page 18: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

18Andrea Sciabà

WLCG operations (2/4)

• Technology and tools (so to speak…)– Daily meeting– Tier-1 service coordination meeting– GDB

• Roles and bodies– Security, information, data management officers– Site administrators– Site security officers– Experiment contact persons

Page 19: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

19Andrea Sciabà

WLCG operations (3/4)

• Procedures and policies– Scheduling downtimes

• Well defined rules to declare them

– Problem handling• Little in terms of formal procedures, issues and

incidents are handled and discussed in the daily meeting and the T1SCM

• SIR for major incidents are an essential tool• GDB also useful to discuss issues at a general level

Page 20: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

20Andrea Sciabà

WLCG operations (4/4)

• Areas of improvement– Sometimes the strength of the link between an

experiment and a site is not enough• The very need of site contacts can be seen as an issue…

– Improve communication of the experiment requirements to the sites (e.g. via VO cards)

• Areas of potential efficiency gains– To have a real WLCG operations team: now

experiments do most the computing operations– A better communication channel for the Tier-2’s (now

only the GDB)

• Largest use of operational effort• Missing areas