48
Ian Bird Ian Bird CERN IT CERN IT LCG Deployment Area manager LCG Deployment Area manager EGEE Operations Manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

Embed Size (px)

Citation preview

Page 1: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

Ian BirdIan BirdCERN ITCERN ITLCG Deployment Area managerLCG Deployment Area managerEGEE Operations ManagerEGEE Operations Manager

LCG Status Report

LHCC Open Session

CERN28th June 2006

Page 2: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 20052

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Outline

Project Status Organisation for Phase II

Applications Area CERN Tier 0

Castor-2 Tier 0 infrastructure LHC networking

Grid Infrastructure Status Service Challenges – results and plans Regional centres Middleware status

Physics Support & Analysis Summary

Page 3: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 20053

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

The Worldwide LHC Computing Grid

Purpose Develop, build and maintain a distributed computing

environment for the storage and analysis of data from the four LHC experiments

Ensure the computing service … and common application libraries and tools

Phase I – 2002-2005 - Development & planning

Phase II – 2006-2008 – Deployment & commissioning of the initial services

Page 4: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 20054

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

WLCG Collaboration

The Collaboration ~130 computing centres 12 large centres

(Tier-0, Tier-1) 40-50 federations of smaller

“Tier-2” centres 29 countries

Memorandum of Understanding Agreed in October 2005, now being signed

Purpose Focuses on the needs of the 4 LHC experiments Commits resources

Each October for the coming year 5-year forward look

Agrees on standards and procedures

Page 5: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

Physics Support

Grid Deployment Board – chair Kors Bos (NIKHEF)

With a vote: One person from a major site in each country One person from each experimentWithout a vote: Experiment Computing Coordinators Site service management representatives Project Leader, Area Managers

Management Board – chair Project Leader

Experiment Computing CoordinatorsOne person fromTier-0 and each Tier-1 SiteGDB chairProject Leader, Area ManagersEGEE Technical Director

Architects Forum – chair Pere Mato (CERN)

Experiment software architects Applications Area Manager Applications Area project managers

Collaboration Board – chair Neil Geddes (RAL)

Sets the main technical directions

One person from Tier-0 and each Tier-1, Tier-2 (or Tier-2 Federation) Experiment spokespersons

Overview Board – chair Jos Engelen (CERN CSO)

Committee of the Collaboration Boardoversee the projectresolve conflicts

One person from Tier-0, Tier-1s Experiment spokespersons

Page 6: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 20056

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

More information on the collaboration

Boards and Committees

All boards except the

OB have open access to

agendas, minutes,

documents

Planning data: MoU Documents and Resource Data Technical Design Reports Phase 2 Plans Status and Progress Reports Phase 2 Resources and costs at CERN

http://www.cern.ch/lcg

Page 7: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

LCG Applications Area

Page 8: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 20058

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Merge of SEAL and ROOT projects

Single team working together successfully for more than one year

~50 % of SEAL functionality has been migrated to ROOT In use by the experiments (will be in production for this

year’s data challenges) What is left is easily maintainable (no new development)

Started to plan the migration of the second 50% Collected information from experiments Detailed plan in preparation In general no urgency from the experiments Will need to persuade experiments to migrate when

software is ready

Page 9: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 20059

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

AA Project status (1)

Software Process Infrastructure Project (SPI) Stable running of services and improving them: savannah,

hypernews, software installations, and software distributions Support experiments directly to provide complete software

configurations Support for new platforms: SLC4, MacOSX

Core Libraries and Services Project (ROOT) Many developments for the integration of Reflex and CINT.

Plan to release new system this fall Consolidation of the new Math libraries. New packages:

Multi-Variate analysis, Fast Fourier Transforms Many performance improvements in many areas (e.g. I/O

and Trees) Many new developments in PROOF: asynchronous queries,

connect/disconnect mode, package manager, monitoring, etc.

Improvements and new functionality in GUI and Graphics packages

Page 10: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200510

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

AA Project Status (2)

Persistency Framework Project (POOL & COOL) CORAL, a reliable generic RDBMS interface for Oracle, MySQL, SQLight

and FroNTier LCG 3D project Provides db lookup, failover, connection pooling, authentication, monitoring COOL and POOL can access all back-ends via CORAL CORAL also used as separate package by ATLAS/CMS online

Improved COOL versioning functionalities (user tags and hierarchical tags)

Simulation Project Improved tools for geometry model interchange (GDML) Extended framework for interfacing test beam simulations with Geant4

and Fluka. Physics analysis expected soon. Considerable effort on the study of hadronic shower shapes to resolve

discrepancies with test beam data. Improved regression suite to investigate and compare hadronic shower shapes

New C++ Monte Carlo generators (Pythia8, ThePEG/Herwig++) have been added to the generator library (GENSER)

Created new precise elastic process for protons and neutrons in Geant4 New efficient method to detect overlaps in geometries when constructed

and support for parallel geometries New Python interface module interfacing key Geant4 classes

Page 11: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

CERN Tier 0 and

LHC Networking status

Page 12: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200512

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

CERN Castor storage systemA CASTOR2 review took place at CERN on June 6th – 9th:Members : John Harvey (CERN, chair), Migual Branco (ATLAS), Don Petravick (FNAL),

Shaun de Witt (RAL)

Details and the final report http://indico.cern.ch/conferenceDisplay.py?confId=2916

Page 13: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200513

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Castor2 Highlights in 2006

ATLAS Tier 0 test in January at nominal rates (320 MB/s, no Tier 1 export)

Various large scale data challenges: SC4 data export from Castor2 disk pool at ~1.6 GB/s Castor2 disk pool stress tests at 4.3 GB/s, cf the expected load

of 4.5 GB/s aggregate for all 4 experiments during pp running Successful integration of 2 new tape storage systems from

IBM and STK with tested peak rates of 1.6 GB/s to tape Successful transition of all 4 experiments from Castor1 to

Castor2 Today ~ 1 PB disk space in Castor2 disk pools with ~2.5

million files on disk Castor2 disk pool for CMS served analysis data successfully

for ~ 1000 simultaneous clients with 1 GB/s aggregate performance

Second ATLAS Tier 0 test just started at nominal rates:

Page 14: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200514

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Tier 0 ramp-up12,000 spinning disks

2 PB of diskspaceMay 2006 Sep 2006 Feb 2007

space [TB]

servers space [TB]

servers

space [TB]

servers

Alice 78 20 231 ~60 500

Atlas 123 25 176 ~45 370

CMS 138 27 176 ~45 370

LHCb 121 26 188 ~45 370

total LHC

460 98 771 ~180 1610 ~480

SC4 187 40

ITDC 169 42 169 42 170 ~40

public ~200 ~100

total 816 180 940 220 ~2000

~600

boxes kSI2K

Today 2300 4300

2007 -1000+1200=2500

+5700=1000

0

2008 25000

Batch system

Page 15: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200515

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Computer Centre Electrical Infrastructure …

The new substation is operational Two power cuts were caused by the new equipment in

January (6th, 24th); reasons understood rapidly and fixed

Critical services maintained as designed during problems on May 16th; full services back within 3 hours after power restored.

1st new UPS module being installed will be commissioned by mid-July No capacity increase; replaces current UPS only

Additional UPS capacity only at end-2006 extremely tight schedule

requires removal/relocation of existing equipment from July 15th – August 15th!

and two month period for 2nd phase of foundation reinforcement

Page 16: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200516

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

… and Cooling infrastructure

Work (much) delayed wrt initial plan weather delays more than expected many safety concerns

Three major cooling problems since end-March (and other minor problems):

Focus has been to maintain critical lab services (network, admin services, email,…) physics services were shutdown to reduce heat load

Production chillers being commissioned now 1st unit in production June 19th; 2nd on June 23rd. 3rd unit in production by June 30th

Final configuration in by mid-July Future work

Installation of sensors: 2-3 per equipment row Completion of ducts on righthand (barn) side [4th chiller; yet to be funded]

Page 17: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200517

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

The new European Network Backbone

LCG working group with Tier-1s and national/ regional research network organisations

New GÉANT 2 – research network backbone

Strong correlation with major European LHC centres

Swiss PoP at CERN

Page 18: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200518

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Page 19: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

Grid Infrastructure

Page 20: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200520

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

LCG Service Hierarchy

Tier-0 – the accelerator centre Data acquisition & initial processing Long-term data curation Distribution of data Tier-1 centres

Canada – Triumf (Vancouver)France – IN2P3 (Lyon)Germany – Forschunszentrum KarlsruheItaly – CNAF (Bologna)Netherlands Tier-1 (Amsterdam)Nordic countries – distributed Tier-1

Spain – PIC (Barcelona)Taiwan – Academia SInica (Taipei)UK – CLRC (Oxford)US – FermiLab (Illinois) – Brookhaven (NY)

Tier-1 – “online” to the data acquisition process high availability

Managed Mass Storage – grid-enabled data service

Data-heavy analysis National, regional support

Tier-2 : ~120 centres (40-50 federations) in ~29 countries Simulation End-user analysis – batch and interactive

Page 21: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200521

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

LCG depends on 2 major science grid infrastructures …

The LCG service runs & relies on grid infrastructure provided by:

EGEE - Enabling Grids for E-ScienceOSG - US Open Science Grid

Page 22: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

22

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE: > 180 sites, 40 countries > 24,000 processors, ~ 5 PB storage

EGEE Grid Sites : Q1 2006

sites

CPU

EGEE: Steady growth over the lifetime of the project

Page 23: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

23

Enabling Grids for E-sciencE

INFSO-RI-508833

A global, federated e-Infrastructure

EGEE infrastructure~ 200 sites in 39 countries~ 20 000 CPUs> 5 PB storage> 35 000 concurrent jobs per day> 60 Virtual Organisations

EUIndiaGrid

EUMedGrid

SEE-GRID

EELA

BalticGrid

EUChinaGridOSGNAREGI

Page 24: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

24

Enabling Grids for E-sciencE

INFSO-RI-508833

Use of the infrastructureK-Jobs/Day - EGEE Grid - All VOs

0

5

10

15

20

25

30

35

40

45

Jun-05 Jul-05 Aug-05 Sep-05 Oct-05 Nov-05 Dec-05 Jan-06 Feb-06 Mar-06 Apr-06 May-06

month

K-J

ob

s/D

ay

.

alice atlas cms lhcb geant4 biomed

compchem egeode egrid esr fusion magic

ops planck Other VOs dteam

More than 35K jobs/day on the EGEE GridLHC VOs 30K jobs/day

Sustained & regular workloads of >35K jobs/day• spread across full infrastructure• doubling/tripling in last 6 months – no effect on operations

Several applications now depend on EGEE as their primary computing resource

Sustained & regular workloads of >35K jobs/day• spread across full infrastructure• doubling/tripling in last 6 months – no effect on operations

Several applications now depend on EGEE as their primary computing resource

Page 25: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

25

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE Operations Process• Grid operator on duty

– 6 teams working in weekly rotation CERN, IN2P3, INFN, UK/I, Ru,Taipei

– Crucial in improving site stability and management

– Expanding to all ROCs in EGEE-II• Operations coordination

– Weekly operations meetings– Regular ROC managers meetings– Series of EGEE Operations Workshops

Nov 04, May 05, Sep 05, June 06• Geographically distributed responsibility

for operations:– There is no “central” operation– Tools are developed/hosted at different sites:

GOC DB (RAL), SFT (CERN), GStat (Taipei), CIC Portal (Lyon)

• Procedures described in Operations Manual

– Introducing new sites– Site downtime scheduling– Suspending a site– Escalation procedures– etc

Highlights:• Distributed operation• Evolving and maturing procedures• Procedures being in introduced into and shared with the related infrastructure projects

Page 26: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

26

Enabling Grids for E-sciencE

INFSO-RI-508833

Site testing

77% target 88%

89% 85% 83%

89% 68% 58%

77% 87% 68%

average (all sites)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

77% target 88%

89% 85% 83%

89% 68% 58%

77% 87% 68%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

• Measuring response times and availability:

Site Availability Monitor – SAM– Based upon Site Functional Test suite– monitoring services by running regular tests– basic services – SRM, LFC, FTS, CE, RB,

Top-level BDII, Site BDII, MyProxy, VOMS, R-GMA, ….

– VO environment – tests supplied by experiments

– results stored in database– displays & alarms for sites, grid operations,

experiments– high level metrics for management– integrated with EGEE operations-portal - main

tool for daily operations– Mechanism and tests shared with OSG

Page 27: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

27

Enabling Grids for E-sciencE

INFSO-RI-508833

Sustainability: Beyond EGEE-II

• Need to prepare for permanent Grid infrastructure– Maintain Europe’s leading position in global science Grids

– Ensure a reliable and adaptive support for all sciences

– Independent of short project funding cycles

– Modelled on success of GÉANT Infrastructure managed in collaboration

with national grid initiatives

Page 28: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

28

Enabling Grids for E-sciencE

INFSO-RI-508833

Structure

Federated model bringing together National Grid Initiatives (NGIs) to build a European organisation

EGEE federations wouldevolve into NGIs

Each NGI is a national body• Recognised at the national level• Mobilises national funding and resources• Contributes and adheres to international standards and policies• Operates the national e-Infrastructure• Application independent, open to new user communities and resource

providers

Page 29: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

29

OSG & WLCG

OSG IInfrastructurenfrastructure is a core piece of the WLCG.

OSG delivers accountable resources and cycles for LHC experiment production and analysis.

OSG Federates with other infrastructures.

Experiments see a seamless global computing facility

Page 30: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

30

Ramp up of OSG use last 6 months

OSG 0.4.1 deployment

OSG 0.4.0deployment

Page 31: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

31

Data Transfer by VOs

e.g. CMS

Page 32: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

32

Operations

Grid Operations Center.

Facility, Service and VO Support Centers.

Manual or automated flow of tickets within OSG and bridged to other Grids.

Ownership of problems at end-points and by GOC.

Guided by Operations Model, Standard Procedures, Support Center Agreementshttp://

osg.ivdgl.org/twiki/bin/view/Operations/WebHome

Page 33: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200533

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

WLCG Interoperability

Cross-grid job submission: Most advanced with OSG – cross job submission has been

put in place for WLCG Used in production by US-CMS for several months

EGEE Generic Info Provider installed on OSG site (now in VDT)

Allows all sites to be seen in info system Monitoring (GStat and SFT) can run on OSG sites EGEE clients installed on OSG-LCG sites Inversely – EGEE sites can run OSG jobs All use SRM SEs; File catalogues are application choice – LFC widely used

Support and operations: Workflows and processes being put in place and tested Operations workshop last week tried to finalise some of the

open issues

Page 34: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200534

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

LCG Service planning

full physicsrun

first physics

cosmics

2007

2008

2006Pilot Services – stable service from 1 June 06

LHC Service in operation – 1 Oct 06 over following six months ramp up to full operational capacity & performance

LHC service commissioned – 1 Apr 07

Page 35: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200535

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Service Challenges

Purpose Understand what it takes to operate a real grid servicereal grid service – run for weeks/months at a time (not

just limited to experiment Data Challenges) Trigger and verify Tier1 & large Tier-2 planning and deployment –

- tested with realistic usage patterns Get the essential grid services ramped up to target levels of reliability, availability,

scalability, end-to-end performance

Four progressive steps from October 2004 thru September 2006 End 2004 - SC1 – data transfer to subset of Tier-1s Spring 2005 – SC2 – include mass storage, all Tier-1s, some Tier-2s 2nd half 2005 – SC3 – Tier-1s, >20 Tier-2s –first set of baseline services

Jun-Sep 2006 – SC4 – pilot service

Autumn 2006 – LHC service in continuous operation – ready for data taking in 2007

Page 36: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200536

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

SC4 – the Pilot LHC Service from June 2006

A stable service on which experiments can make a full demonstration of experiment offline chain

DAQ Tier-0 Tier-1data recording, calibration, reconstruction

Offline analysis - Tier-1 Tier-2 data exchangesimulation, batch and end-user analysis

And sites can test their operational readiness Service metrics MoU service levels Grid services Mass storage services, including magnetic tape

Extension to most Tier-2 sites

Evolution of SC3 rather than lots of new functionality

In parallel – Development and deployment of distributed database services

(3D project) Testing and deployment of new mass storage services (SRM 2.2)

Page 37: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200537

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Sustained Data Distribution Rates: CERN Tier-1s

Centre ALICE ATLAS CMS LHCb Rate into T1 MB/sec (pp run)

ASGC, Taipei X X 100

CNAF, Italy X X X X 200

PIC, Spain X X X 100

IN2P3, Lyon X X X X 200

GridKA, Germany X X X X 200

RAL, UK X X X 150

BNL, USA X 200

FNAL, USA X 200

TRIUMF, Canada X 50

NIKHEF/SARA, NL X X X 150

Nordic Data Grid Facility

X X 50

Totals 1,600

Design target is twice these rates to enable catch-up after

problems

Page 38: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200538

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

SC4 T0-T1: Results

Target: sustained disk – disk transfers at 1.6GB/s out of CERN at full nominal rates for ~10 days

Result: just managed this rate on Easter Sunday (1/10)

Easter w/eTarget 10 day period

Page 39: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200539

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

ATLAS SC4 tests

From last week: initial ATLAS SC4 work Rates to ATLAS T1 sites close to target rates

ATLAS transfers Background transfers

Page 40: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200540

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Service readiness

Internal LCG review of services was held 8-9th June Mandate:

Assess the service readiness and preparations of the Tier 1 sites

Scope: all aspects of LCG except applications area First day:

Review of each Tier 1: status, planning, issues Second day:

Middleware plans and priorities (EGEE and OSG) Interoperability Experiment views of status of middleware Status of storage interface (SRM)

Difficult to assess overall status of sites – each Tier 1 is unique in its management, environment, issues

All are now taking the timescale seriously Final report from the review expected in July

Page 41: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200541

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Middleware: Baseline services

In June 2005 the set of baseline service were agreed: Basic set of middleware required from the grid infrastructures Agreed by all experiments – minor variations of priority Baseline service group, and later workshops documented

missing features LCG priorities for development agreed at Mumbai workshop

in Feb Now reflected in EGEE & OSG middleware development plans

gLite-3.0 (released in May for SC4) contains all of the baseline services

SRM v2.2 for storage interfaces has a longer timescale (Nov) Still reliability, performance, management issues to be

addressed gLite-3.0 is an evolution of the previous LCG-2.7 and gLite-1.x

middleware Deployed in production without disturbing production

environment Forms the basis for evolution of the services to add missing

features, improve performance and reliability Several services (FTS, LFC, VOMS, BDII) are used everywhere

(not just EGEE sites)

Page 42: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

Physics Support and Analysis

Page 43: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200543

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Supporting the experiments in grid activities

Original activity on the Grid has focused on large productions Essential activity Still requiring effort (middleware and experiments sw evolving)

Genuine need now for user analysis Big step forward compared to production Preparation stages still going on

Tools maturing All components are being finalised

Concrete signs of analysis activity

ALICE: Support production and analysis Integration and support

ATLAS: Distributed analysis

coordination and analysis (Ganga)

Experiment dashboard Integration and support Job reliability

CMS: Experiment dashboard Integration and support Job reliability

LHCb: Support analysis (Ganga) Integration and support

Page 44: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200544

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Analysis efforts (CMS)

6k analysis jobs/day It was negligible less than 1 year ago A factor of two increase since late

2005 Jobs to finalise Physics TDR

Page 45: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200545

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Analysis efforts (cont)

ALICE 3 tutorials for users have started (Jan

06) – more than 50 attendees Typically 15-20 active users

ATLAS and LHCb Use a common tool to expose users

(Ganga) Several demos and tutorials

CHEP06 presentation (U. Egede)

#users over last two months (2 servicesconnecting users to the grid)

Page 46: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200546

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Experiment dashboard

Originally proposed by CMS Now in production

ATLAS dashboard Similar concept and re-use of experience and software Preview available

ATLAS production

Aggregation of monitor information from all sources

Allow to follow the history of activity

Allow to correlate information (e.g. data sets and sites)

Allow to track down problems

Page 47: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200547

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Service reliability

Bring together monitoring of experiment-specific services and applications with that of the middleware components to study and improve the LCG service

Middleware weaknesses Infrastructure mis-configuration and instabilities

Feedback into LCG/EGEE Deployment & Middleware development

Example: 20th June Top “good” sites (“grid”

efficiency) MIT = 99.6% DESY = 100.% Bari = 100 % Pisa = 100% FNAL = 100% ULB-VUB = 96.8% KBFI = 100% CNAF = 99.6 ITEP = 100%

Page 48: Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006

October 7, 200548

LCG Status report [email protected] LHCC Open Meeting; 28th June 2006

Summary International science grid infrastructures are really operational

And relied upon for daily production use at large scale More than 200 sites in EGEE and OSG Real grid operations in place for over a year

LCG depends upon 2 major science grid infrastructures: EGEE and OSG ~130 computer centres in 49 countries Excellent global networking

Good understanding now of: Experiment computing models and requirements Agreement on the baseline grid services Experience of the problems and issues

But: Reliability must be improved The full computing models will be tested this year Big ramp-up needed in terms of capacity, number of jobs, Tier 2 sites

participating Will there be a scaling problem? must be tested in the next 12 months

Data will arrive next year No new developments make what we have work absolutely reliably, and be scaleable, and performant