14
Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Embed Size (px)

Citation preview

Page 1: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

GridPP: Experiment Status

&User Feedback

Dan ToveyUniversity Of Sheffield

Page 2: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

Introduction

This talk will be in two parts:

1. The good news Details of Grid use by the experiments

2. The less good news Feedback from the experiments regarding their experiences

Page 3: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

ATLAS Grid Use• Almost all resources for ATLAS are Grid-based

– Three Grid flavours to work with – LCG-2, NorduGrid, Grid3

– Considerable issues of interoperability/federation• Next large exercise is Rome Physics Workshop in June

– Generation• mixture of Grid and non-Grid, but much non-Grid

for convenience– Simulation/Digitisation/Reconstruction

• All on Grid– Analysis

• Some Grid-based analysis already, distributed analysis being rolled-out in Spring

• Rome will use a mixture of Grid/non-Grid analysis

Page 4: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

ATLAS Issues

• Interoperability– Currently, production system has to layer job scheduling over

system for each deployment– Absolute need for a unified file catalogue system

• Currently layer additional catalogue over others

• Information system/policy– Inaccurate advertisement of sites– SE saturation

• Internal – production system need better clean-up and more robust back-up

• SE should advertise if it is really for storage! SCR$MONTH class required?

• Lines of reporting need to be improved/clarified

• LCG issues– LCG-Castor failures– RLS corruption

• Resource issues– Still trying to ensure required resources for 2007/2008

Page 5: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

General CMS Outlook

• Tool development

– Going very well: leading contributions from the UK in the most important areas

– Integration between tools is starting

– Moving away from LCG-style data management, for now

– Our modular approach can re-integrate LCG tools later on if needed

• Collaboration status / plans

– Computing Model now blessed and publicly available

– Computing TDR well under way

– UK making a strong contribution

– Use of Tier-1 / Tier-2 resources in the UK will start to grow rapidly as Grid-enabled DST analysis begins

Page 6: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

LCGAug 2004DIRAC 27%LCG 73%

LHCb - 2004

Production Desktop

Provides control and decomposes

complex workflows into steps and

modules.

DIRAC alone

LCG inaction

LCG paused

LCG restarted

186 M Produced Events DC04 Phase 1186M events. 424 CPU years

LCG(UK) Tier 1 7.7%

Tier 2 London 3.9%Tier 2 South 2.3%Tier 2 North 1.4%

DIRAC(UK) Imperial 2.0%Liverpool 3.1%Oxford0.1%ScotGrid 5.1%

3-5

M e

vents

/day

1.8M

eve

nts/d

ay

DIRACMay 2004DIRAC 89%LCG 11%

~50% run on LCG resources

LCG Efficiency = 61%UK second largest producer (25%) after CERN.

Page 7: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

DC05 – Real Time Data Challenge Mimic LHC data taking, test HLT, reconstruction and streaming software from pre-simulated data. Production phase (150M events) April- June 05.

LHCb-2005DC04 Phase 2 – Stripping (Ongoing)Using Production Desktop developed in UK (G. Kuznetsov).

Data reduction on 65TB distributed over Tier 1 sites.

Using DIRAC with input data and LCG for data access.

SRM was on critical path – available at CERN, PIC, CNAF. Production version unavailable at RAL, UK not participating.

DC04 Phase 3 – End User AnalysisUse GANGA Grid interface (UK project).

Improvements Jan–March 05 (A. Soroko at CERN)

User training started (e.g. Cambridge event funded by GridPP in December 04)

Distributed analysis from March 2005 with datasets replicated to RAL.

Production Desktop

Page 8: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

Infrastructure

• QCDgrid is primarily a data-grid and is aimed at providing a storage and processing

infrastructure for QCDOC (5 Tflop-sustained QCD simulation

facility)• QCDOC is now installed and being shaken down

in Edinburgh along with ‘Tier 1’ 50 Tbyte store.• ‘Tier 2’ storage nodes have been installed in

Edinburgh, Liverpool, Swansea and Southampton. (4 x 12.5 Tbyte)

• Additional storage/access nodes are operating at RAL and Columbia

• Processing clusters at Edinburgh (QCDOC FE), Liverpool, Southampton,….

Page 9: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

Usage and Uptake

• Run by Grid administrator + local sysadmins + users

• Main node @ Edinburgh + secondary @ Liverpool.

• All UKQCD primary data already stored and all secondary data produced by grid-retrieval and (currently non-grid) processing.

• Secondary data is also stored back on QCDgrid (metadata markup not yet automated).

• All QCDOC data to be archived on QCDgrid with NO tape copies.

• Job submission software allows submision to any grid-enabled system (only requires Globus)

• No. of actual users (~8) is quite low at the moment because production data from QCDOC has not (quite) started to flow.

Page 10: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

ZEUS MC Grid

• In 2005 60% of ZEUS MC is being produced via Grid.

• RAL, UCL and Scotgrid-Glasgow accept ZEUS VO.

• 27% of ZEUS Grid MC comes from UK.

ZEUS Grid MC Production 01/10/04 - 23/1/2005

Germany

Italy

UK

Canda

http://www-zeus.desy.de/~stadie/prodstatus.html

Page 11: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

ZEUS MC Grid

• Grid integrated with previous MC production.

• ZEUS MC production 2004 354 Million Events.

• Now with grid on target for 458 Million in 2005.

• Monte Carlo data from Grid is being used in ZEUS physics analysis.

ZEUS UK Grid MC Production

Glasgow

RAL(PP)

RAL(LCG)

UCL(CCC)

UCL

Page 12: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

User Feedback

• Degree of engagement between GridPP and experiments questioned by OsC.

• Questionnaire distributed to all experiments asking for views.

• Put simply (and bluntly): results suggest strong barriers to successful take-up of Grid in general and LCG in particular by most experiments.

• Dissatisfaction especially with – stability, – support, – site configuration,– data management and movement

• More work needed by LCG and GridPP to address these issues encouraging discussion yesterday of some issues.

Page 13: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

Usage

• Statistics collected for grid use:– Overall– GridPP supported overall– GridPP supported in UK

• Some reason for optimism– Some expts using Grid significantly

• Still large spikes at ~0…

Experim ent Take-up (Grid)

012345678

0 10 20 30 40 50 60 70 80 90 100

Percentage Grid Use by Task

No

. Exp

erim

ents

Generation

Simulation

Reconstruction

Analysis

Experim ent Take-up (LCG/SAMGrid)

012345678

0 10 20 30 40 50 60 70 80 90 100

Percentage LCG/SAMGrid Use by Task

No

. Exp

erim

ents

Generation

Simulation

Reconstruction

Analysis

Experim ent Take-up (GridPP - UK)

0

1

2

3

4

5

6

7

0 10 20 30 40 50 60 70 80 90 100

Percentage LCG/SAMGrid Use by Task

No

. Exp

erim

ents

Generation

Simulation

Reconstruction

Analysis

Page 14: Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield

Dan Tovey, University of Sheffield

General User Feedback

• Perception that Grid techniques are being forced upon experiments through e.g. switch to Grid-only access to Tier-1.

• Problem of conflict between UK Grid strategy and the priorities of wider international collaborations– This could potentially harm UK physics return.

• Concern that some experiments having to integrate complex existing software infrastructure with the Grid with little or no available effort or ear-marked financial support. – It is clear that Portal project is going to be key.

• Shift in emphasis needed towards more pro-active approach aimed at helping experiments to achieve their ‘real-world’ data processing goals GridPP2