31
Tony Doyle - University of Glasgow UK UK title.open ( ); revolution title.open ( ); revolution {execute}; {execute}; LHC Computing Challenge LHC Computing Challenge Methodology? Methodology? H H ierarchical ierarchical I I nformation in a nformation in a G G lobal lobal G G rid rid S S upernet upernet Aspiration? Aspiration? HIGGS HIGGS DataGRID-UK DataGRID-UK Aspiration? Aspiration? ALL ALL Data Intensive Computation Data Intensive Computation Teamwork

UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Embed Size (px)

Citation preview

Page 1: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

title.open ( ); revolution {execute};title.open ( ); revolution {execute};

LHC Computing ChallengeLHC Computing Challenge

Methodology?Methodology?

HHierarchical ierarchical IInformation in a nformation in a GGlobal lobal GGrid rid SSupernetupernet

Aspiration?Aspiration?

HIGGSHIGGS

DataGRID-UKDataGRID-UK

Aspiration?Aspiration?

ALLALL Data Intensive Computation Data Intensive Computation Teamwork

Page 2: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

OutlineOutline

Starting PointStarting Point The LHC The LHC

Computing Computing ChallengeChallenge

Data HierarchyData Hierarchy DataGRIDDataGRID Analysis Analysis

ArchitecturesArchitectures

GRID Data GRID Data ManagementManagement

Industrial Industrial PartnershipPartnership

Regional CentresRegional Centres Today’s WorldToday’s World Tomorrow’s WorldTomorrow’s World SummarySummary

Page 3: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Starting PointStarting Point

Page 4: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Starting PointStarting Point

“Current technology would not be able to scale data to such an extent, which is where the teams at Glasgow and Edinburgh Universities come in.The funding awarded will enable the scientists to prototype a Scottish Computing Centre which could develop the computing technology and infrastructure needed to cope with the high levels of data produced in Geneva, allowing the data to be processed, transported, stored and mined. Once scaled down, the data will be distributed for analysis by thousands of scientists around the world. The project will involve participation from Glasgow University's Physics & Astronomy and Computing Science departments, Edinburgh University's Physics & Astronomy department and the Edinburgh Parallel Computing Centre, and is funded by the Scottish Higher Education Funding Council's (SHEFC Joint Research Equipment Initiative). It is hoped that the computing technology developed during the project will have wider applications in the future, with possible uses in astronomy, computing science and genomics observation, as well as providing generic technology and software for the next generation Internet.”

Page 5: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

The LHC Computing ChallengeThe LHC Computing Challenge

Detector for ALICE experiment

Detector forLHCb experiment

Page 6: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

A Physics EventA Physics Event

Gated electronics response from a proton-proton collisionGated electronics response from a proton-proton collision Raw data: hit addresses, digitally converted charges and timesRaw data: hit addresses, digitally converted charges and times Marked by a unique code:Marked by a unique code:

Proton bunch crossing number, RF bucket Event number

Collected, Processed, Analyzed, Archived….Collected, Processed, Analyzed, Archived…. Variety of data objects become associated Event “migrates” through analysis chain:

may be reprocessed; selected for various analyses; replicated to various locations.

Page 7: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

LHC Computing ModelLHC Computing Model

Hierarchical, distributed tiersHierarchical, distributed tiers

GRID ties distributed resources together GRID ties distributed resources together

Tier-2

Tier-1

Tier-0 Dedicated or QoS Network Links

ScotGRID

CERN

CERN

Universities

RALRAL

Page 8: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

coordination required at collaboration and group levels

Data StructureData Structure

Raw DataRaw Data

Reconstruction

Data Acquisition

Level 3 trigger

Trigger TagsTrigger Tags

Event Summary Data

ESD

Event Summary Data

ESD Event Tags Event Tags

Physics Models

Monte Carlo Truth DataMonte Carlo Truth Data

MC Raw DataMC Raw Data

Reconstruction

MC Event Summary DataMC Event Summary Data MC Event Tags MC Event Tags

Detector Simulation

Calibration DataCalibration Data

Run ConditionsRun Conditions

Trigger System

Page 9: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Physics AnalysisPhysics Analysis

ESD: Data or Monte CarloESD: Data or Monte Carlo

Event Tags Event TagsEvent Selection

Analysis Object DataAnalysis Object DataAnalysis Object DataAnalysis Object DataAnalysis Object Data

AOD

Analysis Object Data

AOD

Calibration DataCalibration Data

Analysis, Skims

Raw DataRaw Data

Tier 0,1Collaboration

wide

Tier 2Analysis

Groups

Tier 3, 4Physicists

Physics Analysis

Physics

Objects Physics

Objects

Physics

Objects

INC

RE

AS

ING

DA

TA

FLO

W

Page 10: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

ATLAS ParametersATLAS Parameters

Running conditions at startup:Running conditions at startup:

Raw event size ~2 MB (recently revised upwards...)Raw event size ~2 MB (recently revised upwards...)

2.7x102.7x1099 event sample event sample 5.4 PB/year, before data processing 5.4 PB/year, before data processing

““Reconstructed” events, Monte Carlo data Reconstructed” events, Monte Carlo data ~9 PB/year (2PB disk) ~9 PB/year (2PB disk)

CPU: ~2M SpecInt95 CPU: ~2M SpecInt95

CERN alone can handle only 1/3 of these resources

2005 2006 2007Average Luminosity (10^33) 0.1 1 10Trigger Rate (Hz) 100 270 400Physics Rate (Hz) 100 155 240Running (Equiv. Days) 14 100 100Physics Events (10^9) 0.1 2.7 2.4

Page 11: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Data HierarchyData Hierarchy

““RAW, ESD, AOD, TAG”RAW, ESD, AOD, TAG”

RAWRAW Recorded by DAQRecorded by DAQTriggered eventsTriggered events

Detector digitiDetector digitissationation~2 MB/event~2 MB/event

ESDESDPseudo-physical information:Pseudo-physical information:

Clusters, track candidates Clusters, track candidates (electrons, muons), etc.(electrons, muons), etc.

Reconstructed Reconstructed informationinformation

~100 kB/event~100 kB/event

AODAOD

Physical informationPhysical information::Transverse momentum, Transverse momentum,

Association of particles, jets, Association of particles, jets, (best) id of particles,(best) id of particles,

Physical info for relevant “objects”Physical info for relevant “objects”

Selected Selected informationinformation

~10 kB/event~10 kB/event

TAGTAGAnalysis Analysis

informationinformation~1 kB/event~1 kB/eventRelevant information Relevant information

for fast event selectionfor fast event selection

Page 12: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Testbed DataBaseTestbed DataBase

Object Model:Object Model:Atlas Simulated Raw Events Atlas Simulated Raw Events

bPEvent

bPEventObjVector

bPEventObj

bPSiDetector

bPSiDigit

bPMDT_Detector

bPMDT_Digit

bPCaloRegion

bPCaloDigit

bPTruthVertex

bPTruthTrack

System DB Raw Data DB1Raw Data DB2

...

Event Container Raw Data Container

PEvent #1 PEventObjeVector PEventObjVector :PEvent #2 PEventObjVector PEventObjVector :

PSiDetector PSiDigit ...PTRT_Detector PTRTDigit ...PMDT_Detector PMDT_Digit ...PCaloRigion PCaloDigit ...PTruthVertex PTruthTrack ... :

Page 13: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

LHC Computing ChallengeLHC Computing Challenge

Tier2 Centre ~1 TIPS

Online System

Offline Farm~20 TIPS

CERN Computer Centre >20 TIPS

RAL Regional Centre

US Regional Centre

French Regional Centre

Italian Regional Centre

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations

~100 MBytes/sec

~100 MBytes/sec

100 - 1000 Mbits/sec

•One bunch crossing per 25 ns

•100 triggers per second

•Each event is ~1 Mbyte

Physicists work on analysis “channels”

Each institute has ~10 physicists working on one or more channels

Data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~ Gbits/sec or Air Freight

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

~Gbits/sec

Tier Tier 00

Tier Tier 11

Tier Tier 33

Tier Tier 44

1 TIPS = 25,000 SpecInt95

PC (1999) = ~15 SpecInt95

ScotGRID++ ~1 TIPS

Tier Tier 22

Page 14: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

e.g. MySQL database daemon

Basic 'crash-me' and associated tests

Access times for basic insert, modify, delete, update database operations e.g.

(on 256Mbyte, 800MHz Red Hat 6.2 linux box)

Database Access BenchmarkDatabase Access Benchmark

350k data insert operations 149 seconds

10k query operations 97 seconds

350k data insert operations 149 seconds

10k query operations 97 seconds

Many applications require database functionalityMany applications require database functionality

Currently favoured HEP DataBase applicatione.g. BaBar, ZEUS software

Page 15: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

CPU Intensive ApplicationsCPU Intensive Applications

Numerically intensive simulations:Numerically intensive simulations: Minimal input and output data

ATLAS Monte Carlo (gg H bb)228 sec/3.5 Mb event on 800 MHz linux

box

Standalone physics applications:

1. Simulation of neutron/photon/electron interactions for 3D detector design2. NLO QCD physics simulation

Compiler Speed (MFlops)Fortran (g77) 27C (gcc) 43Java (jdk) 41

Compiler Tests:

Page 16: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Network Monitoring PrototypeNetwork Monitoring Prototype

Tools:Java

Analysis Studio

overTCP/IP

InstantaneousCPU Usage

ScalableArchitecture

Individual Node Info.

Page 17: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Analysis ArchitectureAnalysis Architecture

Converter

Algorithm

Event DataService

PersistencyService

DataFiles

AlgorithmAlgorithm

Transient Event Store

Detec. DataService

PersistencyService

DataFiles

Transient Detector

Store

MessageService

JobOptionsService

Particle Prop.Service

OtherServices

HistogramService

PersistencyService

DataFiles

TransientHistogram

Store

ApplicationManager

ConverterConverter

The Gaudi Framework - developed by LHCb

- adopted by ATLAS (Athena)

Page 18: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

GRID ServicesGRID Services

Grid ServicesGrid Services Resource Discovery Scheduling Security Monitoring Data Access Policy

Athena/Gaudi ServicesAthena/Gaudi Services Application manager

“Job Options” service

Event persistency service

Detector persistency

Histogram service

User interfaces

Visualization

DatabaseDatabase Event model

Object federations

Extensible interfaces and

protocols being specified

and developed:

Tools: 1. UML

2. Java

Protocols: 1. XML

2. MySQL DataGRID Toolkit

3. LDAP}

Page 19: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Virtual Data ScenarioVirtual Data Scenario

Example analysis scenario:Example analysis scenario: Physicist issues a query from Athena for a Monte Carlo dataset

Issues: How expressive is this query? What is the nature of the query: declarative Creating new queries and language

Algorithms are already available in local shared libraries

An Athena service consults an ATLAS Virtual Data Catalog

Consider possibilities:Consider possibilities: TAG file exists on local machine (e.g. Glasgow)

Analyze it

ESD file exists in a remote store (e.g. Edinburgh) Access relevant event files, then analyze that

RAW File no longer exists (e.g. RAL) Regenerate, re-reconstruct, re-analyze !!! GRID Data

Management

Page 20: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

GlobusGlobus

Page 21: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

GlobusGlobus

DataGRIDToolKit

Page 22: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

GRID Data ManagementGRID Data Management

Goal: develop middle-ware infrastructure to manage petabyte-scale data

Replica Manager

Data Mover

Data Accessor

Storage Manager

Castor HPSS

Data Locator

Meta Data Manager

Local Filesystem

Query Optimisation &Access Pattern Manag.

Secure Region

High Level Services

Medium Level Services

Core ServicesService levels reasonably well defined

Identify Key AreasWithin Software

Structure

Page 23: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

5 areas for development5 areas for development Data Accessor - hides specific storage system requirements.

Mass Storage Management group. Replication - improves access by wide-area caching. Globus

toolkit offers sockets and a communication library, Nexus. Meta Data Management - data catalogues, monitoring

information (e.g. access pattern), grid configuration information, policies. MySQL over Lightweight Directory Access Protocol (LDAP) being investigated.

Security - ensuring consistent levels of security for data and meta data.

Query optimisation - “cost” minimisation based on response time and throughput Monitoring Services group.

Identifiable UKContributions

RAL

Identifying Key AreasIdentifying Key Areas

RAL

Page 24: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

AstroGridAstroGrid

WP1 PROJECT MANAGEMENT

WP2 REQUIREMENTS ANALYSIS : existing functionality and future requirements; community consultation

WP3 SYSTEM ARCHITECTURES: benchmark and implement

WP4 GRID-ENABLE CURRENT PACKAGES : implement and test performance

WP5 DATABASE SYSTEMS : requirements analysis and implementation; scalable federation tools.

WP6 DATA MINING ALGORITHMS : requirements analysis, development and implementation

WP7 BROWSER APPLICATIONS : requirements analysis and software development

WP8 VISUALISATION : concepts and requirements analysis, software development.

WP9 INFORMATION DISCOVERY : concepts and requirements analysis, software development

WP10 FEDERATION OF KEY CURRENT DATASETS : e.g.. SuperCOSMOS, INT-WFS, 2MASS, FIRST, 2dF

WP11 FEDERATION OF NEXT GENERATION OPTICAL-IR DATASETS : esp. Sloan, WFCAM

WP12 FEDERATION of HIGH ENERGY ASTROPHYSICS DATASETS : esp. Chandra, XMM

WP13 FEDERATION of SPACE PLASMA and SOLAR DATASETS : esp. SOHO, Cluster, IMAGE

WP14 COLLABORATIVE DEVELOPMENT OF VISTA, VST, and TERAPIX PIPELINES

WP15 COLLABORATION PROGRAMME WITH INTERNATIONAL PARTNERS

WP16 COLLABORATION PROGRAMME WITH OTHER DISCIPLINES

Emphasis on High LevelGUIs etc

WP 1 Grid Workload Management A.Martin-QMW (0.5)

WP 2 Grid Data Management A.Doyle-Glasgow (1.5)

WP 3 Grid Monitoring services R.Middleton-RAL (1.8)

WP 4 Fabric Management A.Sansum-RAL (0.5)

WP 5 Mass Storage Management J.Gordon-RAL (1.5)

WP 6 Integration Testbed D.Newbold-Bristol (3.0)

WP 7 Network Services P.Clarke-PPNCG/UCL (2.0)

WP 8 HEP Applications N/A (?) (4.0)

WP 9 EO Science Applications ( c/o R.Middleton-RAL ) (0.0)

WP 10 Biology Applications ( c/o P.Jeffreys-RAL ) (0.1)

WP 11 Dissemination P.Jeffreys-RAL (0.1)

WP 12 Project Management R.Middleton-RAL (0.5)

ReplicationFragmentation

Emphasis on Low LevelServices etc

Page 25: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Testbed = Learning by ExampleTestbed = Learning by Example

+Cloning

SRIF Expansion

= expansion of open source ideas

“GRID Culture”

Page 26: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

missionmission to accelerate the exploitation of simulation by to accelerate the exploitation of simulation by industry, commerce and academia industry, commerce and academia

45 staff, £2.5M turnover - externally funded45 staff, £2.5M turnover - externally funded solve business problems - not sell technologysolve business problems - not sell technology

PartnershipImportant

Page 27: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Industrial PartnershipIndustrial Partnership

pingping

service

ping

monitor

WAN

LAN

Adoption of OPENIndustry Standards

+OO Methods

Industry ResearchCouncil Inspiration:

Data-IntensiveComputation

Page 28: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Regional CentresRegional Centres

SRIF Infrastructure

Grid Data Management

SecurityMonitoring

Networking

Local Perspective:Consolidate

Research Computing

Optimisation of Number of Nodes?4-5?

Relative size dependent on funding dynamics

Global Perspective:V. Basic Grid Skeleton

Regional Expertise Model?

Page 29: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Today’s WorldToday’s World

Istituto Trentino Di Cultura

Helsinki Institute of Physics

Science Research Council

SARA

Page 30: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

Tomorrow’s WorldTomorrow’s World

CR2

AC12

AC13

AC14

Istituto Trentino Di

CulturaHelsinki Institute of

PhysicsScience Research Council

AC7

AC8

AC9

AC10

AC11

CR3

AC15

AC16

AC17

CR4

SARA

AC18

AC19

CR5

AC20

AC21

CR6

CO

Page 31: UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid

Tony Doyle - University of Glasgow

UKUK

SummarySummary

General Engagement (£=OK)General Engagement (£=OK) Mutual Interest (Mutual Interest (ScotGRIDScotGRID

Example)Example) Emphasis on Emphasis on

DataGrid Core Development (e.g. Grid Data Management)

“CERN” lead + Unique UK Identity Extension of Open Source Idea “Grid

Culture” = Academia + Industry Multidisciplinary Approach =

University + Regional Basis Use of Existing Structures (e.g. EPCC,

RAL) Hardware Infrastructure via SRIF +

Industrial Sponsorship Now LHC

Grid Data Management

SecurityMonitoring

Networking

Detector for ALICE experiment

Detector forLHCb experiment

ScotG

RI

D