27
Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium on Nuclear Electronics & Computing Varna, Bulgaria 15-20 September 2003

Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

Embed Size (px)

Citation preview

Page 1: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

Advanced Grid Technologies in ATLAS Data

Management Alexandre Vaniachine

Argonne National Laboratory

Invited talk at NEC’2003XIX International Symposium on Nuclear

Electronics & ComputingVarna, Bulgaria

15-20 September 2003

Page 2: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

ATLAS Software OverviewGrid technologies deployedDC1 production experience

ATLAS computing challenge

Core software domains

Data management architecture

Page 3: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

26th March 2003 LHCC ATLAS status report

ATLAS commissioning

Phase ASystem at ROD level.Systems for LVL1, DCS and DAQ.Check cable connections.Infrastructure.Some system tests.

Phase CSystems/Trigger/DAQ combined.

Phase DGlobal commissioning. Cosmic ray runs.Initial off-line software. Initial physics runs.

Phase BCalibration runs on local systems.

8/03 12/04 03/06 10/06

The discussions and the planningfor the commissioning phasesof the experiment have startedin the Collaboration at many levels

ATLAS Computing Challenge Our event size: 1-1.5 MB After on-line selection

events will be written to permanent storage at a rate of 100-200 Hz

Raw data: 1 PB/year With reconstructed and

simulated data the total is ~10 PB/year

ATLAS depends on computing as much as it depends on the trigger or the hadron calorimeter

These data start coming at the full rate at the end of 2006

Page 4: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

+ The problem of the larger and more distributed collaboration >2000 collaborators 151 institutions 34 countries

+ The decision that CERN will supply only a fraction of the computing with the rest supplied by collaborators

The RESULT of the unprecedented data sizes and the distributed nature of physicists and computing is the need for multiple advances in computing tools

Planetary Computing Model

Computing infrastructure, which was centralized in the past, now will be distributed

(For experiments the trend is reverse)

Page 5: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Software Framework: Athena

Athena features:Common code

base with Gaudi framework (LHCb)

Separation of data and algorithms

Memory management

Transient/ persistent data split

The backbone of ATLAS Computing Model data flow

Page 6: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

SoftwareFramework

GridComputing

DataManagement

My presentation will focus on advances in computing technologies integrating Grid Computing and Data Management – two core software domains providing foundation for ATLAS Software Framework

Separation of transient and persistent datain ATLAS software architecture determinesthree corecomputingdomains

Core Computing Domains Scalable solutions for

data persistency Software framework

for data processing algorithmsGrid computing

for data processing and analysis

Page 7: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Interfacing Athena to the Grid

Athena/GAUDI Application

Virtual Data, Algorithms

GRIDServices

Histograms, Monitoring Results

Job: configuration monitoring scheduling

Resource: estimation booking

GANGA: Gaudi/Athena aNd Grid Alliance

Page 8: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

ATLAS Database Architecture Described in ATLAS

Database Architecture document

Site 1

Site 3Site 2

Transport & Install

Extract & Transform

Just Extract

Transport, Transform & Install

Ready for Grid Integration

Independent on persistency technology

Page 9: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Ensuring that the ‘application’ software is independent of underlying persistency technology is one of the defining characteristics of the ATLAS software architecture (“transient/persistent” split)

Changing the persistency mechanism (e.g. Objectivity -> Root I/O) requires a change of “converter”, but of nothing else

The ‘ease’ of the baseline change demonstrates benefits of decoupling transient/persistent representations

Integrated operation of framework & data management domains demonstrated the capability of• reading the same data from different frameworks• switching between persistency technologies:

Objectivity DB & ROOT I/O persistency in ATLAS DC0ATLAS-specific temporary solution (AthenaROOT) in DC1An important milestone towards DC2 has been achieved recently:

• the LHC-wide hybrid ROOT-based persistency technology POOL for DC2 delivered in the latest ATLAS software release 7.0.0 (AthenaPOOL)

Technology Independence

Page 10: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

LHC Common Persistence Infrastructure (POOL)

During the past year a new effort emerged – the LHC-wide Computing Grid Project (LCG)

The LCG's Requirements Technical Assessment Group (RTAG) on persistence recommended a common infrastructure:an object streaming layer based upon ROOTand a relational database layer for file management and

higher-level services Based on RTAG recommendations a common development

project was launched: POOL ATLAS is committed to this effort and adopted POOL technology To be clear: the common project infrastructure that POOL will

provide is our baseline event store technology

Page 11: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

ATLAS Data Challenges In a recent world-wide

collaborative effort - Data Challenge 1 (DC1) - spanning over 56 prototype tier centers in 21 countries on four continents, ATLAS produced more than60 TB of data for physics studies

DC1 provided a testbed for integration and testing of advanced Grid computing components in a production environment

Page 12: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

DC1 Production on the Grid

A significant fraction of DC1 data produced: NorduGrid US ATLAS Grid TestbedDC1 jobs successfully tested: EDG Grid3 (US ATLAS, US CMS, LIGO, SDSS sites)

BNL

Boston U

Tier1

Prototype Tier2

Tier1

Prototype Tier2

Michigan

Testbed sites

UTA

OU

Indiana

LBL

UNM

HU

Argonne

SMU

Outreach site

Condor-G submit & VDC host

Condor-G submit & VDC host

Chimera execution site

Chimera execution site

Chimera Storage host & MAGDA Cache

Chimera Storage host & MAGDA Cache

RLS serverRLS server Chicago

MAGDA server

MAGDA server

AtlasChimeraPacman cache

AtlasChimeraPacman cache

ATLAS releases from Nordugridand CERNPacman cache

ATLAS releases from Nordugridand CERNPacman cache

Page 13: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Innovative TechnologiesSeveral novel Grid technologies were used in ATLAS

data production and data management for the first time. My presentation will describe new Grid technologies introduced in HEP production environment:Chimera Virtual Data System automating data

derivationVirtual Data Cookbook services managing templated

production recipesefficient Grid certificate authorization technologies for

virtual data access controlvirtual database services delivery for reconstruction

on Grid clusters behind closed firewalls

Page 14: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Centralized Management For efficiency of the large production tasks distributed worldwide, it is

essential to establish shared production management tools

Metadata Catalog

LFN Attribute Value

Replica Catalog

LFN PFNs[ ]

Virtual Data Catalog

derived LFNs[ ] required LFNs[ ] ^transformation

The ATLAS Metadata Catalogue AMI and the Replica Catalogue MAGDA exemplify such Grid tools deployed in DC1

To complete the data management architecturefor distributed production ATLAS prototyped Virtual Data services

Page 15: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

MAGDA Architecture

Replica Catalogue MAGDA: MAnager for Grid-based DAta

Page 16: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

AMI Architecture

Metadata Catalogue AMI: ATLAS Metadata Interface

Page 17: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Introducing Virtual Data The prevailing views in HEP Computing have been data-centric: we

need to produce the data (ASAP), with the production recipes being just some tools that were used in the process by the “production gurus”. The value of the production recipes has not been fully appreciated.

Preparation of recipes for data production requires significant efforts and encapsulates a considerable experts’ knowledge

Because the production recipes have to be fully validated their development is an iterative time-consuming process similar to the fundamental knowledge discovery

The GriPhyN project (www.griphyn.org) introduced a different perspective:

recipes are as valuable as the data If you have the recipes you may not even need the data: you can

reproduce the data ‘on-demand’

Page 18: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

VDC Architecture

Page 19: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Virtual Data in DC1 ProductionTo deliver scalable data management solution ATLAS

implemented innovative Computing Science concepts in practice: first use of Virtual Data technologies in DC1 production

Two concepts are implemented in ATLASVirtual Data System operation:Production workflow became computerized

Acyclic data dependencies tracking using GriPhyN and iVDGL software

• Providing Data Provenance Services• first use of Chimera Virtual Data system in production

Production recipes became templetizedTemplated recipes repository: Cookbook

• Providing Data Providence* Services• about a half of more than two hundred DC1 datasets were serviced

* prov·i·dence n. 1. Care or preparation in advance; foresight, The American Heritage Dictionary of the English Language

Page 20: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Acyclic Portion of DC1 Workflow Chimera Virtual Data system eliminates ‘manual’ tracking of the data dependencies between

independent production steps & enables multi-step compound data transformations on-demand

AthenaGenerators

HepMC.root

digis.zebra

atlsimatlsim pileup

digis.root Athenarecon

recon.root

QA.ntuple

geometry.zebraAthena QA

AthenaAtlfast

filtering.ntuple

geometry.root

Athenaconversion

QA.ntuple

Athena QA

Atlfast.root

Atlfastrecon

recon.root

Feedback loop introduced in ATLAS by physics validation is omitted

Page 21: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Chimera in DC1 Reconstruction Installed ATLAS releases 6.0.2+

(Pacman cache) on select US ATLAS testbed sites

2x520 partitions of DataSet 2001 (lumi10) have been reconstructed at JAZZ-cluster (Argonne), LBNL, IU and BU, BNL (test)

2x520 Chimera derivations, ~200,000 events reconstructed

Submit hosts - LBNL; others: Argonne, UC, IU

RLS-servers at the University of Chicago and BNL

Storage host and Magda cache at BNL Group-level Magda registration of

output Output transferred to BNL and

CERN/Castor

Page 22: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Uncharted OGSA Area

Interest in X509 authorization capabilities of MySQL was prompted by Doug Olson announcement to PPDG mailing list

Numerous e-mail exchanges and discussions with interested PPDG participants on grid-enabling MySQL

Grid example by Kate Keahey SC02 OGSA Tutorial

Grid Service Example:Database Service

A DBaccess Grid service will support at least two portTypes GridService

Database_PortType

Each has service data

GridService: basic introspection information, lifetime, …

DB info: database type, query languages supported, current load, …, …

GridService DB_PortType

DB info

Name, lifetime, etc.

Database services on the grid is an uncharted OGSA areaAt CHEP’03 MySQL emerged as the most popular database

Page 23: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Database Access on the Grid

Different security modelsA separate server does the grid authorization:

Spitfire (EDG WP2) – SOAP/XML text-only data transportDAI (IBM UK) – Spitfire technologies + XML binary

extensionsPerl DBI database proxy (ALICE) – SQL data transportOracle 10g (separate authorization layer)

Authorization is integrated in database server:on a higher level: GSS API (work by Richard Casella, BNL)on a lower level: certificate verification (my current work)

Page 24: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Grid-enabling MySQL

Tested MySQL X509 certificate authorization technologyvalidated with DOE, CERN and Nordugrid certificatespotential problem with host certificates issued at CERN

Developed solutions for MySQL security problemsadopted in MySQL 4.0.13

Increased MySQL AB awareness of the grid computing needs

Set up grid-enabled server prototype for ATLASused in ATLAS Data Challenge 1 production for Chimera-

based reconstruction

Page 25: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Production Experience

Collected production experience with grid security model:need to expand backward compatibility of grid proxy toolsneed to add the server purpose to grid host certificatesneed to initiate the grid proxy upon login (similar to AFS token)need for shared grid certificates

similar to privileged accounts traditionally shared in HENP computing for production, librarian, data management and database administration tasks

More information was presented atPPDG (All-hands meeting)Grid3 (production experience reported)

Page 26: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Coherent Approach

Main Server

Replica Servers

Transport & Install

Extract & Transport

Extract-Transport-Install MySQL simplified the delivery of the extract-transport-install components of ATLAS database architecture to provide database services needed for the DC1 reconstruction on sites with Grid Compute Elements behind closed firewalls (e.g., NorduGrid)

Page 27: Advanced Grid Technologies in ATLAS Data Management Alexandre Vaniachine Argonne National Laboratory Invited talk at NEC’2003 XIX International Symposium

NEC'2003 Varna, Bulgaria

Alexandre Vaniachine

Roadmap to Success

ATLAS computing is steadily progressing towards a highly functional software suite, plus a World Wide computing model

During the past year, Data Challenges have provided both an impetus and a testbed for bringing coherence to developments in all core software domains

Several advanced Grid Computing technologies were successfully tested and deployed in ATLAS Data Challenge 1 production environment