17
18 June 2002 V. Vagnoni BEAUTY 20 02 1 LHCb Distributed Computing and the Grid V. Vagnoni (INFN Bologna) D. Galli, U. Marconi, V. Vagnoni INFN Bologna N. Brook Bristol K. Harrison Cambridge E. Van Herwijnen, J. Closier, P. Mato CERN A. Khan Edinburgh A. Tsaregorodtsev Marseille H. Bulten, S. Klous Nikhef F. Harris, I. McArthur, A. Soroko Oxford G. N. Patrick, G. Kuznetsov RAL

LHCb Distributed Computing and the Grid V. Vagnoni (INFN Bologna)

Embed Size (px)

DESCRIPTION

LHCb Distributed Computing and the Grid V. Vagnoni (INFN Bologna). D. Galli, U. Marconi, V. Vagnoni INFN Bologna N. Brook Bristol K. Harrison Cambridge E. Van Herwijnen, J. Closier, P. Mato CERN A. Khan Edinburgh A. Tsaregorodtsev Marseille - PowerPoint PPT Presentation

Citation preview

18 June 2002 V. Vagnoni BEAUTY 2002

1

LHCb Distributed Computingand the Grid

V. Vagnoni (INFN Bologna)

D. Galli, U. Marconi, V. Vagnoni INFN BolognaN. Brook BristolK. Harrison CambridgeE. Van Herwijnen, J. Closier, P. Mato CERNA. Khan EdinburghA. Tsaregorodtsev MarseilleH. Bulten, S. Klous NikhefF. Harris, I. McArthur, A. Soroko OxfordG. N. Patrick, G. Kuznetsov RAL

18 June 2002 V. Vagnoni BEAUTY 2002

2

Overview of presentation

• Current organisation of LHCb distributed computing

• The Bologna Beowulf cluster and its performance in distributed environment

• Current use of Globus and EDG middleware

• Planning for data challenge and the use of Grid

• Current LHCb Grid/applications R/D

• Conclusions

18 June 2002 V. Vagnoni BEAUTY 2002

3

History of distributed MC production

• Distributed System has been running for 3+ years & processed many millions of events for LHCb design.

• Main production sites:– CERN, Bologna, Liverpool, Lyon, NIKHEF & RAL

• Globus already used for job submission to RAL and Lyon

• System interfaced to GRID and demonstrated at EU-DG Review and NeSC/UK Opening.

• For 2002 Data Challenges, adding new institutes:– Bristol, Cambridge, Oxford, ScotGrid

• In 2003, add – Barcelona, Moscow, Germany, Switzerland &

Poland.

18 June 2002 V. Vagnoni BEAUTY 2002

4

Updatebookkeepingdatabase

Transferdata tomass store

Data quality check

Submit jobs remotelyvia Web

Executeon farm

Analysis

LOGICALFLOW

18 June 2002 V. Vagnoni BEAUTY 2002

5

Monitoring and Control of MC jobs

• LHCb has adopted PVSS II as prototype control and monitoring system for MC production.

– PVSS is a commercial SCADA (Supervisory Control And Data Acquisition) product developed by ETM.

– Adopted as Control framework for LHC Joint Controls Project (JCOP).

– Available for Linux and Windows platforms.

18 June 2002 V. Vagnoni BEAUTY 2002

6

18 June 2002 V. Vagnoni BEAUTY 2002

7

Example of LHCb computing facility:

the Bologna Beowulf cluster• Set up at INFN-CNAF

– ˜100 CPUs hosted in Dual Processor machines (ranging from 866 MHz to 1.2 GHz PIII), 512 MB RAM

– 2 Network Attached Storage systems• 1 TB in RAID5, with 14 IDE disks + hot spare• 1 TB in RAID5, with 7 SCSI disks + hot spare

• Linux disk-less processing nodes with OS centralized on a file server (root file-system mounted over NFS)

• Usage of private network IP addresses and Ethernet VLAN– High level of network isolation– Access to external services (afs, mccontrol, bookkeeping db,

java servlets of various kinds, …) provided by means of NAT mechanism on a GW node

18 June 2002 V. Vagnoni BEAUTY 2002

8

Farm Configuration

NAS

Red Hat 7.2 (kernel 2.4.18)DNSNAT (IP masquerading)

Disk-less nodeCERN Red Hat 6.1Kernel 2.2.18PBS MasterMC control serverFarm Monitoring

Gateway

Fast Ethernet Switch

NAS

Power Distributor

EthernetLink

Power Control

Control Node

Processing Node 1

Processing Node n

Red Hat 7.2

Various services:Home directories

PXE remote boot,DHCP, NIS

1TB RAID 5 1TB RAID 5

Uplink

Mirrored disks (RAID 1)

Mirrored disks (RAID 1)

PublicVLAN

PrivateVLAN

Disk-less nodesCERN Red Hat 6.1Kernel 2.2.18PBS Slave

OS file-systemsMaster Server

18 June 2002 V. Vagnoni BEAUTY 2002

9

Fast ethernet switch

NAS 1TB

Ethernet controlled power distributor

Rack (1U dual-processor MB)

18 June 2002 V. Vagnoni BEAUTY 2002

10

Farm performance• Farm capable to simulate and reconstruct

about (700 LHCb-events/day)*(100 CPUs)=70000 LHCb-events/day

• Data transfer over the WAN to the CASTOR tape library at CERN realised by using bbftp– very good throughput (up to 70 Mbits/s over

currently available 100 Mbits/s)

18 June 2002 V. Vagnoni BEAUTY 2002

11

Current Use of Grid Middleware in development

system• Authentication

– grid-proxy-init• Job submission to DataGrid

– dg-job-submit• Monitoring and control

– dg-job-status– dg-job-cancel– dg-job-get-output

• Data publication and replication– globus-url-copy, GDMP

18 June 2002 V. Vagnoni BEAUTY 2002

12

Example 1:Job Submission

dg-job-submit /home/evh/sicb/sicb/bbincl1600061.jdl -o /home/evh/logsub/

bbincl1600061.jdl:#

Executable = "script_prod";

Arguments = "1600061,v235r4dst,v233r2";

StdOutput = "file1600061.output";

StdError = "file1600061.err";

InputSandbox = {"/home/evhtbed/scripts/x509up_u149","/home/evhtbed/sicb/mcsend","/home/evhtbed/sicb/fsize","/home/evhtbed/sicb/cdispose.class","/home/evhtbed/v235r4dst.tar.gz","/home/evhtbed/sicb/sicb/bbincl1600061.sh","/home/evhtbed/script_prod","/home/evhtbed/sicb/sicb1600061.dat","/home/evhtbed/sicb/sicb1600062.dat","/home/evhtbed/sicb/sicb1600063.dat","/home/evhtbed/v233r2.tar.gz"};

OutputSandbox = {"job1600061.txt","D1600063","file1600061.output","file1600061.err","job1600062.txt","job1600063.txt"};

18 June 2002 V. Vagnoni BEAUTY 2002

13

Job

Local disk

Compute Element

globus-url-copy

ReplicaCatalogue

NIKHEF - Amsterdam

CERN TESTBED

REST-OF-GRID

JobStorage Element

replica-get

publish

register-local-file

Storage Element

MSS

Data Data

Data

Example 2: Data Publishing & Replication

18 June 2002 V. Vagnoni BEAUTY 2002

14

LHCb Data Challenge 1 (July-September 2002)

• Physics Data Challenge (PDC) for detector, physics and trigger evaluations– based on existing MC production system – small

amount of Grid tech to start with

– Generate ~3*10**7 events (signal + specific background + generic b and c + min bias)

• Computing Data Challenge (CDC) for checking developing software– will make more extensive use of Grid middleware

• Components will be incorporated into PDC once proven in CDC

18 June 2002 V. Vagnoni BEAUTY 2002

15

GANGA: Gaudi ANd Grid Alliance

Joint Atlas (C. Tull) and LHCb (P. Mato) project,formally supported by GridPP/UK with 2 joint

Atlas/LHCb research posts at Cambridge and Oxford

GAUDI Program

GANGAGU

I

JobOptionsAlgorithms

Collective&

ResourceGrid

Services

HistogramsMonitoringResults

• Application facilitating end-user physicists and production managers the use of Grid services for running Gaudi/Athena jobs.

• a GUI based application that should help for the complete job life-time:- job preparation and

configuration- resource booking- job submission- job monitoring and control

18 June 2002 V. Vagnoni BEAUTY 2002

16

Required functionality

• Before Gaudi/Athena program starts– Security (obtaining certificates and credentials)– Job configuration (algorithm configuration, input data

selection, ...)– Resource booking and policy checking (CPU, storage,

network)– Installation of required software components– Job preparation and submission

• While Gaudi/Athena program is running:– Job monitoring (generic and specific)– Job control (suspend, abort, ...)

• After program has finished:– Data management (registration)

18 June 2002 V. Vagnoni BEAUTY 2002

17

Conclusions• LHCb already has distributed MC production using GRID

facilities for job submission• We are embarking on large scale data challenges

commencing July 2002, and we are developing our analysis model

• Grid middleware will be being progressively integrated into our production environment as it matures (starting with EDG, and looking forward to GLUE)

• R/D projects are in place – for interfacing users (production + analysis) and Gaudi/Athena

software framework to Grid services – for putting production system into integrated Grid environment

with monitoring and control• All work being conducted in close participation with EDG and

LCG projects– Ongoing evaluations of EDG middleware with physics jobs– Participate in LCG working groups e.g. Report on ‘Common use

cases for a HEP Common Application layer’ http://cern.ch/fca/HEPCAL.doc