4
Nuclear Instruments and Methods in Physics Research A 502 (2003) 437–440 AliEn—ALICE environment on the GRID P. Saiz a, *, L. Aphecetche b , P. BunWi ! c a,c , R. PiskaW d , J.-E. Revsbech e , V. $ Sego d a CERN, European Organization for Nuclear Research, 1211 Gen " eve 23, Switzerland b Laboratoire de Physique Subatomique et des Technologies Associ ! es, Ecole des Mines de Nantes, Universit ! e de Nantes, 4 Rue Alfred Kastler, 44307 Nantes, Cedex 3, France c Institut f . ur Kernphysik, August-Euler-Strasse 6, 60486 Frankfurt am Main, Germany d University of Zagreb, Department of Mathematics, BijeniWka cesta 30, 10000 Zagreb, Croatia e University of Copenhagen, Niels Bohr Institute Blegdamsvej 17, 2100 Kobenhavn, Denmark For the ALICE Collaboration Abstract AliEn (http://alien.cern.ch) (ALICE Environment) is a Grid framework built on top of the latest Internet standards for information exchange and authentication (SOAP, PKI) and common Open Source components. AliEn provides a virtual file catalogue that allows transparent access to distributed datasets and a number of collaborating Web services which implement the authentication, job execution, file transport, performance monitor and event logging. In the paper we will present the architecture and components of the system. r 2003 Published by Elsevier Science B.V. PACS: 07.05.Tp Keywords: GRID; Distribute computing; File catalogue; Resource broker 1. Introduction The ALICE experiment has developed AliEn as an implementation of distributed computing infra- structure needed to simulate, reconstruct and analyze data from the experiment. Thanks to AliEn, the sites that belong to the ALICE Virtual Organisation can be seen and used as a single entity—any available node executes jobs and access to logical files and datasets is transparent to the user. In developing AliEn common stan- dards and solutions in the form of Open Source components were used. Only 1% (around 25 k physical lines of code in perl) is native AliEn code while 99% of the code has been imported in form of Open Source packages and perl modules. This allowed a fast development cycle and the pre- liminary version of the system was in production six months after the project was started. Currently ALICE is using the system for distributed produc- tion of Monte Carlo data at over 30 sites on four continents. During the last twelve months more than 16,000 jobs have been successfully run under AliEn control worldwide, totaling 25 CPU years and producing 20 TB of data. *Corresponding author. E-mail address: [email protected] (P. Saiz). 0168-9002/03/$ - see front matter r 2003 Published by Elsevier Science B.V. doi:10.1016/S0168-9002(03)00462-5

AliEn—ALICE environment on the GRID

  • Upload
    p-saiz

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AliEn—ALICE environment on the GRID

Nuclear Instruments and Methods in Physics Research A 502 (2003) 437–440

AliEn—ALICE environment on the GRID

P. Saiza,*, L. Aphecetcheb, P. BunWi!ca,c, R. PiskaWd, J.-E. Revsbeche, V. $Segod

aCERN, European Organization for Nuclear Research, 1211 Gen"eve 23, SwitzerlandbLaboratoire de Physique Subatomique et des Technologies Associ!es, Ecole des Mines de Nantes, Universit!e de Nantes, 4 Rue Alfred

Kastler, 44307 Nantes, Cedex 3, Francec Institut f .ur Kernphysik, August-Euler-Strasse 6, 60486 Frankfurt am Main, Germany

dUniversity of Zagreb, Department of Mathematics, BijeniWka cesta 30, 10000 Zagreb, CroatiaeUniversity of Copenhagen, Niels Bohr Institute Blegdamsvej 17, 2100 Kobenhavn, Denmark

For the ALICE Collaboration

Abstract

AliEn (http://alien.cern.ch) (ALICE Environment) is a Grid framework built on top of the latest Internet standards

for information exchange and authentication (SOAP, PKI) and common Open Source components. AliEn provides a

virtual file catalogue that allows transparent access to distributed datasets and a number of collaborating Web services

which implement the authentication, job execution, file transport, performance monitor and event logging. In the paper

we will present the architecture and components of the system.

r 2003 Published by Elsevier Science B.V.

PACS: 07.05.Tp

Keywords: GRID; Distribute computing; File catalogue; Resource broker

1. Introduction

The ALICE experiment has developed AliEn asan implementation of distributed computing infra-structure needed to simulate, reconstruct andanalyze data from the experiment. Thanks toAliEn, the sites that belong to the ALICE VirtualOrganisation can be seen and used as a singleentity—any available node executes jobs andaccess to logical files and datasets is transparentto the user. In developing AliEn common stan-

dards and solutions in the form of Open Sourcecomponents were used. Only 1% (around 25 kphysical lines of code in perl) is native AliEn codewhile 99% of the code has been imported in formof Open Source packages and perl modules. Thisallowed a fast development cycle and the pre-liminary version of the system was in productionsix months after the project was started. CurrentlyALICE is using the system for distributed produc-tion of Monte Carlo data at over 30 sites on fourcontinents. During the last twelve months morethan 16,000 jobs have been successfully run underAliEn control worldwide, totaling 25 CPU yearsand producing 20 TB of data.

*Corresponding author.

E-mail address: [email protected] (P. Saiz).

0168-9002/03/$ - see front matter r 2003 Published by Elsevier Science B.V.

doi:10.1016/S0168-9002(03)00462-5

Page 2: AliEn—ALICE environment on the GRID

1.1. ALICE computing model

When the experiments starts running, it willcollect data at a rate of 2 PB per year, producingmore than 109 files per year which will requiremassive processing effort for reconstruction [1]. Inaddition, during the preparation and runningphase, a large-scale simulation must be carriedout involving all available resources worldwide.While ALICE simulation task has some specificrequirements (it takes today up to 24 h to simulatethe complete detector response for one event andthe resulting output file is up to 2 GB) the overallrequirements are compatible with use cases ofother LHC experiments as described in theHEPCAL [2] document. With AliEn we aim toprovide a solution to these use cases in the contextof AliRoot, the ALICE simulation and reconstruc-tion framework [1]. AliRoot uses directly theROOT [3] framework for performance and sim-plicity reasons. ROOT provides data persistencyon a file level and a wide range of utility libraries[1]. The role of AliEn is to create a completelydistributed computing model suitable for a large-scale production.

2. AliEn components

AliEn is designed to be as modular andextensible as possible. The individual modulesand components (Fig. 1) are grouped together infunctional package groups, completely self-con-tained and with no external dependencies. Theinstallation of AliEn does not require rootprivileges. Once installed and started, the compo-nents will be configured by the ConfigurationManager on the basis of detailed configurationstored in LDAP directory and services will startcommunicating with each other exchanging XMLmessages using the SOAP [4] protocol.The file catalogue provides mapping of one or

more Physical File Names (PFN) to the LogicalFile Name (LFN). Users see only LFNs and thesystem will translate them into the closest andmost appropriate PFN depending on the clientlocation and capabilities. The catalogue offers aninterface similar to a UNIX file system, with

directories and files, privileges for owner, groupand the world, and the most common UNIXcommands. It also provides functions to registerand retrieve files or replicate them on a differentlocation. In addition, the file catalogue can storearbitrary metadata information that further de-scribes the content of the files. The file catalogue isimplemented using a relational database viaseveral interface layers (AliEn DB interface,generic perl DB interface and RDBMS specificDB driver). Each branch of file the cataloguedirectory tree can, in principle, be supported by adifferent RDBMS engine running on a differenthost.The user interacts with the system through a

User Interface layer. The interaction can beinstantiated using a C/C++/perl API or from acommand line interface, using a graphic userinterface or via a generic Web portal whichprovides a convenient way to submit, inspect and

Fig. 1. Schematic view of AliEn components.

P. Saiz et al. / Nuclear Instruments and Methods in Physics Research A 502 (2003) 437–440438

Page 3: AliEn—ALICE environment on the GRID

manipulate a large number of jobs runningconcurrently at many sites.The Package Manager manages packages con-

tributed by each Virtual Organisation. Thismechanism allows easy extension of the systemto include experiment specific software. ThePackages know about the requirements they mayhave and about the versions and dependencies toother packages. The Package Manager installsthem automatically when required and preparesthe job execution environment.The authentication in the system is implemented

using Simple Authentication and Security Layer(SASL). To maintain compatibility with otherGLOBUS/Grid projects, we have implementedGSSAPI mechanism as a perl module based onGLOBUS/GSI. This allows the use of variousauthentication mechanisms (token, password,RSA key, X509 certificates) as well as use ofGLOBUS/GSI proxy certificates. Once authenti-cated, the user is given a database token whichallows him to connect and identify himself to thedatabase engine using the Database Proxy service.The Storage Elements (SE) are responsible for

managing physical files and for providing aninterface to mass storage. They allocate space forfiles, manage file caches and take care of removingexpired files. The File Transfer Daemon (FTD)provides orderly and secure transfer of filesbetween different SE to satisfy user or servicerequests.The Computing Element (CE) service is an

interface to the local batch system (LSF, PBS,BQS, DQS, Globus, Condor). When a CE hasavailable resources to execute a job, it sends amessage to the Resource Broker (RB) serviceadvertising its own capabilities and state (name ofthe CE, software available, close SE, number offree job slots, etc.). If the RB matches thesecapabilities with requirements of any of the jobswaiting in the task queue, the job will be reservedfor execution on that particular CE and will get itsunique id and subdirectory where all its files will beaccounted. The AliEn RB uses a pull architecture(as opposed to the push mechanism traditionallyimplemented in other Grid systems based onGLOBUS Toolkit [5]). In this implementationthe RB does not need to know the status of all

resources in the system. The job description, in theform of Job Description Language (JDL) is simplystored in a database table waiting for CEs toconnect and to advertise their capabilities and onlythen the matching procedure is performed. Atexpense of a slightly worse latency this results in‘‘stateless’’ Grid which is a lot more fault tolerantand simpler to implement.

3. Future developments

AliEn has been designed to offer a stableinterface for ALICE researchers over the lifetimeof the experiment. As progress is made in thedefinition of Grid standards and interoperability,AliEn will be interfaced to emerging productsfrom both Europe and the US. Since AliEn wasconceived and developed from the beginningusing the same basic technology as outlined inthe OGSA initiative [5] it will be simple tohave AliEn services to comply with newOGSA [7] specifications as soon as they becomepublic.In its current state, AliEn, by virtue of its file

catalogue, has the knowledge of physical locationof datasets and their replicas and uses thisinformation to schedule the optimal place for jobexecution. This is sufficient for typical simulation,reconstruction and event mixing but not enough tosolve general analysis use case. The analysis,providing that it is carried out within the ROOTframework, can be handled by decomposing acomplex request into several jobs sending anintelligent PROOF [3] agent (in the form of aspecial AliEn job) to remote sites where datasetsare local, processing local datasets and sendingonly the output back to user.In separate development, AliEn will be used to

provide Grid components for MammoGRID [6], a3 yr project funded by EU in domain of healthinformatics.

4. Conclusions

The ALICE experiment has developed AliEn asan implementation of distributed computing infra-

P. Saiz et al. / Nuclear Instruments and Methods in Physics Research A 502 (2003) 437–440 439

Page 4: AliEn—ALICE environment on the GRID

structure needed to simulate, reconstruct andanalyse data from the experiment. AliEn has beendeployed on more than 30 sites and routinely usedin production over the past 12 months. The userinterface is compatible to EU DataGrid at thelevel of authentication and job description lan-guage. In perspective AliEn will be interfaced tothe mainstream Grid infrastructure in HEPand it will remain to serve as interface betweenALICE Offline framework and external Gridinfrastructure.

References

[1] R. Brun, Computing at ALICE, Nucl. Instr. and Meth. A,

these proceedings.

[2] http://lcg.web.cern.ch/LCG/SC2/RTAG4/finalre-

port.doc.

[3] R. Brun, F. Rademakers, Nucl. Instr. and Meth. A 389

(1997) 81;

http://root.cern.ch/.

[4] http://www.soaplite.com.

[5] http://www.globus.org.

[6] http://www.vitamib.org/mammogrid.

[7] http://www.globus.org/ogsa.

P. Saiz et al. / Nuclear Instruments and Methods in Physics Research A 502 (2003) 437–440440