24
annual report 02 CERN openlab for DataGrid applications

CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

  • Upload
    haquynh

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

annual report

02

CERNopenlab for DataGrid applications

Page 2: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

CERN openlab for DataGrid applications

Table of Contents

A word from the Head of CERN openlab 1

THE CONTEXT

CERN and the LHC: probing the Universe 2LHC and Grid computing: a grand challenge 3The LHC Computing Grid (LCG) 4The European Data Grid Project (EDG) 5Enabling Grids for E-science in Europe (EGEE) 6Other EU-funded Grid projects at CERN 7

THE STATUS

CERN openlab: off to a fine start 8Partner profiles 10Organisation and human resources 12The CERN openlab facilities 13

CERN opencluster: pushing the limits 14Porting and benchmarking activities 15CERN openlab data challenges 16The 10Gbps challenge 16The 1GBps storage-to-tape challenge 17

CERN openlab dissemination: spreading the news 18Publications and Conference Contributions 18Press Releases and Non-technical articles 18Thematic workshops and Website 18Public events and VIP presentations 19Student activities and outreach activities 19

THE FUTURE

The technological horizon 20New project opportunities 20

Second Annual Report of the CERN openlab for DataGrid ApplicationsEditor: François GreyDesign and layout: Fabienne MarcastelPhotography: CERN photo serviceProduction: ETT Division CERN

ISBN 92-9083-212-6

© Copyright 2003, CERN

Page 3: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

annual report 2002

1

A word from the Head of CERN openlab

The CERN openlabfor DataGrid appli-

cations is nowwell on the wayto fulfilling theambitions of itsfounders. Thegoal was tocreate a newtype of part-nership be-

tween CERN andindustry, which

could tackle thechallenges of Grid

computing that CERN isfacing in the coming

decade. The results describedin this second annual report testify to a

solid activity that is already producing record-breakingachievements, and that is far from exhausting itspotential for growth.

Since the first annual meeting in March 2002, much hashappened. At that time, the original partners and CERNwere still reeling from the aftermath of the telecomscrash and severe budgetary constraints. However, newlife was injected in the partnership when HP joined inSeptember 2002. Close collaboration between HP andthe existing partners, Intel and Enterasys Networks ledto the CERN opencluster project. This was a significantbreakthrough, as it created a spirit of collaborationbetween partners which was precisely what CERNopenlab was meant to achieve. At about the same time,CERN also boosted its efforts substantially, creating adedicated and talented team to move the CERN open-lab agenda forward.

In December 2002, Manuel Delfino, head of the CERNopenlab and CERN IT Division Leader, stepped down toresume his academic duties in Spain, where he is nowplaying a leading role in national Grid activities. I tookover his position officially from January 1st 2003, hav-ing been Deputy Division Leader prior to that. I wouldlike to take this opportunity to express my sincerestappreciation to Manuel Delfino, on behalf of all part-ners, for having had the vision to help foster CERNopenlab, and the determination to see it through achallenging first year.

What does the future hold for CERN openlab? WithIBM joining in April 2003, the partnership will extendthe CERN opencluster into the area of advanced stor-age – which is crucial for the CERN’s DataGrid ambi-tions. And beyond that, the increasing emphasis onGrid-like solutions throughout the IT industry meansthat there are many opportunities for extending thescope of the partnership – I look forward to the excit-ing times ahead for CERN openlab!

Finally, I would like to thank all the industrial partnersfor their generous sponsorship and unwavering sup-port of CERN openlab. Your commitment is the key toour common success.

Wolfgang von RüdenHead of the CERN openlab for DataGrid applications

Page 4: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

CERN openlab for DataGrid applications

2

THE CONTEXT

CERN and the LHC: probing the Universe

CERN is the European Organization for NuclearResearch, the world's largest particle physics centre.Founded in 1954, the laboratory was one of Europe'sfirst joint ventures and includes today 20 MemberStates. Here, some 6000 physicists from universitiesand laboratories around the world come toexplore what matter is made of and what forceshold it together.

CERN exists primarily to provide the physicistswith the necessary tools. These are accelera-tors, which accelerate particles to almost thespeed of light and detectors to make theparticles visible.

Building the biggest ring

The world's largest and most powerful parti-cle accelerator, the Large Hadron Collider(LHC), is now being constructed at CERN, in a27km circular tunnel. It will ultimately collidebeams of protons at an energy of 14 TeV. Beamsof lead nuclei will be also accelerated, smashingtogether with a collision energy of 1150 TeV. Therewill be four detectors on the LHC ring, each a massiveinstrument several stories high, and each optimised forprobing different aspects of the high-energy collisions.

The accelerator is planned to start operation in 2007. Itwill be used to answer some of the most fundamentalquestions of science.

– What is the origin of the different masses offundamental particles?

– What is “dark matter” made of? – Can the electroweak and strong forces be

unified? – What is the origin of the asymmetry between

matter and antimatter in the universe today?

The LHC aims to provide answers to these questions,and in the process change our view of the Universe.

Page 5: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

annual report 2002

3

LHC and Grid computing: a grand challenge

Grid computing is the computer buzzword of thedecade. Not since the World Wide Web was developedat CERN, over ten years ago, has a new networkingtechnology held so much promise for both science andsociety. Once again, CERN is set to play a leading role inmaking the technology a reality.

The philosophy of the Grid is to provide vast amountsof computer power at the click of a mouse, by linkinggeographically distributed computers and developing“middleware" to run the computers as though theywere a monolithic resource. Whereas the Web givesaccess to distributed information, the Grid does thesame for distributed processing power and storagecapacity.

There are many varieties of Grid technology. In thecommercial arena, Grids that harness the combinedpower of many workstations within a single organisa-tion are already common. But CERN’s objective is alto-gether more ambitious: storing petabytes of data fromthe LHC in a distributed fashion, and making it easilyaccessible to thousands of scientists around the world.This requires much more than just spare PC capacity – anetwork of major computer centres around the worldmust provide their resources in a seamless way.

Petabyte performance

A single particle collision, or “event”, can generate amegabyte of data in the detectors, in the form of tracesthrough the detector of the paths of particles createdby the collision. In a typical experiment at the LHC,there will be some 40 million collisions per second.After on-line filtering, this huge data flow will bereduced to some 100 events of interest per second. Thismeans a recording rate of up to one gigabyte/sec.Thus, over a year – and taking account the need forbackup data – the total data stored from all four detec-tors on the LHC is expected to reach over 10 Petabytes.Since the lifetime of the experiment is expected to beat least 10 years, and researchers will need to analysethe totality of the data with increasingly sophisticatedprograms, the data challenge will grow with time.

Among all measured tracks, the presence of “specialshapes” is the sign for the occurrence of interestinginteractions. The figure below shows a simulation ofraw data from the LHC. Buried in the raw data aretracks that reveal an elusive particle known as theHiggs boson. Events such as this are expected to beextremely rare, happening in only 1 in 1013 collision, sothe analysis algorithms must be very sophisticated. Inparallel with data analysis, simulation of events plays akey role: starting from theory and detector characteris-tics, physicists will compute what the detector shouldregister. Only by building up a similar quantity of simu-lated data can theory and experiment be compared ona statistical basis.

Simulation of a collision producingthe Higgs boson, a particle predictedtheoretically but that has so far elud-ed experiment.

Page 6: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

CERN openlab for DataGrid applications

4

The LHC Computing Grid (LCG)

The goal of the LCG project is to deploy a worldwidegrid solution to meet the unprecedented computingneeds of this virtual organisation.

The year 2002 was a turning point for the constructionof a computing Grid for the LHC. The LHC ComputingGrid (LCG) project was officially launched, with a mis-sion to integrate thousands of computers at dozens ofparticipating centres worldwide into a global comput-ing resource. This technological tour-de-force will relyon software being developed in the European DataGrid(EDG) project, led by CERN, which is the largest Griddevelopment project ever funded by the EU. It will ben-efit from hardware developments initiated in the CERNopenlab.

Gearing up

In 2002, LCG began rapidly gearing up for this chal-lenge, with over 50 computer scientists and engineersfrom partner research centres around the world joiningover the year. The focus in 2002 was on defining thestringent data storage and processing requirements ofthe experiments. For example, a key technical require-ment is to ensure data "persistency", which is how theGrid maintains data availability at all times, even as theGrid environment evolves. A first version of the LCG,called LCG-1, is scheduled to start running on a restrict-ed number of sites in July 2003.

Deployment of LCG-1.Red squares: Main computer centresinvolved in the launch of LCG-1 inJuly 2003. The first production Gridwill involve Fermilab, Brookhaven,RAL, IN2P3, CERN, INFN, FZ Karlsruhe,Moscow State Uni., Academica Sinicaand Tokyo Uni. Blue squares indicateother centres expected to join LCG-1,during the course of the first 18months of operation.

Page 7: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

annual report 2002

5

The European Data Grid Project (EDG)

The European DataGrid is a three year project fundedby the European Union that aims to enable access togeographically distributed computing power and stor-age facilities belonging to different institutions. Thiswill provide the necessary resources to process hugeamounts of data coming from scientific experiments inthree different disciplines: High Energy Physics, Biologyand Earth Observation.

Testbed success

EDG builds on a software toolkit for Grid technologyknown as Globus, developed in the US, as well as othersoftware packages, and uses these to build a function-ing Grid testbed. The project brings together over 100computer engineers, who have generated some 250 Klines of code already. In 2002, EDG middleware man-aged to connect computing resources at some 20 majorcentres. In collaboration with the LHC experiments CMSand ATLAS, a number of demanding computationalchallenges were successfully processed. This provedthat many components of the EDG software are readyfor use in the LCG project.

Page 8: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

CERN openlab for DataGrid applications

6

Enabling Grids for E-Science in Europe (EGEE)

The success of EDG has generated strong support for afollow-up effort to build a permanent European Gridinfrastructure that can serve a broad spectrum of appli-cations reliably and continuously. Providing such a serv-ice will require a much larger effort than setting up thecurrent testbed. So CERN has established a pan-European consortium, EGEE (Enabling Grids for E-sci-ence in Europe) to build a production Grid infrastruc-ture, providing round the clock Grid service to scientiststhroughout Europe. A proposal for such a project hasbeen submitted to the EU 6th Framework Programmein 2003.

The big picture

As illustrated below, the potential benefits of such aninfrastructure for Europe will extend well beyond theLHC, to nearly all areas of science and commerce wherelarge amounts of data must be managed in a distrib-uted way. However, the benefits of Grid technology arenot just regional but global. Reflecting this, CERN hasbeen playing an active role in the Global Grid Forum,which helps to set worldwide standards in this rapidlyevolving field.

Schema of the evolution of theEuropean Grid infrastructure fromtwo pilot applications in high energyphysics and biomedical Grids, to aninfrastructure serving multiple scien-tific and technological communities,with enormous computer resources.The applications and resource fig-ures are purely illustrative. The EGEEproject covers Year 1 and 2 of aplanned four year programme.

Page 9: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

annual report 2002

7

Other EU-funded Grid projects at CERN

The year 2002 also saw the launch of several other EU-funded Grid projects in which CERN plays a significantrole. CERN is leading DataTAG, which aims to providehigh-speed connections between Grids in Europe andthe US.

In 2003, DataTAG set a significant land record, transfer-ring a Terabyte of data from the Stanford LinearAccelerator on the West Coast of the United States toCERN in just over an hour, a factor of 2.5x faster thanprevious land records.

Project portfolio

CERN is also collaborating in CrossGrid, a project thataims to extend EDG functionality to advanced applica-tions such as real-time simulations. The GRACE projectdevelops a higher-layer of software for Grids, includingconcepts such as semantic Grids, which provide contex-tual meaning to information stored on the Grid. CERNis also a partner in MammoGrid, a project dedicated tobuilding a Grid for hospitals to share and analyse mam-mograms in an effort to improve breast cancer treat-ment. Finally, GridStart aims to coordinate the effortsof the major Grid initiatives in Europe, including EDG,and disseminate information about the benefits of Gridtechnology to industry and society.

Page 10: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

CERN openlab for DataGrid applications

8

THE STATUS

CERN openlab: off to a fine start

The CERN openlab was conceived to allowCERN and its industrial partners to peerinto the technological crystal ball andtest technologies that may well becommercially competitive when theLHC is up and running. The indus-trial partners view this as a greatopportunity to develop and testnew technologies, which are stillfar from the commodity comput-ing market, under the rigorousand demanding conditions thatCERN's advanced computing envi-ronment provides. CERN gets the ben-efit of advanced knowledge of emerg-ing technologies, which could significantlyaffect the roadmap for the LCG project.

A theme of the CERN openlab, as the name suggests, isto pursue open standards and open source solutions.This is of vital importance to LCG, where interoperabil-ity between different hardware and software platformsis a necessity for the Grid to work efficiently. And openstandards are increasingly viewed as a necessity bymany industry leaders, in order to be able to deliverWeb services, computing-on-demand, and other Grid-related solutions to customers. Thus, for the industrialpartners, discovering how their different technologiesperform together, and adapting open source softwareto these technologies, are added benefits of the part-nership.

Cluster creation

The first project launched by CERN openlab is calledCERN opencluster, and aims to build a cluster based ontechnologies that are at the cutting edge of what isavailable on the market today. In particular, the CERNopencluster will be linked to the EDG testbed, to seehow these new technologies perform in a Grid envi-ronment. The results are also be closely monitored bythe LCG project, to determine how the potential impactof the technologies involved.

Schematic illustration of the relatio-ship between CERN openlab andother major Grid projects led byCERN.

Marc Collignon and Jean-MichelJouanigot with the EnterasysNetworks equipment.

Page 11: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

annual report 2002

9

One node in the CERN opencluster.

The CERN openlab provides the LHC with a new andvital source of industrial sponsorship for long-termtechnology development and integration. Membershipof the CERN openlab is set at 2.4 MCHF over a threeyear period. The equipment for the CERN opencluster,as well as the contribution of top engineers to developit, is provided by the industrial partners as part of CERNopenlab membership requirements. The concept hasproved very popular, with other major computer andsoftware manufacturers eager to join.

Beyond using CERN as a testbed for cutting edge Gridsoftware and hardware, and building an industry con-sortium for Grid-related technologies of common inter-est, a further goal of CERN openlab is to provide atraining ground for a new generation of engineers tolearn about Grid technology. Already, HP is co-fundingtwo CERN fellows (post-doctoral level researchers) atrend that will hopefully be adopted by other partners.

Page 12: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

CERN openlab for DataGrid applications

10

In September 2002, HP joined Intel and EnterasysNetworks in the CERN openlab for DataGrid applica-tions. Together with CERN, these companies launchedthe ambitious project called CERN opencluster, whichcombines 64-bit processor technology and 10Gb/s net-work cards from Intel, compute servers in a cluster fromHP, and a 10 gigabit switching environment fromEnterasys Networks. In April 2003, IBM joined theCERN openlab, and brought 28TB of high-endstorage to the project, as well as its StorageTank data management technology.

CERN Rationale

The choice of the industrial part-ners of CERN openlab reflectsCERN’s objective to work withindustrial leaders in all aspectsof the Grid computing chal-lenge. 64-bit technology ison the technological hori-zon, so testing Intel®Itanium® 2 processors wasa clear must. Intel and HPare already actively collab-orating on Itanium 2processor-based clustersolutions, so CERN is bene-fiting from the fruits of thepre-existing alliance. CERN isalso part of the GelatoFederation – an open sourceactivity for 64-bit technologypromoted by HP. EnterasysNetworks has traditionally com-peted at the high-end of theswitching and routing technology,bringing to the partnership the cut-ting edge in these technologies. IBM’scommitment to computing-on-demandmade it a natural partner for a Grid project,and Storage Tank is viewed as a data manage-ment system that can rise to the challenge of han-dling the petabytes of data CERN will produce.

Partner profiles

Page 13: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

annual report 2002

11

Enterasys Networks

is a leading worldwide provider of broadline intelligentdata networking infrastructures for enterprise-classcustomers. Enterasys’ networking hardware and soft-ware offerings deliver the innovative security, availabil-ity and mobility solutions required by Global 2000organizations coupled with the industry’s strongestservice and support. For more information onEnterasys and its products, visit www.enterasys.com.

“The aggregate data throughput for LHC will exceedone terabit per second. Enterasys is confident that its10-Gigabit Ethernet Technology will enable CERN tounlock the full potential of its DataGrid.” John Roese,Chief Technology Officer for Enterasys Networks.

HP

is a leading global provider of products, technologies,solutions and services to consumers and businesses. Thecompany's offerings span IT infrastructure, personalcomputing and access devices, global services and imag-ing and printing. HP completed its merger transactioninvolving Compaq Computer Corp. on May 3, 2002.More information about HP is available atwww.hp.com.

“Through this collaboration with the CERN DataGrid,HP’s researchers and engineers will be put to the test totruly push the envelope in developing advanced Gridcomputing technologies.” Jim Duley, Director forTechnology Programs, HP University Relations.

IBM

is the world's largest information technology company,with 80 years of leadership in helping businesses inno-vate. Drawing on resources from across IBM and keyBusiness Partners, IBM offers a wide range of services,solutions and technologies that enable customers, largeand small, to take full advantage of the new era of e-business. Additional information about IBM is availableat www.ibm.com/us/

“This is the perfect environment for us to enhance ourStorage Tank technology to meet the demandingrequirements of large scale Grid computing systems.”Jai Menon, IBM Fellow and co-director of IBM’s StorageSystems Institute

Intel

the world's largest chip maker, is also a leading manu-facturer of computer, networking and communicationsproducts. Additional information about Intel is avail-able at www.intel.com/pressroom.

“CERN’s DataGrid project is an ideal application forIntel’s most powerful processor yet, the Intel®Itanium® 2 processor. The awesome computer powerrequired will find a formidable engine in the Itanium.”Steve Chase, Director, Business and CommunicationSolution Group of Intel.

Page 14: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

CERN openlabManagement Unit

CERN openlabTechnical Team

CERN Liaison withIndustry Partner

CERN openlab Board of Sponsors

Industry PartnerTechnical Liaison

with CERN

Industry PartnerOrganizational

Liaison with CERN

CERN Industry Partners

Liaison with GridProjects

EDGEDG

LCGLCG

CERN openlab for DataGrid applications

12

CERN openlab now represents a significant investmentin manpower for the IT Division, with seven technicalstaff working on the CERN opencluster project at CERN,and significant technical contributions from technicalliaisons in the partner companies.

In September 2002, an openlab Management Unit wascreated, and as of April 2003, two HP-funded CERN fel-lows (postdoctoral level researchers) joined this team.In addition, there is active participation from CERNliaisons to the main Grid projects, LCG and EDG, as wellas CERN liaisons to the individual partner companies.

CERN openlab Board of Sponsors

Luciano Maiani [head of Board] CERN Director General

Wolfgang von Rüden Head of the CERN openlab

John Roese Enterasys NetworksJim Duley HPTom Hawk IBMTom Gibbs Intel

CERN openlab Management Unit

Wolfgang von Rüden Head of the CERN openlab

François Fluckiger Associate HeadFrançois Grey Development and

Communication Sverre Jarp Chief Technologist

Organisation and human resources

CERN openlab Technical Team

Sverre Jarp Chief TechnologistAndreas Hirstius OpenclusterAndreas Unterkircher OpenclusterJean-Michel Jouanigot Networking Marc Collignon Networking Rainer Többicke Storage Jean-Damien Durand Storage

Industry Partner Technical Liaison with CERNMarkus Nispel Enterasys NetworksJohn Manley HPBrian Carpenter & Jai Menon IBMHerbert Cornelius Intel

Industry Partner Organizational Liaison with CERN

Christoph Kälin Enterasys NetworksMichel Benard HPPasquale Di Cesare IBMPierre Mirjolet Intel

CERN Technical Liaison with Industry Partners

Sverre Jarp HP and IntelJean-Michel Jouanigot Enterasys NetworksRainer Többicke IBM

CERN Grid Project Liaison

David Foster LCGFabrizio Gagliardi EDG

Page 15: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

annual report 2002

13

The CERN opencluster area is a central area of theComputer Centre that has been reserved for the CERNopencluster equipment, which benefits from optimumvisibility from the visitors gallery.

The openlab openspace is a new VIP meeting roomwhich has been built with access directly on theComputer Centre. The aim is that CERN and industrialpartners should be able to use it for key technical andpromotional meetings. It provides professional, high-tech surroundings for meetings, with informationabout CERN openlab and the industrial partners in evi-dence.

Offices for the CERN openlab team have been estab-lished in building 513, just along the corridor from theopenlab openspace. The technical team and most ofthe openlab management unit is installed therealready.

The CERN openlab facilities

Meeting in the openlab openspace.

Page 16: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

CERN openlab for DataGrid applications

14

CERN opencluster:

pushing the limits

The major focus of the technical work in CERN openlabover the past year has been to get the CERN openclus-ter up and running, an objective which has been suc-cessfully achieved.

The opencluster consists of thirty-two Linux-basedHewlett Packard rx2600 rack-mounted serversequipped each with two 1 GHz Intel® Itanium® 2processors. In addition, seven development systems,plus two which have been assigned to the experimentAlice, were installed in 2002. Three desktop worksta-tions have been delivered, and are used by the open-cluster technical team for their development work.

The latest equipment

A number of 10-GbE Network Interface Cards (NICs)have been delivered by Intel and installed on the HPrx2600 computers. These allow high-speed transfertests to be carried out, which are described in moredetail below.

Enterasys Networks has delivered a switch equipped tooperate at 10 Gbps and with additional port capacityfor 1 Gbps, provisionally on loan. This comes in additionto two switches delivered in 2002. In summary, thereare 3 switches, 20 10-GbE ports and 76 1-GbE ports.

After IBM joined the CERN openlab in April 2003, IBMand CERN established procedures to support the team-work between the two partners, and a detailed sched-ule for the next 6 months work. IBM is sponsoring 28TBof high-end storage, which was delivered and installedduring the second quarter of 2003.

The opencluster nodes were installed using the fullyautomatic kick-start procedure which is also in use forCERN’s production-oriented PCs. In addition to theLinux operating system, the nodes were also equippedwith openAFS, the LSF batch system, and three sets ofcompilers (Intel, GNU, and the Open ResearchCompiler)

Thanks to the preinstalled management card in eachnode, automation was further developed to allowremote system restart and remote power control. Thisdevelopment confirmed the notion that - for a modesthardware investment - large clusters can be controlledremotely. This result is highly relevant to the LHCComputing Grid, which will need to deploy suchautomation at a large scale.

Interface to CERN opencluster forremote management. Nodes that areunavailable are colour-coded.

Andreas Hirstius with the CERNopencluster.

Page 17: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

annual report 2002

15

Benchmarking activities

Major physics packages (CLHEP, GEANT4, ROOT) havebeen successfully ported and tested on the CERN open-cluster. This work was done together with the authorsand maintainers of the various packages in collabora-tion with the newly formed software group in the EPDivision of CERN. The CERN data management pack-age, CASTOR, was also ported and certified.Benchmarking of the physics packages has begun andfirst results are promising.

For example, PROOF (Parallel ROOT Facility) is a versionof the popular CERN-developed software ROOT, anobject-oriented data analysis framework. It is beingdeveloped for interactive analysis of very large ROOTdata files on a cluster of computers. The scalability ofPROOF has been tested using the CERN opencluster,and proves to be linear up to this cluster size. The ROOTteam at CERN is very interested to pursue tests on evenlarger cluster sizes (see figure below).

Getting better all the time

A compiler review was performed together with Intel,detailing both short term and medium term develop-ment opportunities on the Intel® Itanium® 2 processor.This close collaboration between the CERN openlabtechnical team and Intel staff should lead to significantadvancements of compiler performance – illustratingnicely the sort of technical benefits that industrial part-ners in the CERN openlab enjoy.

Results for scalability of PROOF, theParallel Root Facility. Each node ofthe cluster has one copy of the dataset (4 files, total of 277 MB), 32nodes. In total, this amounts to 8.8Gbyte in 128 files, equivalent to 9million events. On one node, thetotal processing time was 325 s, on32 nodes in parallel it was 12 s.

Rainer Többicke with the IBM stor-age system.

Page 18: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

CERN openlab for DataGrid applications

16

CERN openlab data challenges

The CERN openlab technical team has defined the con-cept of CERN openlab data challenges – tangible objec-tives that may be jointly targeted by some or all of thesponsors.

A major openlab challenge is the 10 Gigabit challenge:a common effort by Enterasys Networks, HP, Intel andCERN to achieve data transfer rates within the open-cluster of 10Gbps. This involves multiple technologicalissues such as network switch tuning, Linux Kernel andTCP/IP parameter tuning.

A dedicated meeting took place at CERN in March 2003with technical representatives from all three industrialsponsors, to chart out a path for this challenge. A joint10Gb Operational Group was set up as a result of themeeting at CERN. Minuted operational meetings of thegroup are organized on a bi-weekly basis via phone-conferencing. Initial results are described below.

First results of the 10Gbps challenge

One of the major challenges of the opencluster projectis to take maximum advantage of the partners’ 10 Gbpstechnology. In April 2003, a first series of tests was con-ducted between two Linux-based Hewlett Packardrx2600 computers equipped each with two 1 GHz

Intel® Itanium® 2 processors. The two computers weredirectly connected ("back-to-back" connection)through 10 GbE Network Interface Cards (NICs) fromIntel (see recent news). The transfer reached a data rateof 755 MBps, that is, approximately 3/4 of the maxi-mum attainable bit rate of the interfaces.

The transfer took place over a 10 km fibre. It used verybig frames (16 KB) in a single stream, as well as the reg-ular suite of Linux Kernel protocols (TCP/IP). Whenusing “jumbo frames” (9 KB), the same test reached 693MBps.

The best results were obtained when aggregating the1 Gbit bi-directional traffic involving the 10 nodes ineach group. The peak traffic between the switches wasthen measured to be 8.2 Gbps.

The next stages of the test campaign will include con-tinued tuning of the parameters, and evaluating thenext version of processors.

Schema of the setup was used fortesting the Enterasys switches. Thicklines indicate 10 Gbps connections,thin ones are 1 Gbps.

Page 19: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

annual report 2002

17

Illustration of the setup used for the1GBps storage to tape challenge.Simulated data from CPUs is storedtemporarily on disk servers, thentransferred to the 45 tape serversthrough the Enterasys switched net-work. Once the LHC is running, a sim-ilar configuration will be used, withdata coming from the experiments,and being copied to the Grid as wellas stored on tape (bubbles indicat-ed).

The GBps storage-to-tape challenge

In May, CERN announced the successful completion of amajor data challenge aimed at pushing the limits ofdata storage to tape. While this was not a CERN open-lab data challenge per se, this data challenge involvedin a critical way several components of the CERN open-cluster.

Using 45 newly installed StorageTek 9940B tape drives,capable of writing to tape at 30 MBps, Bernd Panzerand his team at the IT Division of CERN were able toachieve storage-to-tape rates of 1.1 GBps for periods ofseveral hours, with peaks of 1.2 GBps – roughly equiva-lent to storing a whole movie on DVD every four sec-onds. The average sustained over a three day periodwas of 920 MBps. Previous best results by otherresearch labs were typically less than 850 MBps.

LHC compatible

The significance of this result, and the purpose of thedata challenge, was to show that the CERN’s IT Divisionis on track to cope with the enormous data ratesexpected from experiments on the LHC. These experi-ments will produce data at rates in excess of 100 MBps,and one experiment alone, called Alice, is expected toproduce data at rates of 1.25 GBps.

In order to simulate the LHC data acquisition proce-dure, Dr. Panzer’s team generated an equivalent streamof artificial data, using 40 compute servers. This datawas stored temporarily to 60 disk servers, which includ-ed the CERN opencluster servers, before being trans-ferred to the StorageTek tape servers as illustrated inthe figure, right. A data compression factor of 1.3 wasdeliberately chosen during this data challenge, as this ischaracteristic of the compression that can be achievedwith real experimental data.

A key contributing factor to the success of the datachallenge was a high performance switched networkfrom Enterasys Networks with 10 GBps Ethernet capa-bility, which routed the data from PC to disk and fromdisk to tape. As indicated in the figure, once the LHC isrunning, data will come directly from the experiments,and be stored on disks on the DataGrid, as well asarchived onto tape at CERN.

Page 20: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

CERN openlab for DataGrid applications

18

CERN openlab dissemination: spreading the news

While many of the benefits for industrial partners ofCERN openlab are technical, there is also a strongemphasis in CERN openlab’s mission on the public rela-tions opportunities that this novel partnership pro-vides. Therefore, the CERN openlab Management Unithas developed over the last year a spectrum of activitiesthat promote CERN openlab in different contexts, andprovide added benefits for the partners.

The openlab Technical Team is ensuring disseminationof CERN openlab results through traditional technicalchannels such as conference presentations, technicalpublications and thematic workshops. In addition, thecutting-edge nature of the CERN openlab data chal-lenges lends itself well to Press releases and other non-technical publications, as well as to presentations to vis-iting VIPs at CERN.

Publications

Two publications in the Proceedings of Computing forHigh Energy and Nuclear Physics (CHEP ‘03, La Jolla,California) (to be published, see web informationbelow)

Conference contributions

Two talks at CHEP ’03, La Jolla, California featuredhighlights from CERN opencluster. These presentationscan be viewed at: http://chep03.ucsd.edu/files/139.ppt http://chep03.ucsd.edu/files/262.ppt

Press releases and non-technical articles

Joint Press Release with HP (25/9/2002)Joint Press Release with IBM (2/4/2003)Joint Press release with StorageTek (28/5/2003)CERN bulletin 39/2002 p5CERN bulletin 12/2003 p6CERN courier (to be published Oct. 2003)CERN annual report 2002

Thematic workshops

As part of the technical program, Thematic Workshopsare proposed to sponsors and open to CERN staff. Thestandard format of Thematic Workshops is as follows:on day one, CERN experts present specific CERNrequirements and the status of their technical develop-ment; on day two, CERN meets sponsors on a one-to-one basis.

The first Thematic Workshop took place on 17-18March 2003, dedicated to Data and StorageManagement. The second Workshop is scheduled the 8and 9 July 2003, on Fabric Management. FutureThematic Workshops on Total Cost of Ownership andon Energy Consumption in Data Centers are beingplanned.

Website

The website structure has been expanded and is nowformed of a top level public website (openlab style,stable content) and a new public working site (IT divi-sion style, frequently updated)

Page 21: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

annual report 2002

19

Public events

CERN openlab has hosted two First Tuesday events atCERN – details listed below – in collaboration with theorganization “First Tuesday Suisse Romande”. Thesepublic events attracted some 250 persons each, andinvolved speakers from CERN and industrial partners, aswell as other guest speakers. The public is primarilyfrom the regional business and investor communities.The events were also webcast live and are archived atwww.rezonance.ch. A First Tuesday on Grid technologyis scheduled for Q3 2003.

- 1st FT @ CERN: Grid Technologies: scientific and indus-trial prospects (27 September 2002).

- 2nd FT @ CERN: New trends in industrial partnershipand innovation management at European research lab-oratories (19 March 2003).

VIP Presentations

- European Commissioner for Enterprise andInformation Society, Erkki Liikanen (February 21st2003)

- Swiss State Secretary for Economic Affairs, David Syz(March 3rd 2003)

- European Parliament Technology Assessment delega-tion (March 18 2003)

- Norwegian State Secretary Bjorn Haugstad, Ministryof Education and Research (30 April 2003)

- National Technological University (USA),Management of Technology Programme (May 14th2003)

- Finnish Parliamentary Committee (May 27 2003)

Student activities

Based on a pilot programme run last year in collabora-tion with the Helsinki Institute of Physics, a CERN open-lab student programme will run in summer 2003,involving 11 students from seven European countries.The idea is to have the students work in internationalteams, on projects related to the applications of Gridtechnology. In 2003, one student team will work onissues related to the CERN opencluster, another on Gridsolutions for data acquisition from the Athena experi-ment, and finally, one team will work on the Grid caféwebsite, described below. The objective is to get thestudents to continue their work in their home institute,and to attract talented students back to CERN to doM.Sc. and Ph.D. projects.

CERN openlab is co-sponsor of an application by CERNfor an EU grant from the Marie-Curie programme. Ifsuccessful, this application will lead to funding for 1-2Ph.D. students associated with CERN openlab.

Outreach activities

CERN openlab is actively supporting the establishmentof a Grid Café for the CERN Microcosm exhibition. Theidea is to create a web café with a focus on Grid tech-nologies, including a dedicated website that will givelinks to key Grid technology websites, and access to ademonstration testbed developed by EDG. This activityis also planned to be part of CERN’s presence on con-ference stands such as one currently being planned forTelecom’03 in Geneva in October, and in the frameworkof the RSIS (Role of Science in the Information Society)conference to take place at CERN the 8th and 9thDecember 2003 as a side event to the WSIS (WorldSummit on the Information Society).

CERN openlab is also actively involved in the planningof the CERN Globe of Innovation, a stunning architec-tural structure that the Swiss government has given toCERN, and that will be used as meeting and networkingcentre around the theme of science, technology andinnovation.

Page 22: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

CERN openlab for DataGrid applications

20

THE FUTURE

The technological horizon

The plans for the next two years are to double thenumber of nodes on the CERN opencluster each year,reaching a cluster size of 128. The high-speed switchingenvironment will be correspondingly upgraded, toensure that the 10Gbps challenge can be attained withthis size of cluster.

In parallel with this hardware expansion, softwareactivities will focus in particular on the gridification ofthe opencluster, and on establishing a meaningful setof benchmarks to evaluate the performance of such anadvanced cluster on a computing Grid.

On the storage side, IBM is contributing Storage Tank®,a new technology that is designed to provide scalable,high-performance and highly available management ofvery large amounts of data using a single file name-space regardless of where or on what operating systemthe data reside. (Recently, IBM announced that thecommercial version will be named IBM TotalStorage™SAN File System.) IBM and CERN will work together toextend Storage Tank's capabilities in combination withthe CERN opencluster, so it can manage the LHC dataand provide access to it from any location worldwide.

New project opportunities

Discussions are ongoing in CERN openlab to evaluateother possible areas of technological collaboration,with current or future partners. These could be in termsof added functionality and performance of the existingopencluster, using complementary technologies.

Equally promising are collaborative efforts in the areaof open standards and open source software, as well asspecific Grid-related issues – such as Grid security andmobility – that have implications for long-term com-mercial applications of Grids. Detailed technical pro-posals for developing projects in such areas will be pre-pared in the coming year.

CERN openlab has potential as a vehicle for industrialparticipation in the EGEE project and similar EuropeanGrid projects. CERN openlab will endeavour to keepindustrial partners informed about the progress ofthese projects, and investigate ways to enhance collab-oration with them.

Page 23: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

annual report 2002

21

FOR MORE INFORMATION

Executive Contact

Wolfgang von Rüden,Head of the CERN openlab for DataGrid applications,CERN Information Technology Division,[email protected]: +41 22 767 6436

Public Relations

François Grey,CERN openlab developer,CERN Information Technology Division,[email protected]: +41 22 767 1483

Technical Contact

François Fluckiger,Associate Head of the CERN openlab for DataGridapplications,CERN Information Technology Division,[email protected]: +41 22 767 4984

http://www.cern.ch/openlab

This is the second annual report of the CERN openlab for DataGrid applications. It waspresented to the Board of Sponsors at the Annual Sponsors meeting, June 13th 2003.

Participants in the Annual Sponsors Meeting, from left to right: Sverre Jarp (CERN openlab),François Fluckiger (CERN openlab), Pasquale Di Cesare (IBM), Wolfgang von Rüden (CERN open-lab), Tom Hawk (IBM), John Manley (HP), Tom Woodget (Intel), Luciano Maiani (CERN), MichelBenard (HP), Brian Carpenter (IBM), Pierre Mirjolet (Intel), Lou Witkin (HP), Jai Menon (IBM), JimDuley (HP), Les Robertson (CERN LCG), Gérard Vivier (Enterasys Networks), François Grey (CERNopenlab), Markus Nispel (Enterasys Networks), Christoph Kälin (Enterasys Networks).

Page 24: CERN · A word from the Head of CERN openlab The CERN openlab for DataGrid appli-cations is now well on the way ... network of major computer centres around the world

CERNopenlab for DataGrid applications

sponsored by