S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June 20071 ATLAS cluster in Geneva Szymon Gadomski University of Geneva

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June 2007 1

ATLAS cluster in Geneva

Szymon GadomskiUniversity of Geneva and INP Cracow

• the existing system− hardware, configuration, use− failure of a file server

• the system for the LHC startup− draft architecture of the cluster− comments


The cluster at Uni Dufour

First batch (2005)• 12 Sun Fire v20z (wn01 to 12)

– 2 AMD Opteron 2.4 GHz, 4 GB RAM

• 1 Sun Fire v20z (grid00) – 7.9 GB disk array

• SLC3

Second batch (2006-2007)• 21 Sun Fire X2200 (atlas-ui01 to 03 and wn16 to 33)

– 2 Dual Core AMD Opteron 2.6 GHz, 4 GB RAM

• 1 Sun Fire X4500 with 48 disks 500 GB each– 17.6 TB effective storage (1 RAID 6, 1 RAID 50, 2 RAID 5)

• SLC4

Description at https://twiki.cern.ch/twiki/bin/view/Atlas/GenevaATLASClusterDescription


Diagram of the existing system

• 26 TB and 108 cores in workers and login PCs • atlas-ui01..03, grid00 and grid02 have double

identity because of the CERN network• setup by Frederik Orellana


The cluster at Uni Dufour (2)

Frederik, Jan 07 “old” worker nodes new ones


disk arrays

CERNnetworkworker nodes

back side


Current use (1)

• GRID batch facility in NorduGrid– in fact two independent systems

• “old” (grid00 + wn01..12), SLC3, older NG• “new” (grid02 + wn16..33), SLC4, newer NG

– runs most of the time for the ATLAS Mote Carlo production controlled centrally by the “production system” (role of the Tier 2)

– higher priority for Swiss ATLAS members • mapped to a different user ID, using a different batch queue

– ATLAS releases installed locally and regularly updated

• some level of maintenance is involved


NorduGrid(ATLAS sites)

• n.b. you can submit to all ATLAS NG resources

• prerequisites–grid certificate– read NG user guide

• easier to deal with then ATLAS software


Current use (2)

• Interactive work like on lxplus.cern.ch– login with the same account (“afs account”)– see all the software installed on /afs/cern.ch, including the nightly

releases– use the ATLAS software repository in CVS like on lxplus

• “cmt co …” works

– same environment to work with Athena, just follow the “ATLAS workbook”

• Offer advantages of lxplus without the limitations– You should be able to do the same things the same way.– Your disk space can easily be ~TB and not ~100 MB. Roughly

speaking we have a factor of 104.– Around 18’000 people have accounts on lxplus…


Problems and failure of grid02

The PC• Sun Fire X4500 “Thumper”• 48 x 500 GB of disk space, 17 TB effective storage• 2 Dual Core AMD Opteron processors, 16 GB of RAM• one Gb network connection to CERN, one Gb to the University network including all worker

nodes (not enough)• SLC4 installed by Frederik last January • working as a file server, batch server (Torque) and Nordu Grid server (manager + ftp + info)

The problems• batch system stopping often when many jobs running (18 OK, 36 not OK, 64 definitely not OK)• high “system load” caused by data copying (scp)• can take minutes to respond to a simple command

The crash• More jobs allowed again last Thursday, to understand scaling problems. More jobs sent.• A few minutes later the system stopped responding. It could not be shut down. • After a power cycle it is corrupted.

It will be restored under SLC4. What to do in the long run is a good question. Any advice?


Plan of the system for Autumn 2007

new hardware before end of September

one idea of the architecture60 TB , 176 cores

some open questions

comments are welcome


Comments about future evolution• Interactive work is vital.

– Everyone needs to login somewhere. We are not supporting ATLAS software or Grid tools on laptops and desktops.

– The more we can do interactively, the better for our efficiency. – A larger fraction of the cluster will be available for login. Maybe even all.

• Plan to remain a Grid site. – Bern and Geneva have been playing a role of a Tier 2 in ATLAS. We plan to continue that.

• Data transfer are too unreliable in ATLAS. – Need to find ways to make them work much better.– Data placement from FZK directly to Geneva would be welcome. No way to do that

(LCG>NorduGrid) at the moment.• Choice of gLite vs NorduGrid is (always) open. It depends on

– feasibility of installation and maintenance by one person not full time – reliability (file transfers, job submissions) – integration with the rest of ATLAS computing

• Colleagues in Geneva working for Neutrino experiments would like to use the cluster. We are planning to make it available to them and use priorities.

• Be careful with extrapolations from present experience. Real data volume will be 200x larger then a large Monte Carlo production.


• You can always use the (Nordu) GRID directly:– make your own .xrsl file (see next page)– submit jour jobs and get back the results (ngsub, ngstat, ngcat, ngls) – write your own scripts to mange this when you have 100 or 1000

similar jobs to submit, each one with different input data– many people work like that now– if you like to work like that, less problems for me

• I think we will use higher level tools which:– operate on data sets and not on files– know about ATLAS catalogues of data sets– automatize the process: make many similar jobs and look after them

(split the input data, merging of output)– work with many Grids

Higher level GRID tools


An example of the job description in .xrsl

&("executable" = "SUSYWG_prewrapper.sh" )("inputFiles" = ("SUSYWG_wrapper.sh" "SUSYWG_wrapper.sh" ) ("SUSYWG_prewrapper.sh" "SUSYWG_prewrapper.sh" ) ("SUSYWG_csc_evgen_trf.sh" "SUSYWG_csc_evgen_trf.sh" ) ("SUSYWG_csc_atlfast_trf.sh" "SUSYWG_csc_atlfast_trf.sh" ) ("SUSYWG_SetCorrectMaxEvents.py" "SUSYWG_SetCorrectMaxEvents.py" ) ("z1jckkw_nunu_PT40_01.001.unw" "gsiftp://lheppc50.unibe.ch/se1/borgeg/IN_FILES/zNj_nnPT40_in_noEF/z1jckkw_nunu_PT40_01.001.unw" ) ("z1jckkw_nunu_PT40_01.001_unw.par" "gsiftp://lheppc50.unibe.ch/se1/borgeg/IN_FILES/zNj_nnPT40_in_noEF/z1jckkw_nunu_PT40_01.001_unw.par.ok" ) ("SUSYWG_DC3.005538.AlpgenJimmyToplnlnNp3.py" "SUSYWG_DC3.005538.AlpgenJimmyToplnlnNp3.py" ) ("SUSYWG_csc_atlfast_log.py" "SUSYWG_csc_atlfast_log.py" ) )("outputFiles" = ("z1jckkw_nunu_PT40_01.001.Atlfast12064.NTUP.root" "gsiftp://lheppc50.unibe.ch/se1/borgeg/atlfast12064/zNj_nnPT40_noEF/z1jckkw_nunu_PT40_01.001.Atlfast12064.NTUP.root" ) ("z1jckkw_nunu_PT40_01.001.Atlfast12064.AOD.pool.root" "gsiftp://lheppc50.unibe.ch/se1/borgeg/atlfast12064/zNj_nnPT40_noEF/z1jckkw_nunu_PT40_01.001.Atlfast12064.AOD.pool.root" ) ("z1jckkw_nunu_PT40_01.001.Atlfast12064.log" "gsiftp://lheppc50.unibe.ch/se1/borgeg/atlfast12064/zNj_nnPT40_noEF/z1jckkw_nunu_PT40_01.001.Atlfast12064.log" ) ("z1jckkw_nunu_PT40_01.001.Atlfast12064.logdir.tar.gz" "gsiftp://lheppc50.unibe.ch/se1/borgeg/atlfast12064/zNj_nnPT40_noEF/z1jckkw_nunu_PT40_01.001.Atlfast12064.logdir.tar.gz" ) )("memory" = "1000" )("disk" = "5000" )("runTimeEnvironment" = "APPS/HEP/ATLAS-12.0.6.4_COTRANSFER" )("stdout" = "stdout" )("stderr" = "stderr" )("gmlog" = "gmlog" )("CPUTime" = "2000" )("jobName" = "noEF_z1jckkw_nunu_PT40_01.001.Atlfast12064.0" ) ) )

May 07 02:01:18 RELATION: jobName = SEQUENCE( LITERAL( noEF_z1jckkw_nunu_PT40_01.001.Atlfast12064.0 ) )


Higher level GRID tools (2)• GridPilot

– a Graphical User Interface (GUI) in Java – developed by Frederik Orellana and Cyril Topfel– only works with Nordu Grid so far– used so far for

• beam test analysis and simulation• as a tool to get data to Bern and to Geneva

• Ganga– a command line tool (in Python) or a GUI– an “official” development, well supported– several people, including Till, have used it with success in the command line

mode– can work with NorduGrid, LCG and plain batch system

• pathena– the easiest to use, you run pathena like athena– developed by Tadashi Maeno, BNL– needs a server that does all and a pilot process on every node– for the moment the only server is at BNL, but Tadashi will move here…

• The plan is to try Ganga and GridPilot on some real cases and see how we like them.


Summary

• The ATLAS cluster in Geneva is a large Tier 3– now 108 cores and 26 TB– in a few months 178 cores and 70 TB

• In NorduGrid since a couple of years, runs ATLAS simulation like a Tier 2. Plan to continue that.

• Recently more interactive use by the Geneva group, plan to continue and to develop further.

• Not so good experience with Sun Fire X4500 running Scientific Linux. Need to decide if one should change to Solaris.

• Choice of middleware can be discussed.• Room for more exchange of experiences between the Swiss

WLCG experts: hardware, cluster architecture, OS, cluster management tools, middleware, data transfer tools


backup slides


NorduGrid and LCG

• When we started with the Grid in 2004, the LCG was not an option– Not applicable to small clusters like Bern ATLAS and Geneva ATLAS. You

would need 4 machines to run services. – You need to reinstall the system and have SLC4. Not possible on the Ubelix

cluster in Bern.– Support was given to large sites only.

• The situation now– NG in Bern and Geneva, NG + LCG in Manno– Ubelix (our biggest resource) will never move to SLC4. – Our clusters are on SLC4 and have become larger, but still a sacrifice of 4

machines would be significant.– The change would be disruptive and might not pay off in the long term. NG

still has a better reputation for what concerns stability. We see it as a low(er) maintenance solution.

– Higher level Grid tools promise to hide the differences to the users.

• Do not change now, but reevaluate the arguments regularly.

Dario Barberis: ATLAS Computing Plans for 2007 18

WLCG Workshop - 22 January 2007

Computing Model: central operations

Tier-0: Copy RAW data to Castor tape for archival

Copy RAW data to Tier-1s for storage and subsequent reprocessing

Run first-pass calibration/alignment (within 24 hrs)

Run first-pass reconstruction (within 48 hrs)

Distribute reconstruction output (ESDs, AODs & TAGs) to Tier-1s

Tier-1s: Store and take care of a fraction of RAW data (forever)

Run “slow” calibration/alignment procedures

Rerun reconstruction with better calib/align and/or algorithms

Distribute reconstruction output (AODs, TAGs, part of ESDs) to Tier-2s

Keep current versions of ESDs and AODs on disk for analysis

Tier-2s: Run simulation (and calibration/alignment when appropriate)

Keep current versions of AODs on disk for analysis


General comments

• The tier 3 is not counted for the ATLAS “central operations”. We can use it as we prefer.

• We have done quite some Tier 2 duty already. We are not obliged. It makes sense to keep the system working, but our work has a higher priority.

• How are we going to work when the data comes? Be careful when you extrapolate from your current experience– A large simulation effort in ATLAS (the Streaming Test) produces “10

runs of 30 minute duration at 200 Hz event rate (3.6 million events)”. That is ~10-3 of the data sample produced by ATLAS in one year. Need to prepare for a much larger scale.

• FDR tests of the Computing this Summer. Need to connect to them, exercise the flow of data from Manno, FZK and CERN. Also exercise sending our jobs to the data and collecting the output.


To Do list, for discussion• see with Olivier if GridPilot can work for him • one way of working with Ganga on our cluster

– from lxplus only grid00-cern is visible and ngsub command has a problem with double identity

– on atlas-ui01 there is a library missing

• fix Swiss ATLAS computing documentation pages – many pages obsolete, a clearer structure is needed – need updated information about Grid tools and ATLAS software

• exercise data transfers– from CERN (T0), Karlsruhe (T1) and Manno (T2),– a few different ways and tools

• if thermal modeling for the SCT goes ahead, help to use the cluster


To Do list, for discussion (2)

• Medium term– consolidate system administration

• see if we can make routing more efficient• make all software run on atlas-ui01, then make all the ui identical• update the system on the new workers• when ATLAS ready change old workers to SLC4 and • setup some monitoring of the cluster

– prepare next round of hardware, • possibly evaluate prototypes from Sun

– get Manno involved in the “FDR” (tests of ATLAS computing model)– decide about Ganga vs GridPilot, provide instructions how to use one

• Longer term– try pathena when available at CERN– try PROOF – setup a web server – try the short-lived credential service of Switch

Help would be welcome!


Some open questions

• Local accounts vs afs accounts.• Direct use of the batch system

– with standard command-line tools or as back-end to Ganga

– would require some work (queues in the batch system, user accounts on every worker) but can be done

• Use of the system by the Neutrino group.

Documents

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June 20071 ATLAS cluster in Geneva Szymon Gadomski University of Geneva