PHOBOS Computing in the pre-Grid era

Preview:

DESCRIPTION

PHOBOS Computing in the pre-Grid era. Burt Holzman Head of Computing PHOBOS Experiment. Outline. “Collision Physics” Computing with PHOBOS: now Computing with PHOBOS: soon. Acknowledgments. PHOBOS Computing represents years of hard work by many people including but not limited to - PowerPoint PPT Presentation

Citation preview

PHOBOS Computing PHOBOS Computing in the in the

pre-Grid erapre-Grid era

Burt Holzman

Head of Computing

PHOBOS Experiment

OutlineOutline

• “Collision Physics”

• Computing with PHOBOS: now

• Computing with PHOBOS: soon

AcknowledgmentsAcknowledgments

PHOBOS Computing represents years of hard work by many peopleincluding but not limited to

Marten Ballintijn, Mark Baker, Patrick Decowski, Nigel George, George Heintzelman, Judith Katzy, Andrzej Olszewski, Gunther Roland, Peter Steinberg, Krzysztof Wozniak, Jinlong Zhang and the gang at the RCF

Collision PhysicsCollision Physics

We (HEP/Nuclear/RHIC) collidebillions (and billions...) of

• electrons vs. positrons

• antiprotons vs. protons

• nuclei vs. nuclei

• ... and a whole lot more

Collision PhysicsCollision Physics

Collision physics is ideal for parallel computing: each collision (“event”) is independent!

Animation courtesy of UrQMD group

Collision PhysicsCollision Physics

... and we collect millions of events per day

RHIC Needs PowerRHIC Needs Power

Event from Delphi (e+ e-) Event from STAR (Au Au)

PHOBOS DetectorPHOBOS DetectorCerenkov Trigger Counters Time of Flight Counters

Octagon Multiplicity Detector+ Vertex Detector (Silicon)

Spectrometer Detectors (Silicon)

Beryllium Beam Pipe

Paddle Trigger Counters

Magnet (top part removed)

Ring Multiplicity Detectors(Silicon)

PHOBOS Computing NeedsPHOBOS Computing Needs

• Operations (Counting House)• Simulations (“Monte Carlo”)

– Must understand detector response and backgrounds

• Production– Reconstruct event vertex– Reconstruct tracks

• Analysis

Operations/Counting HouseOperations/Counting House

• Data Acquisition• PR01-PR04: quad Sparc + SCSI Raid

• PR05: Big River TK200 data recorder

• Slow Controls• Windows NT running Labview

• Online Monitoring

• Datamover

Online MonitoringOnline Monitoring

TPhOnDistributor

TPhOnDistClientTPhOnDistClient

DAQ

Raw Event

Event Event

Event

TPhSocketDataFile01

TPhOnDistClientEvent

TPhSocketDataFile01

Event

DistClients dodistributed processing of events (HitProcessing, …)

““Datamover” designDatamover” design

• Run reconstruction ASAP on data

• This implies:– Run calibrations (“pedestals”) in

Counting House

– Run reconstruction from HPSS disk cache, not HPSS tape

PR00 datamoverPR00 datamover

• Users ran calibrations by hand

• Users sunk data by hand

• OK for a really slow DAQ

PR01-PR04 datamoverPR01-PR04 datamover

DAQ

DAQ DISK

SCSI

vmesparc

SCSI

pedestals

NFS

FTP

ped_daemon

daq_daemon

deletor_daemon

online

online tells pedestals when to runpedestals tells vmesparc when to sink

HPSS

PR05 datamoverPR05 datamover

DAQ

DAQ DISK

pedestals

NFS

FTP

ped_daemondaq_daemon

online

online tells pedestals when to runpedestals tells vmesparc when to sink

HPSSRamdisk

Vertex ReconstructionVertex Reconstruction

cm

Vertex Resolution:x ~ 450 my ~ z ~ 200 m

For this event:vertex @

Z = -0.054 cmcm

counts

+z

Si

Si

Combinatorics can be very expensive

Particle TrackingParticle Tracking1. Road-following algorithm

finds straight tracks in

field-free region

2. Curved tracks in B-field

found by clusters in (1/p,

) space

3. Match pieces by ,

consistency in dE/dx and

fit in yz-plane

4. Covariance Matrix Track

Fit for momentum

reconstruction and ghost

rejection

x10 cm

1

2

Byz

Beam

Size of the PHOBOS FarmSize of the PHOBOS Farm

• 6 dual 733 MHz

• 6 dual 933 MHz

• 11 dual 1733 MHz

pharm

• 30 dual 450 MHz

• 44 dual 800 MHz

• 63 dual 1000 MHz

• 26 dual 1400 MHz

• 98 dual 2400 MHz

• 80 dual 3060 MHz

RHIC Computing Facility

All nodes have 2-4 big disks attached

LSF BatchLSF Batch

QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP phslow_hi 40 Open:Active - 10 - - 0 0 0 0phcas_hi 40 Open:Active - 10 - - 0 0 0 0phcasfast_hi 40 Open:Active - 10 - - 0 0 0 0phcrs_hi 40 Open:Active - 10 - - 0 0 0 0phslow_med 30 Open:Active - 45 - - 0 0 0 0phcas_med 30 Open:Active - 70 - - 0 0 0 0phcasfast_med 30 Open:Active - 45 - - 0 0 0 0phcrs_med 30 Open:Active - 70 - - 0 0 0 0phslow_lo 10 Open:Active - - - - 0 0 0 0phcas_lo 10 Open:Active - - - - 0 0 0 0phcasfast_lo 10 Open:Active - - - - 0 0 0 0phcrs_lo 10 Open:Active - - - - 90 0 90 0phslow_mc 5 Open:Active - - - - 0 0 0 0phcas_mc 5 Open:Active - - - - 248 30 213 5phcasfast_mc 5 Open:Active - - - - 105 51 54 0phcrs_mc 5 Open:Active - - - - 386 189 108 89

(for analysis & simulations)

CRS BatchCRS Batch(for Production)

Monitoring: GangliaMonitoring: Ganglia

http://ganglia.sourceforge.net

ROOTROOT

• PHOBOS Software (“PhAT”) built on ROOT (http://root.cern.ch) framework– Automated I/O of objects– Very efficient data formats (trees)– C++ interpreter (!)

rootdrootd

• ROOT files accessible via rootd server-side daemon over 100 Mb/s network

• Can be inefficient: data is remote to analysis job (but OK if CPU is bottleneck)

Distributed DiskDistributed Disk

rcrs4001

rcrs4002

rcrs4003

rcrs4004ORACLE DB

rootd

Distributed Disk: CatWebDistributed Disk: CatWeb

• Web-based front-end

• Oracle database back-end

• Allows users to stage files to distributed disk via dedicated LSF queue

Distributed Disk: CatWebDistributed Disk: CatWeb

88 Tb

PHOBOS: SoonPHOBOS: Soon

• Condor to replace LSF batch• Pros:

– Free (LSF costs big bucks)– Grid-ready– PROOF-ready

• Cons:– No “queue” concept (hackish

solutions may exist)

PHOBOS: SoonPHOBOS: Soon

• PROOF (Parallel ROOT Facility)– Brings the process to the data!

– In use now on pharm, soon on RCF

– Hooks in with Condor (specifically: Condor On Demand, preempting all other activity)

PROOF in actionPROOF in action

root

Remote PROOF Cluster

proof

proof

proof

TNetFile

TFile

Local PC

$ root

ana.Cstdout/obj

node1

node2

node3

node4

$ root

root [0] tree.Process(“ana.C”)

$ root

root [0] tree.Process(“ana.C”)

root [1] gROOT->Proof(“remote”)

$ root

root [0] tree.Process(“ana.C”)

root [1] gROOT->Proof(“remote”)

root [2] chain.Process(“ana.C”)

ana.C

proof

proof = slave server

proof

proof = master server

#proof.confslave node1slave node2slave node3slave node4

*.root

*.root

*.root

*.root

TFile

TFile