Upload
seth-perkins
View
26
Download
3
Tags:
Embed Size (px)
DESCRIPTION
PHOBOS Computing in the pre-Grid era. Burt Holzman Head of Computing PHOBOS Experiment. Outline. “Collision Physics” Computing with PHOBOS: now Computing with PHOBOS: soon. Acknowledgments. PHOBOS Computing represents years of hard work by many people including but not limited to - PowerPoint PPT Presentation
Citation preview
PHOBOS Computing PHOBOS Computing in the in the
pre-Grid erapre-Grid era
Burt Holzman
Head of Computing
PHOBOS Experiment
OutlineOutline
• “Collision Physics”
• Computing with PHOBOS: now
• Computing with PHOBOS: soon
AcknowledgmentsAcknowledgments
PHOBOS Computing represents years of hard work by many peopleincluding but not limited to
Marten Ballintijn, Mark Baker, Patrick Decowski, Nigel George, George Heintzelman, Judith Katzy, Andrzej Olszewski, Gunther Roland, Peter Steinberg, Krzysztof Wozniak, Jinlong Zhang and the gang at the RCF
Collision PhysicsCollision Physics
We (HEP/Nuclear/RHIC) collidebillions (and billions...) of
• electrons vs. positrons
• antiprotons vs. protons
• nuclei vs. nuclei
• ... and a whole lot more
Collision PhysicsCollision Physics
Collision physics is ideal for parallel computing: each collision (“event”) is independent!
Animation courtesy of UrQMD group
Collision PhysicsCollision Physics
... and we collect millions of events per day
RHIC Needs PowerRHIC Needs Power
Event from Delphi (e+ e-) Event from STAR (Au Au)
PHOBOS DetectorPHOBOS DetectorCerenkov Trigger Counters Time of Flight Counters
Octagon Multiplicity Detector+ Vertex Detector (Silicon)
Spectrometer Detectors (Silicon)
Beryllium Beam Pipe
Paddle Trigger Counters
Magnet (top part removed)
Ring Multiplicity Detectors(Silicon)
PHOBOS Computing NeedsPHOBOS Computing Needs
• Operations (Counting House)• Simulations (“Monte Carlo”)
– Must understand detector response and backgrounds
• Production– Reconstruct event vertex– Reconstruct tracks
• Analysis
Operations/Counting HouseOperations/Counting House
• Data Acquisition• PR01-PR04: quad Sparc + SCSI Raid
• PR05: Big River TK200 data recorder
• Slow Controls• Windows NT running Labview
• Online Monitoring
• Datamover
Online MonitoringOnline Monitoring
TPhOnDistributor
TPhOnDistClientTPhOnDistClient
DAQ
Raw Event
Event Event
Event
TPhSocketDataFile01
TPhOnDistClientEvent
TPhSocketDataFile01
Event
DistClients dodistributed processing of events (HitProcessing, …)
““Datamover” designDatamover” design
• Run reconstruction ASAP on data
• This implies:– Run calibrations (“pedestals”) in
Counting House
– Run reconstruction from HPSS disk cache, not HPSS tape
PR00 datamoverPR00 datamover
• Users ran calibrations by hand
• Users sunk data by hand
• OK for a really slow DAQ
PR01-PR04 datamoverPR01-PR04 datamover
DAQ
DAQ DISK
SCSI
vmesparc
SCSI
pedestals
NFS
FTP
ped_daemon
daq_daemon
deletor_daemon
online
online tells pedestals when to runpedestals tells vmesparc when to sink
HPSS
PR05 datamoverPR05 datamover
DAQ
DAQ DISK
pedestals
NFS
FTP
ped_daemondaq_daemon
online
online tells pedestals when to runpedestals tells vmesparc when to sink
HPSSRamdisk
Vertex ReconstructionVertex Reconstruction
cm
Vertex Resolution:x ~ 450 my ~ z ~ 200 m
For this event:vertex @
Z = -0.054 cmcm
counts
+z
Si
Si
Combinatorics can be very expensive
Particle TrackingParticle Tracking1. Road-following algorithm
finds straight tracks in
field-free region
2. Curved tracks in B-field
found by clusters in (1/p,
) space
3. Match pieces by ,
consistency in dE/dx and
fit in yz-plane
4. Covariance Matrix Track
Fit for momentum
reconstruction and ghost
rejection
x10 cm
1
2
Byz
Beam
Size of the PHOBOS FarmSize of the PHOBOS Farm
• 6 dual 733 MHz
• 6 dual 933 MHz
• 11 dual 1733 MHz
pharm
• 30 dual 450 MHz
• 44 dual 800 MHz
• 63 dual 1000 MHz
• 26 dual 1400 MHz
• 98 dual 2400 MHz
• 80 dual 3060 MHz
RHIC Computing Facility
All nodes have 2-4 big disks attached
LSF BatchLSF Batch
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP phslow_hi 40 Open:Active - 10 - - 0 0 0 0phcas_hi 40 Open:Active - 10 - - 0 0 0 0phcasfast_hi 40 Open:Active - 10 - - 0 0 0 0phcrs_hi 40 Open:Active - 10 - - 0 0 0 0phslow_med 30 Open:Active - 45 - - 0 0 0 0phcas_med 30 Open:Active - 70 - - 0 0 0 0phcasfast_med 30 Open:Active - 45 - - 0 0 0 0phcrs_med 30 Open:Active - 70 - - 0 0 0 0phslow_lo 10 Open:Active - - - - 0 0 0 0phcas_lo 10 Open:Active - - - - 0 0 0 0phcasfast_lo 10 Open:Active - - - - 0 0 0 0phcrs_lo 10 Open:Active - - - - 90 0 90 0phslow_mc 5 Open:Active - - - - 0 0 0 0phcas_mc 5 Open:Active - - - - 248 30 213 5phcasfast_mc 5 Open:Active - - - - 105 51 54 0phcrs_mc 5 Open:Active - - - - 386 189 108 89
(for analysis & simulations)
CRS BatchCRS Batch(for Production)
Monitoring: GangliaMonitoring: Ganglia
http://ganglia.sourceforge.net
ROOTROOT
• PHOBOS Software (“PhAT”) built on ROOT (http://root.cern.ch) framework– Automated I/O of objects– Very efficient data formats (trees)– C++ interpreter (!)
rootdrootd
• ROOT files accessible via rootd server-side daemon over 100 Mb/s network
• Can be inefficient: data is remote to analysis job (but OK if CPU is bottleneck)
Distributed DiskDistributed Disk
rcrs4001
rcrs4002
rcrs4003
rcrs4004ORACLE DB
rootd
Distributed Disk: CatWebDistributed Disk: CatWeb
• Web-based front-end
• Oracle database back-end
• Allows users to stage files to distributed disk via dedicated LSF queue
Distributed Disk: CatWebDistributed Disk: CatWeb
88 Tb
PHOBOS: SoonPHOBOS: Soon
• Condor to replace LSF batch• Pros:
– Free (LSF costs big bucks)– Grid-ready– PROOF-ready
• Cons:– No “queue” concept (hackish
solutions may exist)
PHOBOS: SoonPHOBOS: Soon
• PROOF (Parallel ROOT Facility)– Brings the process to the data!
– In use now on pharm, soon on RCF
– Hooks in with Condor (specifically: Condor On Demand, preempting all other activity)
PROOF in actionPROOF in action
root
Remote PROOF Cluster
proof
proof
proof
TNetFile
TFile
Local PC
$ root
ana.Cstdout/obj
node1
node2
node3
node4
$ root
root [0] tree.Process(“ana.C”)
$ root
root [0] tree.Process(“ana.C”)
root [1] gROOT->Proof(“remote”)
$ root
root [0] tree.Process(“ana.C”)
root [1] gROOT->Proof(“remote”)
root [2] chain.Process(“ana.C”)
ana.C
proof
proof = slave server
proof
proof = master server
#proof.confslave node1slave node2slave node3slave node4
*.root
*.root
*.root
*.root
TFile
TFile