Upload
felix-wilcox
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Building a distributed software environment for
CDF within the ESLEA framework
V. Bartsch, M. Lancaster
University College London
CDF experiment
located at Fermilab close to Chicago proton/anti-proton collisions at the Tevatron of an energy of 1.2 TeV
CDF
multipurpose detector with discovery potential for the Higgs, studies of b physics and measurement of standard model parameters
luminosity of about 1fb-1 per year
Principle of data analysis
40MB/s2TB/day
assign particle momentum, tracks etc.
raw data reco data
user selection
user data
MCmonte carlo
simulation of the events
user analysis
analysis performed by
~800 physicists in ~60 institutes
CDF – data handling requirements
The experiment has ~ 800 physicists of which ~ 50 are in the UK.
The experiment produces large amounts of data which is stored in the US
~ 1000 Tb per year~ 2000 Tb data stored to date and expect this to rise to 10,000 by 2008
UK physicists:need to be able to copy datasets ( ~ 0.5-10 Tb) quickly to the UKcreate MC data within the UKother UK physicists and other CDF physicists worldwide
data handling numbers
• CDF has acquired
• produces nowadays 1Pb/year, expected to rise to 10Pb by 2008• Fermilab alone is serving about 18 Tb/day
590 TB Raw data
660 TB Reconstructed data
280 TB MC
1530 TB total
Bytes read
TB
ytes
2
6
10
CDF batch computing• 2 types of activities
– organized processing• raw data reconstruction• data reduction for different physics groups• MC production
– user analysis• need to be able to copy datasets (0.5-10Tb)
both use large amount of CPU use the same tools for all
CDF Grid Philosophy
CDF has adopted Grid concepts quite late during run time while it already had a mature software
look & feel of the old data handling system maintained reliability main issue
use existing infrastructure as portal and change software underneath
CDF Analysis Farm (CAF)
Submit and forget until receiving a mail
Does all the job handling and negotiation with the data handling system without the user knowing
• CDF batch job contains a tar ball with all the needed scripts, binaries and shared libraries and send tarball to output location• user need to authenticate with their kerberos ticket
CAF -evolution over time
CDF used several batch systems and distribution mechanisms• FBSNG• Condor• Condor with Globus• gLite WMS
CAF was able to be distributed, run on non-dedicated resources glite WMS helps to run on EGEE sites
Grid based
Used as Productionsystems
Condor-based GRID CAF
Collector
Userpriorities
Negotiator
Userjobs
Schedd
Globus
User Job
User Job
Grid nodes
Starter
StarterNegotiator assigns
nodes to jobs
Globus assignsnodes to VOs
Glide-ins
Pull Model
gLite WMS-based GRID CAF
Push Model
Userjobs
Schedd
Globus
User Job
User Job
Grid nodes
Resource Broker
Pros & ConsCondor based Grid CAFPros:• Globally managed user and job priorities within CDF• Broken nodes kill condor daemons, not user jobs• Resource selection done after a batch slot is secured
Cons:• Uses a single service proxy for all jobs to enter Grid sites• Requires outgoing connectivity
gLite WMS-based Grid CAFPros:• LCG-backed tools• No need for external connectivity• Grid sites can manage users
Cons:• No global fair share for CDF
gLite WMS-based GRID CAF
• at FNAL: CAF worker nodes used to have CDF software distribution NFS mounted, but not an option in the Grid world
• all production jobs are now self-contained
• trying Parrot to distribute CDF software over HTTP in analysis jobs
2610
1335
858
800
261
FNAL remote dedicatedresources
Condor basedGrid CAFs
LCGCafFermiGrid
avg. usable VMs (Virtual Machine)
Number of jobs on the CAF
Some numbers
Data handling system SAM
• SAM manages file storage– Data files are stored in tape systems at FNAL and elsewhere (most
use ENSTORE at FNAL)– Files are cached around the world for fast access
• SAM manages file delivery– Users at FNAL and remote sites retrieve files transparently out of file
storage. SAM handles caching for efficiency
• SAM manages file cataloging– SAM DB holds meta-data for each file transparent to the user
• SAM manages analysis bookkeeping– SAM remembers what files you ran over, what files you processed,
what applications you ran, when you ran them and where
world wide distribution of SAM stations
selected SAM stations
FNAL
CDF:
10k/20k Files declared/day
15k Files consumed/day
8 TByte of Files cons./day
main consumption of main consumption of datadata still central still central remote use on the riseremote use on the rise
test deployment
300Tb
Total CDF Files To User
summary & outlook
• UCL-HEP cluster deployed• UCL-CCC cluster still to come• need a better integration of SAM and the CAF• user feedback needs to be collated