Upload
ferris-kirkland
View
30
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Nurcan Ozturk University of Texas at Arlington SCHOOL ON HEP@TR-GRID April 30 – May 2, 2008 Turkish Atomic Energy Authority (TAEA), Ankara, Turkey. Distributed Analysis With pathena. Outline. Part I – Information on pathena Introduction How pathena works - PowerPoint PPT Presentation
Citation preview
Nurcan Ozturk
University of Texas at Arlington
SCHOOL ON HEP@TR-GRID
April 30 – May 2, 2008
Turkish Atomic Energy Authority (TAEA), Ankara, Turkey
Distributed Analysis With pathena
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 2
Outline
Part I – Information on pathena Introduction
How pathena works
What type of jobs pathena can run
pathena usage
What happens when submitting jobs
pathena options
Monitoring pathena jobs
Bookkeeping & retry
User support
Part II – pathena Tutorial
Based on “Distributed Analysis on Panda” Twiki page:https://twiki.cern.ch/twiki/bin/view/Atlas/DAonPanda
Part I – Information on pathena
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 4
Introduction
PanDA = Production ANd Distributed Analysis system pathena: Client tool for PanDA
Submits user-defined jobs A consistent user-interface to Athena users. Works on Athena runtime
environment. Runs at the sites in OSG and LCG.
Requirements to run pathena jobs: Athena
Any release version Kit or AFS
GRID User Interface (UI) LCG UI VDT NG UI
Join ATLAS VO All ATLAS VO members
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 5
Job(production or analysis)
output dataset (user dataset)
filefile
transformationtransformation
filefile
input dataset (official or user dataset)
filefile
filefile
No essential difference between production jobs and analysis jobs. Analysis jobs run on the same infrastructure as production jobs. The
infrastructure is always maintained by the production operations team. User dataset can be accessed via DDM (using DQ2 end-user tools). Analysis jobs and production jobs run on separated computing-resources:
Analysis queues: Short queue (wall-time limit = 90min) Long queue (wall-time limit = 8 hours)
Analysis jobs don’t have to compete with production jobs.
How pathena Works - Datasets
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 6
What Type of Jobs pathena Can Run
pathena can run all kinds of Athena jobs: All production steps
Event generation Simulation, Pileup Digitization Reconstruction Merge Analysis
Arbitrary package configuration Add new packages Modify cmt/requirements in any package
Customize source code and/or jobOption Multiple-input streams
For instance signal + minimum-bias
TAG/AANT-based analysis Protection against corrupted/missing files
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 7
pathena Usage
When running athena:
$ athena MyJobOptions.py
All you need to do is:
$ pathena MyJobOptions.py –-inDS inputDataset --outDS outputDataset
Nothing special to submit your Athena job to GRID using pathena. athena -> pathena Add inDS and outDS
The user doesn’t have to modify jobOption file when submitting jobs. Jobs go to data – you need to know where your data is. Analysis jobs
don’t trigger data transfer across GRIDs.
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 8
pathena Options
pathena [--inDS InputDataset] [--outDS OutputDataset] [--minDS MinimumBiasDataset] [--cavDS CavernDataset] [--split N]
[--site SiteName] [--nfiles N] [--nFilesPerJob N] [--nEventsPerJob N]
[--nSkipFiles N] [--official] [--extFile files] [--libDS LibraryDataset] [--long] [--blong] [--nobuild] [--memory MemorySize] [--tmpDir tmpDirName]
[--shipInput] [--fileList files] [--addPoolFC files] [--skipScan]
[--removeFileList filename] [--inputFileList filename] [--inputType types]
[-p bootstrap] [-c command] <jobOption1.py> [<jobOption2.py> [...]]
Please see what you can do using these options on the twiki page:https://twiki.cern.ch/twiki/bin/view/Atlas/DAonPanda#options
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 9
What Happens When Submitting Jobs
• archive user's work directory • send the archive to Panda • extract job configuration from jobOs • define jobs automatically • submit jobs
builds the athena environment at the remote site. It produces a library dataset.
runs athena and produces the output files
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 10
Monitoring pathena Jobs
http://pandamon.usatlas.bnl.gov:25880/server/pandamon/query
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 11
Bookkeeping & Retry
pathena has utilities to see the status/details of the jobs submitted and retry the failed ones for instance:
$ pathena_util
>>> show()
======================================
JobID : 8239
time : 2008-04-29 03:29:07
inDS : fdr08_run1.0003051.StreamEgamma.merge.AOD.o1_r6_t1
outDS : user.NurcanOzturk.HighPtView.StreamEgamma.AtlasInAnkara
libDS : user.NurcanOzturk.lxplus205_0.lib._008239
build : 10676339
run : 10676340
jobO : -c "Mode=['FullReco'];DetailLevel=['FullStandardAOD'];Branches= ['StacoTauRec']" MyJobOptions.py
site : ANALY_SWT2_CPB
>>> status(8239)
>>> retry(8239)
>>> kill(8239)
>>> help()
Press Ctl-D to exit
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 12
User Support
PanDA Savannah page – report problems/errors:
https://savannah.cern.ch/projects/panda/
PanDA/pathena hypernews forum: discussions, sharing experience, helping each other:
https://hypernews.cern.ch/HyperNews/Atlas/get/pandaPathena.html
See the production shift elog (electronic logbook) for system wide or site level problems, maintenances, outages. It is linked from main PanDA monitor page:
http://atlas003.uta.edu:8080/ADCoS/?mode=summary
Part II - Pathena Tutorial
Available at:http://www.usatlas.bnl.gov/twiki/bin/view/AtlasSoftware/PathenaOnFDRData
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 14
Goal
Learn how to submit an analysis job: Setup athena
Check out PandaTools package (for pathena)
Use HighPtView package as an analysis package
Find the data (we will run on FDR data)
Find out which analysis queue will be used
Submit a pathena job
Monitor job’s status in PanDA monitor
Get the output of the job and make plots
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 15
Setup Athena and Work Area
Instructions are given to run on lxplus machines at CERN Create a directory (called AtlasInAnkara) and get the requirements file from next
page Make a sub-directory for 13.0.40 (called 13.0.40) under AtlasInAnkara Setup CMT:
source /afs/cern.ch/sw/contrib/CMT/v1r20p20070208/mgr/setup.sh cmt config
Setup athena for release 13.0.40: source setup.sh -tag=13.0.40,32,groupArea (32 is complier version gcc323)
Check out Tools/Scripts package to setup your work area (easy way of checking out and compiling multiple packages) cd 13.0.40 cmt co -r Scripts-00-01-14 Tools/Scripts
Setup work area and create run area: ./Tools/Scripts/share/setupWorkArea.py cd WorkArea/cmt cmt bro cmt config cmt bro gmake source setup.sh
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 16
Example File - requirements
#############################################################set CMTSITE CERNset SITEROOT /afs/cern.chmacro ATLAS_DIST_AREA ${SITEROOT}/atlas/software/dist
macro ATLAS_GROUP_AREA "/afs/cern.ch/atlas/groups/PAT/Tutorial/EventViewGroupArea/EVTags-13.0.40.323"
apply_tag simpleTestapply_tag oneTest
macro ATLAS_TEST_AREA "" \ 13.0.40 "${HOME}/public/AtlasInAnkara/13.0.40"
use AtlasLogin AtlasLogin-* $(ATLAS_DIST_AREA)############################################################
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 17
Check Out Necessary Packages
Check out PandaTools for pathena: cd to 13.0.40 directory cmt co PhysicsAnalysis/DistributedAnalysis/PandaTools
Check out HighPtView package: cmt co –r HighPtView-00-01-10 PhysicsAnalysis/HighPtPhys/HighPtView
Check out EventViewConfiguration: cmt co –r EventViewConfiguration-00-01-13
PhysicsAnalysis/EventViewBuilder/EventViewConfiguration
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 18
Compile and Make a jobOption File
Run every time new package(s) checked out: ./Tools/Scripts/share/setupWorkArea.py
It prints:################################################################################WorkAreaMgr : INFO Creating a WorkArea CMT package under: [/afs/cern.ch/user/n/nozturk/public/AtlasInAnkara/13.0.40]WorkAreaMgr : INFO Scanning [/afs/cern.ch/user/n/nozturk/public/AtlasInAnkara/13.0.40]WorkAreaMgr : INFO Found 4 packages in WorkAreaWorkAreaMgr : INFO => 0 package(s) in suppression listWorkAreaMgr : INFO Generation of WorkArea/cmt/requirements done [OK]WorkAreaMgr : INFO ################################################################################
Compile PandaTools package from WorkArea: cd WorkArea/cmt cmt bro cmt config cmt bro gmake source setup.sh
Go to run area and get the jobOption file from HighPtView package: cd ../run get_files HighPtViewNtuple_topOptions.py
Make a jobOption file for details of the job, called MyJobOption.py See next page for an example file
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 19
Example File – MyJobOptions.py
import os
print os.environ["CMTPATH"]
InserterConfiguration={} # Always need this lineInserterConfiguration["Electron"]={} # Need such for every item you will modifyInserterConfiguration["Electron"]["FullReco"]=[{"Name":"ElMedium"}]
#DoTrigger=TrueTriggerView=Trueinclude("HighPtView/HighPtViewNtuple_topOptions.py")include("AthenaPoolCnvSvc/ReadAthenaPool_jobOptions.py")ServiceMgr.PoolSvc.SortReplicas=Truefrom DBReplicaSvc.DBReplicaSvcConf import DBReplicaSvcServiceMgr+=DBReplicaSvc()ServiceMgr.DBReplicaSvc.UseCOOLSQLite=False# fix for stream and DPDs by AttilaInserterConfiguration.update({ "CommonParameters": { "DoPreselection":False, "CheckOverlap":False } })
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 20
Setup Grid and DQ2, Find FDR Datasets
Setup Grid: source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid_env.sh
Setup DQ2: source /afs/cern.ch/atlas/offline/external/GRID/ddm/endusers/setup.sh.CERN
Look at available FDR datasets at Tier2’s from Panda monitor: http://gridui02.usatlas.bnl.gov:25880/server/pandamon/query?mode=listFDR Pick up one dataset:
fdr08_run1.0003051.StreamEgamma.merge.AOD.o1_r6_t1
One can also list the replicas for a given dataset: source /afs/usatlas.bnl.gov/Grid/Don-Quijote/DQ2_0_3_client/dq2.sh dq2-list-dataset-replicas fdr08_run1.0003051.StreamEgamma.merge.AOD.o1_r6_t1 INCOMPLETE: DESY-ZNINCOMPLETE: DESY-ZN
COMPLETE: BNLXRDHDD1,SARA-MATRIX_DATADISK,RAL-LCG2_DATADISK,IN2P3-COMPLETE: BNLXRDHDD1,SARA-MATRIX_DATADISK,RAL-LCG2_DATADISK,IN2P3-CC_DATADISK,RALPP,SLACXRD,LIP-LISBON,TAIWAN-LCG2_DATADISK,NDGF-CC_DATADISK,RALPP,SLACXRD,LIP-LISBON,TAIWAN-LCG2_DATADISK,NDGF-T1_DATADISK,IFICDISK,WISC,TOKYO-T1_DATADISK,IFICDISK,WISC,TOKYO-LCG2_DATADISK,MWT2_IU,LIV,ICL,PIC_DATADISK,BU_DDM,TIER0TAPE,INFN-LCG2_DATADISK,MWT2_IU,LIV,ICL,PIC_DATADISK,BU_DDM,TIER0TAPE,INFN-T1_DATADISK,DESY-HH,JINR,CYF,IJST2,TRIUMF-LCG2_DATADISK,FZK-T1_DATADISK,DESY-HH,JINR,CYF,IJST2,TRIUMF-LCG2_DATADISK,FZK-LCG2_DATADISK,TORON,PNPI,AGLT2_SRM,BNL-OSG2_DATADISK,SWT2_CPB,LNF,TW-LCG2_DATADISK,TORON,PNPI,AGLT2_SRM,BNL-OSG2_DATADISK,SWT2_CPB,LNF,TW-FTT,OU,MWT2_UCFTT,OU,MWT2_UC
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 21
Name Association Between DDM and Analysis Queue Names
DDM Name Analysis Queue Name
SWT2_CPB ANALY_SWT2_CPB
OU ANALY_OU_OCHEP_SWT2
AGLT2_SRM ANALY_AGLT2
MWT2_UC ANALY_MWT2
SLACXRD ANALY_SLAC
BU_DDM ANALY_NET2
WISC ANALY_GLOW-ATLAS
Analysis queues available in the US. For more queues see next page.
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 22
Analysis Queues from Panda Monitor
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 23
Run pathena (1)
Run pathena with one line command:
$ pathena -c "Mode=['FullReco'];DetailLevel=['FullStandardAOD']; Branches= ['StacoTauRec']" MyJobOptions.py --inDS fdr08_run1.0003051.StreamEgamma.merge.AOD.o1_r6_t1 --outDS user.NurcanOzturk.HighPtView.StreamEgamma.AtlasInAnkara --nfiles 1 --site ANALY_SWT2_CPB
HighPtView options: Mode=['FullReco'];DetailLevel=['FullStandardAOD']; Branches= ['StacoTauRec']"
pathena options: Specify input dataset by --inDS Specify output dataset by --outDS Specify # of files to be run on by --nfiles 1 Specify the analysis queue name by --site siteName
More pathena options are available at: https://twiki.cern.ch/twiki/bin/view/Atlas/DAonPanda#synopsis
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 24
Run pathena (2)
The following will be printed on the screen:
Your identity: /DC=org/DC=doegrids/OU=People/CN=Nurcan Ozturk 155817Enter GRID pass phrase for this identity:Creating proxy ........................................... DoneYour proxy is valid until: Tue Apr 29 15:24:55 2008extracting run configurationConfigExtractor > No InputConfigExtractor > Output=AANT EVAANtupleDump0Stream AANT0archive sourcesarchive InstallAreapost sources/jobOquery files in dataset:fdr08_run1.0003051.StreamEgamma.merge.AOD.o1_r6_t1submit=================== JobID : 8239 Status : 0 > build PandaID=10676339 > run PandaID=10676340
builds the athena environment at the remote site.It produces a library dataset.
runs athena and produces the output files
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 25
Monitor Job’s Status in PanDA Monitor (1)
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 26
Monitor Job’s Status in PanDA Monitor (2)
Go to “List users” link at the right top corner of PanDA monitor:http://gridui02.usatlas.bnl.gov:25880/server/pandamon/query?ui=users&sort=latest
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 27
Monitor Job’s Status in PanDA Monitor (3)
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 28
Monitor Job’s Status in PanDA Monitor (4)
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 29
Examine Log Files In Case Of Problems
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 30
Retrieve Results and Make Plots
Use dq2 client tools to retrieve the output dataset: dq2_get –rv user.NurcanOzturk.HighPtView.StreamEgamma.AtlasInAnkara
This copies the output files: user.NurcanOzturk.HighPtView.StreamEgamma.AtlasInAnkara.AANT0._00001.root
user.NurcanOzturk.HighPtView.StreamEgamma.AtlasInAnkara._10676340.log.tgz
Open the file in root and make some plots: root user.NurcanOzturk.HighPtView.StreamEgamma.AtlasInAnkara.AANT0._00001.root
root [1] FullRec0->GetListOfLeaves()->Print();
root [2] FullRec0->Draw("El_N", "El_N>0");
root [3] FullRec0->Draw("El_p_T", "El_N>0");
root [4] FullRec0->Draw("Jet_C4_N", "Jet_C4_N>0");
root [5] FullRec0->Draw("Jet_C4_p_T", "Jet_C4_N>0");
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 31
Some Plots – Number of Electrons and Transverse Momentum of Electrons
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 32
HighPtView DPD’s Made From FDR-1 Data
Alden and Amir at UTA made DPD’s using HighPtView package on all FDR data for SWT2 physics analysis groups.
You can get them by dq2_get if you are interested in looking at: dq2_ls user.AldenStradling.fdr08*HPTV_NOR (overlap removal off)
dq2_ls user.AldenStradling.fdr08*HPTV_OR (overlap removal on)
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 33
Future Developments with pathena
Automatic redirection of analysis jobs within a cloud. Namely, no need to specify site - pathena will choose the best site based on data availability and available CPU's.
May 2, 2008May 2, 2008Nurcan OzturkNurcan Ozturk 34
References
FDR datasets available at Tier2’s: http://gridui02.usatlas.bnl.gov:25880/server/pandamon/query?mode=listFDR
pathena wiki page “Distributed Analysis on Panda”: https://twiki.cern.ch/twiki/bin/view/Atlas/DAonPanda
How to submit same pathena job on multiple datasets: https://twiki.cern.ch/twiki/bin/view/Atlas/
DAonPanda#example_6_re_submit_the_same_ana
HighPtView wiki page: https://twiki.cern.ch/twiki/bin/view/Atlas/HighPtView
Wiki pages by Akira Shibata on FDR Analysis: https://twiki.cern.ch/twiki/bin/view/Atlas/TopFDR
https://twiki.cern.ch/twiki/bin/view/Atlas/TopFdrPanda