View
6
Download
0
Category
Preview:
Citation preview
The LEGO Train Framework
Andrei Gheata
Costin Grigoras
Jan Fiete Grosse-Oetringhaus
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 2
Idea
• Manage trains using MonALISA – Users register wagons
– Train operators compose trains
• Automatic testing per wagon
• Train file generation
• Submission managed by ML (existing LPM infrastructure)
• Merging managed by LPM
• Aim: allow operators easy running of analysis trains (~weekly) getting output on the scale of 1-2 days
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 3
Configuration & Testing
• Train Configuration – New class AliAnalysisTaskCfg
• Contains description of wagons (add task macro, libraries, dependencies)
• See talk by Andrei on Monday
• Testing – Uses alientest04 machine
– Downloads AliEn packages (ROOT, AliRoot)
– Copies a part of the input data set to the local machine
– Runs tests per wagon
– Uses syswatch to extract mem/cpu information
– Tests also "base line" task which is empty
Base line
Phys Sel
Centr Sel
User A
User B
User C
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 4
Workflow
MonALISA
User
Train operator
Test machine
AliEn
1. adds wagons
2. composes train
4. recompose after test
3. generates test files + executes test
5. generates train jdl + scripts
6. runs train
config
test results
train files
LPM
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 5
Screenshot
Handler configuration
Wagon configuration
Data configuration
Testing and running status
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 6
Handler
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 7
Wagon
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 8
Dataset
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 9
Run
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 10
Syswatch
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 11
Operator Workflow
Select dataset
Select wagon
Start testing
Inspect output
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 12
Operator Workflow (2)
status of
analysis
status of
merging
intermediate
merging steps Submit final
merge job
(to be automatized)
final merging
status
check output
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 13
Demo…
• Enough theory, let's do some clicking…
http://alimonitor.cern.ch/trains
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 14
Some More Details
• Train runs with an analysis tag
– All code + "AddTask" macro has to be in the tag (no
par file!)
• Output per run stored in the input data directory
(like AOD, QA trains). E.g.: /alice/data/2010/LHC10h/000137366/ESDs/pass2/PWG4/
CorrelationTrain/7_20111117_1350
• All merged runs found in /alice/cern.ch/user/a/alitrain/PWG4/CorrelationTrain/
7_20111117_1350/merge
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 15
Operations
• After 10-12h most jobs are done (~90-98%) – Few running, few waiting
– This situation can persist for days killer for merging the output
– Solutions
• Kill jobs that have waited longer than X (being tested on the level of the LPM, better as a JDL tag)
• Remove CE requirement after a certain time (thx Latchezar for this idea), to be implemented
• Merge jobs have the same tails of few jobs that wait a long time – Ideas: same as above or run them on any CE (problem with
splitting, Pablo is investigating)
• Output available after ~2 days – 25% (real time) spend in running
– 75% in merging
– I believe this can still be improved!
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 16
Operations (visually…) Analysis jobs
Waiting
Running
Done
Error
Merging jobs
Waiting
Running
Done
Error
Analysis jobs
Waiting
Running
Done
Error
hours since submission
hours since submission
hours since submission
here we kill the remaining ones
80% done
in 4 hours
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 17
Current Trains
• Four active beta testers
– Jets (Christian KB)
– D2H (Zaida)
– Correlations in pp (Eva)
– Correlations in PbPb (JF)
• We got a lot of feedback, improved the system
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 18
TODO
• Graphs for CPU/Wall/Mem consumption of user
tasks as function of AliRoot tag
• Some improvements in the web interface
• Automatic launching of final job
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 19
Documentation
• Mailing list (for operators)
– alice-analysis-train-operators@cern.ch
• TWiki (Users + operators)
– https://twiki.cern.ch/twiki/bin/viewauth/ALICE/Analysis
Trains
Recommended