Pre-GDB on Batch Systems (Bologna)11 th March 2014 1 Torque/Maui PIC and NIKHEF experience C....
If you can't read please download the document
Pre-GDB on Batch Systems (Bologna)11 th March 2014 1 Torque/Maui PIC and NIKHEF experience C. Acosta-Silva, J. Flix, A. Pérez-Calero (PIC) J. Templon (NIKHEF)
Pre-GDB on Batch Systems (Bologna)11 th March 2014 1
Torque/Maui PIC and NIKHEF experience C. Acosta-Silva, J. Flix, A.
Prez-Calero (PIC) J. Templon (NIKHEF)
Slide 2
Pre-GDB on Batch Systems (Bologna)11 th March 2014 2 Outline
System overview Successful experience (NIKHEF and PIC) Torque/Maui
current situation Torque overview Maui overview Outlook
Slide 3
Pre-GDB on Batch Systems (Bologna)11 th March 2014 3 System
overview TORQUE is a community and commercial effort based on
OpenPBS project. It improves scalability, enables fault tolerance
and many other features
http://www.adaptivecomputing.com/products/open-source/torque/ Maui
Cluster Scheduler is a job scheduler capable of supporting multiple
scheduling policies. It is free and open- source software
http://www.adaptivecomputing.com/products/open-source/maui/
Slide 4
Pre-GDB on Batch Systems (Bologna)11 th March 2014 4 System
overview TORQUE/Maui system has the usual batch system
capabilities: Queues definition (routing queues) Accounting
Reservation/QOS/Partition FairShare Backfilling Handling of SMP and
MPI jobs Multicore allocation and job backfilling ensure that
Torque/Maui is capable of supporting multicore jobs
Slide 5
Pre-GDB on Batch Systems (Bologna)11 th March 2014 5 Succesful
experience NIKHEF and PIC are multi-VO sites with local & Grid
users Succesful experience during first LHC run with Torque/Maui
system Currently, both are running Torque-2.5.13 + Maui-3.3.4
NIKHEF: 30% non-HEP, 55% WLCG, rest non-WLCG HEP or local jobs.
Highly non-uniform workload 3800 jobs slots 97.5% utilization (last
12 months) 2000 waiting jobs (average)
Slide 6
Pre-GDB on Batch Systems (Bologna)11 th March 2014 6 Succesful
experience NIKHEF: running jobs (last year) NIKHEF: queued jobs
(last year)
Slide 7
Pre-GDB on Batch Systems (Bologna)11 th March 2014 7 Succesful
experience PIC: 3% non-HEP, 83% Tier-1 WLCG, 12% ATLAS Tier-2, rest
local jobs (ATLAS Tier-3, T2K, MAGIC,) 3500 jobs slots 95% approx
utilization (last 12 months) 2500 waiting jobs (average)
Slide 8
Pre-GDB on Batch Systems (Bologna)11 th March 2014 8 Succesful
experience PIC: running jobs (last year)
Slide 9
Pre-GDB on Batch Systems (Bologna)11 th March 2014 9 Succesful
experience PIC: queued jobs (last year)
Slide 10
Pre-GDB on Batch Systems (Bologna)11 th March 2014 10 Torque
overview Torque has a very active community: Mailing list:
[email protected] Total free support from Adaptive
Computing New releases each year (approx. or less) and frequent new
patches 2.5.13 is the last release of branch 2.5.X
Slide 11
Pre-GDB on Batch Systems (Bologna)11 th March 2014 11 Torque
overview
Slide 12
Pre-GDB on Batch Systems (Bologna)11 th March 2014 12 Torque
overview Torque is well integrated with EMI middleware Vastly used
in WLCG Grid sites (~75% of sites in BDii -pbs-) No complex to
install, configure and manage: via qmgr tool plain text accounting
Torque scalability issues Reported for branch 2.5.X Not detected at
our scale Branch 4.2.X presents significant enhancements to
scalability for large environments, responsiveness,
reliability,
Slide 13
Pre-GDB on Batch Systems (Bologna)11 th March 2014 13 Maui
overview Support: Maui is no longer supported by Adaptive Computing
Documentation: Poor documentation causes initial complexity to
install it Things do not always work like the documentation
suggests Scalability issues: At ~8000 queued jobs, Maui hangs
MAXIJOBS parameter can be adjusted to limit the number of jobs
consider for scheduling This solves this issue (currently in
production in NIKHEF)
Slide 14
Pre-GDB on Batch Systems (Bologna)11 th March 2014 14 Maui
overview Moab is the non-free scheduler supported by Adaptive
Computing and based in Maui Aims to increase the scalability It is
a continued commercial support Configuration files are very similar
to the ones in Maui:
http://docs.adaptivecomputing.com/mwm/help.htm#a.kmauimigrate.html
Feedback from sites running Torque/Moab would be a good complement
to this review
Slide 15
Pre-GDB on Batch Systems (Bologna)11 th March 2014 15 Outlook
Torque/Maui scalability issues Only relevant for larger sites
feasible option for small-medium size sites Might be well solved in
4.2.X branch and tunning Maui options Actually, multicore jobs
reduces the number of jobs to be handled by the system for sites
that are predominantly WLCG (eg PIC at 95%), switching to a pure
multicore load would further reduce scheduling issues at the site
level. for sites that are much less WLCG dominated (eg Nikhef at
55%), a switch to pure multicore load might actually increase
scheduling issues at the site level, as this move would remove much
of the entropy which allows reaching 97% utilization. Another
concern is the support for the systems, being Maui the weakest link
for the Torque/Maui combination
Slide 16
Pre-GDB on Batch Systems (Bologna)11 th March 2014 16 Outlook
Some future options Change from Maui to Moab (but, it is not free!)
Setting up a kind of OpenMaui project within WLCG-sites as a
community effort to provide support and improvements to Maui
Integrate with another scheduler. Which one? Complete change to
another system (SLURM, HTCondor, ) Do nothing until a real problem
arrives Currently, just a worry, no real problem detected so far in
PIC/NIKHEF Improvements from migrating to another system
unclear
Slide 17
Pre-GDB on Batch Systems (Bologna)11 th March 2014 17 Outlook
Questions: If decided for WLCG sites to move away from Torque/Maui,
would it be feasible before the LHC Run2? Migration to a new batch
system requires time and effort, thus manpower and expertise, in
order to reach and adequate performance for a Grid site Not clear
if needed before Run2 What happens with sites shared with non-WLCG
VOs? Impact on other users (NIKHEF 45%) For PIC, several
disciplines rely on local job submissions. A change on the batch
system affects many users, and requires re-education, changes, and
tests of their submission tools to adapt to an eventual new
system