Pre-GDB on Batch Systems (Bologna)11 th March 2014 1 Torque/Maui PIC and NIKHEF experience C. Acosta-Silva, J. Flix, A. Pérez-Calero (PIC) J. Templon (NIKHEF)

Embed Size (px)

Citation preview

  • Slide 1
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 1 Torque/Maui PIC and NIKHEF experience C. Acosta-Silva, J. Flix, A. Prez-Calero (PIC) J. Templon (NIKHEF)
  • Slide 2
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 2 Outline System overview Successful experience (NIKHEF and PIC) Torque/Maui current situation Torque overview Maui overview Outlook
  • Slide 3
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 3 System overview TORQUE is a community and commercial effort based on OpenPBS project. It improves scalability, enables fault tolerance and many other features http://www.adaptivecomputing.com/products/open-source/torque/ Maui Cluster Scheduler is a job scheduler capable of supporting multiple scheduling policies. It is free and open- source software http://www.adaptivecomputing.com/products/open-source/maui/
  • Slide 4
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 4 System overview TORQUE/Maui system has the usual batch system capabilities: Queues definition (routing queues) Accounting Reservation/QOS/Partition FairShare Backfilling Handling of SMP and MPI jobs Multicore allocation and job backfilling ensure that Torque/Maui is capable of supporting multicore jobs
  • Slide 5
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 5 Succesful experience NIKHEF and PIC are multi-VO sites with local & Grid users Succesful experience during first LHC run with Torque/Maui system Currently, both are running Torque-2.5.13 + Maui-3.3.4 NIKHEF: 30% non-HEP, 55% WLCG, rest non-WLCG HEP or local jobs. Highly non-uniform workload 3800 jobs slots 97.5% utilization (last 12 months) 2000 waiting jobs (average)
  • Slide 6
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 6 Succesful experience NIKHEF: running jobs (last year) NIKHEF: queued jobs (last year)
  • Slide 7
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 7 Succesful experience PIC: 3% non-HEP, 83% Tier-1 WLCG, 12% ATLAS Tier-2, rest local jobs (ATLAS Tier-3, T2K, MAGIC,) 3500 jobs slots 95% approx utilization (last 12 months) 2500 waiting jobs (average)
  • Slide 8
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 8 Succesful experience PIC: running jobs (last year)
  • Slide 9
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 9 Succesful experience PIC: queued jobs (last year)
  • Slide 10
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 10 Torque overview Torque has a very active community: Mailing list: [email protected] Total free support from Adaptive Computing New releases each year (approx. or less) and frequent new patches 2.5.13 is the last release of branch 2.5.X
  • Slide 11
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 11 Torque overview
  • Slide 12
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 12 Torque overview Torque is well integrated with EMI middleware Vastly used in WLCG Grid sites (~75% of sites in BDii -pbs-) No complex to install, configure and manage: via qmgr tool plain text accounting Torque scalability issues Reported for branch 2.5.X Not detected at our scale Branch 4.2.X presents significant enhancements to scalability for large environments, responsiveness, reliability,
  • Slide 13
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 13 Maui overview Support: Maui is no longer supported by Adaptive Computing Documentation: Poor documentation causes initial complexity to install it Things do not always work like the documentation suggests Scalability issues: At ~8000 queued jobs, Maui hangs MAXIJOBS parameter can be adjusted to limit the number of jobs consider for scheduling This solves this issue (currently in production in NIKHEF)
  • Slide 14
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 14 Maui overview Moab is the non-free scheduler supported by Adaptive Computing and based in Maui Aims to increase the scalability It is a continued commercial support Configuration files are very similar to the ones in Maui: http://docs.adaptivecomputing.com/mwm/help.htm#a.kmauimigrate.html Feedback from sites running Torque/Moab would be a good complement to this review
  • Slide 15
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 15 Outlook Torque/Maui scalability issues Only relevant for larger sites feasible option for small-medium size sites Might be well solved in 4.2.X branch and tunning Maui options Actually, multicore jobs reduces the number of jobs to be handled by the system for sites that are predominantly WLCG (eg PIC at 95%), switching to a pure multicore load would further reduce scheduling issues at the site level. for sites that are much less WLCG dominated (eg Nikhef at 55%), a switch to pure multicore load might actually increase scheduling issues at the site level, as this move would remove much of the entropy which allows reaching 97% utilization. Another concern is the support for the systems, being Maui the weakest link for the Torque/Maui combination
  • Slide 16
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 16 Outlook Some future options Change from Maui to Moab (but, it is not free!) Setting up a kind of OpenMaui project within WLCG-sites as a community effort to provide support and improvements to Maui Integrate with another scheduler. Which one? Complete change to another system (SLURM, HTCondor, ) Do nothing until a real problem arrives Currently, just a worry, no real problem detected so far in PIC/NIKHEF Improvements from migrating to another system unclear
  • Slide 17
  • Pre-GDB on Batch Systems (Bologna)11 th March 2014 17 Outlook Questions: If decided for WLCG sites to move away from Torque/Maui, would it be feasible before the LHC Run2? Migration to a new batch system requires time and effort, thus manpower and expertise, in order to reach and adequate performance for a Grid site Not clear if needed before Run2 What happens with sites shared with non-WLCG VOs? Impact on other users (NIKHEF 45%) For PIC, several disciplines rely on local job submissions. A change on the batch system affects many users, and requires re-education, changes, and tests of their submission tools to adapt to an eventual new system