28
Converting Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science July 31, 2014

Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Embed Size (px)

Citation preview

Page 1: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Converting Your

(Simple) Job Scripts

from PBS to SLURM

on discover

NASA Center for Climate Simulation

High Performance Science

July 31, 2014

Page 2: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Introduction

• Portable Batch System

–Developed at Ames for NASA

–Commercial version: PBS Pro (Altair

Engineering)

• Simple Linux Utility for Resource

Management

–Developed at LLNL

–Open-source (supported by SchedMD)

• PBS->SLURM on discover in October 2013. NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 2

Page 3: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

What’s the difference?

• Concepts and commands have new names.

• Overall script design remains essentially the

same.

• A PBS “queue” is equivalent to a SLURM

“partition”.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 3

Page 4: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Why did we switch?

• Quality of Service (QoS)

– Eliminates need for dedicated queues

• Great reduction in cost

• But PBS is still used at NAS….

– … so we use a PBS emulation layer.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 4

Page 5: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

PBS emulation with SLURM

• SchedMD provided wrapper scripts (in Perl).

• We modified the wrappers for discover.

• Most changes were folded back into baseline.

• Wrapped tools: qsub, qalter, qdel,

qhold, qrerun, qrls, qstat, xsub

• Wrappers handle command-line options only.

• #PBS script directives are translated to

#SBATCH and processed by sbatch.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 5

Page 6: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Emulation “gotchas”

• Not all PBS features can be emulated.

• SLURM exports user environment by default.

• SLURM runs in the current directory.

• SLURM combines stdout and stderr.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 6

Page 7: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Batch job submission

• For simple cases, just replace qsub with

sbatch.

$ qsub myjob.sh

becomes

$ sbatch myjob.sh

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7

Page 8: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Naming your job

• Naming the job makes it easier to find.

#PBS -N job_name

becomes

#SBATCH -J job_name

or

#SBATCH --job-name=job_name

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 8

Page 9: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Specifying the account

• Make sure the proper account is charged.

#PBS -A account_name

becomes

#SBATCH -A account_name

or

#SBATCH --account=account_name

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 9

Page 10: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Specifying the partition

• Only if you have to ….

#PBS -q destination

becomes

#SBATCH -p destination

or

#SBATCH --partition=destination

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 10

Page 11: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Specifying the number of nodes

• Specify how many nodes you need.

#PBS -l select=num

becomes

#SBATCH -N num

or

#SBATCH --nodes=num

• A range can also be specified as nmin-nmax.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 11

Page 12: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Specifying processes per node

• Use one process per core on each node by

default, but may want less.

#PBS -l mpiprocs=num

becomes

#SBATCH --ntasks-per-node=num

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 12

Page 13: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Specifying processor type

• Choices are:

– Sandy Bridge (sand – 16 cores/node)

– Westmere (west – 12 cores/node)

#PBS -l proc=proc_type

becomes

#SBATCH -C proc_type

or

#SBATCH --constraint=proc_type

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 13

Page 14: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

stdout and stderr streams

• Specify where the output streams are written.

#PBS -o opath -e epath

becomes

#SBATCH -o opath -e epath

or

#SBATCH --output=opath --error=epath

• Streams are joined in SLURM by default (./slurm-

NNNNNNN.out), which required -j oe or -j eo

in PBS. NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 14

Page 15: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Mail notification

• Use to get a message when your job is done, or when something bad happens…

#PBS -M user_list

becomes

#SBATCH --mail-type=type

#SBATCH --mail-user=user

• Type can be BEGIN, END, FAIL, ALL (any state change).

• Default user is the submitter.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 15

Page 16: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Your working directory

• PBS jobs ran in a spool directory.

• SLURM jobs run in the current directory.

• Can be changed with cd command, or:

#SBATCH -D path

or

#SBATCH --workdir=path

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 16

Page 17: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Exporting environment variables

• PBS exported nothing by default.

• SLURM exports everything by default.

• Change with one or more of:

#SBATCH --export=names

#SBATCH --export=ALL

#SBATCH --export=NONE

#SBATCH --export-file=path

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 17

Page 18: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Threads and MPI

• Set up and run threads as you always have.

• mpirun/mpiexec/mpiexec.hydra are

not part of PBS, so no changes needed.

• The SLURM tool srun provides additional

features that are SLURM-specific.

– Provides features similar to those of MPI tools.

– Differences in job step control and signal

propagation.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 18

Page 19: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

A simple example

• User inigo has an old PBS script:

–The job name is revenge.

–Runs in the default PBS queue.

–Runs the application sword.x.

–Uses 8 Westmere nodes

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 19

Page 20: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

The simple script (PBS)

#PBS –N revenge

#PBS –l select=8:proc=west

mpirun sword.x

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 20

Page 21: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

The simple script (SLURM)

#SBATCH --job-name=revenge

#SBATCH --nodes=8

#SBATCH --constraint=west

mpirun sword.x

# Could also use:

# srun sword.x

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 21

Page 22: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

The simple results…

• Program runs in current directory, not the

spool directory.

• User environment is exported.

• Standard output and standard error together in ./slurm-NNNNNNN.out.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 22

Page 23: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

A not-so-simple example

• User westley has an old PBS script:

–The job name is pirate.

–Charge the account roberts.

–Runs in the PBS queue dread.

–Uses 12 Sandy Bridge nodes.

–Uses 8 cores per node.

–Export only the variable BUTTERCUP.

–Runs the application sword.x.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 23

Page 24: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

The NSS Script (PBS)

#PBS –N pirate

#PBS –A roberts

#PBS –q dread

#PBS –l

select=12:proc=sand:mpiprocs=8

#PBS –v BUTTERCUP

mpirun sword.x

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 24

Page 25: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

The NSS Script (SLURM)

#SBATCH --job-name=pirate

#SBATCH --account=roberts

#SBATCH --partition=dread

#SBATCH --nodes=12

#SBATCH --constraint=sand

#SBATCH --ntasks-per-node=8

#SBATCH --export=NONE,BUTTERCUP

mpirun sword.x

# or

# srun sword.x

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 25

Page 26: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

The NSS results…

• Program runs in current directory, not the spool directory.

• User environment is exported.

• Standard output and standard error together in ./slurm-NNNNNNN.out.

• NOTE: If you have an environment variable named NONE, and use --export=NONE, nothing is exported. But if you have NONE, and use --export=NONE,OTHER, NONE and OTHER are exported with everything else! So don’t do that….

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 26

Page 27: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Much more to come…

• Using mpirun/mpiexec/mpiexec.hydra

vs. using srun.

– Differing behavior for signal propagation and job

control commands.

• Job dependencies with strigger.

• Copying files with sbcast.

• Attaching to running jobs with sattach.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 27

Page 28: Converting Your (Simple) Job Scripts from PBS to … Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science ... –Uses

Questions?

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 28