24
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

Embed Size (px)

Citation preview

Page 1: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

Running Jobs on Jacquard

An overview of interactive and batch computing, with comparsions to Seaborg

David TurnerNUG Meeting3 Oct 2005

Page 2: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

2

Topics

• Interactive– Serial– Parallel– Limits

• Batch– Serial– Parallel– Queues and Policies

• Charging• Comparison with Seaborg

Page 3: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

3

Execution Environment

• Four login nodes– Serial jobs only– CPU limit: 60 minutes– Memory limit: 64 MB

• 320 compute nodes– “Interactive” parallel jobs– Batch serial and parallel jobs– Scheduled by PBSPro

• Queue limits and policies established to meet system objectives

– User input is critical!

Page 4: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

4

Interactive Jobs

• Serial jobs run on login nodes– cd, ls, pathf90, etc.– ./a.out

• Parallel jobs run on compute nodes– Controlled by PBSPro

mpirun -np 16 ./a.out

qsub -I -q interactive -l nodes=8:ppn=2 % cd $PBS_O_WORKDIR

% mpirun -np 16 ./a.out

qsub -I -q batch -l nodes=32:ppn=2,walltime=18:00:00

Page 5: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

5

PBSPro

• Marketed by Altair Engineering– Based on open source Portable Batch

System developed for NASA– Also installed on DaVinci

• Batch scripts contain directives:#PBS -o myjob.out

• Directives may also appear as command-line options:qsub -o myjob.out …

Page 6: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

6

Simple Batch Script

#PBS -l nodes=8:ppn=2,walltime=00:30:00#PBS -N myjob #PBS -o myjob.out #PBS -e myjob.err #PBS -A mp999 #PBS -q debug #PBS -V

cd $PBS_O_WORKDIR mpirun -np 16 ./a.out

Page 7: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

7

Useful PBS Options (1)

-A repoCharge this job to repository repoDefault: Your default repository

-N jobnameProvide name for job; up to 15 printable, non-

whitespace charactersDefault: Name of batch script

-q qnameSubmit job to batch queue qnameDefault: batch

Page 8: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

8

Useful PBS Options (2)

-S shellSpecify shell as the scripting language

Default: Your login shell

-VExport current environment variables into the

batch job environment

Default: Do not export

Page 9: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

9

Useful PBS Options (3)

-o outfileWrite STDOUT to outfileDefault: <jobname>.o<jobid>

-e errfileWrite STDERR to errfileDefault: <jobname>.e<jobid>

-j [eo|oe]Join STDOUT and STDERR on STDOUT (eo)

or STDERR (oe)Default: Do not join

Page 10: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

10

Useful PBS Options (4)

-m [a|b|e|n]E-main notification

a = send mail when job aborted by system

b = send mail when job begins

e = send mail when job ends

n = do not send mail

Options a, b, and e may be combined

Default: a

Page 11: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

11

Batch Queues

Submit Execute Nodes Walltime

interactive interactive 1 – 16 30 mins

debug debug 1 – 32 30 mins

batch

batch16 1 – 16 48 hours

batch32 17 – 32 24 hours

batch64 33 – 64 12 hours

batch128 65 – 128 6 hours

batch256 129 – 256 6 hours

low low 1 – 64 6 hours

Page 12: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

12

Batch Queue Policies

• Each user may have:– One running interactive job– One running debug job– Four jobs running over entire system

• Only one batch128 job is allowed to run at a time.

• The batch256 queue usually has a run limit of zero. NERSC staff will arrange to run jobs of this size.

Page 13: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

13

Submitting Batch Jobs

% qsub myjob

93935.jacin03

%

• Record jobid for tracking!

Page 14: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

14

Deleting Batch Jobs

% qdel 93935.jacin03

%

Page 15: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

15

Monitoring Batch Jobs (1)

• PBS command qstat % qstatJob id Name User Time Use S Queue---------------- ---------------- ---------------- -------- - -----93295.jacin03-ib job5 einstein 00:00:00 R batch1693894.jacin03 EV80fl02_3 legendre 0 H batch16

93330.jacin03 test.script laplace 00:00:23 R batch32

93897.jacin03 runlu8x8 rasputin 0 Q batch3293334.jacin03-m mtp_mg_3wat_o2a fibonacci 00:00:11 R batch16...

• Use -u option for single-user output% qstat -u einsteinJob id Name User Time Use S Queue---------------- ---------------- ---------------- -------- - -----93295.jacin03-ib job5 einstein 00:00:00 R batch16%

Page 16: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

16

Monitoring Batch Jobs (2)

• NERSC command qs% qs

JOBID ST USER NAME NDS REQ USED SUBMIT

93939 R gauss STDIN 1 00:30:00 00:10:43 Oct 2 16:47:00

93891 R einstein runlu4x8 16 01:00:00 00:38:48 Oct 2 15:23:36

93918 R inewton r4_16 8 01:00:00 00:10:37 Oct 2 15:36:35

... 93785 Q inewton r4_64 32 01:00:00 - Oct 2 08:42:36

93828 Q rasputin nodemove 64 00:05:00 - Oct 2 12:00:11

93897 Q einstein runlu8x8 32 01:00:00 - Oct 2 15:24:27

... 93893 H legendre EV80fl02_2 4 03:00:00 - Oct 2 15:24:23

93894 H legendre EV80fl02_3 4 03:00:00 - Oct 2 15:24:24

93917 H legendre EV80fl98_5 4 03:00:00 - Oct 2 15:26:06

...

• Also provides -u option

Page 17: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

17

Monitoring Batch Jobs (3)

• NERSC website has current queue look:http://www.nersc.gov/nusers/status/jacquard/qstat

• Also has completed jobs list:http://www.nersc.gov/nusers/status/jacquard/pbs_summary

• Numerous filtering options available– Owner– Account– Queue– Jobid

Page 18: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

18

Charging

• Machine charge factor (cf) = 4– Based on benchmarks and user applications– Currently under review

• Serial interactive– Charge = cf • cputime– Always charged to default repository

• All parallel– Charge = cf • 2 • nodes • walltime– Charged to default repo unless -A specified

Page 19: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

19

Things To Look Out For (1)

• Do not set group write permission for your home directory; it will prevent PBS from running your jobs.

• Library modules must be loaded at runtime as well as linktime.

• Propagation of environment variables to remote processes is incomplete; contact NERSC consulting for help.

Page 20: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

20

Things To Look Out For (2)

• Do not run more that one MPI program in a single batch script.

• If your login shell is bash, you may see:accept: Resource temporarily unavailable

done.

In this case, specify a different shell using the -S directive, such as:#PBS -S /usr/bin/ksh

Page 21: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

21

Things To Look Out For (3)

• Batch jobs always start in $HOME. To get to directory where job was submitted:cd $PBS_O_WORKDIR

For jobs that work with large files:cd $SCRATCH/some_subdirectory

• PBS buffers output and error files until job completes. To view files (in home directory) while running:-k oe

Page 22: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

22

Things To Look Out For (3)

• The following is just a warning and can be ignored:Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.

Page 23: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

23

LoadLeveler vs. PBS

LL PBS LL PBS#@ node #PBS -l nodes #@

notification#PBS -m

#@ tasks_per_node

#PBS -l ppn #@ shell #PBS -S

#@ wall_clock_limit

#PBS -l walltime #@ output #PBS -o

#@ class #PBS -q #@ error #PBS -e

#@ job_name #PBS -N #@ environment

#PBS -V

#@ account_no #PBS -A

Page 24: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

24

Resources

• NERSC Websitehttp://www.nersc.gov/nusers/resources/jacquard/running_jobs.php

http://www.nersc.gov/vendor_docs/altair/PBSPro_7.0_User_Guide.pdf

• NERSC Consulting

    1-800-66-NERSC, menu option 3, 8 am - 5 pm, Pacific time     (510) 486-8600, menu option 3, 8 am - 5 pm, Pacific time     [email protected]     http://help.nersc.gov/