26
New WestGrid Facilities: New WestGrid Facilities: Bugaboo, Checkers, Orcinus, Bugaboo, Checkers, Orcinus, Silo, Snowpatch Silo, Snowpatch A Technical Overview A Technical Overview Martin Siegert, SFU

New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

New WestGrid Facilities:New WestGrid Facilities:

Bugaboo, Checkers, Orcinus,Bugaboo, Checkers, Orcinus,

Silo, SnowpatchSilo, Snowpatch

A Technical OverviewA Technical Overview

Martin Siegert, SFU

Page 2: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

New SystemsNew Systems HardwareHardware

Compute Facilities

▬ cores

▬ memory

▬ interconnect

Storage Facilities

▬ size

Page 3: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

New SystemsNew Systems Software, SetupSoftware, Setup

Compute Facilities

▬ common usage: compilers, software libraries, MPI

▬ running jobs: rules, quota, etc.

▬ which facility to use for what?

Storage Facilities

▬ where to store data

▬ quota

Page 4: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

HardwareHardware Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus

Processors: Intel Xeon E54XX processorsbinary compatible – programs will run on all three systems (libraries!)

► Orcinus: 3.00 GHz, 3072 cores

► Bugaboo: 2.66 GHz, 1280 cores

► Checkers: 2.50 GHz, 1280 cores

Interconnect: Infiniband

► latency: 1.6µs

► bandwidth: 2GB/s

All nodes have 16 GB of memory

Page 5: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

General SetupGeneral Setup Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

Goal:

► it is unnecessary to set certain environment variables: PATH, LD_LIBRARY_PATH, etc.

► it is unnecessary to use specific system commands, e.g., xlf95, mpf90, etc.

► it is unnecessary to specify system directories

► everything should work without modifications

► it may be harmful the specify too much (e.g., full paths, system directories, etc.)

Page 6: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

SoftwareSoftware Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

General Setup

► do not specify full pathnames, e.g.,mpiexec siesta

should work.

► some applications (mostly licensed software) require loading a module, e.g.,module load amberpmemd

► pathnames may change, disappear

Page 7: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

CompilationCompilation Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

available compilers:

► GNU: gcc, g++, gfortran

► Intel: icc, icpc, ifort

► PGI: Orcinus only: pgcc, pgCC, pgf90

generic compilers: cc, c++, f77, f90

uses the default compiler: the “best” compiler for most applications; ––version shows real compiler

optimization: -O3 will work regardless of the compiler in use

all compilations are 64 bit by default.

Page 8: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Compilation: LibrariesCompilation: Libraries Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

do not set LD_LIBRARY_PATH

► very common source of mistakes

► need to remember at run time what you used at compile time

do not use -L/global/xyz/lib64 flags

Page 9: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Compilation: LibrariesCompilation: Libraries Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

simply specify -l<name_of_library>, e.g., -lfftw3 or -lgsl

use generic names:

-llapack instead of -lmkl_lapack

-lblas instead of -lmkl_intel_lp64 -lmkl_sequential -lmkl_core

☛ recommended: better performance!

you can use the command ldd myprog to see the libraries that got linked into myprog.

Page 10: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Compilation: SummaryCompilation: Summary Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

Keep it simple:cc -O3 -o myprog myprog.c -lfftw3 -llapack -lblas

f90 -O3 -o myprog myprog.f90 -lfftw3 -llapack -lblas

► uses default compiler

► will find the libraries

► will include the path to the libraries in the executable (no need for LD_LIBRARY_PATH)

Page 11: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Compilation: MPICompilation: MPI Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

generic compilers:

mpicc, mpicxx, mpif77, mpif90

use the default compilers

specific compilers:

mpigcc, mpig++ exist on bugaboo, checkers and snowpatch

better: module load gccmpif90 ... will use gfortran

module load intelmpicc ... will use icc

Page 12: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Compilation: MPICompilation: MPI Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

Everything else the same as for non-MPI compilation:

mpicxx -O3 -o myprog myprog.C -lfftw_mpi -lfftw

mpif90 -O3 -o myprog myprog.f90 -lblas

Page 13: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Running JobsRunning Jobs Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

Two ways of specifying the no. of processors in a job submission script:

#PBS -l procs=42

requests 42 cores anywhere on the facilityrecommended: leads to the by far shortest waiting times – no restrictions on the distribution of processes over nodes.

#PBS -l nodes=14:ppn=3

requests 14 (different!) nodes with exactly three cores on each of the nodes.

Page 14: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Running MPI JobsRunning MPI Jobs Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

You shall use mpiexec to run a MPI program.

not mpirun !

in job submission scripts simply

mpiexec myprog args

will work.

interactively (on headnode for test purposes)

mpiexec -n 2 myprog args

is required.

Page 15: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Where to Run?Where to Run? Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

Orcinus fastest, largest: 3.00GHz, 3072 coresbut available disk space small, I/O slow.

► jobs with little I/O requirements

► special software: Matlab, Lumerical

Checkers has medium size disk space and intermediate I/O performance

► special software: Gaussian

Bugaboo has large and high-performance disks: parallel Lustre filesystem

► jobs with large and/or high-performance storage needs

Page 16: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Where to Run?Where to Run? Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

Snowpatch has only GigE interconnect, is somewhat older and slower.

► jobs without requirement for high-performance interconnect

Check where which software is available:

http://www.westgrid.ca/support/software

Page 17: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Where to Run?Where to Run? Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

Job limits:

► maximum walltime:

◆ Bugaboo: 4 months◆ Checkers: 21 days◆ Orcinus: 10 days

► maximum processor-seconds (cumulative):

◆ Bugaboo: 1024 days◆ Checkers: 1280 days◆ Orcinus: 15000 days

Page 18: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Where to Run?Where to Run? Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

Still don't know?

► use the system that is the least busy – shortest waiting time.

Page 19: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Problems?Problems? Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch

Your program crashed? Oh no.

Please email us: [email protected]

But please give us some details:

At least:

► which facility you were working on

► the job id

► the submission script

► the error message that indicates the job failure

Page 20: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

ModulesModules Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus

Use the module command to change the default environment:

change the default compilers, e.g., from Intel compilers to GNU compilers:

module load gcc

change the version of a compiler:

module load intel/11.0.074

(check: mpif90 --versionifort (IFORT) 11.0 20081105)

Page 21: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Modules (cont)Modules (cont) Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus

make certain software packages available that are not in the default PATH

module load gaussian

change the MPI distribution:

module load mpi/intel

warning: this will not work with system compiled software – not recommended. In particular, mpiexec cannot figure out which MPI distribution was used at compile time.

Page 22: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Modules (cont)Modules (cont) Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus

show what is available

module avail

(a short list does not indicate that the software is not installed, it may just mean that the software is available by default)

show what is currently loaded

module list

getting back to the default setup

module unload gccmpif90 ––versionifort (IFORT) 11.1 20090511

Page 23: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Storage FacilitiesStorage Facilities Silo, BugabooSilo, Bugaboo

Silo ... a very fine system, highly recommended

► located at USask

► about 450 TB disk space

► /home file system:

◆ backed up◆ default quota: 500 GB

► /data file system:

◆ no backups◆ default quota: 1 TB

Page 24: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Storage Facilities (cont)Storage Facilities (cont) Silo, BugabooSilo, Bugaboo

Bugaboo ... dedicated to jobs with high I/O needs

► located at SFU

► 980 TB disk space – half of that allocated to ATLAS (LHC)

► /home file system:

◆ backed up◆ default quota: 300 GB

► /global/scratch file system:

◆ no backups◆ default quota: 1 TB

Page 25: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Storage Facilities (cont)Storage Facilities (cont) Silo, BugabooSilo, Bugaboo

Gridstore ... do not use!

► /vault, /data file systems: all files will be moved off

➡ to Silo by default➡ to Bugaboo on request

► /home file system:

➡ dedicated exclusively to jobs running on Snowpatch, Robson, Tantalus

Blackhole ... will disappear.

Page 26: New WestGrid Systems › files › webfm › seminar_docs › 2009...submission script: #PBS -l procs=42 requests 42 cores anywhere on the facility recommended: leads to the by far

Questions?Questions?