Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
New WestGrid Facilities:New WestGrid Facilities:
Bugaboo, Checkers, Orcinus,Bugaboo, Checkers, Orcinus,
Silo, SnowpatchSilo, Snowpatch
A Technical OverviewA Technical Overview
Martin Siegert, SFU
New SystemsNew Systems HardwareHardware
Compute Facilities
▬ cores
▬ memory
▬ interconnect
Storage Facilities
▬ size
New SystemsNew Systems Software, SetupSoftware, Setup
Compute Facilities
▬ common usage: compilers, software libraries, MPI
▬ running jobs: rules, quota, etc.
▬ which facility to use for what?
Storage Facilities
▬ where to store data
▬ quota
HardwareHardware Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus
Processors: Intel Xeon E54XX processorsbinary compatible – programs will run on all three systems (libraries!)
► Orcinus: 3.00 GHz, 3072 cores
► Bugaboo: 2.66 GHz, 1280 cores
► Checkers: 2.50 GHz, 1280 cores
Interconnect: Infiniband
► latency: 1.6µs
► bandwidth: 2GB/s
All nodes have 16 GB of memory
General SetupGeneral Setup Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
Goal:
► it is unnecessary to set certain environment variables: PATH, LD_LIBRARY_PATH, etc.
► it is unnecessary to use specific system commands, e.g., xlf95, mpf90, etc.
► it is unnecessary to specify system directories
► everything should work without modifications
► it may be harmful the specify too much (e.g., full paths, system directories, etc.)
SoftwareSoftware Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
General Setup
► do not specify full pathnames, e.g.,mpiexec siesta
should work.
► some applications (mostly licensed software) require loading a module, e.g.,module load amberpmemd
► pathnames may change, disappear
CompilationCompilation Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
available compilers:
► GNU: gcc, g++, gfortran
► Intel: icc, icpc, ifort
► PGI: Orcinus only: pgcc, pgCC, pgf90
generic compilers: cc, c++, f77, f90
uses the default compiler: the “best” compiler for most applications; ––version shows real compiler
optimization: -O3 will work regardless of the compiler in use
all compilations are 64 bit by default.
Compilation: LibrariesCompilation: Libraries Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
do not set LD_LIBRARY_PATH
► very common source of mistakes
► need to remember at run time what you used at compile time
do not use -L/global/xyz/lib64 flags
Compilation: LibrariesCompilation: Libraries Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
simply specify -l<name_of_library>, e.g., -lfftw3 or -lgsl
use generic names:
-llapack instead of -lmkl_lapack
-lblas instead of -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
☛ recommended: better performance!
you can use the command ldd myprog to see the libraries that got linked into myprog.
Compilation: SummaryCompilation: Summary Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
Keep it simple:cc -O3 -o myprog myprog.c -lfftw3 -llapack -lblas
f90 -O3 -o myprog myprog.f90 -lfftw3 -llapack -lblas
► uses default compiler
► will find the libraries
► will include the path to the libraries in the executable (no need for LD_LIBRARY_PATH)
Compilation: MPICompilation: MPI Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
generic compilers:
mpicc, mpicxx, mpif77, mpif90
use the default compilers
specific compilers:
mpigcc, mpig++ exist on bugaboo, checkers and snowpatch
better: module load gccmpif90 ... will use gfortran
module load intelmpicc ... will use icc
Compilation: MPICompilation: MPI Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
Everything else the same as for non-MPI compilation:
mpicxx -O3 -o myprog myprog.C -lfftw_mpi -lfftw
mpif90 -O3 -o myprog myprog.f90 -lblas
Running JobsRunning Jobs Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
Two ways of specifying the no. of processors in a job submission script:
#PBS -l procs=42
requests 42 cores anywhere on the facilityrecommended: leads to the by far shortest waiting times – no restrictions on the distribution of processes over nodes.
#PBS -l nodes=14:ppn=3
requests 14 (different!) nodes with exactly three cores on each of the nodes.
Running MPI JobsRunning MPI Jobs Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
You shall use mpiexec to run a MPI program.
not mpirun !
in job submission scripts simply
mpiexec myprog args
will work.
interactively (on headnode for test purposes)
mpiexec -n 2 myprog args
is required.
Where to Run?Where to Run? Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
Orcinus fastest, largest: 3.00GHz, 3072 coresbut available disk space small, I/O slow.
► jobs with little I/O requirements
► special software: Matlab, Lumerical
Checkers has medium size disk space and intermediate I/O performance
► special software: Gaussian
Bugaboo has large and high-performance disks: parallel Lustre filesystem
► jobs with large and/or high-performance storage needs
Where to Run?Where to Run? Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
Snowpatch has only GigE interconnect, is somewhat older and slower.
► jobs without requirement for high-performance interconnect
Check where which software is available:
http://www.westgrid.ca/support/software
Where to Run?Where to Run? Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
Job limits:
► maximum walltime:
◆ Bugaboo: 4 months◆ Checkers: 21 days◆ Orcinus: 10 days
► maximum processor-seconds (cumulative):
◆ Bugaboo: 1024 days◆ Checkers: 1280 days◆ Orcinus: 15000 days
Where to Run?Where to Run? Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
Still don't know?
► use the system that is the least busy – shortest waiting time.
Problems?Problems? Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus,, Snowpatch Snowpatch
Your program crashed? Oh no.
Please email us: [email protected]
But please give us some details:
At least:
► which facility you were working on
► the job id
► the submission script
► the error message that indicates the job failure
ModulesModules Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus
Use the module command to change the default environment:
change the default compilers, e.g., from Intel compilers to GNU compilers:
module load gcc
change the version of a compiler:
module load intel/11.0.074
(check: mpif90 --versionifort (IFORT) 11.0 20081105)
Modules (cont)Modules (cont) Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus
make certain software packages available that are not in the default PATH
module load gaussian
change the MPI distribution:
module load mpi/intel
warning: this will not work with system compiled software – not recommended. In particular, mpiexec cannot figure out which MPI distribution was used at compile time.
Modules (cont)Modules (cont) Bugaboo, Checkers, OrcinusBugaboo, Checkers, Orcinus
show what is available
module avail
(a short list does not indicate that the software is not installed, it may just mean that the software is available by default)
show what is currently loaded
module list
getting back to the default setup
module unload gccmpif90 ––versionifort (IFORT) 11.1 20090511
Storage FacilitiesStorage Facilities Silo, BugabooSilo, Bugaboo
Silo ... a very fine system, highly recommended
► located at USask
► about 450 TB disk space
► /home file system:
◆ backed up◆ default quota: 500 GB
► /data file system:
◆ no backups◆ default quota: 1 TB
Storage Facilities (cont)Storage Facilities (cont) Silo, BugabooSilo, Bugaboo
Bugaboo ... dedicated to jobs with high I/O needs
► located at SFU
► 980 TB disk space – half of that allocated to ATLAS (LHC)
► /home file system:
◆ backed up◆ default quota: 300 GB
► /global/scratch file system:
◆ no backups◆ default quota: 1 TB
Storage Facilities (cont)Storage Facilities (cont) Silo, BugabooSilo, Bugaboo
Gridstore ... do not use!
► /vault, /data file systems: all files will be moved off
➡ to Silo by default➡ to Bugaboo on request
► /home file system:
➡ dedicated exclusively to jobs running on Snowpatch, Robson, Tantalus
Blackhole ... will disappear.
Questions?Questions?