hpc user guide Documentation

hpc user guide DocumentationRelease master

HPC admin team

Dec 19, 2021

Contents

1 Octopus 1

i

ii

CHAPTER 1

Octopus

1.1 Overview

1.1.1 Hardware Resources

Octopus is a mixed architecture Intel/AMD Beowulf cluster with the following specifications:

• 776 cores

– 376 AMD EPYC 7551p vCPUs

– 96 Intel Xeon E5-2695 v4 vCPUs

– 256 Intel Xeon E5-2665 physical cores

– 48 Intel Xeon E5-2643 v2 vCPUs

• 2.6 TB main memory

• 4 x Nvidia V100 PCI-E GPUs

• 8 x Nvidia GK110GL Tesla K20m GPUs

• 10 Gbit/s CISCO interconnect used for storage and computing

• 40 Gbit/s Infiniband interconnect QLogic 12200 InfiniBand QDR switch switch

• 100 TB shared storage and scratch space

1.1.2 Operating system

All the nodes of Octopus run Linux (CentOS 7).

The following types of jobs can be run on the cluster:

• batch jobs (no user interaction)

• GPU jobs (e.g scientific computing using GPGPUs or deep learning)

1

https://www.amd.com/en/products/cpu/amd-epyc-7551p

https://ark.intel.com/content/www/us/en/ark/products/91316/intel-xeon-processor-e5-2695-v4-45m-cache-2-10-ghz.html

https://ark.intel.com/products/64597/Intel-Xeon-Processor-E5-2665-20M-Cache-2_40-GHz-8_00-GTs-Intel-QPI?q=E5-2665

https://ark.intel.com/content/www/us/en/ark/products/75268/intel-xeon-processor-e5-2643-v2-25m-cache-3-50-ghz.html

https://www.nvidia.com/en-us/data-center/v100/

http://www.nvidia.com/content/PDF/kepler/Tesla-K20X-BD-06397-001-v05.pdf

http://filedownloads.qlogic.com/files/software/77422/Install_Guide_QLogic_12000_B.pdf

hpc user guide Documentation, Release master

2 Chapter 1. Octopus


• memory intensive jobs (up to 256GB RAM on a single machine available as a SMP host)

• IO intensive jobs using the scratch partition (e.g several TB processing per job)

• Interactive Jupyer jobs running on the compute hosts

• Fully interactive desktop environment running on a compute node

1.1.3 Scheduler

The scheduler used in Octopus is open source SLURM For more information on using the scheduler please consultthe SLURM cheatsheet

1.1.4 Partitions

There are three main types of partitions available for users:

• normal: 12 hosts with 16 vCPUs each with 64GB RAM

• gpu: 3 hosts with one V100 card on each node limited to 8 cores and 128 GB RAM max.

• large: 4 hosts with 64 cores each and 256 GB RAM

• arza: 16 hosts with 16 cores each and 64 GB RAM connected with Infiniband connectivity

• medium: 5 hosts with 12 cores each and 24 GB RAM

For more information on using the paritions with the information on the resources and time limits please consult thehosts and partitions section.

1.1.5 Storage

All the hosts’ mount the /home directory and the /apps directory. The quota of the home directory is set to 25 GB.The /home directory is backed up regularly. For larger storage space the /scratch partition can be used that has aquota 1 TB per user. The maximum number of files that can be owned by a user is 1,000,000.

1.2 Getting connected

1.2.1 Connecting to a terminal

When on the AUB network (also valid when connected through the VPN service https://servicedesk.aub.edu.lb/TDClient/Requests/ServiceDet?ID=29740 ), any of the following methods can be used to login to the head nodeof the cluster.

ssh [email protected] # preferredssh [email protected] # optional (not recommended)ssh [email protected] # optional (not recommended)ssh [email protected] # last resort (if all of the above do not work)

TIP: Passwordless login can be set up to avoid typing the password everytime and is safer than saving the pass-word in the ssh client or re-typing it.

1.2. Getting connected 3

https://slurm.schedmd.com/documentation.html

https://servicedesk.aub.edu.lb/TDClient/Requests/ServiceDet?ID=29740

https://servicedesk.aub.edu.lb/TDClient/Requests/ServiceDet?ID=29740


Warning: SECURITY: make sure to change your account password after the administrators have created youraccount. To change the account password after logging in, use the command passwd

Note: direct ssh access to the compute nodes is disabled and not allowed.

1.2.2 Tools for connecting

Any of the following can be used to connect to Octopus:

• native ssh on linux or mac (recommended)

• msys2 (recommended on windows) [execute pacman -S openssh rsync]

• mobaxterm (most user freindly) [install the portable version]

• winscp: https://winscp.net/eng/index.php

• putty: https://putty.org/

1.2.3 Generating a ssh private-public key pair

SSH keys can be used to authenticate yourself to login to the cluster. This is the recommended method and is moresecure than typing in password or saving the passowrd in the ssh client (e.g putty). The generated key pair will allowyou to login to the cluster from your local machine.

my machine ----------> HPC cluster(linux/win/mac) (linux)

on linux and mac

To generate the key files:

- public key : ``~/.ssh/id_rsa.pub``- private key: ``~/.ssh/id_rsa``

execute the following command in a terminal on you machine:

# create the ssh directory and set the correct permission flagmy machine> mkdir -p ~/.sshmy machine> chmod 700 ~/.ssh

# first generate an ssh key on Amy machine> ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa

Warning: this will overwrite any keys that already exist. You can specify a new identity name using the -fmy_ouptut_keyfie


https://www.msys2.org

https://mobaxterm.mobatek.net

https://winscp.net/eng/index.php

https://putty.org/


Note: this same process can be done on windows also from the command line assuming that you already have opensshinstalled. (e.g using msys2)

<screencast>

on windwows using mobaxterm

Mobaxterm can be used to generate a ssh private-public key pair. <screencast>

1.2.4 Login to the HPC cluster using a ssh public key

At this point, it is assumed that you already have a ssh identity (public-private key pair). If not, see the section above.

on linux/mac

to push your public key to the cluster, the command ssh-copy-id can be used.

$ ssh-copy-id -i id_rsa [email protected]

To test if the key has been added correctly:

$ ssh -i ~/.ssh/id_rsa [email protected]

<screencast>

on windows using mobaxterm

The second part of the following screencast covers using mobaxterm and a ssh identity to log in without a password.

1.2.5 Connecting to a graphical user interface

VNC session are useful only if you want to have a desktop like environment that runs on the HPC cluster but isdisplayed on your computer with which the user can interact (e.g with a mouse). Such desktop environments are usefulfor example for lightweight visualizations of data that are rendered on the HPC cluster or for testing and prototyping.In this section the procedure for creating a VNC session on the head node is described.

Note: VNC session on the head node should be restricted for non-compute or memory or input/output intensive tasks.For demanding interative work with a desktop environment use the job script for running a VNC server on a computenode that has signifincantly more resources than the head node and significantly more rendering power on the GPUnodes.

VNC session are not needed for command line work or for running batch batch.

VNC clients

VNC is a simple way to join a remote desktop session on the cluster. There are several flavours and clients of VNC.We recommend the following:

1.2. Getting connected 5

http://website.aub.edu.lb/it/hpc/SiteAssets/Pages/faq/generate_ssh_key_linux.mp4

http://website.aub.edu.lb/it/hpc/SiteAssets/Pages/faq/generate_ssh_public_private_key_pair_mobaxterm_windows_and_enable_passwordless_login.mp4

http://website.aub.edu.lb/it/hpc/SiteAssets/Pages/faq/login_with_ssh_key_linux.mp4

http://website.aub.edu.lb/it/hpc/SiteAssets/Pages/faq/generate_ssh_public_private_key_pair_mobaxterm_windows_and_enable_passwordless_login.mp4


• realVNC: https://www.realvnc.com/en/connect/download/viewer/linux/ (easy)

• TigerVNC: https://wiki.archlinux.org/index.php/TigerVNC (easy-advanced)

TigerVNC can be easily installed on most linux operating systems. RealVNC is more user freindly and is availablefor most common operating systems.

1.2.6 Creating SSH tunnels

SSH tunnels are handy for redirecting traffic from one host/port to another. Here are some links on how to createtunnels on various platforms, since we will be using them in what follows:

• native linux tunnel https://www.revsys.com/writings/quicktips/ssh-tunnel.html

• tunnels with putty

– https://infosecaddicts.com/perform-local-ssh-tunneling/

– https://www.youtube.com/watch?v=7YNd1tFJfwc

• tunnels with powershell https://www.youtube.com/watch?v=gh03CpaUxbQ

• tunnels with mobaxterm

– https://blog.mobatek.net/post/ssh-tunnels-and-port-forwarding/

– http://emp.byui.edu/ercanbracks/cs213/SSH%20tunneling%20with%20Mobaxterm.htm

• contact it.helpdesk and mention HPC getting connected

1.3 Transferring data

There are several ways to transfer data from and to Octopus. The following is a subset and an incomplete list ofmethods and tools:

• scp

• rsync

• winscp

• sftp

1.3.1 scp

transfer/copy files from the local machine to ~/ on octopus:

scp -rp my_local_file <user>@octopus.aub.edu.lb:~/

To transfer files from octopus to the local machine rsync is more suitable since the command should be executedon octopus and that requires a connection from octopus to the local machine that is usually no possible unless a sshtunnel is created. rsync supports this out of the box.

More information on using scp can be found in the official manual.


https://www.realvnc.com/en/connect/download/viewer/linux/

https://wiki.archlinux.org/index.php/TigerVNC

https://www.revsys.com/writings/quicktips/ssh-tunnel.html

https://infosecaddicts.com/perform-local-ssh-tunneling/

https://www.youtube.com/watch?v=7YNd1tFJfwc

https://www.youtube.com/watch?v=gh03CpaUxbQ

https://blog.mobatek.net/post/ssh-tunnels-and-port-forwarding/

http://emp.byui.edu/ercanbracks/cs213/SSH%20tunneling%20with%20Mobaxterm.htm

https://linux.die.net/man/1/scp


1.3.2 rsync

transfer/copy files from the local machine to ~/ on octopus:

rsync -PrlHvtpog my_local_file <user>@octopus.aub.edu.lb:~/

To transfer files from octopus to the local machine:

rsync -PrlHvtpog <user>@octopus.aub.edu.lb:~/my_file .

More information on using rsync can be found in the official manual.

1.3.3 winscp

Winscp is a graphical user interface that can be used to transfer files back and forth among the local machine and theHPC cluster. It is a free tool that can be downloaded from the here. The follwing tutorial is a good reference on howto use it.

1.3.4 sftp

sftp is a command line based tool that provides the same functionality as winscp. To establish a secure ftp connectionto octopus the following command can be used:

sftp <user>@octopus.aub.edu.lb

Once the connection is established sftp commands such as (get, put) can be used in the sftp prompt to send / receivedata (files, folders, . . . etc).

More information on using rsync can be found in the official manual.

1.4 SLURM cheatsheet help

This page is dedicated to commonly used SLURM commands with short tips and howto quickies. You can find moredetails at (first two hits on google search):

• https://slurm.schedmd.com/pdfs/summary.pdf

• https://www.chpc.utah.edu/presentations/SlurmCheatsheet.pdf

1.4.1 Submitting a job

In order to submit a job, a script compliant with the scheduler directives should be passed to sbatch

$ sbatch my_job_script.sh

To submit an interactive for testing and/or debugging/development the srun command can be used

# single core interactive bash terminal on a compute node (e.g for development)$ srun --pty /bin/bash

# allocate a cpu only job (specify resources details)$ srun --partition=normal --nodes=1 --ntasks-per-node=4 --cpus-per-task=1 --mem=8000 -→˓-account=my_project --time=0-01:00:00 --pty /bin/bash

(continues on next page)

1.4. SLURM cheatsheet help 7

https://linux.die.net/man/1/rsync

https://winscp.net/eng/download.php

https://www.youtube.com/watch?v=xW0BQIaz7Ic&ab_channel=ExaVault

https://linux.die.net/man/1/sftp

https://slurm.schedmd.com/pdfs/summary.pdf

https://www.chpc.utah.edu/presentations/SlurmCheatsheet.pdf


(continued from previous page)

# allocate a gpu job$ srun --partition=gpu --nodes=1 --ntasks-per-node=1 --cpus-per-task=1 --mem=8000 --→˓gres=gpu --account=my_project --time=0-01:00:00 --pty /bin/bash

1.4.2 List of running jobs

The list of jobs specific to the current user (i.e you) that are queued or running

$ squeue

The list of jobs running or queueud on the cluster

$ squeue -a

To show the estimated starting time of a pending job

$ squeue --start -j <job_id>

1.4.3 Remove a job from the queue

Use squeue to query the running jobs and get the JOBID. Once the job id (that is an integer in the first column ofthe output of squeue) of the job to be killed is know, execute:

$ scancel job_to_be_killed_id

1.4.4 List of hosts and queues/partitions on the cluster

$ sinfo

To see the details of the compute nodes with their respective specs

$ sinfo_all

NODELIST STATE AVAIL CPUS S:C:T CPU_LOAD FREE_MEM ACTIVE_FEATURES→˓REASONonode01 idle up 16 2:8:1 0.01 62536 intel noneonode02 idle up 16 2:8:1 0.01 63275 intel noneonode03 idle up 16 2:8:1 0.01 63317 intel noneonode04 idle up 16 2:8:1 0.08 63295 intel noneonode05 idle up 16 2:8:1 0.06 18614 amd noneonode06 idle up 16 2:8:1 0.03 25758 amd noneonode07 idle up 16 2:8:1 0.01 59303 amd noneonode08 idle up 16 2:8:1 0.01 21531 amd noneonode09 idle up 16 2:8:1 0.01 18060 amd noneonode10 idle up 8 1:8:1 0.07 14140 amd noneonode11 idle up 8 1:8:1 0.01 32087 amd noneonode12 idle up 8 1:8:1 0.15 31365 amd noneonode13 idle up 64 8:8:1 0.01 63232 amd noneonode14 idle up 64 8:8:1 0.01 56430 amd none





onode15 idle up 64 8:8:1 0.01 63092 amd noneonode16 idle up 64 8:8:1 0.01 62363 amd none

To see the details of the available partition with their respective specs

$ sinfo_partitions

PARTITION TIMELIMIT NODELIST MAX_CPUS_PER_NODE NODES→˓ JOB_SIZE CPUS MEMORY GRES→˓ NODES(A/I/O/T)normal 1-00:00:00 onode[01-09] UNLIMITED 9→˓ 1-infinite 16 60000+ (null)→˓ 0/9/0/9large 1-00:00:00 onode[13-16] UNLIMITED 4→˓ 1-infinite 64 256000 (null)→˓ 1/3/0/4gpu 6:00:00 onode10 UNLIMITED 1→˓ 1-infinite 8 15000→˓gpu:v100d16q:1 1/0/0/1gpu 6:00:00 onode[11-12] UNLIMITED 2→˓ 1-infinite 8 32000→˓gpu:v100d32q:1 1/1/0/2msfea-ai 3-00:00:00 onode12 UNLIMITED 1→˓ 1-infinite 8 32000→˓gpu:v100d32q:1 1/0/0/1msfea-ai 3-00:00:00 onode10 UNLIMITED 1→˓ 1-infinite 8 15000→˓gpu:v100d16q:1 1/0/0/1cmps-ai 3-00:00:00 onode11 UNLIMITED 1→˓ 1-infinite 8 32000→˓gpu:v100d32q:1 0/1/0/1physics 1-00:00:00 onode[13-16] UNLIMITED 4→˓ 1-infinite 64 256000 (null)→˓ 1/3/0/4

1.5 Example job script

The following script can be used as a template to exectute some bash commands for a serial or parallel program.

1 #!/bin/bash2

3 ## specify the job and project name4 #SBATCH --job-name=my_job_name5 #SBATCH --account=foo_project6

7 ## specify the required resources8 #SBATCH --partition=normal9 #SBATCH --nodes=1

10 #SBATCH --ntasks-per-node=811 #SBATCH --cpus-per-task=212 #SBATCH --gres=gpu:v100d32q:113 #SBATCH --mem=1200014 #SBATCH --time=0-01:00:0015 #SBATCH --mail-type=ALL


1.5. Example job script 9



16 #SBATCH [email protected]

18 #19 # add your command here, e.g20 #21 echo "hello world"

Flags Description

• #SBATCH --job-name=my_job_name: Set the name of the job. This will appear e.g. when the commandsqueue is executed to query the queued or running jobs.

• #SBATCH --account=abc123: Specify the ID of the project. This number should correspond to the projectID of the service request. Jobs without this flag will be rejected.

• #SBATCH --partition=normal: The name of the partition, a.k.a queue to which the job will be submit-ted.

• #SBATCH --nodes=2: The number of nodes that will be reserved for the job.

• #SBATCH --ntasks-per-node=8: The number of cores (e.g mpi tasks) that will be reserved per node.

• #SBATCH --cpus-per-task=2: The number of cores per task to be reserved for the job (e.g number ofopenmp threads per mpi task). The total number of cores reserved for the job is the product of the values of theflags --nodes, --ntasks-per-node and --cpus-per-task.

• #SBATCH --mem=32000: the amount of memory per node in MB that will be reserved for the job. Jobs thatdo not specify this flag will be rejected.

• #SBATCH --time=1-00:00:00: The time limit of the job. When the limit is reached, the job is killed bythe scheduler. Jobs that do not specify this flag will be rejected.

• #SBATCH --mail-type=ALL: recieve email notification for all stages of a job, e.g when the job starts andterminates.

• #SBATCH [email protected]: The email address to which the job notification emailsare sent.

1.5.1 Batch job submission and monitoring procedure

• submit the job script using SLURM

$ sbatch my_job_script.sh

This will submit the job to the queueing system. The job could run immediately or set in pending mode untilthe requested resources are available for the job to run.

• check the status of the job

$ squeue -a

• After the job is dispatched for executing (starts running), monitor the output by checking the .o file.

For more information on using SLURM, please consult the man pages:

$ man sbatch



1.5.2 Jobs time limits and checkpoints

In-order to have fair usage of the resources and the partitions (queues), different partitions have different time limits.The maximum time limit for jobs is 3 days. Also paritions have different priorities that are necessary for fair usage,for example, short jobs have higher priorities than long jobs. When a job reaches the time limit that is specified in thejob script or the time limit of the partition, it is automatically killed and removed the the queue. It is the responsibilityof the user to set the job parameters based on the requirements of the job and the available resources.

in all the examples below it is the responsibily of the user to manage writing the checkpoint file and loading it.

1.5.3 Resubmit a job automatically using job arrays

In the following example, a job array (#SBATCH --array=1-30%1) is used to indicate that the job should be runas a chain of 30 jobs back to back. Using this flow a job can be run for arbitarily long periods, in this case and forthe sake of demonstration, this job runs for 30 days using individual jobs that run for 1 day each. When the first jobfinishes, a checkpoint file foo.chkp is written to the disk and the execution of the next job starts where foo.chkp‘ isread and the program state is restored and the execution resumes.

#!/bin/bash

#SBATCH --job-name=my_job_name#SBATCH --account=abc123

## specify the required resources#SBATCH --partition=normal#SBATCH --nodes=1#SBATCH --ntasks-per-node=8#SBATCH --cpus-per-task=2#SBATCH --mem=12000#SBATCH --time=0-01:00:00#SBATCH --array=1-30%1

## load some modulesmodule load python

# start executing the program,MY_CHECKPOINT_FILE=foo.chkpif [ -z "${MY_CHECKPOINT_FILE}" ]; then

# checkpoint file is not found, execute this commandpython train_model_from_scratch.py

else# checkpoint file is found, read it and continue trainingpython train_model_from_scratch.py --use-checkpoint=${MY_CHECKPOINT_FILE}

fi

Each job in the job array will have its own .out file suffixed with the job array index, e.g my_slurm_30.out.

1.5.4 resubmit a job automatically using job dependencies

The main difference between using job dependencies and job array is that using dependencies the job will be resub-mitted infinit times until the user decides to cancel the automatic re-submission.

Warning: It is important to include a wait time of a few minuites (e.g 5 min) so that the scheduler will not beoverloaded by the recursive resubmission of jobs in case something goes wrong.

1.5. Example job script 11


In the template job script below, when the job is submitted, a sbatch command submits the dependency from withinthe job. The simulation/program resume procedure is the same as that of using job arrays, i.e if a checkpoint exists,run the program from the checkpoint, otherwise run the program and create the checkpoint.

#!/bin/bash

#SBATCH --job-name=my_job_name#SBATCH --account=abc123

## specify the required resources#SBATCH --partition=normal#SBATCH --nodes=1#SBATCH --ntasks-per-node=8#SBATCH --cpus-per-task=2#SBATCH --mem=12000#SBATCH --time=0-01:00:00

## submit the dependency that will start after the current job finishessbatch --dependency=afterok:${SLURM_JOBID} job.shsleep 300

# start executing the program,MY_CHECKPOINT_FILE=foo.chkpif [ -z "${MY_CHECKPOINT_FILE}" ]; then

# checkpoint file is not found, execute this commandpython train_model_from_scratch.py

else# checkpoint file is found, read it and continue trainingpython train_model_from_scratch.py --use-checkpoint=${MY_CHECKPOINT_FILE}

fi

1.6 Applications

Several programs with different versions are available on HPC systems. Having all the versions at the disposal of theuser simultaneously leads to library conflicts and clashes. In any production environment only the needed packagesshould be included in the environment of the user. In-order to use python3 the appropriate module should be loaded.

$ module load python/3

Modules are “configuration” set/change/remove environment variables from the current environment.

1.6.1 Useful module commands

• module avail: display the available packages that can be loaded

• module list: lists the loaded packages

• module load foo: to load the package foo

• module rm foo: to unload the package foo

• module purge: to unload all the loaded packages

For detailed information on the usage of module check the man pages



$ man module

1.6.2 Installed applications, tools and scientific libraries

Below is a list of applications currently installed and maintained on the HPC environment. These application areavailable to all users and can be used in any combination as long as they are compatible with each other and do notcause library clashes.

Scientific applications

Application Versionabaqus 2020ansys electromagnetics/19.0

fluent/17.2fluent/18.2fluent/19.0fluent/20.1

darknet 0ff2343petrel 2019.2enzo enzo

2.5-gcc-9.1.0-openmpi-4.0.1ffmpeg 4.2.2gaussian 09git 2.23.0gnuplot 5.2.8gromacs 2020-gpu

2020-mpi-gpu2020-mpi

kaldi 4c41168-gcc-8.3.0-cuda-10.14c41168-gcc-8.3.0

lammps 7Aug2019-gcc-9.1.0mathematica 11.3matlab 2018b

2019bmrbayes 3.2.7-gcc-9.1.0

3.2.7-mpi3.2.7-openmpi-4.0.13.2.7

mumax 3.10beta-b21e9e63.10beta

nload 0.7.4octave 5.2.0opensees 3.2.0pari 2.11.2-mt

2.11.2semargl 221910csingularity 3.1.0

3.5.2sox 12.18.1

Continued on next page

1.6. Applications 13


Table 1 – continued from previous pageApplication Version

14.4.2vim 8.1

Scientific Libraries

Name Versionamd amd-rng/2.1

blis/2.1fftw/3.5.8-amd-mpilapack/2.1libm/3.5.0scalapack/2.1securerng/2.1

blas 3.8.0cblas 3.8.0cupy py36-cuda-8.0

py36-cuda-10.1py37-cuda-8.0py37-cuda-9.1py37-cuda-10.1

eigen 3.3.7fftw 2.1.5

3.3.8flac 1.3.3gdal 2.4.1gperftools 2.7-gcc-4.8.5grackle 3.2-gcc-9.1.0-haswellgsl 2.6hdf/5 1.8.15-mpi

1.8.15-openmpi-4.0.11.8.15-serial-gcc-9.1.01.8.15-serial

hypre openmpi/2.13.10-gcc-9.1.0lame 3.100lapack 3.8.0libcint 3.0.18libjpeg turbo-2.0.4libpng 1.6.36libunwind 1.3.1-gcc-4.8.5

1.3.1-gcc-9.1.0numactl 2.0.13openblas 0.3.5

0.3.8-gcc-9.1.0-dynamic-arch0.3.8-gcc-9.1.0-haswell

opencv 3.4.10openfst 1.6.7-gcc-8.3.0rpy2 2.9.5-py3-gcc-8.3.0

2.9.5ucx 1.6.0




Table 2 – continued from previous pageName Versionxlrd 1.2.0

Compilers and interpreters

Name Versionamd/aocc 2.0.0

2.1.0cmake 3.10.2

3.13.43.15.4

Cuda 8.09.09.11010.1

gcc 5.4.06.4.07.2.08.3.09.1.010.1.0

go 1.111.13.4

intel 2019u5java java8

jdk/1.8.0_161jdk/1.8.0

llvm 589

mpi/mpich intel-2019u53.33.3.2

mpi/mvapich 2.3mpi/openmpi 1.6.2

3.1.34.0.1-slurm-18.08.64.0.1

perl 5.28.05.30.1

pgi 19.10/pgipython 2

2.7.1533.7.33.7.73.8.2base/miniconda3pytorchpytorch-0.4.1


1.6. Applications 15


Table 3 – continued from previous pageName Version

qiskittensorflow-1.14.0tensorflow-2.1theanotheano-1.0.4

scala 2.12.7swig 4.0.1

Miscellaneous Applications

Name VersionR 3.6.1autoconf 2.69automake 1.16curl 7.58.0hwloc 2.0.3libtool 2.4.6pmix 2.2.2prun 1.3

1.7 Interactive jobs - Desktop environment on a compute nodes

Interactive jobs give the user a desktop like environment on a compute node. Such jobs are useful for tasks where userinteractions / input are needed. For example, although matlab or Ansys Fluent jobs can be run as batch jobsthrough the command line or scripts, sometimes interacting with their GUIs is necessary.

1.7.1 Recommended workflow

1) Create the VNC configuration. This step is done when the account is created and hence can be skipped. Executethe procedure described there if your VNC configuration does not exist or is corrupted.

2) submit the job script ( e.g sbatch job.sh)

3) Get the port number after the job starts running from e.g slurm-166866.out get the VNC port numberVNC_HEAD_PORT = 5201.

4) create the tunnel.

5) connect using a vnc viewer to the vnc session running on the compute using e.g localhost:5201

1.7.2 An interactive job on a compute node

When the interactive job script is submitted, a vnc session is created on the compute node. The session is terminatedwhen the job exits or when the job is killed.

To connect to the vnc session using a vnc viewer (client) a tunnel to the VNC_HEAD_PORT that is specified in the jobscript below should be created.



details

1.7.3 Create/edit folder and files

• first check if the folder .vnc exists and has the following two files: xstartup and config by executing:

ls ~/.vnc

the output should show

config xstartup

If these files don’t exist, create them by copying the settings from a pre-defined directory on the shared filesystem/home/shared/sample_scripts/slurm_vnc_job

rm -fvr ~/.vnccp -fvr /home/shared/sample_scripts/slurm_vnc_job/.vnc ~/chown -Rc $USER ~/.vnccp /home/shared/sample_scripts/slurm_vnc_job/job.sh ~/

set the vnc password by executing the command (set a strong password that is at least 12 characters long)

vncpasswd# optionally set a view only password

1.7. Interactive jobs - Desktop environment on a compute nodes 17


1.7.4 submit the job

The following job script can be used as a template and the resources options can be changed to meet the demands of aparticular simulation. This job script is also included in ~/.vnc folder. After submitting the job, the VNC_HEAD_PORTis written to the slurm-JOBID.out file.

#!/bin/bash

## specify the job and project name#SBATCH --job-name=my_job_name#SBATCH --account=7672200


### DO NOT EDIT BEYOND HERE UNLESS YOU KNOW WHAT YOU ARE DOINGsource ~/.bashrc

VNC_HEAD_PORT=$(random_unused_port)echo "VNC_HEAD_PORT = ${VNC_HEAD_PORT}"

JOB_INFO_FPATH=~/.vnc/slurm_${SLURM_JOB_ID}.vnc.outrm -f ${JOB_INFO_FPATH}

VNC_SESSION_ID=$(vncserver 2>&1 | grep "desktop is" | tr ":" "\n" | tail -n→˓1)echo ${VNC_SESSION_ID} >> ${JOB_INFO_FPATH}

ssh -R localhost:${VNC_HEAD_PORT}:localhost:$((5900 + ${VNC_SESSION_ID}))→˓ohead1 -N &SSH_TUNNEL_PID=$!echo ${SSH_TUNNEL_PID} >> ${JOB_INFO_FPATH}

sleep infinity

A copy of this file can be obtained from /home/shared/sample_scripts/slurm_vnc_job/job.sh. Al-tenatively create the file in your ~/ directory. The script can be submitted the usual way using sbatch.

1.7.5 Create a ssh tunnel

On a local terminal, use the VNC_HEAD_PORT written to the slurm-JOBID.out file to create the tunnel. Thetunnel can be created using other application such as mobaxterm using its graphical user interface.

ssh -L localhost:<VNC_HEAD_PORT>:localhost:<VNC_HEAD_PORT> <user>@octopus.aub.edu.lb -→˓N

1.7.6 Connect using a vnc viewer (client) to the ssh tunnel on localhost

If you’re using RealVNC type in localhost:<VNC_HEAD_PORT>

or on MobaXterm, session -> VNC:


https://blog.mobatek.net/post/ssh-tunnels-and-port-forwarding/


• Remote hostname or IP address: localhost

• port: <VNC_HEAD_PORT>

1.8 Scientific Computing and Environments

1.8.1 Parallel Computing

1.8.2 GPU computing

1.8.3 Python

The python environment can be activated using the command:

module load python/3

Most of the python environment are set up based on anaconda in the /apps/sw/miniconda/ directory.

Users who wish to extend/create custom python these environment can:

• use the flag --user when using pip. For example to install ipython a user can issue the command:

pip install --user ipython

This will install ipython to ~/.local/lib/python3.7/site-packages. Users can check whetherthe imported package is the one installed in their home directory by importing it and printing it, e.g

import IPythonprint(IPython)>>> <module 'IPython' from '/home/john/.local/lib/python3.7/site-packages/IPython/→˓__init__.py'>

• a similar approach can be done for anaconda environments.

– new conda environment in a custom location

conda create --prefix /home/john/test-env python=3.8

• virtualenvs are by default created in the home directory ~/.virtualenvs. It might be also useful touse the package Virtualenvwrapper.

• use pipenv that is a new and powerful way to creating and managing python environments.The following is an excellent guide on getting started with pipenv https://robots.thoughtbot.com/how-to-manage-your-python-projects-with-pipenv

• install anaconda locally in their home directories

• compile and install python from source. This is non-trivial and requires good knowledge of what the user isdoing, but gives full control on the build process and customization of python. For optimal performance, this isthe recommended approach.

Jupyter notebooks

A jupyter lab server is run on a compute node to which a user can connect to using a browser on the local machine (i.elaptop/desktop/terminal).

1.8. Scientific Computing and Environments 19

https://robots.thoughtbot.com/how-to-manage-your-python-projects-with-pipenv

https://robots.thoughtbot.com/how-to-manage-your-python-projects-with-pipenv


• submit the jupyter server script using sbatch (see below)

• get the port number from jupyter-${MY_NEW_JOB_ID}.log after the job stars running

• create the tunnel to Octopus

• get the URL with the autnetication token from jupyter-${MY_NEW_JOB_ID}.log and use that link (with thetoken) in your browser

Jupyter notebook job on a compute node

The following job script can be used as a template to submit a job.

#!/bin/bash

#SBATCH --job-name=jupyter-server#SBATCH --partition=normal#SBATCH --account=my_account

#SBATCH --nodes=1#SBATCH --ntasks-per-node=1#SBATCH --cpus-per-task=1#SBATCH --mem=8000#SBATCH --time=0-01:00:00#SBATCH --account=foo_project

source ~/.bashrc

module purgemodule load python/3

JUPYTER_PORT=$(random_unused_port)

jupyter-lab --no-browser --port=${JUPYTER_PORT} > jupyter-${SLURM_JOB_ID}.log 2>&1 &ssh -R localhost:${JUPYTER_PORT}:localhost:${JUPYTER_PORT} ohead1 -N

Connect to the jupyter server from a client

After the job is submitted it is possible to connect to the jupyter server (that is running on the compute node) using sshtunnels from your local client machine’s web browser. To create the tunnel, execute (on your local terminal)

$ ssh -L localhost:38888:localhost:38888 octopus.aub.edu.lb -N

After creating the tunnel, you can access the server from your browser by typing in the url (with the token) found injupyter.log (see previous section)

The diagram for the steps involved is:

Running production jobs with Jupyter notebooks

Using Jupyter notebooks through the browser as described above requires a contineous and stable connection to theHPC cluster (to keep the ssh tunnel alive). When connected from inside the campus network, such issues are minimal.However the connection might experience instability and could get disconected especially when there are no userinteractions with the notebook, e.g when running a production job when the user is away from the terminal.





After developing a Jupyter notebook (through the browser), production jobs can be runs in batch mode by executingthe notebook. Such execution does not require interactions with the notebook through the browser. The followingtemplate job script can be used to execute the input notebook and the executed notebook is saved into a separate onewhere it can be retrieved from the cluster and examined elsewhere, i.e the notebook with the results are saved and noresources or gpu would be needed to view the results.

Note: no ssh tunnel is required for executing the notebook

#!/bin/bash

#SBATCH --job-name=jupyter-server#SBATCH --partition=normal


## load modules heremodule load python/3

## execute the notebookjupyter nbconvert --to notebook \

--ExecutePreprocessor.enabled=True \--ExecutePreprocessor.timeout=9999999 \--execute my_production_notebook.ipynb --output my_results.ipynb

1.8.4 Machine Learning - Deep Learning - Artificial Intelligence jobs

Deep learning frameworks

Currently the following machine learning libraries are installed:

• tensorflow

• keras

• pytorch

• sklearn

Hardware optimized for deep learning

There are three hosts that are available for running deep learning jobs

GPUs host(s) GPU / host GPU ram (GB) GPU resource flag4 onode10 1 x Nvidia V100 16 v100d16q:1

onode11 1 x Nvidia V100 32 v100d32q:1onode12 1 x Nvidia V100 32 v100d32q:1onode17 1 x Nvidia V100 32 v100d32q:1

8 anode[01-08] 1 x Nvidia K20x 4.5 k20:1



Allocating GPU resources

In order to use a GPU for the deep learning job (or other jobs that require GPU usage), the following flag must bespecified in the job script:

#SBATCH --gres=gpu

Not all the GPUs have the same amout of memory. Using --gres=gpu will allocate any available GPU. Advancedselections of the GPUs types can be specifyied by passing extra flags to --gres. The detailed flags for the differentGPU types are mentioned in the columns GPU resources flag in the table above. For example, to allocate aNvidia V100 GPU with 16GB GPU ram use the flag:

#SBATCH --gres=gpu:v100d16q:1

Using tensorflow, Keras or pytorch

The default environment for:

• tensorflow, keras and sklearn: python/tensorflow

• pytorch: python/pytorch

For any of these environment the cuda module must be imported.

A typical batch job script looks like:

#!/bin/bash

#SBATCH --job-name=keras-classify#SBATCH --partition=gpu

#SBATCH --nodes=1#SBATCH --ntasks-per-node=1#SBATCH --cpus-per-task=1#SBATCH --gres=gpu#SBATCH --mem=12000#SBATCH --time=0-01:00:00

## set the environment modulesmodule purgemodule load cudamodule load python/tensorflow

## execute the python jobpython3 keras_classification.py

To connect to a jupyter notebook with the deep learning environment copy the jupyter notebook server job scriptfrom the python jupyter server guide and load the cuda module and shown above in addition to the needed machinelearning framework module.

Deep learning jobs tips and best practices

It is recommended to:

• develop and prototype using interactive jobs such as jupyter notebooks or VNC sessions or batch interactivejobs and run the production models using bactch jobs.

• use checkpoints in-order to have higher turnover of GPU jobs since the resources are scarce.



Tensorflow has built in checkpointing features for training models. Details on possible workflows for jobs with check-points can be found in the slurm jobs guide

Troubleshooting

check the nvidia driver

To make sure that the job that has been dispatched to a node that has a GPU, the following command can be includedin the job script before the command that executes a notebook or a command that runs the training for example:

# BUNCH OF SBATCH COMMANDS (JOB HEADER)

## set the environment modulesmodule purgemodule load cudamodule load python/tensorflow

nvidia-smi

the expected output should be similar to the following where the Nvidia driver version is mentioned in addition to theCUDA toolkit version and some other specs of the GPU(s) and the list of GPU processes at the end (in this case none)

[john@onode12 ~]$ nvidia-smiSun Dec 8 00:41:27 2019+-----------------------------------------------------------------------------+| NVIDIA-SMI 430.30 Driver Version: 430.30 CUDA Version: 10.2 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. ||===============================+======================+======================|| 0 GRID V100D-32Q On | 00000000:02:02.0 Off | 0 || N/A N/A P0 N/A / N/A | 31657MiB / 32638MiB | 13% Default |+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+| Processes: GPU Memory || GPU PID Type Process name Usage ||=============================================================================|| No running processes found |+-----------------------------------------------------------------------------+

This snippet can be included in the job script

check the deep learning framework backend

For tensorflow, when the following snippet is executed:6Q, Compute Capability 7.0‘‘)

import tensorflow as tfwith tf.Session() as sess:

devices = sess.list_devices()

the GPU(s) should be displayed in the output (search for ‘‘StreamExecutor device (0): GRID V100D-16Q

2019-12-08 01:01:44.211101: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcuda.so.12019-12-08 01:01:44.246405: I tensorflow/stream_executor/cuda/cuda_gpu_executor.→˓cc:1005] successful NUMA node read from SysFS had negative value (-1), but there→˓must be at least one NUMA node, so returning NUMA node zero





2019-12-08 01:01:44.247114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640]→˓Found device 0 with properties:name: GRID V100D-16Q major: 7 minor: 0 memoryClockRate(GHz): 1.38pciBusID: 0000:02:02.02019-12-08 01:01:44.254377: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcudart.so.10.12019-12-08 01:01:44.288733: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcublas.so.102019-12-08 01:01:44.310036: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcufft.so.102019-12-08 01:01:44.345122: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcurand.so.102019-12-08 01:01:44.378862: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcusolver.so.102019-12-08 01:01:44.395244: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcusparse.so.102019-12-08 01:01:44.448277: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcudnn.so.72019-12-08 01:01:44.448677: I tensorflow/stream_executor/cuda/cuda_gpu_executor.→˓cc:1005] successful NUMA node read from SysFS had negative value (-1), but there→˓must be at least one NUMA node, so returning NUMA node zero2019-12-08 01:01:44.449664: I tensorflow/stream_executor/cuda/cuda_gpu_executor.→˓cc:1005] successful NUMA node read from SysFS had negative value (-1), but there→˓must be at least one NUMA node, so returning NUMA node zero2019-12-08 01:01:44.450245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763]→˓Adding visible gpu devices: 02019-12-08 01:01:44.451105: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your→˓CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.→˓1 SSE4.2 AVX AVX2 FMA2019-12-08 01:01:44.461730: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94]→˓CPU Frequency: 1996250000 Hz2019-12-08 01:01:44.462592: I tensorflow/compiler/xla/service/service.cc:168] XLA→˓service 0x5650b0feed20 executing computations on platform Host. Devices:2019-12-08 01:01:44.462644: I tensorflow/compiler/xla/service/service.cc:175]→˓StreamExecutor device (0): <undefined>, <undefined>2019-12-08 01:01:44.463168: I tensorflow/stream_executor/cuda/cuda_gpu_executor.→˓cc:1005] successful NUMA node read from SysFS had negative value (-1), but there→˓must be at least one NUMA node, so returning NUMA node zero2019-12-08 01:01:44.463942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640]→˓Found device 0 with properties:name: GRID V100D-16Q major: 7 minor: 0 memoryClockRate(GHz): 1.38pciBusID: 0000:02:02.02019-12-08 01:01:44.464020: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcudart.so.10.12019-12-08 01:01:44.464037: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcublas.so.102019-12-08 01:01:44.464052: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcufft.so.102019-12-08 01:01:44.464067: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcurand.so.102019-12-08 01:01:44.464080: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcusolver.so.102019-12-08 01:01:44.464094: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcusparse.so.102019-12-08 01:01:44.464109: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcudnn.so.72019-12-08 01:01:44.464181: I tensorflow/stream_executor/cuda/cuda_gpu_executor.→˓cc:1005] successful NUMA node read from SysFS had negative value (-1), but there→˓must be at least one NUMA node, so returning NUMA node zero





2019-12-08 01:01:44.464867: I tensorflow/stream_executor/cuda/cuda_gpu_executor.→˓cc:1005] successful NUMA node read from SysFS had negative value (-1), but there→˓must be at least one NUMA node, so returning NUMA node zero2019-12-08 01:01:44.465426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763]→˓Adding visible gpu devices: 02019-12-08 01:01:44.465481: I tensorflow/stream_executor/platform/default/dso_loader.→˓cc:42] Successfully opened dynamic library libcudart.so.10.12019-12-08 01:01:44.729323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181]→˓Device interconnect StreamExecutor with strength 1 edge matrix:2019-12-08 01:01:44.729383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]→˓ 02019-12-08 01:01:44.729399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200]→˓0: N2019-12-08 01:01:44.729779: I tensorflow/stream_executor/cuda/cuda_gpu_executor.→˓cc:1005] successful NUMA node read from SysFS had negative value (-1), but there→˓must be at least one NUMA node, so returning NUMA node zero2019-12-08 01:01:44.730551: I tensorflow/stream_executor/cuda/cuda_gpu_executor.→˓cc:1005] successful NUMA node read from SysFS had negative value (-1), but there→˓must be at least one NUMA node, so returning NUMA node zero2019-12-08 01:01:44.731236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.→˓cc:1005] successful NUMA node read from SysFS had negative value (-1), but there→˓must be at least one NUMA node, so returning NUMA node zero2019-12-08 01:01:44.731866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326]→˓Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14226→˓MB memory) -> physical GPU (device: 0, name: GRID V100D-16Q, pci bus id: 0000:02:02.→˓0, compute capability: 7.0)2019-12-08 01:01:44.734308: I tensorflow/compiler/xla/service/service.cc:168] XLA→˓service 0x5650b1acf9a0 executing computations on platform CUDA. Devices:2019-12-08 01:01:44.734353: I tensorflow/compiler/xla/service/service.cc:175]→˓StreamExecutor device (0): GRID V100D-16Q, Compute Capability 7.0

This snippet can be included at the top of the notebook or python script.

Similar checks can be done for pytorch.

1.8.5 The Matlab environment

Overview

Matlab can be used on the cluster in several configurations.

• run jobs directly on the compute nodes of the cluster (recommended)

– out of the box parallelism up to 64 cores (a full max size node)

– full parallelism on the cluster (guide not available yet)

• run jobs on a client and use the cluster as a backend (requires setup).

– supports windows clients

– supports linux and mac clients

Matlab as a client on Windows

This configuration allows the user to use MATLAB on a local machine e.g. a laptop or a terminal on the AUB networkand run the heavy computations sections of a Matlab program/script on the HPC cluster. After the execution on the



HPC cluster is complete the results are transparently retrieved by MATLAB and shown in the matlab workspace on theclient. For this use case, the user does not have to login (or interact) with the HPC cluster.

Note: this section of the guide has been tested with Matlab 2019b make sure you have the same version on the clientmachine.

Note: Multiple such parallel configuration can co-exist and can be selected at runtime.

Setting up a Matlab 2019b client

Pre-requisites:

• Matlab 2019b installed on the client.

• slurm.zip folder to be extracted in the integration folder

• Octopus Matlab 2019b client settings

• A working directory (folder) on your “C” or “D” drive.

• Have your Matlab code modified to exploit parallelism.

• Once slurm.zip is downloaded, extract it to Documents\MATLAB (shown in screenshot below) or to thecorresponding directory of your non-default Matlab installation directory:

• Open Matlab R2019b on the client machine (e.g your laptop)

– Select Set Path (under HOME -> ENVIRONMENT)

– Click on Add Folder

– Browse to Documents\MATLAB\slurm\nonshared

– Click save

• To import the octopus.mlsettings profile:

– click on Parallel

– click on Manage Cluster Profiles

– Choose Import then browse to octopus.mlsettings file (downloaded in step 3 in the Pre-requisitessection above)

– Once the octopus.mlsettings profile gets loaded, select it, click on Edit, and modify theRemoteJobStorageLocation by using a path on your HPC account (make sure to change the<user> to your username).

* You can choose which queue to work on through modifying AdditionalSubmitArgs:

* You can modify the number of cores to be used on HPC cluster (e.g. 4,6,8,10,12) throughNumWorkers

• When finished, press done and make sure to set the HPC profile as Default.

• Press validate to validate the parallel configuratin.


https://mailaub.sharepoint.com/:f:/r/sites/vLab/Shared%20Documents/Shares/Matlab%20Slurm?csf=1&e=OLWlut






Client batch job example

Below is a sample Matlab program for submitting independent jobs on the cluster. In this script four functions areexectued on the cluster and the results are collected back one job a time back to back in blocking mode (this can beimproved on but that is beyond the scope of this guide).

clc; clear;

% run a function locallyoutput_local = my_linalg_function(80, 300);

% run 4 jobs on the cluster, wait for the remote jobs to finish% and fetch the results.cluster = parcluster('Octopus');

% run the jobs (asyncroneously)for i=1:4

jobs(i) = batch(cluster, @my_linalg_function, 1, {80, 600});end

% wait for the jobs to finishfor i=1:4

status = wait(jobs(i));outputs(i) = fetchOutputs(jobs(i));

end

% define a function that does some linaer algebrafunction results = my_linalg_function(n_iters, mat_sz)

results = zeros(n_iters, 1);for i = 1:n_iters

results(i) = max(abs(eig(rand(mat_sz))));end

end

Note: Fetching outputs will fail if more than one instance of Matlab is connecting to the cluster for that user. So twoMatlab instances on the same client or two Matlab instances on two different clients (one on each client) will causethe synchronization of job results with SLURM to fail. to correct this, you must change the JobStorageLocation in thecluster profile (the local folder to which jobs are synched)

Note: For communicating jobs using shared memory or MPI the jobs should be submitted on the cluster directly andit is not possible to submit such jobs through the client in the configuration described above.

Matlab on the compute nodes of the cluster

This configuration allows the user to run MATLAB scripts on the HPC cluster directly through the scheduler. Once thejobs are complete the user can choose to transfer the results to a local machine and analyze them or analyze everythingon the cluster as well and e.g retrieve a final product that could be a plot or some data files. This setup does not requirethe user to have matlab installed on their local machine.



Serial jobs

No setup is required to run a serial job on the cluster.

The following job script (matlab_serial.sh) can be used to submit a serial job running the matlab scriptmy_serial_script.m.

#!/bin/bash

#SBATCH --job-name=matlab-smp#SBATCH --partition=normal

#SBATCH --nodes=1#SBATCH --ntasks-per-node=1#SBATCH --cpus-per-task=1#SBATCH --mem=16000#SBATCH --time=0-01:00:00

module load matlab/2018b

matlab -nodisplay -r "run('my_smp_script.m'); exit" > matlab_${SLURM_JOBID}.out

ticvalues = zeros(200);for i = 1:size(values, 2)

values(i) = sum(abs(eig(rand(800))));endtoc

disp(sum(sum(values)));

The following should be present in the output

Elapsed time is 113.542701 seconds.checksum = 9.492791e+05

Note: the Elapsed time could vary slightly since the execution time depends on the load of the compute node (ifit is not the only running process) and the checksum could vary slightly since it is based on randon numbers.

Single node (shared memory - SMP) parallel jobs

No setup is required to run a shared memory job on the cluster. Whenever parallelism is required, Matlab will spawnthe needed workers on the local compute node.

The following job script (matlab_smp.sh) can be used to submit a serial job running the matlab scriptmy_smp_script.m.

Note: the only differences with a serial job are:

• the names of the script.

• --nodes=1 must be specified otherwise the resources would be allocated on other nodes and would not beaccessible by matlab.

• specify the parallel profile in the .m script e.g parpool('local', 64)



• for is replaced with parfor in the .m matlab script.

#!/bin/bash

#SBATCH --job-name=matlab-smp#SBATCH --partition=normal


module load matlab/2018b

matlab -nodisplay -r "run('my_smp_script.m'); exit" > matlab_${SLURM_JOBID}.out

for example, the content of my_smp_script.m could be:

parpool('local', 64)ticvalues = zeros(200);parfor i = 1:size(values, 2)

values(i) = min(eig(rand(800)));endtoc

The following should be present in the output

Elapsed time is 10.660034 seconds.checksum = 9.492312e+05

Note: the Elapsed time could vary slightly since the execution time depends on the load of the compute node (ifit is not the only running process) and the checksum could vary slightly since it is based on randon numbers.

1.8.6 Pari - Computer Algebra

Introduction

“PARI/GP is a specialized computer algebra system, primarily aimed at number theorists, but can be used by anybodywhose primary need is speed.”

For more assisstance on using PARI please refer to the official online user guide

Running Pari (single threaded)

Sample job script

#!/bin/bash

#SBATCH --job-name=pari-serial



https://pari.math.u-bordeaux.fr/pub/pari/manuals/2.11.1/users.pdf



#SBATCH --partition=normal


module load pari

gp < my_pari_commands.gp

where the content of my_pari_commands.gp could be

x = 1y = 2z = x + yprint(z)

1.8.7 Enzo - multi-physics hydrodynamic astrophysical calculations

The enzo enviroment can be activated by executing:

$ module load enzo

The pre-built enzo executable is compatible with run/Hydro/Hydro-3D like problems or run/Cosmology/SphericalInfall

Assuming that the problem being solved is a run/Hydro/Hydro-3D like problem, the following script submitsand runs the job on the cluster using 32 cores in total. The tasks are spread across 4 nodes with 8 cores on each node:

1 #!/bin/bash2

3 ## specify the job and project name4 #SBATCH --job-name=enzo5 #SBATCH --account=96397176

7 ## specify the required resources8 #SBATCH --partition=normal9 #SBATCH --nodes=4

10 #SBATCH --ntasks-per-node=811 #SBATCH --cpus-per-task=112 #SBATCH --mem=3200013 #SBATCH --time=0-01:00:0014

15 module purge16 module load enzo17 srun --mpi=pmix enzo.exe SphericalInfall.enzo

After creating the job file job.sh, submit the job to the scheduler

$ sbatch job.sh

To compile enzo from the source code see BuildEnzoFromSource. To compile enzo from a custom problem seeBuildEnzoFromSourceCustomProblem.



Building Enzo from source

external references

http://enzo-project.org/BootCamp.html https://grackle.readthedocs.io/en/latest/ https://grackle.readthedocs.io/en/latest/Installation.html#downloading

dependencies

• mercurial

• servial version of hdf5 1.8.15-patch1

• openmpi

• gcc-9.1.0

• libtool (for grackle)

• Intel compiler (optional)

All these dependencies/prerequisites can be loaded through

$ module load enzo/prerequisites

• download (or clone with mercurial) the enzo source code and extract it

$ wget https://bitbucket.org/enzo/enzo-dev/get/enzo-2.5.tar.gz$ tar -xzvf enzo-2.5.tar.gz$ cd enzo-enzo-dev-2984068d220f

• configure it

$ ./configure$ make machine-octopus$ make show-config

This command will modify the file ``Make.config.machine``

• after preparing the build make files, execute

$ make -j8

to compile and produce the enzo executable

• The installation has been tested with the problem CollapseTestNonCosmologicalthat is located at in the directory ENZO_SOURCE_TOPDIR/run/Hydro/Hydro-3D/CollapseTestNonCosmological/.

Build enzo for a custom problem

To build enzo for the sample simulation e.g. run/CosmologySimulation/AMRCosmology after loading theenzo environment:

$ module load enzo


http://enzo-project.org/BootCamp.html

https://grackle.readthedocs.io/en/latest/

https://grackle.readthedocs.io/en/latest/Installation.html#downloading

https://grackle.readthedocs.io/en/latest/Installation.html#downloading


Note: this loaded version of enzo might not be compatible with your custom problem, so a simulation run mightfail. But the enzo executable in the loaded enzo environment is needed to produce the Enzo_Build file needed tocompile enzo for your custom problem. Or you can produce an Enzo_Build file yourself if you know what your aredoing.

• execute

$ enzo AMRCosmology.enzo

in the directory run/CosmologySimulation/AMRCosmology

• this will generate a Enzo_Build file

• copy Enzo_Build to the enzo source code dir

$ cp run/CosmologySimulation/AMRCosmology/Enzo_Build src/enzo/Make.settings.→˓AMRCosmology

• configure the make files with the new settings

$ make machine-octopus$ make show-config$ make load-config-AMRCosmology$ make show-config

This will change the content of Make.config.override

• build enzo

$ make -j8

• copy the produced enzo.exe to the problem directory

• for this AMRCosmology also the inits.exe and ring.exe must be built. From the top level sourcedirectory

# to build inits.exe$ cd src/inits$ make -j8

# to build ring.exe$ cd src/ring$ make -j8

• copy also input/cool_rates.in into the simulation directory AMRCosmology and follow the instruc-tions in notes.txt to start the simulation

Note: make sure to use ./enzo.exe (that is the executable built for your problem) instead of just enzo.exe (thatis the executable in the default enzo environment) in the script job.sh.

Build enzo with the Intel compiler

To produce the enzo Makefile with the needed Intel compiler flags:

• create copy the octopus makefile



cp Make.mach.octopus Make.mach.octopus-intel-2019u3

• do the following changes to Make.mach.octopus-intel-2019u3

LOCAL_GRACKLE_INSTALL = /apps/sw/grackle-3.2-intel-2019u3MACH_CC_MPI = mpiicc # C compiler when using MPIMACH_CXX_MPI = mpiicpc # C++ compiler when using MPIMACH_FC_MPI = mpiifort # Fortran 77 compiler when using MPIMACH_F90_MPI = mpiifort # Fortran 90 compiler when using MPIMACH_LD_MPI = mpiicpc # Linker when using MPIMACH_CC_NOMPI = icc # C compiler when not using MPIMACH_CXX_NOMPI = icpc # C++ compiler when not using MPIMACH_FC_NOMPI = ifort # Fortran 77 compiler when not using MPIMACH_F90_NOMPI = ifort # Fortran 90 compiler when not using MPIMACH_LD_NOMPI = icpc # Linker when not using MPIMACH_OPT_AGGRESSIVE = -O3 -xHost

#-----------------------------------------------------------------------# Precision-related flags#-----------------------------------------------------------------------

MACH_FFLAGS_INTEGER_32 = -i4MACH_FFLAGS_INTEGER_64 = -i8MACH_FFLAGS_REAL_32 = -r4MACH_FFLAGS_REAL_64 = -r8

LOCAL_LIBS_MACH = -limf -lifcore # Machine-dependent libraries

• to compile with the new intel makefile

$ module load intel/2019u3$ module load grackle/3.2-intel$ make machine-octopus-intel-2019u3$ make opt-high

# (optional - use if needed)# or for agressive optimization, before publishing results with such# agressive optimization, check the scientific results with those# run with opt-high or even opt-debug$make opt-aggressive

# compile enzo$ make -j

Compile Grackle and build it

• download the Grackle source code and extract it (or clone it from github)

create a copy of the makefile Make.mach.linux-gnu

$ cp Make.mach.linux-gnu Make.mach.octopus

in the Octopus makefile, specify by specifying the path to HDF5 and the install prefix:



MACH_FILE = Make.mach.octopusLOCAL_HDF5_INSTALL = /apps/sw/hdf/hdf5-1.8.15-patch1-serial-gcc-7.2MACH_INSTALL_PREFIX = /apps/sw/grackle/grackle-3.2

To build and install grackle, execute:

$ cd grackle/src/clib$ make machine-octopus$ make$ make install

Compile grackle with the Intel compiler

Copy the makefile and create one specific to the Intel compiler

$ cp Make.mach.octopus Make.mach.octopus-intel-2019u3

MACH_CC_NOMPI = icc # C compilerMACH_CXX_NOMPI = icpc # C++ compilerMACH_FC_NOMPI = ifort # Fortran 77MACH_F90_NOMPI = ifort # Fortran 90MACH_LD_NOMPI = icc # LinkerMACH_INSTALL_PREFIX = /apps/sw/grackle/grackle-3.2-intel-2019u3

To build and install it, execute:

$ module unload gcc$ module load intel/2019u3$ cd grackle/src/clib$ make machine-octopus-intel-2019u3$ make$ make install

Using Yt to postprocess Enzo snapshots

Yt can be used either in interactive mode by submitting an interactive job interactive job, through jupyter notebooks,or by using python script via regular batch jobs. A simple visualization can be produced by executing the following(after a job is allocated):

import ytds = yt.load("/home/john/my_enzo_simulation/DD0000/DD0000")print ("Redshift =", ds.current_redshift)p.save()

A .png file will be saved to disk in this case. In an interactive job the images can be viewed live.

1.8.8 The Ansys Fluent environment

Overview

In-order to use Ansys Flutent a VNC session is need. Execution can be done in batch mode, but that is beyondthe scope of this guide.



There are three versions of fluent installed on octopus:

• 17.2

• 18.2

• 19.0

to load the latest version, the following command can be used in a terminal:

$ module load ansys/fluent/19.0

Job script template

The following job script template can be used (for a VNC session):

#!/bin/bash

## specify the job and project name#SBATCH --job-name=my_fluent_job#SBATCH --account=6544724


### DO NOT EDIT BEYOND HERE UNLESS YOU KNOW WHAT YOU ARE DOINGsource ~/.bashrc

VNC_HEAD_PORT=$(random_unused_port)echo "VNC_HEAD_PORT = ${VNC_HEAD_PORT}"

JOB_INFO_FPATH=~/.vnc/slurm_${SLURM_JOB_ID}.vnc.outrm -f ${JOB_INFO_FPATH}

VNC_SESSION_ID=$(vncserver 2>&1 | grep "desktop is" | tr ":" "\n" | tail -n 1)echo ${VNC_SESSION_ID} >> ${JOB_INFO_FPATH}

ssh -R localhost:${VNC_HEAD_PORT}:localhost:$((5900 + ${VNC_SESSION_ID})) ohead1 -N &SSH_TUNNEL_PID=$!echo ${SSH_TUNNEL_PID} >> ${JOB_INFO_FPATH}

slurm_hosts_to_fluent_hosts

sleep infinity

Running Ansys Fluent

Fluent in shared memory mode

In this configuration, Fluent can be run either in serial mode (one core) or in shared memory (SMP) mode usingup to the max number of cores and all the available memory of a compute node.



To run Fluent in local mode using one or multiple cores on the same machine, execute:

$ module load ansys/fluent/19.0$ fluent

in a terminal, the Fluent launcher should open in the desktop

Note: use #SBATCH --nodes=1 in the job script.

Fluent in distributed mode

For simulations that do not fit in a single node, Fluent can automatically allocate resources on multiple nodes. Inthis case the following steps must be followed:

• open the fluent laucher in the same procedure done for shared memory mode

• select the number of cores (step 1 in the figure below)

• click on the Parallel Settings tab (step 2 in the figure below)

• set the File Containing Machine Names (step 3 in the figure below). Each job will have a differentfile name that is prefix by the slurm job id.

Note: For example to use four nodes use #SBATCH --nodes=4 in the job script to use for node. To run asimulation on 128 cores you can use:

• #SBATCH --nodes=4 and #SBATCH --ntasks-per-node=32, this will allow you to use up to 1024GB ram.

• #SBATCH --nodes=2 and #SBATCH --ntasks-per-node=64 will grant you access to 512 GB ram.



1.8.9 Molecular dynamics

LAMMPS

#!/bin/bash

#SBATCH --job-name=my_lammps_job#SBATCH --partition=normal


## load modules here

## run the program

GROMACS

#!/bin/bash

#SBATCH --job-name=my_gromacs_job#SBATCH --partition=normal

#SBATCH --nodes=1(continues on next page)




#SBATCH --ntasks-per-node=1#SBATCH --cpus-per-task=1#SBATCH --mem=8000#SBATCH --time=0-01:00:00#SBATCH --account=foo_project


## run the program

HOOMD

#!/bin/bash

#SBATCH --job-name=my_hoomd_job#SBATCH --partition=normal



## execute the notebook

1.8.10 Abaqus

Abaqus is an application that is used for solving structural simulation of multi-physics problems.

There are two main modes of running Abaqus on Octopus:

• batch mode (using the command line or scripts)

• graphical user interface mode (using desktop sessions)

To use any of these two the abaqus module should be loaded:

module load abaqus

Graphical user interface mode

To launch the Abaqus GUI after connecting to a desktop environment, to launch CAE (Complete Abaqus Environ-ment) the following command can be executed in a terminal:

module load abaqusabaqus cae


https://www.3ds.com/products-services/simulia/products/abaqus/


Template Abaqus job (batch mode)

The following job script can be used as a template to run abaqus batch jobs on one compute node.

#!/bin/bash

## specify the job and project name#SBATCH --job-name=abaqus#SBATCH --account=ab123


source ~/.bashrcmodule load abaqus/2020abaqus job=my_abaqus_sim_name input=my_sim.inp cpus=`nproc` mp_mode=threads→˓interactive

Multi-node parallel Abaqus jobs

The following job script below can be used to run a parallel Abaqus job using multiple compute nodes in batch mode.The script below can be downloaded by clicking here

If a graphica user interface job is used then the following script should be executed in the same folder where thecommand abaqus cae -mesa is executed:

module load abaquschmod +x slurm_abaqus_mpi_env_gen.sh./slurm_abaqus_mpi_env_gen.shabaqus cae -mesa

After the job is executed MPI must be selected in the the ABAQUS job in the GUI.





script for running a multi-node parallel job in batch mode

#!/bin/bash

## specify the job and project name#SBATCH --job-name=abaqus#SBATCH --account=ab123

## specify the required resources#SBATCH --partition=large#SBATCH --nodes=4#SBATCH --ntasks-per-node=1#SBATCH --cpus-per-task=64#SBATCH --mem=64000#SBATCH --time=1-00:00:00

source ~/.bashrcmodule load abaqus/2020

############################################################### DO NOT MODIFY BEYOND THIS UNLESS YOU KNOW WHAT YOU ARE DOING##############################################################

# dump the hosts to a text fileSLURM_HOSTS_FILE=slurm-hosts-${SLURM_JOBID}.out

## generate the mp_host_list environment variable#srun hostname > ${SLURM_HOSTS_FILE}

mp_host_list="["for HOST in `sort ${SLURM_HOSTS_FILE} | uniq`; do

echo ${HOST}mp_host_list="${mp_host_list}""['${HOST}',`grep ${HOST} ${SLURM_HOSTS_FILE} | wc -

→˓l`]"done

mp_host_list=`echo ${mp_host_list} | sed 's/\]\[/\]\,\[/g'`"]"

echo $mp_host_list

## write the abaqus environment file#ABAQUS_ENV_FILE="abaqus_v6.env"cat > ${ABAQUS_ENV_FILE} << EOFimport osos.environ['ABA_BATCH_OVERRIDE'] = '1'verbose=3mp_host_list=${mp_host_list}if 'SLURM_PROCID' in os.environ:

del os.environ['SLURM_PROCID']EOF

abaqus job=my_input_file.inp cpus=$SLURM_NPROCS` -verbose 3 standard_parallel=all mp_→˓mode=mpi interactive



1.8.11 CST

To run CST in interactive (GUI) mode, a VNC job should be allocated. Once the job runs and the connection to theVNC session through a VNC client is done by the user, the following can be executed to start CST GUI:

module load cst/2019cst_design_environment_gui

The license server is:

server: feacomlicense.win2k.aub.edu.lbport: 31000

Once the application runs, the opened application should look similar to the following: