Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Cluster System Handbook
Leibniz Universität IT ServicesScientific Computing Group
April 20, 2020
Leibniz Universität IT Services
Contents
1 I am a new user - Quick Start 5
2 About the cluster system 6
2.1 Getting Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 What the cluster system may be used for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Computing power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Forschungscluster-Housing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Contact & Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Connecting to the cluster system & file transfer 8
3.1 Connecting from Linux or Mac OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 File transfer using Linux or Mac OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Connecting from Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 X2Go client configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 X2Go Broker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 Cluster Web Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.6.1 How to connect to the web portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.7 Tipps for working with graphical 3d-applications . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.8 File transfer under Windows using FileZilla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.9 Connecting from outside the university’s network . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 File systems 19
4.1 Quota and grace time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Bigwork’s file system Lustre and stripe count . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 $TMPDIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Excercise: Using file systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.5 Excercise: setting stripe count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.6 Excercise: altering stripe count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Modules & application software 24
5.1 Working with modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Exercise: Working with modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 Working with the cluster system 28
6.1 Login nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Interactive jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 Batch jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Leibniz Universität IT Services
6.3.1 Converting jobscripts written under windows . . . . . . . . . . . . . . . . . . . . . . . . 30
6.4 PBS options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.5 PBS environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.6 PBS commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.6.1 pbsnodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.7 Queues & partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.7.1 GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.7.2 Forschungscluster-Housing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.8 Maximum resource requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.9 Excercise: interactive job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.10 Excercise: batch job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7 SLURM usage guide 35
7.1 The SLURM Workload Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.3 Interactive jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.4 Submitting a batch script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.4.1 An example of a serial job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.4.2 Example of an OpenMP job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.4.3 Example of an MPI job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.4.4 Job arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.5 SLURM environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.6 GPU jobs on the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.7 Job status and control commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.7.1 Query commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.7.2 Job control commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.7.3 Job accounting commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8 Application software 46
8.1 Build software from source code on the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.2 EasyBuild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.2.1 EasyBuild framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.2.2 How to build your software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.2.3 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.3 Singularity containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.3.1 Singularity overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.3.2 Singularity containers on cluster system . . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.3.3 Install Singularity on your local computer . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.3.4 Create a Singularity container using recipe file . . . . . . . . . . . . . . . . . . . . . . . 50
8.3.5 Create a Singularity container using Docker or Singularity Hub . . . . . . . . . . . . . . 52
8.3.6 Upload the container image to your BIGWORK directory at the cluster system . . . . . . 52
Leibniz Universität IT Services
8.3.7 Running a container image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8.3.8 Singularity & parallel MPI applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8.3.9 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.4 Hadoop/Spark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.4.1 Hadoop - setup and running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.4.2 Spark - setup and running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.4.3 How to acces the web management pages provided by a Hadoop Cluster . . . . . . . . . 56
8.4.4 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.5 MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.5.1 Versions and Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.5.2 Running MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.5.3 Parallel Processing in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.5.4 Using the Parallel Computing Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.5.5 Build MEX File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.5.6 Toolboxes and Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.5.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.6 NFFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.7 ANSYS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.7.1 ANSYS Workbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.7.2 ANSYS Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.8 COMSOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.8.1 Prerequisite for use on the cluster system . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.8.2 Using COMSOL on the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9 Transferring files into the archive 67
9.1 Quota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.2 Transferring data into the archive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.3 Login with lftp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.4 Copying files into the archive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.5 Fetching files from the archive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.6 Some useful commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.7 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
10 Citing the cluster system 70
11 When your work is done 71
Leibniz Universität IT Services
Preface
This handbook is meant to facilitate your work with the clus-ter system of Leibniz Universität Hannover. Please take sometime to read through this document and do not hesitate tocontact the cluster team, [email protected],if you have any questions. If you think you have no timeat all, please make sure you are at least subscribed to theCluster-News mailing list, in order to receive announcementsconcerning the cluster system.
Yours sincerely,Cluster-Team
4
Leibniz Universität IT Services
I am a new user - Quick Start
Please note: If you have absolutely no time at all, at least read this one page. Please make sure, the followingpoints are met.
• Did you receive an email, confirming your addition to the Cluster-News mailing list? Importantannouncements, for example maintenance periods, are announced on this list.
• Did you change your password? You can change your password using the command passwd.
• Consider the cluster system as a tool to facilitate your research. Mastering any tool takes time. Considerattending an introductory talk and reading this Cluster Handbook.
Leibniz Universität IT Services
About the cluster system
In order to meet the University’s demand for computing resources with a lot of CPUs and memory, LUIS isoperating a cluster system as part of the service Scientific Computing. All scientists of Leibniz University canuse the cluster system for their research free of charge.
user pc at institue
login nodes
batch system
FCH
SMP Helena
Tane Taurus
Lena Haku
Home Bigwork Project
ssh, X2Go
Job ID qsub
Job
compute cluster(s) GigabitEthernet
InfiniBand
Figure 2.1: Sketch of the cluster system with it’s individual components
Resources of the cluster system are largely a DFG major instrumentation. Therefore rules1 for DFG majorinstrumentation apply when using the cluster system. Project leaders of your EDV-Project bear responsibilityto comply with the DFG rules.
2.1 Getting Access
An EDV-Project is the framework under which you will conduct your work on the cluster system. Probablythere is already an EDV-Project leader at your institute. In this case this person can create a cluster user accountfor you using BIAS. In case you need to apply for an EDV-Project yourself, please use form ORG.BEN 42.
Please note: Create only one user account per person.
1www.dfg.de2User accounting on BIAS is not part of the service Scientific Computing and thus not part of the cluster system
Leibniz Universität IT Services
2.2 What the cluster system may be used for
Parts of the cluster system are DFG major instrumentation, thus rules for DFG major instrumentation applywhen using the cluster system. Furthermore software licenses are valid for research and teaching only.Accordingly the cluster system must only be used for research and teaching activities.
2.3 Computing power
Currently the cluster system consists of the following compute resources, see table 6.7.
2.4 Forschungscluster-Housing
Additionally institutes can integrate their own hardware for use within the cluster system in the service calledForschungscluster-Housing (FCH). This hardware is accessible only to the respective institute during the day,i.e. between eight o’clock in the morning and eight o’clock in the evening. During night-time all cluster usershave access to the Forschungscluster-Housing resources.
If you get directed to a machine that does not fit the name scheme of our main clusters during that time period,it is most likely a FCH participant. For information about having your own hardware in the Forschungscluster-Housing, please get in touch.
2.5 Contact & Help
For all cluster related questions, please contact [email protected].
7
Leibniz Universität IT Services
Connecting to the cluster system & file transfer
The following addresses should be used to connect to the cluster system:
login.cluster.uni-hannover.de in order to submit jobs.
transfer.cluster.uni-hannover.de whenever you need to transfer data.
Please note: Execution time is limited to 30 minutes on all login nodes.
Thus transfers will be aborted on all login nodes if the execution time limit is reached. On the transfer nodeexecution time is unlimited, but on the other hand it is not possible to submit jobs on the transfer node.
3.1 Connecting from Linux or Mac OS
In order to connect to the cluster system from Linux or Mac OS an ssh client is required, which comes with mostdistributions by default. The following command will establish a connection to the cluster system. Replaceusername with your cluster user name.
If you need to use graphical programmes on the cluster system, the option -X, which enables X11 forwarding,is required.
ssh -X [email protected]
Alternatively you can connect to the cluster system graphically using X2Go. This client is part of most Linuxdistributions package repositories. Please configure your X2Go client as described in chapter 3.4. Furthermoreyou can use the X2Go broker mode with this client, see section 3.5.
3.2 File transfer using Linux or Mac OS
There is a special node dedicated to data transfer with the cluster system. Whenever you transfer data with thecluster system use the special transfer node:
transfer.cluster.uni-hannover.de
Files can be transferred to and from the cluster system in a number of ways. For single files scp can be used. Itis recommended to use rsync for file transfers. Additionally you can use FileZilla if you would like to usea graphical tool. Information on how to configure FileZilla for use with the cluster system can be found insection 3.8.
Please note: Use dedicated transfer node for file transfers, because processes are aborted after 30 minutes onthe login machines.
3.3 Connecting from Windows
In order to connect to the cluster system from Windows additional software is needed. Using graphicalprograms on login nodes or compute nodes is possible with the help of an x-window client. Following areinstructions on how to install and configure the X2Go client under Windows 7 for use with the cluster system.This guide is based on the X2Go-client version 4.0.2.1+hotfix1, which can be obtained from the followingURL. After downloading you may install the program to a folder of your choice. Furthermore we recommendusing X2Go broker, see section 3.5.
Leibniz Universität IT Services
3.4 X2Go client configuration
Please note: We recommend using X2Go in the broker mode for graphical connections, see section 3.5.
After starting the X2Go client, either using a desktop short-cut or using the start menu, a configurationdialogue is displayed. In this dialogue you should specify a session name and make the following four entries.
1. Host: login.cluster.uni-hannover.de
2. Login: Your user name
3. SSH-Port: 22
4. Session type: XFCE
The completed configuration dialogue is depicted in figure 3.1. Entries in the red boxes have to be setaccordingly. Afterwards leave the configuration assistant by clicking the OK button.
Figure 3.1: X2Go configuration dialogue, entries in red boxes have to be set.
On the right side of the main window the newly created session name is displayed, see figure 3.2. You can startthis session by clicking on the session name (in the upper right corner in figure 3.2) or by entering the sessionname in the dialogue box named session.
Figure 3.2: Start a new session by clicking or entering the session name.
9
Leibniz Universität IT Services
The first time a connection is established, the login nodes’ host-key is unknown. A notification will pop up andyou need to accept the host-key (see figure 3.3) by pressing yes.
Figure 3.3: Host key verification dialogue
Please note: Current host key hashes are listed on our website1.
After a connection was successfully established an XFCE Desktop is displayed as depicted in figure 3.4. The
Figure 3.4: XFCE desktop with applications menu in the lower left corner
Applications Menu in the bottom left corner can be used to start a console window and then load modules orsubmit jobs into the queue. You can open editors e.g. to write or edit batch scripts. Particularly interactive jobswhich open graphical program windows can be run. To end your session either go to the Applications Menu orpress the little green icon in the bottom right corner of your desktop.
Please note: Suspended X2Go sessions will be terminated after four weeks without prior notice.
3.5 X2Go Broker
Please note: Windows users must install the X2Go-Client version 4.1.2.0
If you would like to reconnect to a graphical session, use X2Go broker. For example you could start a sessionat the university and reconnect to it at home. In order to do this, you have to establish a connection throughthe X2Go broker. After installing an X2Go client, proceed as described below.
• X2Go broker on Linux
Use the following command to establish a connection with X2Go broker (the following command shouldbe on one line. Replace <username> with your cluster username).
1https://www.luis.uni-hannover.de/scientific_computing.html
10
Leibniz Universität IT Services
x2goclient--broker-url=ssh://<username>@x2gobroker.cluster.uni-hannover.de/usr/bin/x2gobroker--broker-autologin
• X2Go broker on Windows
Either edit the existing shortcut to X2Go or create a new one (chose “Eigenschaften” in Picture 3.5).
Figure 3.5: Edit X2Go shortcut
Extend the command given as “Ziel”, see Picture 3.6, with the following parameters (the followingcommand should be on one line. Replace <username> with your cluster username).
"[...]x2goclient.exe"--broker-url=ssh://<username>@x2gobroker.cluster.uni-hannover.de/usr/bin/x2gobroker
After providing your password, a session is listed in the X2Go window, see picture 3.7. Choose this session. Youwill get a desktop on the cluster system. You can reconnect to this session and continue working graphically.
Please note: Suspended X2Go sessions will be terminated after four weeks without prior notice.
11
Leibniz Universität IT Services
Figure 3.6: X2Go broker command
Figure 3.7: X2Go broker session
12
Leibniz Universität IT Services
3.6 Cluster Web Portal
The web interface powered by the software package Open OnDemand allows you to access the LUIS clusterresources using a web browser without installing additional software on your personal workstation. Currently,the portal works with newer versions of Chrome (22+), Firefox (32+) and Safari(13+). Internet Explorer is notfully supported. Compatibility to mobile devices may be provided by the project in the future.
From within the Open OnDemand environment, you can:
• Create, submit, cancel and monitor batch jobs
• Open a terminal connection to the cluster login servers
• Browse, edit, download and upload files in/to your HOME directory
• Run noVNC Remote Desktop sessions connecting to compute nodes for interactive applications using aGUI
• Run other preconfigured interactive applications like Jupyter Notebook, MATLAB or COMSOL
The Open OnDemand website contains additional information about the current and future directions of theproject.
3.6.1 How to connect to the web portal
Please note: To access the portal, make sure you are connected to the University network, e.g. via the LUISVPN Service.
Log on to the cluster web portal using your normal cluster login credentials by opening a new page in yourweb browser pointing to https://weblogin.cluster.uni-hannover.de (see figure 3.8).
Figure 3.8: Cluster web portal login page
Once you have been connected to the portal, you will be presented with the main dashboard page, see figure 3.9.There, you will find several menus to enable access to the different Applications for File Browsing, Job
13
Leibniz Universität IT Services
Management and Interactive Computing. Tutorial videos in the section “Getting started with OnDemand” ofthe dashboard page explain further details of using the web portal.
Figure 3.9: Cluster web portal dashboard page
14
Leibniz Universität IT Services
3.7 Tipps for working with graphical 3d-applications
When working with a 3d-application that interactively displays 3-dimensional representations of objects, youmay experience awkwardly slow rendering. If your application uses OpenGL – which should be the case formany 3d-software packages running on a Linux system – you may try the following to speed up things:
• To work from the command line of a Linux workstation:
Get the VirtualGL package installed on your workstation. This provides the vglconnect command, whichyou use in the following way to connect to the cluster:
vglconnect -bindir /opt/VirtualGL/bin \-s <your_account_name_here>@login.cluster.uni-hannover.de
Now set up and check the connection as described below.
• Running a 3d-application from within an X2Go-/X2Go-Broker session:
Start an X2Go session as described in section 3.5. Set up and check the connection as described in thefollowing paragraph.
Test the connection and check whether the animations feel “smooth”:
vglrun -fps 60 glxgearsvglrun -fps 60 /opt/VirtualGL/bin/glxspheres64 -p 10000 -n 20
An example how to use VirtualGL with a “real” application:
module load ANSYSvglrun -fps 60 cfx5
Compare this performance to just running glxgears or glxspheres64 without the vglrun command. You may ormay not experience a difference, depending on your application. If things don’t work out yet and you wantto try other settings, we recommend a look into chapter 15 of the VirtualGL documentation, which containsrecipes for various applications. 2
Hint: in case a software does not seem to work with vglrun, you may try to first load the following modules:
module load GCC/7.3.0-2.30 OpenMPI/3.1.1 Mesa/.18.1.1
2https://www.virtualgl.org
15
Leibniz Universität IT Services
3.8 File transfer under Windows using FileZilla
The FileZilla client may be used to exchange files on a machine running windows with the cluster system.In the following section we provide instructions on how to install and configure FileZilla client version3.14.1_win64 on Windows 7. FileZilla can be obtained from the following URL. After downloading, you caninstall the FileZilla client to a directory of your choice.
After installing, open the Site Manager and create a new server which you can then connect to. The followingoptions have to be set (cf. the red boxes in figure 3.10).
• Host: transfer.cluster.uni-hannover.de
• Protocol: SFTP - SSH File Transfer Protocol
• Logon Type: Ask for password
• User: Your user name
Figure 3.10: Site Manager: General
Furthermore, it is possible to open the remote connection directly to $BIGWORK. Without further configuration,the remote directory will be set to $HOME. In order to configure this option, go to the Advanced tab and setDefault remote directory accordingly, see figure 3.11.
Figure 3.11: Site Manager: Advanced
The first time a connection to the transfer node is made, you will need to confirm the authenticity of the node’shost-key - cf. figure 3.12.
After a connection is successfully established – cf. figure 3.13 – you can exchange data with the cluster system.
16
Leibniz Universität IT Services
Figure 3.12: Host-key verification on first connection attempt
Figure 3.13: Connection to transfer-node, established with FileZilla
17
Leibniz Universität IT Services
3.9 Connecting from outside the university’s network
The cluster system in not reachable from outside the university’s network. In order to connect from outsideyou have to establish a connection with the university’s network first, e.g. by using a VPN. The colleaguesfrom the network team provide a VPN service.
LUIS VPN Service
After establishing a connection to the university’s network you can connect to the cluster system as usual.
Please note: Connection speed from outside networks will mostly be slower than from your office. This cancause applications which provide a graphical user interface to respond slowly and make working with themhard. If gui windows respond slowly, it is due to the technical nature and not a technical problem with thecluster system.
18
Leibniz Universität IT Services
File systems
There are two storage systems which make two file systems globally available, i.e. available on every node.These are depicted in figure 4.1.
$HOME Your home directory. Comparatively few space available but with a daily backup. Thus only the mostimportant files should be saved here. $HOME is connected through gigabit ethernet and thus rather slowcompared to $BIGWORK.
$BIGWORK Your bigwork directory. Comparatively much space available but without backup. It is meantas a work directory. All computations should write to this directory. $BIGWORK is connected throughInfiniBand technology and thus much faster than $HOME. It is referred to as work or scratch file systemand should be regarded as such. After finishing work on an idea clear the board, i.e. delete unneededfiles.
$PROJECT Your project directory. Each project(from the BIAS point of view) is provided by reasonable largeamount of $BIGWORK independent high performance(lustre, InfiniBand) project storage for long timeretaining of your data. All members of a project have read&write access to the project storage areawhich is available at /project/<your-cluster-groupname> (or you can use the variable $PROJECTto access your group’s project storage). If you wish to save your personal files on project storage, itis recommended that you create the directory $PROJECT/$USER with proper access rights, mkdir -m0700 $PROJECT/$USER, and store your files there. Each group’s initial quota for project storage willbe 10 TB. The project storage is visible on login and transfer machines only, but not from cluster worknodes. This means that the project storage can not be used as an input&output for your jobs runningon the cluster work nodes(instead use $BIGWORK). You can use either the cluster login nodes or thetransfer(recommended) machine to move your files from $BIGWORK to $PROJECT or vice versa. Ingeneral, $PROJECT storage is not intended to be used for heavy computation, instead for long timeretaining of your cluster data, which might need to have a fast access to $BIGWORK storage. Therefore,in terms of IOPS it is relatively slower than $BIGWORK storage. Project storage is not backed up.
Store your most important data on $HOME, which you can not risk losing and could otherwise not recreate.Simulation’s intermediate results should be written to $BIGWORK because these can be recreated by runningthe simulation again. Here you have much more storage space available which is also connected faster usingInfiniBand. When using $HOME to save files during computations you will very likely slow down yourcomputation. Thus make sure all default directories are set to $BIGWORK. These include temporary directoriesand those automatically set by applications.
Under no circumstances should a link to $BIGWORK be created in your home directory. In this scenario datawritten to $BIGWORK would be passed through $HOME and from there to $BIGWORK and vice versa forreading data. The environment variable $BIGWORK should be used instead.
Please note: Backing up your data regularly from $BIGWORK to $PROJECT storage or to your institute’s serveris essential, since $BIGWORK is designed as scratch file system.
4.1 Quota and grace time
On both storage systems only a fraction of the whole disk space is available to you, which is your quota. Thereis a soft quota and a hard quota. A hard quota is an upper bound which can not be exceeded. The soft quota onthe other hand can be exceeded. Exceeding your soft quota starts the grace time. During this grace time youare allowed to exceed your soft quota up to your hard quota. After this period you will not be abale to storeany more data, unless you reduce disk space usage below the soft quota. If your disk space consumption fallsbelow the soft quota, your grace time counter is reset.
Leibniz Universität IT Services
Home
Bigwork
Project
$HOME $BIGWORK $PROJECT
˝ /home/username
˝ 10 GB soft quota
˝ 12 GB hard quota
˝ All cluster nodes˝ backup
˝ Gigabit Ethernet
˝ /bigwork/username
˝ 100 GB soft quota
˝ 1 TB hard quota
˝ All cluster nodes˝ no backup
˝ InfiniBand
˝ /project/groupname
˝ 10 TB soft quota
˝ 12 TB hard quota
˝ login & transfer nodes
˝ no backup
˝ InfiniBand
Figure 4.1: Cluster file systems with specifications
By using the quota mechanism we are trying to limit individual disk space consumption and keep the systemperformance as high as possible. Please delete files which are no longer needed. Low disk space consumptionis especially helpful on $BIGWORK in order to optimise system performance. You can query your disk spaceusage and quota with the following commands, see also exercise in chapter 4.4.
fquota Display $HOME disk usage and quota.
lquota Display $BIGWORK and $PROJECT disk usage and quota.
Please note: If no free space is left on $HOME you will not be able to login graphically any more. Connectingusing ssh (without -X) will still be possible.
4.2 Bigwork’s file system Lustre and stripe count
Please note: All statements made in this section also apply to $PROJECT storage
On the technical level $BIGWORK is comprised of multiple components which make up the storage systemdepicted in figure 4.2. Generally it is possible to use $BIGWORK with default values. However, it can beuseful to alter parameters, especially stripe count. Setting stripe count manually can result in higher individualtransfer rates and increase overall system performance. This not only benefits you but all users of the clustersystem.
Data on $BIGWORK are saved on OSTs, object storage targets. Each OST consists of a number of hard disks. Bydefault data are saved to only one OST when writing to $BIGWORK, regardless of data size. This correspondsto a stripe count of one since stripe count indicates how many OSTs will be used to store data. Striping dataover multiple OSTs can increase data access time. Transfer speeds of multiple OSTs, i.e. multiple hard diskclusters, are combined.
20
Leibniz Universität IT Services
Please note: When working with files larger than 100 MB, please set stripe count manually according tosection 4.5.
Stripe count is set as an integer value representing the number of OSTs to use, with -1 indicating all availableOSTs. It is advised to create a directory below $BIGWORK and set a stripe count of -1 for it. This directory canthen be used to store all files larger than 100 MB. For files smaller than 100 MB a stripe count of one, which isset by default, is sufficient.
Please note: In order to alter the stripe count of existing files, these need to be copied, see section 4.6. Simplymoving files with mv is not sufficient in this case.
Figure 4.2: $BIGWORK’s technical components with object storage targets (OST), object storage servers (OSS),metadata server (MDS) and InifiniBand (IB) switch.
4.3 $TMPDIR
Within jobs $TMPDIR points to local storage available directly on each node. Whenever local storage is needed,$TMPDIR should be used.
Please note: As soon as a job finishes, all data stored under $TMPDIR will be deleted automatically.
Do not simly assume $TMPDIR to be faster than $BIGWORK, but test it. $TMPDIR could be used in case ofapplications that need a temporary directory.
21
Leibniz Universität IT Services
4.4 Excercise: Using file systems
# where are you? lost? print working directory!pwd
# change directory to your bigwork/project/home directorycd $BIGWORKcd $PROJECTcd $HOME
# display your home, bigwork & project quotacheckquota
# make personal directory in your group's project storage# set permissions (-m) so only your account can access# the files in it (0700)mkdir -m 0700 $PROJECT/$USER
# copy the directory mydir from bigwork to projectcp -r $BIGWORK/mydir $PROJECT/$USER
4.5 Excercise: setting stripe count
# get overall bigwork usage, note different fill levelslfs df -h# get current stripe settings for your bigworklfs getstripe $BIGWORK# change directory to your bigworkcd $BIGWORK# create a directory for large files (anything over 100 MB)mkdir LargeFiles# get current stripe settings for that directorylfs getstripe LargeFiles# set stripe count to -1 (all available OSTs)lfs setstripe --count -1 LargeFiles# check current stripe settings for LargeFiles directorylfs getstripe LargeFiles# create a directory for small filesmkdir SmallFiles# check stripe information for SmallFiles directorylfs getstripe SmallFiles
Use newly created LargeFiles directory to store large files
4.6 Excercise: altering stripe count
Sometimes you might not know beforehand, how large files created by your simulations will turn out. In thiscase you can set stripe size after a file has been created in two ways. Let us create a 100 MB file first.
# enter the directory for small filescd SmallFiles# create a 100 MB filedd if=/dev/zero of=100mb.file bs=10M count=10# check filesize by listing directory contentsls -lh# check stripe information on 100mb.filelfs getstripe 100mb.file
22
Leibniz Universität IT Services
# move the file into the large files directorymv 100mb.file ../LargeFiles/# check if stripe information of 100mb.file changedlfs getstripe ../LargeFiles/100mb.file# remove the filerm ../LargeFiles/100mb.file
In order to change stripe, the file has to be copied (cp). Simply moving (mv) the file will not affect stripe count.
First method:
# from within the small files directorycd $BIGWORK/SmallFiles# create a 100 MB filedd if=/dev/zero of=100mb.file bs=10M count=10# copy file into the LargeFiles directorycp 100mb.file ../LargeFiles/# check stripe in the new locationlfs getstripe ../LargeFiles/100mb.file
Second method:
# create empty file with appropriate stripe countlfs setstripe --count -1 empty.file# check stripe information of empty filelfs getstripe empty.file# copy file "in place"cp 100mb.file empty.file# check that empty.file now has a size of 100 MBls -lh# remove the original 100mb.file and work with empty.filerm 100mb.file
23
Leibniz Universität IT Services
Modules & application software
The number of packages, i.e. software, which is installed with the operating system of cluster nodes is kept lighton purpose. Additional packages and applications are provided by a module system which enables you to easilycustomise your working environment on the cluster system. This module system is called Lmod1. Furthermorewe can provide different versions of software which you can use on demand. Loading a module, softwarespecific settings are applied, e.g. changing environment variables PATH, LD_LIBRARY_PATH and MANPATH.
We have adopted a systematic software naming and versioning convention in conjunction with the softwareinstallation system EasyBuild2.
Software installation on the cluster utilizes a hierarchical software module naming scheme. This means that thecommand module avail does not display all installed software modules right away. Instead the modules thatare immediately available to load are displayed only. Upon loading some modules, more modules may becomeavailable. Specifically, loading a compiler module or MPI implementation module will make available all thesoftware built with those applications. This way, he hope prerequisites for certain software become apparent.
At the top level of module hierarchy there are modules for compilers, toolchains and software applicationsthat come as a binary and thus do not depend on compilers. Toolchain modules organize compilers, MPIimplementations and numerical libraries. Currently the following toolchain modules are available:
• Compiler only toolchains
– GCC: GCC + updated binutils
– iccifort: Intel compilers + GCC
• Compiler + MPI toolchains
– gompi: GCC + OpenMPI
– iimpi: Intel compilers + Intel MPI
– iompi: Intel compilers + OpenMPI
• Compiler + MPI + numerical libraries toolchains
– foss: gompi + OpenBLAS + FFTW + ScaLAPACK
– intel: iimpi + Intel MKL
– iomkl: iompi + Intel MKL
5.1 Working with modules
This section explains how to use software modules.
List the entire list of possible modules
module spider
The same but more compact output
module -t spider
Search for specific modules that have "string" in their name
1https://lmod.readthedocs.io/en/latest/010_user.html2https://easybuild.readthedocs.io/en/latest/
Leibniz Universität IT Services
module spider string
Detailed information about particular version of the module (including instructions on how to load the module)
module spider name/version
List modules immediately available to load
module avail
Some software modules are hidden from the avail and spider commands. These are mostly the modules forsystem library packages which other directly used user applications depend on. To list hidden modules youshould provide the --show_hidden option to the avail and spider commands:
module --show_hidden availmodule --show_hidden spider
A hidden module has a dot (.) in font of its version (eg. zlib/.1.2.8).
List currently loaded modules
module list
Load a specific version of a module
module load name/version
If only name is specified, the command will load the default version marked with a (D) in the module availlisting (usually the latest version). Loading a module may automatically load other modules it depends on.
It is not possible to load two versions of the same module at the same time.
To switch between different modules
module swap old new
To unload the specified module from the current environment
module unload name
To clean your environment of all loaded modules
module purge
Show what environment variables the module will set
module show name/version
Save the current list of modules to "name" collection for later use
module save name
Restore modules from collection "name"
module restore name
List of saved collections
module savelist
To get the complete list of options provided by Lmod through the command module type the following
module help
25
Leibniz Universität IT Services
5.2 Exercise: Working with modules
As an example of working with the Lmod modules, here we show how to load the gnuplot module.
List loaded modules
module list
No modules loaded
Find available gnuplot versions
module -t spider gnuplot
gnuplot/4.6.0gnuplot/5.0.3
Determine how to load the selected gnuplot/5.0.3 module
module spider gnuplot/5.0.3
--------------------------------------------------------------------------------gnuplot: gnuplot/5.0.3
--------------------------------------------------------------------------------Description:
Portable interactive, function plotting utility - Homepage: http://gnuplot.sourceforge.net/
This module can only be loaded through the following modules:
GCC/4.9.3-2.25 OpenMPI/1.10.2
Help:Portable interactive, function plotting utility - Homepage: http://gnuplot.sourceforge.net
/
Load required modules
module load GCC/4.9.3-2.25 OpenMPI/1.10.2
Module for GCCcore, version .4.9.3 loadedModule for binutils, version .2.25 loadedModule for GCC, version 4.9.3-2.25 loadedModule for numactl, version .2.0.11 loadedModule for hwloc, version .1.11.2 loadedModule for OpenMPI, version 1.10.2 loaded
And finally load the selected gnuplot module
module load gnuplot/5.0.3
Module for OpenBLAS, version 0.2.15-LAPACK-3.6.0 loadedModule for FFTW, version 3.3.4 loadedModule for ScaLAPACK, version 2.0.2-OpenBLAS-0.2.15-LAPACK-3.6.0 loadedModule for bzip2, version .1.0.6 loadedModule for zlib, version .1.2.8 loaded..........................
In order to simplify a procedure of the gnuplot module loading, the current list of loaded modules can be savedin "mygnuplot" collection (the name string "mygnuplot" is of course arbitrary) and then loaded again whenneeded as follows
Save loaded modules to "mygnuplot"
26
Leibniz Universität IT Services
module save mygnuplot
Saved current collection of modules to: mygnuplot
If "mygnuplot" not is specified, the name "default" will be used.
Remove all loaded modules (or open a new shell)
module purge
Module for gnuplot, version 5.0.3 unloadedModule for Qt, version 4.8.7 unloadedModule for libXt, version .1.1.5 unloaded........................
List currently loaded modules. This selection is empty now.
module list
No modules loaded
List saved collections
module savelist
Named collection list:1) mygnuplot
Load gnuplot module again
module restore mygnuplot
Restoring modules to user's mygnuplotModule for GCCcore, version .4.9.3 loadedModule for binutils, version .2.25 loadedModule for GCC, version 4.9.3-2.25 loadedModule for numactl, version .2.0.11 loaded..........................
27
Leibniz Universität IT Services
Working with the cluster system
The reason you decided to use the cluster system is probably it’s computing resources. However, computingnodes are not accessible directly to users.
Please note: If you have a job running, you can log on to the nodes taking part in that job directly.
With around 250 people using the cluster system for their research every year, there has to be an instanceorganising and allocating resources among users. This instance is called the batch system. Currently, it isrunning as a software package called PBS/Torque (portable batch system). During the course of the year 2020,the whol cluster system will be migrated to use a new scheduling software called SLURM. That software isdescribed in the next chapter. In this (current) chapter, the software used up to now (PBS/Torque) is described,and we would like to recommend to start here for generic usage.
We currently (as of March 2020) use TORQUE as resource manager and Maui as scheduler, which upon requestallocate resources to users.
Most work with the cluster system is done as jobs. Jobs are the framework with which the computing resourcesof the cluster system can be used and they are started by the qsub command. Generally qsub has the followingform.
qsub <Optionen> <Name des Jobscripts>
The manual page for qsub can be accessed like this.
man qsub
You can quit reading the manual page by pressing the ’q’ key.
6.1 Login nodes
After logging in to the cluster system you are located on a login node. These machines are not meant to beused for large computations, i.e. simulation runs. In order to keep these nodes accessible their load has to beminimised. Therefore processes will be killed automatically after 30 minutes of elapsed cpu-time. Please useinteractive jobs for tasks like pre- or post-processing and even some larger compilations in order to avoid thefrustrating experience of sudden shut down of your application.
6.2 Interactive jobs
The simplest way of using the cluster system’s compute power is by starting an interactive job. This can bedone by issuing the qsub command with -I option on any login node.
zzzzsaal@login02:„$ qsub -IACHTUNG / WARNING:'mem' parameter not present; the default value will be used (1800mb)'walltime' parameter not present; the default value will be used (24:00:00)'nodes' parameter not present; the default value will be used (nodes=1:ppn=1)qsub: waiting for job 1001152.batch.css.lan to startqsub: job 1001152.batch.css.lan ready
zzzzsaal@lena-n080:„$
In this example a user by the name of zzzzsaal issues the qsub command from the node login02. Followingthis the batch-system warns about missing parameters and starts a job with ID 1001152.batch.css.lan. Using
Leibniz Universität IT Services
the short JobID 1001152 is more common. Afterwards user zzzzsaal is located on machine lena-n080 whichcan be seen by looking at the command prompt, which now shows @lena-n080 to indicate this. This is nodenumber 80 of the Lena cluster. From now on this node’s computing power can be utilised.
This simplest form of an interactive job uses default values for all resource specifications. In practice resourcespecifications should always be adapted to fit one’s needs. This can be done by supplying the qsub commandwith options. A listing of possible options can be found in section 6.4. The following example illustrates howuser zzzzsaal requests specific resources starting from login node login02 inside an interactive job. For thisinteractive job the user requests one cpu-core on one machine and 2 GB of memory for an hour. Additionallythe -X option is used, which switches on X window forwarding, so applications with graphical user interfacescan be used.
zzzzsaal@login02:„$ qsub -I -X -l nodes=1:ppn=1 -l walltime=01:00:00 -l mem=2GBqsub: waiting for job 1001154.batch.css.lan to startqsub: job 1001154.batch.css.lan ready
zzzzsaal@lena-n079:„$
After the job with JobID 1001154 has started, the machine named lena-n079 is ready to be used. An extendedexample of how to utilise interactive jobs is given in section 6.9.
6.3 Batch jobs
In preparation for batch jobs interactive jobs should be used. Within interactive jobs all commands can beentered which are later going to make up a batch script, thus testing functionality. Only if everything worksshould the commands be put into a batch script line by line. This line to line transcript of an interactive sessioncan be used as batch script. In case you were given a batch script by other people, take some time to enter allthe commands in an interactive job. This way you familiarise yourself with what the individual commands do.
In order to request the same resources as with the interactive job from section 6.2 within a batch job, thefollowing can be written to a file.
#!/bin/bash -login
# resource specification#PBS -l nodes=1:ppn=1#PBS -l walltime=01:00:00#PBS -l mem=2GB
# commands to executedate
Generally jobscripts can be divided into two parts, resource specification and commands to be executed. Linesbeginning with # are comments with two exceptions. The first line here specifies the shell which is used tointerpret the script. In this case the shell is bash. Also lines beginning with #PBS are recognised by PBS,portable batch system, as resource specifications. In this case the resources requested are one cpu-core on onemachine and 2 GB of memory for an hour. The section with commands to executed only contains a signlecommand. The date command returns the current date and time.
This file is saved as batch-job-example.sh and can afterwards be submitted to the batch system by issuing theqsub command.
zzzzsaal@login02:„$ qsub batch-job-example.sh1001187.batch.css.lan
After submitting the jobscript a JobID is returned by the batch system, in this case 1001187. After the job hasfinished two files can be found in the directory which the jobscript was submitted from.
zzzzsaal@login02:„$ ls -lhtotal 12K-rw-r--r-- 1 zzzzsaal zzzz 137 19. Apr 12:54 batch-job-example.sh
29
Leibniz Universität IT Services
-rw------- 1 zzzzsaal zzzz 0 19. Apr 12:59 batch-job-example.sh.e1001187-rw------- 1 zzzzsaal zzzz 30 19. Apr 12:59 batch-job-example.sh.o1001187
The first file has the extension .e1001187, which holds all error messages which occurred during job execution.In this case this file is empty. The second file has the extension .o1001187 and contains all messages whichwould have been displayed on the terminal and have been redirected here. By displaying this file’s contentsthis can be verified.
zzzzsaal@login02:„$ cat batch-job-example.sh.o1001187Tue Apr 19 12:59:18 CEST 2016
The file contains the output of the date command.
Please note: Jobscripts written under windows need converting, see section 6.3.1.
6.3.1 Converting jobscripts written under windows
Creating a jobscript under windows and copying it onto the cluster system may create the following errormessage when submitting that jobscript with qsub.
zzzzsaal@login02:„$ qsub WindowsDatei.txtqsub: script is written in DOS/Windows text format
Check this file with the file command.
zzzzsaal@login02:„$ file WindowsDatei.txtWindowsDatei.txt: ASCII text, with CRLF line terminators
Convert the file to Unix format.
zzzzsaal@login02:„$ dos2unix WindowsDatei.txtdos2unix: converting file WindowsDatei.txt to UNIX format ...
Check the file again with the file command to see if conversion was successful.
zzzzsaal@login02:„$ file WindowsDatei.txtWindowsDatei.txt: ASCII text
6.4 PBS options
Following is a list of selected PBS options, which allow job control. These options are valid for interactive aswell as batch jobs.
-N namedeclares a name for the job
-j oejoin standard output and error streams
-l nodes=n :ppn=prequest n nodes and p cpu cores per node
-l walltime=timerequested wall clock time in format hh:mm:ss
-l mem=valuerequests RAM according to value, possible suffix of kb, mb or gb
-M email addresslist of users to whom mail is sent about the job
30
Leibniz Universität IT Services
-m abesend mail on (one or multiple selections): a - job abort, b - job beginning, e - job end
-Vall environment variables are exported to the job
-q queuedestination queue of the job, see section 6.7
-W x=PARTITION:namepartition to be used, see 6.7
-Ijob is to be run interactively
More options can be found on the man page, which can be opened with the following command.
man qsub
6.5 PBS environment variables
“When a batch job is started, a number of variables are introduced into the job’s environment that can be usedby the batch script in making decisions, creating output files, and so forth. These variables are listed in thefollowing table”1:
PBS_O_WORKDIRJob’s submission directory
PBS_NODEFILEFile containing line delimited list on nodes allocated to the job
PBS_QUEUEJob queue
PBS_JOBNAMEUser specified jobname
PBS_JOBIDUnique JobID
6.6 PBS commands
qsub scriptSubmit PBS job
showqShow status of PBS jobs
qdel jobidDelete Job jobid
All of the above commands have detailed manual pages, which can be viewed with the following command:
man <command>
In order to exit the manual page, press q.
1Source: http://docs.adaptivecomputing.com/torque/4-2-10
31
Leibniz Universität IT Services
6.6.1 pbsnodes
On the login-nodes the pbsnodes command can be used to obtain information about resources. For example,the amount of RAM of one of the “Terabyte-Machines” in the helena queue can be queried.
pbsnodes smp-n031
At first the output will seem a little bit confusing. It shows, among others, the following parameter.
physmem=1058644176kb
This output can be converted into gb. 1024kb equals 1mb, 1024mb equals 1gb etc. . . . This way you know themaximum number of RAM you can request on one machine in queue helena is 1009 gb.
6.7 Queues & partitions
Cluster Nodes Processors Cores/Node
Memory/Node (GB)
Partition Queue
Lena 80 2x Intel Haswell Xeon E5-2630 v38-cores, 2.40GHz, 20MB Cache
16 64 lena all
Haku 20 2x Intel Haswell Xeon E5-2620 v48-cores, 2.10GHz, 20MB Cache
16 64 haku all
Tane 96 2x Intel Westmere-EP Xeon X56706-cores, 2.93GHz, 12MB Cache
12 48 tane all
SMP9 4x Intel Westmere-EX Xeon E7-4830
8-cores, 2.13GHz, 24MB Cache32 256 smp all
9 4x Intel Backton Xeon E75406-cores, 2.00GHz, 18MB Cache
24 256 puresmp all
3 4x Intel Westmere-EX Xeon E7-48308-cores, 2.13GHz, 24MB Cache
32 1024 — helena
Dumbo 18 4x Intel Sandybridge Xeon E5-4650 v210-cores, 2.40GHz, 26MB Cache
40 1024 dumbo all
GPU 1 2x Intel Skylake Xeon Silver 411612-cores, 2.10GHz, 16MB Cache
24 96 — gpu
FCH 75 .2 all
Please note: The length of support contracts for individual clusters varies. Should you need an identicalhardware platform over the next years, please choose Lena.
There are multiple queues available:
allThis is the default queue and does not have to be requested explicitly. PBS will route a job to matchingnodes.
helenaA queue for jobs with large RAM requirements up to 1 Terabyte.
gpuUse this queue in order to utilize GPU ressources.
2each institute has it’s own partition
32
Leibniz Universität IT Services
testA queue for testing. There is one node with 12 processors and 48 GB of RAM available. Maximumwalltime is 6 hours.
In addition to queues there a multiple partitions. Using these partitions you can direct your job to specificmachines, for example in the queue all, see table for partition names.
6.7.1 GPU
In order to use machines equipped with GPU, you need to use the queue “gpu”. Machines with GPU can only beused by one person, i.e. job, exclusively to avoid interference. This may currently cause significant wait timeif many of you are trying to use the one machine we have. On the other hand we have seen significant idleperiods of that machine. However, we are working on getting more machines with GPU into the cluster system.
6.7.2 Forschungscluster-Housing
Forschungscluster-Housing (FCH) machines are divided into one partition per institute. The partition namemostly corresponds to the institutes‘ abbreviation.
6.8 Maximum resource requests
Some maximum values exists, which can not be exceeded. Maximum walltime per job is limited, as well asmaximum number of simultaneously running jobs. Furthermore the number of cpus is limited. All these limitsapply per user name.
Walltime Maximum walltime is 200 hours per job
Jobs The maximum number of running jobs per user is 64
CPUs The overall maximum number of CPUs (ppn) all running jobs can use is 768 per user
6.9 Excercise: interactive job
# start an interactive job, what happens?qsub -I# exit this interactive jobexit# specify all resource parameters, so no defaults get usedqsub -I -X -l nodes=1:ppn=1 -l walltime=01:00:00 -l mem=2GB# load module for octavemodule load octave/3.8.1# start octaveoctave# inside octave the following commands create a plotoctave:1> x = 0:10;octave:2> y = x.^2;octave:3> h = figure(1);octave:4> plot(x,y);octave:5> print('-f1','-dpng','myplot')octave:6> exit# display newly created imagedisplay myplot.png
Interactive jobs are useful for debugging - always use interactive first
33
Leibniz Universität IT Services
6.10 Excercise: batch job
Create a file named MyBatchPlot.m
x = 0:10;y = x.^2;h = figure(1);plot(x,y);print('-f1','-dpng','MyBatchPlot');
Create a file named MyFirstBatchJob.sh
#!/bin/bash -login#PBS -l nodes=1:ppn=1#PBS -l walltime=01:00:00#PBS -l mem=2GB# load octave modulemodule load octave/3.8.1# start octaveoctave MyBatchPlot.m
Submit the job script
qsub MyFirstBatchJob.sh
Check files MyFirstBatchJob.sh.o* and MyFirstBatchJob.sh.e*
34
Leibniz Universität IT Services
SLURM usage guide
The scientific computing team at the Leibniz Universität IT-Services (LUIS) is currently preparing to switch allcluster computing systems from the software package that has been used for the past 15 years – Torque/Maui –to a more modern system, the SLURM computing resource manager. The transition will take place by graduallymoving parts of the existing system to the new scheduler. In a first step, SLURM will manage only the GPGPUcomponents of the cluster. However, the complete cluster - including Forschungscluster-Housing nodes - has tobe migrated to SLURM until the end of this year (2020). Over the course of the next months, we will integrateinformation about usage and concepts of the new system into the usual introductory presentations. News – asusual – will be announced via the cluster news mailing list.
7.1 The SLURM Workload Manager
SLURM (Simple Linux Utility for Resource Management) is a free open-source batch scheduler and resourcemanager that allows users to run their jobs on the LUIS compute cluster. It is a modern, extensible batchsystem that is installed around the world on many clusters of various sizes. This chapter describes the basictasks necessary for submitting, running and monitoring jobs under the SLURM Workload Manager on the LUIScluster. More detailed information about SLURM is provided by the official SLURM website.
The following commands are useful to interact with SLURM:
• sbatch - submit a batch script
• salloc - allocate compute resources
• srun - allocate compute resources and launch job-steps
• squeue - check the status of running and/or pending jobs
• scancel - delete jobs from the queue
• sinfo - view intormation abount cluster nodes and partitions
• scontrol - show detailed information on active and/or recently completed jobs, nodes and partitions
• sacct - provide the accounting information on running and completed jobs
Below some usage examples for these commands are provided. For more information on each command, referto the corresponding manual pages, e.g., man squeue or – of course – to the SLURM manual’s website.
7.2 Partitions
In SLURM, compute nodes are grouped in partitions. Each partition can be regarded as an independentqueue, even though a job may be submitted to multiple partitions, and a compute node may belong to severalpartitions simultaneously. Jobs are the allocated resources within a single partition for executing tasks on thecluster for a specified period of time. The concept of “job steps” is used to execute several tasks simultaneouslyor sequentially using the srun command within a job.
Table 7.1 list the currently defined partitions and their parameter constraints. The limits shown can not be beoverruled by users.
To control the job workload on the cluster and keep SLURM responsive, we enforce the following (see table 7.2)restrictions regarding the number of jobs:
Leibniz Universität IT Services
Partitionname
Max Job Run-time
Max NodesPer Job
Max Numberof Jobs perUser
Max CPUs perUser
Default Mem-ory per CPU
Shared NodeUsage
gpu 24 hours 1 32 no limit 1600 MB yes
Table 7.1: SLURM partition limits
SLURM limits Max Numberof RunningJobs
Max Numberof SubmittedJobs
Cluster wide 10000 20000
Per User 64 500
Table 7.2: SLURM limits
In case you need custom limits for a certain time, please submit a request containing a short explanation [email protected]. Based on available resources and in keeping with maintaining afair balance between all users, we may be able to accommodate special needs for a limited time.
To list the job limits relevant for you, use the sacctmgr command. For example:
sacctmgr -s show usersacctmgr -s show user format=user,account,maxjobs,maxsubmit,qos
Up-to-date information on all available nodes may be obtained using the following commands:
sinfo -Nlscontrol show nodes
Information on available partitons and their configuration:
sinfo -sscontrol show partitions
7.3 Interactive jobs
Batch submission is the most common and most efficient way to use the computing cluster. Interactive jobsare also possible; they may be useful for things like:
• working with an interactive terminal or GUI applications like R, iPython, ANSYS, MATLAB, etc.
• software development, debugging, or compiling
You can start an interactive session on a compute node either with salloc or srun. The following examplesubmits an interactive job using srun that requests two tasks (this corresponds to two CPU cores) and 4 GBmemory per core for an hour:
[user@login02 „]$ srun --time=1:00:00 --ntasks=2 --mem-per-cpu=4G --x11 --pty $SHELL -isrun: job 222 queued and waiting for resourcessrun: job 222 has been allocated resources
[user@euklid-n001 „]$
Once the job starts, you will get an interactive shell on the first compute node (euklid-n001 in the exampleabove) that has been assigned to the job. The option --x11 sets up X11 forwarding on this first node enablingthe use of graphical applications. The interactive session is terminated by exiting the shell.
36
Leibniz Universität IT Services
An interactive session with GPU resources has to be started using the command salloc. The following exampleallocates two GPUs per node for 2 hours:
[user@login02 „]$ salloc --time=2:00:00 --gres=gpu:2salloc: Granted job allocation 228salloc: Waiting for resource configurationsalloc: Nodes euklid-n002 are ready for job
Once an allocation has been made, the salloc command will start a shell on the login node where thesubmission was done. To start your application on the assigned compute nodes (euklid-n002 in this example),you either execute the srun command on the login shell:
[user@login02 „]$ module load my_module[user@login02 „]$ srun ./my_program
... or connect to the allocated compute nodes by ssh:
[user@login02 „]$ echo $SLURM_NODELIST # assigned compute node(s)euklid-n002
[user@login02 „]$ ssh euklid-n002
[user@euklid-n002 „]$ module load my_module[user@euklid-n002 „]$ ./my_program
To terminate the session, type exit on the login shell:
[user@login02 „]$ exitexitsalloc: Relinquishing job allocation 228salloc: Job allocation 228 has been revoked.
7.4 Submitting a batch script
An appropriate SLURM job submission file for your job is a shell script with a set of directives at the beginningof the file. These directives are issued by starting a line with the string #SBATCH. A suitable batch script is thensubmitted to the batch system using the sbatch command.
7.4.1 An example of a serial job
The following is an example of a simple serial job script (save the lines to the file test_serial.sh).
Note: change the #SBATCH directives to your use case where applicable.
#!/bin/bash#SBATCH --job-name=test_serial#SBATCH --ntasks=1#SBATCH --mem-per-cpu=2G#SBATCH --time=00:20:00#SBATCH --constraint=[skylake|haswell]#SBATCH [email protected]#SBATCH --mail-type=BEGIN,END,FAIL#SBATCH --output test_serial-job_%j.out#SBATCH --error test_serial-job_%j.err
# Change to my work dircd $SLURM_SUBMIT_DIR
# Load modulesmodule load my_module
37
Leibniz Universität IT Services
# Start my serial appsrun ./my_serial_app
To submit the batch job, use
sbatch test_serial.sh
Note: as soon as compute nodes are allocated to your job, you can establish an ssh connection from the loginmachines to these nodes.
Note: if your job uses more resources than defined with the #SBATCH directives, the job will automatically bekilled by the SLURM server.
Note: we recommend that you submit sbatch jobs with the #SBATCH --export=NONE option to establish aclean environment, otherwise SLURM will propagate your current environment variables to the job.
The table 7.3 shows frequently used sbatch options that can either be specified in your job script with the#SBATCH directive or on the command line. Command line options override options in the script. Thecommands srun and salloc accept the same set of options.
Options Default Value Description
--nodes=<N> or -N <N> 1 Number of compute nodes
--ntasks=<N> or -n <N> 1 Number of tasks to run
--cpus-per-task=<N> or -c <N> 1 Number of CPU cores per task
--ntasks-per-node=<N> 1 Number of tasks per node
--ntasks-per-core=<N> 1 Number of tasks per CPU core
--mem-per-cpu=<mem> partition dependent memory per CPU core in MB
--mem=<mem> partition dependent memory per node in MB
--gres=gpu:<type>:<N> - Request nodes with GPUs
--time=<time> or -t <time> partition dependent Walltime limit for the job
--partition=<name> or -p <name> none Partition to run the job
--constraint=<list> or -C <list> none Node-features to request
--job-name=<name> or -J <name> job script’s name Name of the job
--output=<path> or -o <path> slurm-%j.out Standard output file
--error=<path> or -e <path> slurm-%j.err Standard error file
--mail-user=<mail> your account mail User’s email address
--mail-type=<mode> - Event types for notifications
--exclusive nodes are shared Exclusive acccess to node
Table 7.3: sbatch/srun/salloc options. Both long and short options are listed
To obtain a complete list of parameters, refer to the sbatch man page: man sbatch
Note: if you submit a job with --mem=0, it gets access to the complete memory of each allocated node.
By default, the stdout and stderr file descriptors of batch jobs are directed to slurm-%j.out and slurm-%j.errfiles, where %j is set to the SLURM batch job ID number of your job. Both files will be found in the directory inwhich you launched the job. You can use the options --output and --error to specify a different name orlocation. The output files are created as soon as your job starts, and the output is redirected as the job runs sothat you can monitor your job’s progress. However, due to SLURM performing file buffering, the output of
38
Leibniz Universität IT Services
your job will not appear in the output files immediately. To override this behaviour (this is not recommendedin general, especially when the job output is large), you may use -u or --unbuffered either as an #SBATCHdirective or directly on the sbatch command line.
If the option --error is not specified, both stdout and stderr will be directed to the file specified by --output.
7.4.2 Example of an OpenMP job
For OpenMP jobs, you will need to set --cpus-per-task to a value larger than one and explicitly define theOMP_NUM_THREADS variable. The example script launches eight threads, each with 2 GiB of memory and amaximum run time of 30 minutes.
#!/bin/bash#SBATCH --job-name=test_openmp#SBATCH --ntasks=1#SBATCH --cpus-per-task=8#SBATCH --mem-per-cpu=2G#SBATCH --time=00:30:00#SBATCH --constraint=[skylake|haswell]#SBATCH [email protected]#SBATCH --mail-type=BEGIN,END,FAIL#SBATCH --output test_openmp-job_%j.out#SBATCH --error test_openmp-job_%j.err
# Change to my work dircd $SLURM_SUBMIT_DIR
# Bind your OpenMP threadsexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASKexport KMP_AFFINITY=verbose,granularity=core,compact,1export KMP_STACKSIZE=64m
## Load modulesmodule load my_module
# Start my applicationsrun ./my_openmp_app
The srun command in the script above sets up a parallel runtime environment to launch an application onmultiple CPU cores, but on one node. For MPI jobs, you may want to use multiple CPU cores on multiplenodes. To achieve this, have a look at the following example of an MPI job:
Note: srun should be used in place of the "traditional" MPI launchers like mpirun or mpiexec.
7.4.3 Example of an MPI job
This example requests 10 compute nodes on the lena cluster with 16 cores each and 320 GiB of memory intotal for a maximum duration of 2 hours.
#!/bin/bash#SBATCH --job-name=test_mpi#SBATCH --partition=lena#SBATCH --nodes=10#SBATCH --ntasks-per-node=16#SBATCH --mem-per-cpu=2G#SBATCH --time=02:00:00#SBATCH [email protected]#SBATCH --mail-type=BEGIN,END,FAIL#SBATCH --output test_mpi-job_%j.out#SBATCH --error test_mpi-job_%j.err
39
Leibniz Universität IT Services
# Change to my work dircd $SLURM_SUBMIT_DIR
# Load modulesmodule load foss/2018b
# Start my MPI applicationsrun --cpu_bind=cores --distribution=block:cyclic ./my_mpi_app
As mentioned above, you should use the srun command instead of mpirun or mpiexec in order to launchyour parallel application.
Within the same MPI job, you can use srun to start several parallel applications, each utilizing only a subsetof the allocated resources. However, the preferred way is to use a Job Array (see section Job arrays). Thefollowing example script will run 3 MPI applications simmultaneously, each using 64 tasks (4 nodes with 16cores each), thus totalling to 192 tasks:
#!/bin/bash#SBATCH --job-name=test_mpi#SBATCH --partition=lena#SBATCH --nodes=12#SBATCH --ntasks-per-node=16#SBATCH --mem-per-cpu=2G#SBATCH --time=00:02:00#SBATCH --constraint=[skylake|haswell]#SBATCH [email protected]#SBATCH --mail-type=BEGIN,END,FAIL#SBATCH --output test_mpi-job_%j.out#SBATCH --error test_mpi-job_%j.err
# Change to my work dircd $SLURM_SUBMIT_DIR
# Load modulesmodule load foss/2018b
# Start my MPI applicationsrun --cpu_bind=cores --distribution=block:cyclic -N 4 --ntasks-per-node=16 ./my_mpi_app_1 &srun --cpu_bind=cores --distribution=block:cyclic -N 4 --ntasks-per-node=16 ./my_mpi_app_1 &srun --cpu_bind=cores --distribution=block:cyclic -N 4 --ntasks-per-node=16 ./my_mpi_app_2 &wait
Note the wait command in the script; it results in the script waiting for all previously commands that werestarted with & (execution in the background) to finish before the job can complete. We kindly ask to take carethat the time necessary to complete each subjob is not too different in order not to waste too much valuablecpu time
7.4.4 Job arrays
Job arrays can be used to submit a number of jobs with the same resource requirements. However, some ofthese requirements are subject to changes after the job has been submitted. To create a job array, you needto specify the directive #SBATCH --array in your job script or use the option --array or -a on the sbatchcommand line. For example, the following script will create 12 jobs with array indices from 1 to 10, 15 and 18:
#!/bin/bash#SBATCH --job-name=test_job_array#SBATCH --ntasks=1#SBATCH --mem-per-cpu=2G#SBATCH --time=00:20:00#SBATCH [email protected]#SBATCH --mail-type=BEGIN,END,FAIL
40
Leibniz Universität IT Services
#SBATCH --array=1-10,15,18#SBATCH --output test_array-job_%A_%a.out#SBATCH --error test_array-job_%A_%a.err
# Change to my work dircd $SLURM_SUBMIT_DIR
# Load modulesmodule load my_module
# Start my appsrun ./my_app $SLURM_ARRAY_TASK_ID
Within a job script like in the example above, the job array indices can be accessed by the variable$SLURM_ARRAY_TASK_ID, whereas the variable $SLURM_ARRAY_JOB_ID refers the the job array’s master job ID.If you need to limit (e.g. due to heavy I/O on the BIGWORK file system) the maximum number of simultaneouslyrunning jobs in a job array, use a % separator. For example, the directive #SBATCH --array 1-50%5 willcreate 50 jobs, with only 5 jobs active at any given time.
Note: the maximum number of jobs in a job array is limited to 100.
7.5 SLURM environment variables
SLURM sets many variables in the environment of the running job on the allocated compute nodes. Table 7.4shows commonly used environment variables that might be useful in your job scripts. For a complete list, seethe "OUTPUT ENVIRONMENT VARIABLES" section in the sbatch man page.
$SLURM_JOB_ID Job id
$SLURM_JOB_NUM_NODE Number of nodes assigned to the job
$SLURM_JOB_NODELIST List of nodes assigned to the job
$SLURM_NTASKS Number of tasks in the job
$SLURM_NTASKS_PER_CORE Number of tasks per allocated CPU
$SLURM_NTASKS_PER_NODE Number of tasks per assigned node
$SLURM_CPUS_PER_TASK Number of CPUs per task
$SLURM_CPUS_ON_NODE Number of CPUs per assigned node
$SLURM_SUBMIT_DIR Directory the job was submitted from
$SLURM_ARRAY_JOB_ID Job id for the array
$SLURM_ARRAY_TASK_ID Job array index value
$SLURM_ARRAY_TASK_COUNT Number of jobs in a job array
$SLURM_GPUS Number of GPUs requested
Table 7.4: SLURM environment variables
7.6 GPU jobs on the cluster
The LUIS cluster has a number of nodes that are equipped with NVIDIA Tesla GPU Cards.
Currently, 4 Dell nodes containing 2 NVIDIA Tesla V100 each are available for general use in the partition gpu.
41
Leibniz Universität IT Services
Use the following command to display the actual status of all nodes in the gpu partition and the computingresources they provide, including type and number of installed GPUs:
$ sinfo -p gpu -NO nodelist:15,memory:8,disk:10,cpusstate:15,gres:20,gresused:20NODELIST MEMORY TMP_DISK CPUS(A/I/O/T) GRES GRES_USEDeuklid-n001 128000 291840 32/8/0/40 gpu:v100:2(S:0-1) gpu:v100:2(IDX:0-1)euklid-n002 128000 291840 16/24/0/40 gpu:v100:2(S:0-1) gpu:v100:1(IDX:0)euklid-n003 128000 291840 0/40/0/40 gpu:v100:2(S:0-1) gpu:v100:0(IDX:N/A)euklid-n004 128000 291840 0/40/0/40 gpu:v100:2(S:0-1) gpu:v100:0(IDX:N/A)
To ask for a GPU resource, you need to add the directive #SBATCH --gres=gpu:<type>:n to your job scriptor on the command line, respectively. Here, "n" is the number of GPUs you want to request. The type ofrequested GPU (<type>) can be skipped. The following job script requests 2 Tesla V100 GPUs, 8 CPUs in thegpu partition and 30 minutes of wall time:
#!/bin/bash#SBATCH --job-name=test_gpu#SBATCH --partition=gpu#SBATCH --nodes=1#SBATCH --ntasks-per-node=8#SBATCH --gres=gpu:v100:2#SBATCH --mem-per-cpu=2G#SBATCH --time=00:30:00#SBATCH [email protected]#SBATCH --mail-type=BEGIN,END,FAIL#SBATCH --output test_gpu-job_%j.out#SBATCH --error test_gpu-job_%j.err
# Change to my work dircd $SLURM_SUBMIT_DIR
# Load modulesmodule load fosscuda/2018b
# Run GPU applicationsrun ./my_gpu_app
When submitting a job to the gpu partition, you must specify the number of GPUs. Otherwise, your job will berejected at the submission time.
Note: on the Tesla V100 nodes, you may currently only request up to 20 CPU cores for each requested GPU.
7.7 Job status and control commands
Dieser Abschnitt gibt einen Überblick über oft zur Steuerung und Überwachung von Jobs verwendete SLURM-Kommandos. This section provides an overview of commonly used SLURM commands that allow you tomonitor and manage the status of your batch jobs.
7.7.1 Query commands
The status of your jobs in the queue can be queried using
$ squeue
or – if you have array jobs and want to display one job array element per line –
$ squeue -a
Note that the symbol $ in the above commands and all other commands below represents the shell prompt.The $ is NOT part of the specified command, do NOT type it yourself.
42
Leibniz Universität IT Services
The squeue output should look more or less like the following:
$ squeueJOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
412 gpu test username PD 0:00 1 (Resources)420 gpu test username PD 0:00 1 (Priority)422 gpu test username R 17:45 1 euklid-n001431 gpu test username R 11:45 1 euklid-n004433 gpu test username R 12:45 1 euklid-n003434 gpu test username R 1:08 1 euklid-n002436 gpu test username R 16:45 1 euklid-n002
ST shows the status of your job. JOBID is the number the system uses to keep track of your job. NODELISTshows the nodes allocated to the job, NODES the number of nodes requested and – for jobs in the pendingstate (PD) – a REASON. TIME shows the time used by the job. Typical job states are PENDING(PD), RUNNING(R),COMPLETING(CG), CANCELLED(CA), FAILED(F) and SUSPENDED(S). For a complete list, see the "JOB STATECODES" section in the squeue man page.
You can change the default output format and display other job specifications using the option --format or-o. For example, if you want to additionally view the number of CPUs and the walltime requested:
$ squeue --format="%.7i %.9P %.5D %.5C %.2t %.19S %.8M %.10l %R"JOBID PARTITION NODES CPUS TRES_PER_NODE ST MIN_MEMORY TIME TIME_LIMIT NODELIST(REASON)
489 gpu 1 32 gpu:2 PD 2G 0:00 20:00 (Resources)488 gpu 1 8 gpu:1 PD 2G 0:00 20:00 (Priority)484 gpu 1 40 gpu:2 R 1G 16:45 20:00 euklid-n001487 gpu 1 32 gpu:2 R 2G 11:09 20:00 euklid-n004486 gpu 1 32 gpu:2 R 2G 12:01 20:00 euklid-n003485 gpu 1 16 gpu:2 R 1G 16:06 20:00 euklid-n002
Note that you can make the squeue output format permanent by assigning the format string to the environmentvariable SQUEUE_FORMAT in your $HOME/.bashrc file:
$ echo 'export SQUEUE_FORMAT="%.7i %.9P %.5D %.5C %.13b %.2t %.19S %.8M %.10l %R"' >> „/.bashrc
The option %.13b in the variable assignment for SQUEUE_FORMAT above displays the column TRES_PER_NODEin the squeue output, which provides the number of GPUs requested by each job.
The following command displays all job steps (processes started using srun):
squeue -s
To display estimated start times and compute nodes to be allocated for your pending jobs, type
$ squeue --startJOBID PARTITION NAME USER ST START_TIME NODES SCHEDNODES NODELIST(REASON)
489 gpu test username PD 2020-03-20T11:50:09 1 euklid-n001 (Resources)488 gpu test username PD 2020-03-20T11:50:48 1 euklid-n002 (Priority)
A job may be waiting for execution in the pending state for a number of reasons. If there are multiple reasonsfor the job to remain pending, only one is displayed.
• Priority - the job has not yet gained a high enough priority in the queue
• Resources - the job has sufficient priority in the queue, but is waiting for resources to become available
• JobHeldUser - job held by user
• Dependency - job is waiting for another job to complete
• PartitionDown - the queue is currently closed for new jobs
For the complete list, refer to the squeue man page the section "JOB REASON CODES".
If you want to view more detailed information about each job, use
43
Leibniz Universität IT Services
$ scontrol -d show job
If you are interested in the detailed status of one specific job, use
$ scontrol -d show job <job-id>
Replace <job-id> by the ID of your job.
Note that the command scontrol show job will display the status of jobs for up to 5 minutes after theircompletion. For batch jobs that finished more than 5 minutes ago, you need to use the sacct command toretrieve their status information from the SLURM database (see section Job accounting commands).
The sstat command provides real-time status information (e.g. CPU time, Virtual Memory (VM) usage,Resident Set Size (RSS), Disk I/O, etc.) for running jobs:
# show all status fieldssstat -j <job-id>
# show selected status fieldssstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <job-id>
Note: the above commands only display your own jobs in the SLURM job queue.
7.7.2 Job control commands
The following command cancels a job with ID number <job-id>:
$ scancel <job-id>
Remove all of your jobs from the queue at once using
$ scancel -u $USER
If you want to cancel only array ID <array_id> of job array <job_id>:
$ scancel <job_id>_<array_id>
If only job array ID is specified in the above command, then all job array elements will be canceled.
The commands above first send a SIGTERM signal, then wait 30 seconds, and if processes from the job continueto run, issue a SIGKILL signal.
The -s option allows you to issue any signal to a running job which means you can directly communicatewith the job from the command line, provided that it has been prepared for this:
$ scancel -s <signal> <job-id>
A job in the pending state can be held (prevented from being scheduled) using
$ scontrol hold <job-id>
To release a previously held job, type
$ scontrol release <job-id>
After submitting a batch job and while the job is still in the pending state, many of its specifications can bechanged. Typical fields that can be modified include job size (amount of memory, number of nodes, cores,tasks and GPUs), partition, dependencies and wall clock limit. Here are a few examples:
# modify time limitscontrol update JobId=279 TimeLimit=12:0:0
# change number of tasksscontrol update jobid=279 NumTasks=80
# change node number
44
Leibniz Universität IT Services
scontrol update JobId=279 NumNodes=2
# change the number of GPUs per nodescontrol update JobId=279 Gres=gpus:2
# change memory per allocated CPUscontrol update Jobid=279 MinMemoryCPU=4G
# change the number of simultaneously running jobs of array job 280scontrol update ArrayTaskThrottle=8 JobId=280
For a complete list of job specifications that can be modified, see section "SPECIFICATIONS FOR UPDATECOMMAND, JOBS" in the scontrol man page.
7.7.3 Job accounting commands
The command sacct displays the accounting data for active and completed jobs which is stored in the SLURMdatabase. Here are a few usage examples:
# list IDs of all your jobs since January 2019sacct -S 2019-01-01 -o jobid
# show brief accounting data of the job with <job-id>sacct -j <job-id>
# display all job accountig fieldssacct -j <job-id> -o ALL
The complete list of job accounting fields can be found in section "Job Accounting Fields" in the sacct manpage. You could also use the command sacct --helpformat
45
Leibniz Universität IT Services
Application software
Please note: We are trying to update this section of cluster documentation regularly to ensure it matchesthe state of software on the cluster system. Issue the command module spider on the cluster system for acomprehensive list of available software. This section can not replace documentation that comes with theapplication you are using, please read that as well.
A wide variety of application software on available in the cluster system. These applications are located ona storage system and are available through an NFS export via Module System - Lmod. Issue the commandmodule spider on the cluster system for a comprehensive list of available software. If you need a differentversion of an already installed application, or one that is currently not installed, please get in touch. The mainprerequisite for use within the cluster system is availability for Linux. Furthermore, if the application needs alicense, we need to have a look at additional questions.
Some selected Windows applications can also be executed on the cluster system with the help of wine orSingularity containers. For information on Singularity see section 8.3 or contact us for more information.
Please note: The following sections may be available in english only. If you desperately need a translation,please get in touch.
Leibniz Universität IT Services
8.1 Build software from source code on the cluster
Sub-clusters of the cluster system have different CPU architectures. The command lcpuarchs issued on thelogin nodes lists all available CPU types.
login01 „ $ lcpuarchs -vCPU arch names Cluster partitions-------------- ------------------haswell fi,haku,iqo,isd,iwes,lena,pci,smpnehalem nwf,smp,tane,taurus,tfdsandybridge bmwz,iazd,isu,itp
CPU of this machine: haswell
For more verbose output type: lcpuarchs -vv
Therefore, if a software executable built with the target specific compiler options runs on a machine withnot suitable CPUs, then the “Illegal instruction” error message will be triggered. For example, if you compileyour program on haswell node (eg. lena sub-cluster) with the gcc compiler option -march=native, thenthe program will not run on nehalem nodes (eg. tane sub-cluster).
This section explains how to build a software on the cluster system to avoid the mentioned problem.
In your HOME (or BIGWORK) directory create build/install directories for every available CPU architecture andalso the directory source for storing software installation sources.
In the exmple below we want to compile an example software my-soft of version 3.1.
login03„$ mkdir -p „/sw/sourcelogin03„$ mkdir -p „/sw/nehalem/my-soft/3.1/{install,build}login03„$ mkdir -p „/sw/haswell/my-soft/3.1/{install,build}login03„$ mkdir -p „/sw/sandybridge/my-soft/3.1/{install,build}
Copy software installation archive to the source directory.
login03„$ mv my-soft-3.1.tar.gz „/sw/source
Build my-soft for every available CPU architecture by submitting an interactive job to node with the properCPU type. For example, to compile my-soft for nehalem nodes.
login03„$ qsub -I -l nodes=1:nehalem:ppn=4,walltime=6:00:00,mem=16gb
Then unpack and build the software.
taurus-n034„$ tar -zxvf „/sw/source/my-soft-3.1.tgz -C „/sw/$ARCH/my-soft/3.1/buildtaurus-n034„$ ./configure --prefix=„/sw/$ARCH/my-soft/3.1/install && make && make install
Finally, use the environment variable $ARCH in your jobs scripts to access the right installation path of my-softexecutable for the current work node.
login03„$ cat job-my-soft.sh#PBS -N my-soft#PBS -l nodes=4:ppn=8,walltime=12:0:0,mem=128gb,
# change to work dircd $PBS_O_WORKDIR
# run my_soft„/sw/$ARCH/my-soft/3.1/install/bin/my-soft.exe file.input
47
Leibniz Universität IT Services
8.2 EasyBuild
EasyBuild is a software build and installation framework that allows you to manage (scientific) software onHigh Performance Computing (HPC) systems in an efficient way.
8.2.1 EasyBuild framework
The EasyBuild framework is available in the cluster through the module EasyBuild-custom. This moduledefines the location of the EasyBuild configuration files, recipes and installation directories. You can loadEasyBuild module using the command:
module load EasyBuild-custom
EasyBuild software and modules will be installed by default under the following directory:
$BIGWORK/my.soft/software/$ARCH$BIGWORK/my.soft/modules/$ARCH
Here the variable ARCH will be either haswell, broadwell or sandybridge. The command lcpuarchsexecuted on the cluster login nodes lists all currently available values of ARCH. You can override the de-fault software and module installation directory, and the location of your EasyBuild configuration files(MY_EASYBUILD_REPOSITORY) by exporting the environment variables listed below, before loading the Easy-Build modulefile:
export EASYBUILD_INSTALLPATH=/your/preferred/installation/direxport MY_EASYBUILD_REPOSITORY=/your/easybuild/repository/dirmodule load EasyBuild-custom
8.2.2 How to build your software
After you load the EasyBuild environment as explained in the section above, you will have the commandeb available to build your code using EasyBuild. If you want to build the code using a given configuration<filename>.eb and resolving dependencies, you will use the flag -r as in the example below:
eb <filename>.eb -r
The build command just needs the configuration file name with the extension .eb and not the full path,provided that the configuration file is in your search path: the command eb --show-config will print thevariable robot-paths that holds the search path. More options are available, please have a look at the shorthelp message typing eb -h. For instance, you can check if any EasyBuild configuration file already exists fora given program name, using the search flag -S:
eb -S <program_name>
You will be able to load the modules created by EasyBuild in the directory defined bythe EASYBUILD_INSTALLPATH variable using the following commands:
module use $EASYBUILD_INSTALLPATH/modules/$ARCH/allmodule load <modulename>/version
The command module use will prepend the selected directory to your MODULEPATH environment variable,therefore the command module avail will show modules of your software as well. Note that by defaultthe variable EASYBUILD_INSTALLPATH is set to the directory within your $BIGWORK. However, by default the$BIGWORK is not readable by other users. Therefore if you want ot make your software available to anothercluster user with username user_name, you have to make your software installation path readable for the useras follows
setfacl -m u:user_name:x $BIGWORKsetfacl -R -m u:user_name:rx $BIGWORK/my.soft
48
Leibniz Universität IT Services
8.2.3 Further Reading
• EasyBuild documentation
• Easyconfigs repository
49
Leibniz Universität IT Services
8.3 Singularity containers
Please note: These instructions were written for Singularity 2.4.2 and 2.6.1.
“Singularity enables users to have full control of their environment. Singularity containers can be used topackage entire scientific workflows, software and libraries, and even data.”1
8.3.1 Singularity overview
Singularity enables users to execute containers on High-Performance Computing (HPC) cluster like they arenative programs or scripts on a host computer. For example, if the cluster system is running Scientific Linux,but your application runs in Ubuntu, you can create an Ubuntu container image, install your application intothat image, copy the image to the cluster system and run your application using Singularity in its nativeUbuntu environment.
One of the main benefits of Singularity is that containers are executed as a non-privileged user on the clustersystem and can have access to network file systems like HOME, BIGWORK and PROJECT.
Additionally, Singularity properly integrates with the Message Passing Interface (MPI), and utilizes communica-tion fabrics such as InfiniBand and Intel Omni-Path Architecture.
8.3.2 Singularity containers on cluster system
If you want to create a new container and set up a new environment for your jobs, we recommend that youstart by reading the Singularity documentation. The basic steps to get started are detailed below.
8.3.3 Install Singularity on your local computer
Before you can create your image you need to install singularity on a system where you have administrativerights. The superuser rights are also necessary to bootstrap and modify your own container image. To installSingularity on your personal machine do the following:
VERSION=2.4.2wgethttps://github.com/singularityware/singularity/releases/download/$VERSION/singularity-$VERSION.
tar.gztar xvf singularity-$VERSION.tar.gzcd singularity-$VERSION/./configure --prefix=/usr/localmakesudo make install
To verify Singularity is installed, run
singularity
For detailed instructions on Singularity installation, refer to the online documentation.
8.3.4 Create a Singularity container using recipe file
For a reproducible container, the recommended practice is to create containers using a Singularity recipe file.Note that you can not bootstrap or modify your containers on the cluster system, you can only run them. Therecipe below builds a Cents 7 container.
Create the following recipe file called centos7.def.
1Singularity homepage
50
Leibniz Universität IT Services
BootStrap: yumOSVersion: 7MirrorURL:http://mirror.centos.org/centos-%{OSVERSION}/%{OSVERSION}/os/$basearch/Include: yum wget
%setupecho "This section runs on the host outside the container during bootstrap"
%postecho "This section runs inside the container during bootstrap"
# install packages in the containeryum -y groupinstall "Development Tools"yum -y install vim python epel-releaseyum -y install python-pip
# install tensorflowpip install --upgrade pippip install --upgrade tensorflow
# enable access to BIGWORK and PROJECT storage on the cluster systemmkdir -p /bigwork /project
%runscriptecho "This is what happens when you run the container"
echo "Arguments received: $*"exec /usr/bin/python "$@"
%testecho "This test will be run at the very end of the bootstrapping process"
python --version
This recipe file uses the YUM bootstrap module to bootstrap the core operation system, CentOS 7, within thecontainer. For other bootstrap modules (e.g.. docker) and details on singularity recipe files, refer to the onlinedocumentation.
Next, build a container. This has to be executed on a machine where you have root privileges.
sudo singularity build centavo.img centos7.def
Shell into the newly built container.
singularity shell centos7.img
By default Singularity containers are build as read-only squashfs image files. If you need to modify yourcontainer, e.g.. install additional software, you need to convert the squashfs container to a writeable sandboxone first.
sudo singularity build --sandbox centos7-sandbox centos7.img
This creates a sandbox directory called centos7-sandbox which you can then shell into and make changes.
sudo singularity shell --writable centos7-sandbox
However, to keep the container reproducible, it is strongly recommended to perform all required changes viathe recipe file.
To see other singularity command-line options, issue the following command.
singularity help
51
Leibniz Universität IT Services
8.3.5 Create a Singularity container using Docker or Singularity Hub
Another easy way to obtain and use a container is to pull it directly from Docker Hub or Singularity Hub imagerepositories. For further details refer to singularity online documentation.
8.3.6 Upload the container image to your BIGWORK directory at the cluster system
Linux users can transfer the container with the following command. Feel free to use other methods of yourchoice.
scp centos7.img [email protected]:/bigwork/username
8.3.7 Running a container image
Please note: In order for you to be able to run your container, it must be located in your BIGWORK directory.
Login on to the cluster system
Load Singularity module and run your container image.
username@login01:„$ cd $BIGWORKusername@login01:„$ module load GCC/4.9.3-2.25 Singularity/2.4.2username@login01:„$ singularity run centos7.img --version
The run singularity sub-command will carry out all instructions in the %runscript section of the containerrecipe file. Use the singularity sub-command exec to run any command from inside the container. For example,to get the content of file /etc/os-release inside the container, issue the command.
username@login01:„$ singularity exec centos7.img cat /etc/os-release
Please note: You can access (read & write) your HOME, BIGWORK and PROJECT (only login nodes) storage frominside your container. In addition, /scratch (only on work nodes) and /tmp directories of a host machine areautomaticaly mounted in a container.
8.3.8 Singularity & parallel MPI applications
In order to containerize your parallel MPI application and run it properly on the cluster system you have toprovide MPI library stack inside your container. In addition, userspace driver for Mellanox InfiniBand HCAsshould be installed in the container to utilize cluster InfiniBand fabric as a MPI transport layer.
This example Singularity recipe file ubuntu-openmpi.def retrieves an Ubuntu container from Docker Huband installes required MPI and InfiniBand packages.
BootStrap: dockerFrom: ubuntu:xenial
%post# install openmpi & infinibandapt-get updateapt-get -y install openmpi-bin openmpi-common libibverbs1 libmlx4-1
# enable access to BIGWORK and PROJECT storage on the cluster systemmkdir -p /bigwork /project
# enable access to /scratch dir. required by mpi jobsmkdir -p /scratch
52
Leibniz Universität IT Services
Once you have built an image file ubuntu-openmpi.img and transferred it over to the cluster system, asexplained in the previous section, your MPI application can be run as follows (assuming you have alreadyreserved a number of cluster work nodes).
module load foss/2016amodule load GCC/4.9.3-2.25 Singularity/2.4.2mpirun singularity exec ubuntu-openmpi.img /path/to/your/parallel-mpi-app
The lines above can be entered on the command line of an interactive session as well as inserted into a batchjob script.
8.3.9 Further Reading
• Singularity home page
• Singularity Hub
• Docker Hub
53
Leibniz Universität IT Services
8.4 Hadoop/Spark
The Apache Hadoop is a framework that allows for the distributed processing of large data sets. TheHadoop-cluster module helps to launch Hadoop or Spark cluster within the cluster system and manage themby the cluster batch job scheduler.
8.4.1 Hadoop - setup and running
To run your Hadoop applications on the cluster system you should perform the following steps. The first step isto allocate some number of cluster work machines interactively or in a batch job script. Then, Hadoop clusterhas to be started on the allocated nodes. Next, when the Hadoop cluster is up and running, you can launchyou Hadoop applications to the cluster. Once your applications are finished, the Hadoop cluster has to be shutdown (job termination automatically stops the running Hadoop cluster).
The following example runs the simple word-count MapReduce java application on the Hadoop cluster. Thescript requests 6 nodes, totally allocating 6ˆ40 CPUs to the Hadoop cluster for 30 minutes. The Hadoop clusterwith the persistent HDFS storage located on your $BIGWORK/hadoop-cluster/hdfs directory is started bythe command hadoop-cluster start -p. After completion of the word-count Hadoop application, thecommand hadoop-cluster stop shuts the cluster down
We recommend you to run your hadoop jobs on dumbo cluster partition. The dumbo cluster nodes provide large( 21TB on each node) local disk storage.
#!/bin/bash#PBS -N Hadoop.cluster#PBS -l nodes=6:ppn=40,mem=3000gb,walltime=0:30:00#PBS -W x=PARTITION:dumbo
# Change to work dircd $PBS_O_WORKDIR
# Load the cluster management modulemodule load Hadoop-cluster/1.0
# Start Hadoop cluster# Cluster storage is located on local disks of reserved nodes.# The storage is not persistent (removed after Hadoop termination)hadoop-cluster start
# Report filesystem info&statshdfs dfsadmin -report
# Start the word count apphadoop fs -mkdir -p /data/wordcount/inputhadoop fs -put -f $HADOOP_HOME/README.txt $HADOOP_HOME/NOTICE.txt /data/wordcount/inputhadoop fs -rm -R -f /data/wordcount/outputhadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /
data/wordcount/input/data/wordcount/output
hadoop fs -ls -R -h /data/wordcount/outputrm -rf outputhadoop fs -get /data/wordcount/output
# Stop hadoop clusterhadoop-cluster stop
The command hadoop-cluster status shows the status and configuration of running Hadoop cluster. Referto the example Hadoop job script at $HC_HOME/examples/test-hadoop-job.sh for other possible HDFSstorage options. Note that to access the variable $HC_HOME you should have the Hadoop-cluster moduleloaded. Note also that you can not load the Hadoop-cluster module on cluster login nodes.
54
Leibniz Universität IT Services
8.4.2 Spark - setup and running
Apache Spark is a large-scale data processing engine that performs in-memory computing. Spark has muchmore advantages over the MapReduce framework as far as the real-time processing of large data sets isconcerned. It claims to process data up to 100x faster than Hadoop MapReduce in memory, while 10x fasterwith disks. Spark offers bindings in Java, Scala, Python and R for building parallel applications.
Because of its high memory and I/O bandwidth requirements, we recommend you to run your spark jobs ondumbo cluster partition.
The batch script below, which asks for 4 nodes and 40 CPUs per node, executes the example java applicationSparkPi that estimates the constant π. The command hadoop-cluster start –spark starts Spark clusteron Hadoop’s resource manager YARN, which in turn runs on the allocated cluster nodes. Spark job submitionto the running Spark cluster is done by the spark-submit command.
Fine tuning of Spark’s configuration can be done by setting parameters in the variable $SPARK_OPTS.
#!/bin/bash#PBS -N Spark.cluster#PBS -l nodes=4:ppn=40,mem=2000gb,walltime=2:00:00#PBS -W x=PARTITION:dumbo
# Load modulesmodule load Hadoop-cluster/1.0
# Start hadoop cluser with spark supporthadoop-cluster start --spark
# Submit Spark job## spark.executor.instances - total number of executors# spark.executor.cores - number of cores per executor# spark.executor.memory - amount of memory per executorSPARK_OPTS="--conf spark.driver.memory=4g--conf spark.driver. cores=1--conf spark.executor.instances=17--conf spark.executor.cores=5--conf spark.executor.memory=14g"
spark-submit ${SPARK_OPTS} --class org.apache.spark.examples.SparkPi \$SPARK_HOME/examples/jars/spark-examples_2.11-2.1.1.jar 100
# Stop spark clusterhadoop-cluster stop
The command hadoop-cluster status shows the status and configuration of running Spark cluster.
Alternatively, you can run Spark in an interactve mode as follows:
Submit an interactive batch job requesting 4 nodes and 40 CPUs per node on dumbo cluster partition
login03„$ qsub -I -W x=PARTITION:dumbo -l node=4:ppn=40,mem=2000gb
Once your interactive shell is ready, load the Hadoop-cluster module, next start the Spark cluster and thenrun Python Spark Shell application
dumbo-n011„$ module load Hadoop-cluster/1.0dumbo-n011„$ hadoop-cluster start --sparkdumbo-n011„$ pyspark --master yarn --deploy-mode client
The command hadoop-cluster stop shuts the Spark cluster down.
55
Leibniz Universität IT Services
8.4.3 How to acces the web management pages provided by a Hadoop Cluster
On start-up and also with the command hadoop-cluster status, Hadoop shows where to access the webmanagement pages of your virtual cluster. It will look like this:
=========================================================================== Web interfaces to Hadoop cluster are available at:== HDFS (NameNode) http://dumbo-n0XX.css.lan:50070== YARN (Resource Manager) http://dumbo-n0XX.css.lan:8088== NOTE: your web browser must have proxy settings to access the servers= Please consult the cluster handbook, section "Hadoop/Spark"===========================================================================
When you put this into your browser without preparation, you will most likely get an error, since “css.lan” is apurely local “domain”, which does not exist in the world outside the LUIS cluster.
In order to access pages in this scope, you will need to setup both a browser proxy that recognizes specialaddresses pointing to “css.lan” and an ssh tunnel the proxy can refer to.
This is how you do it on a Linux machine running Firefox:
1. Start Firefox and point it to the address https://localhost:8080 You should get an error message saying“Unable to connect”, since your local computer most probably is not set up to run a local web server at port8080. Continue with step 2.
2. Go to the command line and create an ssh tunnel to a login node of the cluster. Replace <username> withyour own username:
ssh -o ConnectTimeout=20 -C2qTnNf -D 8080 <username>@login.cluster.uni-hannover.de
3. Go to the page you opened in step one and refresh it (click Reload or use Ctrl+R or F5, typically). The errormessage should change into something saying “Secure Connection Failed - could not verify authenticity ofthe received data”. This actually shows that the proxy is running. Continue with step 4.
4. Open a new tab in Firefox and enter the URL about:addons. In the search field “Find more extensions”type “FoxyProxy Standard” and install the AddOn.
5. On the top right of your Firefox, you should see a new icon for FoxyProxy. Click on it and choose “Options”.Then go to “Add”.
• Under “Proxy Type”, choose SOCKS5
• Set “Send DNS through SOCKS5 proxy” to on
• For the Title, we suggest “css.lan”
• IP address, DNS name and server name point to “localhost”
• In the “port” field, enter 8080 (this is the port you used above for your ssh proxy tunnel)
• Username and password remain empty
Click “Save” and choose the Patterns button on the new filter. Add a new white pattern like this: Set “Name”and “Pattern” respectively to css.lan and *.css.lan*. Keep rest of the options default and Click Save.
6. Once again, click on the FoxyProxy icon and make sure that “Use Enabled Proxies By Patterns and Priority”is enabled.
7. Congratulations! You should now be ready to directly view the URLs Hadoop tells you from you ownworkstation.
8. If you want to facilitate/automate starting the ssh tunnel, you could use the following command linesomewhere in your .xsession or .profile. Remember to replace <username> with your actual username:
56
Leibniz Universität IT Services
ps -C ssh -o args | grep ConnectTimeout >& /dev/null || \ssh -o ConnectTimeout=20 -C2qTnNf -D 8080 <username>@login.cluster.uni-hannover.de
8.4.4 Further Reading
• Hadoop home page
• Spark home page
57
Leibniz Universität IT Services
8.5 MATLAB
MATLAB is a technical computing environment for high performance numeric computation and visualization.MATLAB integrates numerical analysis, matrix computation, signal processing and graphics. MATLABtoolboxes are collections of algorithms that enhance MATLAB’s functionality in domains such as signal andimage processing, data analysis and statistics, mathematical modeling, etc.
8.5.1 Versions and Availability
You can use module spider matlab to get a list of available versions. Feel free to contact cluster team ifyou need other versions. MATLAB is proprietary software and can be used on the cluster for non-commercial,academic purposes only. As a free alternative to MATLAB you might consider GNU Octave, which is mostlycompatible with MATLAB and provides most of the features as well.
8.5.2 Running MATLAB
As MATLAB by default utilizes all available CPU cores, excessive use of MATLAB on the login nodes wouldprevent other logged in users from using resources. The recommended way to work with MATLAB is to submita job (interactive or batch) and start MATLAB from within the dedicated compute node assigned to you by thebatch system.
To submit an interactive job, you can use the command (you may adjust the numbers per your need).
login03„$ qsub -I -X -l nodes=1:ppn=12,walltime=02:00:00,mem=36gb
The -X flag enables X11 forwarding on the compute node if you need to use the graphical MATLAB userinterface.
Once you are assigned a compute node, you can run MATLAB interactively by loading the MATLAB module.To load the default version of MATLAB module, use module load MATLAB. To select a particular MATLABversion, use module load MATLAB/version. For example, to load MATLAB version 2017a, use module loadMATLAB/2017a.
To start MATLAB interacively with graphical user interface (GUI), after having loaded MATLAB module, typethe command.
smp-n010„$ matlab
This requires the -X flag for the qsub command as well as X11 forwarding enabled for your SSH client or usingX2GO (recommended) for logging in onto the cluster system. The following command will start an interactive,non-GUI version of MATLAB.
smp-n010„$ matlab -nodisplay -nosplash
Type matlab -h for a complete list of command line options.
In order to run MATLAB non-interactively via the batch system, you will require a batch submission script.Below is an example batch script (matlab-job-serial.sh) for a serial run that will execute MATLAB script(hello.m) in a single thread.
#!/bin/bash#PBS -N matlab_serial#PBS -M [email protected]#PBS -m bae#PBS -j oe#PBS -l nodes=1:ppn=1#PBS -l walltime=00:10:00#PBS -l mem=4gb
# Compute node the job ran onecho "Job ran on:" $HOSTNAME
58
Leibniz Universität IT Services
# Load modulesmodule load MATLAB/2017a
# Change to work dir:cd $PBS_O_WORKDIR
# Log file nameLOGFILE=$(echo $PBS_JOBID | cut -d"." -f1).log
# The program to runmatlab -nodesktop -nosplash < hello.m > $LOGFILE 2>&1
Example MATLAB script, hello.m:
% Example MATLAB script for Hello Worlddisp 'Hello World'exit% end of example file
To run hello.m via the batch system, submit the matlab-job-serial.sh file with the following command:
login03„$ qsub matlab-job-serial.sh
2232617.batch.css.lan
Output from the running of the MATLAB script will be saved in the $LOGFILE file, which in this case expandsto 2232617.log.
8.5.3 Parallel Processing in MATLAB
Please note: MATLAB parallel computing across multiple compute nodes is not supported by the cluster systemat the moment.
MATLAB supports both implicit multithreding as well as explicit parallelism provided by the Parallel ComputingToolbox (PCT) which requires specific commands in your MATLAB code in order to create threads.
Implicit multithreading allows some functions in MATLAB, particularly linear algebra and numerical routinessuch as fft, eig, svd, etc, to distribute the workload between cores of the node that your job is running onand thus run faster than on a single core. By default, all of the current versions of MATLAB available on thecluster have multithreading enabled. A single MATLAB session will run as many threads as there are coreson a compute node reserved for your job by the batch system. For example, if you request nodes=1:ppn=4,your MATLAB session will spawn four threads.
#!/bin/bash#PBS -N matlab_multithread#PBS -M [email protected]#PBS -m bae#PBS -j oe#PBS -l nodes=1:ppn=4#PBS -l walltime=00:20:00#PBS -l mem=16gb
# Compute node the job ran onecho "Job ran on:" $HOSTNAME
# Load modulesmodule load MATLAB/2017a
# Change to work dir:cd $PBS_O_WORKDIR
59
Leibniz Universität IT Services
# Log file nameLOGFILE=$(echo $PBS_JOBID | cut -d"." -f1).log
# The program to runmatlab -nodesktop -nosplash < hello.m > $LOGFILE 2>&1
However, if you want to disable multithreading you may either request nodes=1:ppn=1 (see an example jobscript above) or add the option -singleCompThread when running MATLAB.
8.5.4 Using the Parallel Computing Toolbox
MATLAB Parallel Computing Toolbox (PCT) (parallel for-loops, special array types, etc.) lets you make explicituse of multicore processors, GPUs and clusters by executing applications on workers (MATLAB computationalengines) that run locally. At the moment, the cluster system does not support parallel processing across multiplenodes (MATLAB Distributed Computing Server). As such, parallel MATLAB jobs are limited to a single computenode with the “local” pool through use of the MATLAB PCT.
Specific care must be taken when running multiple MATLAB PCT jobs. When you submit multiple jobs thatare all using MATLAB PCT for parallelization, all of the jobs will attempt to use the same defaut location forstoring information about the MATLAB pools that are in use, thereby creating a race condition where one jobmodifies the files that were put in place by another. The solution is to have each of your jobs that will use thePCT set a unique location for storing job information. An example batch script matlab-job-pct.sh is shownbelow.
#!/bin/bash#PBS -N matlab_multithread_pct#PBS -M [email protected]#PBS -m bae#PBS -j oe#PBS -l nodes=1:ppn=12#PBS -l walltime=00:40:00#PBS -l mem=40gb
# Compute node the job ran onecho "Job ran on:" $HOSTNAME
# Load modulesmodule load MATLAB/2017a
# Change to work dir:cd $PBS_O_WORKDIR
# Log file nameLOGFILE=$(echo $PBS_JOBID | cut -d"." -f1).log
# The program to runmatlab -nodesktop -nosplash < pi_parallel.m > $LOGFILE 2>&1
And the corresponding MATLAB script pi_parallel.m, which in addition starts the correct number of parallelMATLAB workers depending on the requested cores.
% create a local cluster objectpc = parcluster('local')
% explicitly set the JobStorageLocation to% the temp directory that is unique to each cluster jobpc.JobStorageLocation = getenv('TMPDIR')
% start the matlabpool with maximum available workersparpool (pc, str2num(getenv('PBS_NP')))
60
Leibniz Universität IT Services
R = 1darts = 1e7count = 0ticparfor i = 1:darts
x = R*rand(1)y = R*rand(1)if x^2 + y^2 <= R^2
count = count + 1end
end
myPI = 4*count/dartsT = tocfprintf('The computed value of pi is %8.7f\n',myPI)fprintf('The parallel Monte-Carlo method is executed in %8.2f seconds\n', T)delete(gcp)exit
For further details on MATLAB PCT refer to the online documentation.
8.5.5 Build MEX File
See e.g. online documentation for details on how to build mex files. This section is a straight transscript ofthat website adapted to fit the cluster system’s module system. First, get hold of the example file timestwo.cwhich comes with MATLAB. This can be done from within MATLAB.
copyfile(fullfile(matlabroot,'extern','examples','refbook','timestwo.c'),'.','f')
Each MATLAB version needs a specific version of GCC in order to build mex files. This is the crucial part. SeeMATLAB documentation for details.
$ module load MATLAB/2018a$ module load GCC/6.3.0-2.27$ lstimestwo.c$ mex timestwo.cBuilding with 'gcc'.MEX completed successfully.$ lstimestwo.c timestwo.mexa64
Notice the file timestwo.mexa64 was created. You can now use it like a function.
$ matlab -nodesktop -nosplash>> timestwo(6)
ans =
12
8.5.6 Toolboxes and Features
On the cluster system you can use the university’s MATLAB campus licence. Please see for details and a list ofavailable toolboxes and features.
8.5.7 Further Reading
• MATLAB online documentation
61
Leibniz Universität IT Services
• MATLAB Parallel Computing Toolbox
62
Leibniz Universität IT Services
8.6 NFFT
Please note: These instructions were written for NFFT 3.4.1.
NFFT is available as a module on the cluster system. However, there may be situations where you need tocompile your own version. For example if you need the MATLAB interface as mex file. When compiling fromsource, take into account that the cluster system’s CPU architecture is heterogeneous and you best compile aversion for every architecture you will be using in order to avoid problems, see section 8.1.
Execute the following commands to load prerequisites.
$ module load foss/2016a FFTW/3.3.4 gompi/2016a
Download the NFFT sources and change to the directoty containing nfft-3.4.1 sources. Execute the followingcommands in order to build NFFT with MATLAB 2018a interface. Substitute zzzzsaal with your own user name.
$ ./configure --prefix=/bigwork/zzzzsaal/nfft/nfft-3.4.1-compiled--with-matlab=/sw-eb/apps/software/haswell/Core/MATLAB/2018a/$ make$ make install
You end up with libnfft.mexa64 located in the lib directory. Use it as detailed here. In order to test your build,start the MATLAB version you built for.
$ module load MATLAB/2018a$ matlab -nodisplay -nosplash
Inside MATLAB issue the following commands. Change zzzzsaal to your user name.
>> addpath(genpath('/bigwork/zzzzsaal/nfft/nfft-3.4.1-compiled/lib'))>> cd /bigwork/zzzzsaal/nfft/nfft-3.4.1-compiled/share/nfft/matlab/nfft/>> simple_test
63
Leibniz Universität IT Services
8.7 ANSYS
8.7.1 ANSYS Workbench
ANSYS workbench can be started with the following command.
runwb2
8.7.2 ANSYS Tips
You might come across an error like this when running ANSYS on enos nodes.
ansysdis201: Rank 0:8: MPI_Init_thread: multiple pkey found in partition keytable, please choose one via MPI_IB_PKEY
Enos nodes of the cluster system do not have an InfiniBand interconnect but use Omni-Path instead. If youwould like to run ANSYS on enos nodes, chose the correct partition key (pkey) by adding the following line toyour job script before calling the ANSYS application.
[[ $HOSTNAME =„ ^enos-.* ]] && export MPI_IB_PKEY=0x8001
However, sometimes ANSYS, or the underlying mpi-implementation used, does not seem to honour the exportedvariable causing the error to persist. Unfortunately the ANSYS documentation on their mpi implementations isscarce. In this case please contact ANSYS support or exclude enos nodes from your job in order to circumventthe error. You can write a PARTITION line in your resource specification with all partitions you would like touse except enos. There is no option to exclude enos or any other partition.
64
Leibniz Universität IT Services
8.8 COMSOL
COMSOL Multiphysics is a finite element analysis, solver and simulation software/FEA software package forvarious physics and engineering applications, especially for coupled phenomena, and multiphysics. In additionto conventional physics-based user interfaces, COMSOL Multiphysics also allows entering coupled systems ofpartial differential equations (PDEs)
8.8.1 Prerequisite for use on the cluster system
In order to use COMSOL on the cluster system, you need a valid license, which usually means that your homeinstitute permits you to use theirs. Institutes buy licenses primarily for local use on their own workstationsand you cannot use them by default on the cluster. That means if you want to use your institute’s licenses onthe cluster system, you will need to contact us beforehand, because the corresponding user names need to beadded to the unix group “comsol” on the cluster system before they are allowed to use COMSOL here.
A license token can either be used only locally, or additionally on the cluster system. As long as it is in use,regardless of where that may be, it can not be used elsewhere. Please be aware that licenses available on thecluster system to the “comsol” unix group are available to everyone in that group. This includes multipleinstitutes. I.e. if you decide to make your license available for use on the cluster, it will not be exclusive to youor your institute. Usually, though, this does not pose a problem, since normally not all license tokens are usedat the exact same times.
Please contact our colleagues from the license management department in order to purchase licenses. 2
8.8.2 Using COMSOL on the cluster
COMSOL can run either in graphical/interactive (GUI) mode or in batch mode.
To start the graphical user interface in an interactive batch job, you could use:
qsub -I -X -l nodes=1:ppn=12,walltime=02:00:00,mem=36gb
After the system tells you your batch job is ready, load the corresponding module:
module load COMSOL/5.4
The following command lists all available versions:
module avail comsol
Then start the COMSOL GUI by typing
comsol -np $PBS_NP -3drend sw
Please avoid starting the program on login nodes so you do not steal resources from other users – yourjobs would be killed here after using 30 minutes cpu time, anyway. Either use an interactive job or use an“Interactive Apps” in the web portal. Both methods are described in detail earlier in this handbook.
In order to use COMSOL in batch mode, you will need a batch job script.
Attention: COMSOL is a rather memory intensive application. if you do not explicitly request memory, youwill only get 1800 MB for the job in total as this is the default value.
Here is a sample script to run a COMSOL job on one node:
#!/bin/bash -login#PBS -N COMSOL#PBS -M [email protected]#PBS -l walltime=1:00:00#PBS -l nodes=1:ppn=8
2https://www.luis.uni-hannover.de/software.html
65
Leibniz Universität IT Services
#PBS -l mem=32GB#PBS -j oe
# load modulemodule load COMSOL/5.4
# go to work dircd $PBS_O_WORKDIR
# start comsolcomsol batch -np $PBS_NP -inputfile infile.mph -outputfile outfile.mph -tmpdir $TMPDIR
Another sample script starting an MPI job on two nodes:
#!/bin/bash -login#PBS -N COMSOL#PBS -M [email protected]#PBS -l walltime=1:00:00#PBS -l nodes=2:ppn=8#PBS -l mem=64GB#PBS -j oe
# load modulemodule load COMSOL/5.4
# go to wrok dircd $PBS_O_WORKDIR
cat $PBS_NODEFILE | uniq > hostfile# start comsol (all options must be on a single line)comsol -nn $PBS_NUM_NODES -np $PBS_NUM_PPN batch -f hostfile -mpirsh ssh
-inputfile input_cluster.mph -outputfile output_cluster.mph -tmpdir $TMPDIR
In case you want to run parallel COMSOL jobs, we strongly recommend to first do a scaling study.
We do not provide a central COMSOL server, but of course you may start one yourself from within the job.
WARNING: COMSOL can produce BIG recovery files that by default will be placed in your home directory. Thisin turn may be a reason to unexpectedly exceed your home quota (the maximum amount of space you mayoccupy in $HOME). You will not be able to login graphically if you exceed your home quota, so avoid this byeither moving the recovery directory to $BIGWORK using the option -recoverydir $BIGWORK/comsol, orby disabling recovery files altogether using the option -autosave off.
66
Leibniz Universität IT Services
Transferring files into the archive
Please note: The archive is operated as part of the service Archivierung and thus not part of the cluster system.
The archive can be used to store results and simulation data permanently. Each account has to be registered forarchive use, before using it. This can be done on the BIAS-website after logging in with your user name andpassword. After clicking on the link entitled Ihren Benutzernamen für die Nutzung des Archivsystems zulassenit takes roughly an hour, before the archive can be used.
9.1 Quota
Archival storage in the archive system of Leibniz Universität Hannover is controlled by a quota mechanism.There is a quota on the amount of files as well as storage space. Please see the website of the archive servicefor further details at http://www.luis.uni-hannover.de/archivierung.html.
9.2 Transferring data into the archive
In order to transfer data into the archive of Leibniz Universität Hannover, it is recommended to use the cluster’sdedicated transfer node, see 3.2 and 3.8.
9.3 Login with lftp
The archive can be reached at archiv.luis.uni-hannover.de using the lftp command.
username@clustertransfer:„$ lftp <username>@archiv.luis.uni-hannover.de
After entering your cluster user name’s password the lftp prompt appears.
lftp <username>@archiv.luis.uni-hannover.de:„>
Now you can use the ls command to list your directory contents at the archive. At the same time this is to testan established connection to the archive.
lftp <username>@archiv.luis.uni-hannover.de:„> ls
At your first login to the archive system with your account the directory is empty. The ls command will notreturn any listing. You can terminate the connection with exit.
lftp <username>@archiv.luis.uni-hannover.de:„> exit<username>@clustertransfer:„$
Aliases for exit are quit and bye.
9.4 Copying files into the archive
On the cluster system’s transfer node change to the directory where the data to be copied are located.
clustertransfer:„$ cd $BIGWORK/my_data_dirclustertransfer:/bigwork/username/my_data_dir$
Leibniz Universität IT Services
After logging in using lftp the put command is used.
clustertransfer:/bigwork/username/my_data_dir$ lftp<username>@archiv.luis.uni-hannover.de:„>lftp <username>@archiv.luis.uni-hannover.de:„> put myfile.tar.gz
The file myfile.tar.gz is located inside the directory we previously changed to in this example. After usingput to transfer the file it is also available on the archive. The TAB key works for completing file and directorynames in lftp as well.
Saving multiple small files in the archive is not desired, because at least one copy of the data are kept onmagnetic tape. Therefore a constant stream of data is desirable which can be achieved by some large files. It isrecommended to use tar or zip to combine small files into one bigger file. This can also optimize your quota.
In order to transfer multiple (large) files at once, the mput command can be used. This is short for multiple put.The mput command understands the wildcard * as it is used in bash.
lftp <username>@archiv.luis.uni-hannover.de:„> mput mydata*.tar.gz
9.5 Fetching files from the archive
In order to get fetch files from the archive, the get command can be used.
lftp <username>@archiv.luis.uni-hannover.de:„> get myfile.tar.gz
This command puts the file at the location the lftp command was issued from which transferred the fileinto the archive. For fetching more than one file the mget command can be used (multiple get). Fetching thefile may take some time until transfer starts. This time is needed by the storage robot to find the respectivemagnetic tape and wind the tape to the position the file is located at.
9.6 Some useful commands
Listing the current directories’ contents can be achieved by the command !ls. An exclamation mark executesthe command on the machine lftp was started on. On the contrary listing the current local directory can bedone with lpwd at the lftp prompt.
It is possible to create directories in the archive using the mkdir command.
lftp <username>@archiv.luis.uni-hannover.de:„> mkdir myDir
Changing directories works in the usual way using cd.
lftp <username>@archiv.luis.uni-hannover.de:„> cd myDir
And back up one directory.
lftp <username>@archiv.luis.uni-hannover.de:„> cd ..
A local directory can be changed using the lcd command, short for local cd.
lftp <username>@archiv.luis.uni-hannover.de:„> lcd /bigwork/<username>/datadir
9.7 Further reading
• man page lftp
clustertransfer:„$ man lftp
navigate using the arrow keys and exit with ’q’
68
Leibniz Universität IT Services
• Service Archivierung:http://www.luis.uni-hannover.de/archivierung.html
69
Leibniz Universität IT Services
Citing the cluster system
We welcome citing the cluster system in your publications. If you would like some help phrasing youracknowledgement, feel free to use the examples below. We appreciate it.
Example 1
Die Ergebnisse/Teile der Ergebnisse, die in dieser Arbeit vorgestellt sind, wurden mithilfe desClustersystems an der Leibniz Universität Hannover berechnet.
The results presented here were (partially) carried out on the cluster system at the Leibniz Universityof Hannover, Germany.
Example 2
Diese Arbeit wurde vom Team des Clustersystems der Leibniz Universität Hannover unterstützt.
We acknowledge the support of the cluster system team at the Leibniz University of Hannover,Germany in the production of this work.
Example 3
Diese Arbeit wurde unterstützt vom Compute-Cluster, welches von der Leibniz Universität Hannover,vom Niedersächsischen Ministerium für Wissenschaft und Kultur (MWK) und der DeutschenForschungsgemeinschaft (DFG) getragen wird.
This work was supported by the compute cluster, which is funded by the Leibniz UniversitätHannover, the Lower Saxony Ministry of Science and Culture (MWK) and the German ResearchAssociation (DFG).
Leibniz Universität IT Services
When your work is done
When you are done working with the cluster system, it would be nice, if you did a few things before leaving.
1. Clean up your directories
• Clean up $HOME
• Clean up $BIGWORK
• Clean up $PROJECT
2. Have your project leader delete your user account