Upload
bruce-richard
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction to HPC Workshop
October 9 2014
Introduction
Rob Lane
HPC Support
Research Computing Services
CUIT
Introduction
HPC Basics
Introduction
First HPC Workshop
Yeti
• 2 head nodes
• 101 execute nodes
• 200 TB storage
Yeti
• 101 execute nodes–38 x 64 GB–8 x 128 GB–35 x 256 GB–16 x 64 GB + Infiniband–4 x 64 GB + nVidia K20 GPU
Yeti
• CPU– Intel E5-2650L–1.8 GHz–8 Cores–2 per Execute Node
Yeti
• Expansion Round–66 new systems–Faster CPU–More Infiniband–More GPU (nVidia K40)–ETA January 2015
Yeti
HP S6500 Chassis
HP SL230 Server
Job Scheduler
• Manages the cluster
• Decides when a job will run
• Decides where a job will run
• We use Torque/Moab
Job Queues
• Jobs are submitted to a queue
• Jobs sorted in priority order
• Not a FIFO
Access
Mac Instructions
1. Run terminal
Access
Windows Instructions
1. Search for putty on Columbia home page
2. Select first result
3. Follow link to Putty download page
4. Download putty.exe
5. Run putty.exe
Work Directory
$ cd /vega/free/users/your UNI
• Replace “your UNI” with your UNI
$ cd /vega/free/users/hpc2108
Copy Workshop Files
• Files are in /tmp/workshop
$ cp /tmp/workshop/* .
Editing
No single obvious choice for editor
• vi – simple but difficult at first• emacs – powerful but complex• nano – simple but not really standard
nano
$ nano hellosubmit
“^” means “hold down control”
^a : go to beginning of line
^e : go to end of line
^k: delete line
^o: save file
^x: exit
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
# Set output and error directories
#PBS -o localhost:/vega/free/users/UNI#PBS -e localhost:/vega/free/users/UNI
# Print "Hello World"
echo "Hello World"
# Sleep for 10 seconds
sleep 10
# Print date and time
date
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
# Set output and error directories
#PBS -o localhost:/vega/free/users/UNI#PBS -e localhost:/vega/free/users/UNI
# Print "Hello World"
echo "Hello World"
# Sleep for 10 seconds
sleep 10
# Print date and time
date
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m n#PBS -V
hellosubmit
#!/bin/sh
# Directives
#PBS -N HelloWorld#PBS -W group_list=yetifree#PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m n#PBS -V
hellosubmit
# Set output and error directories
#PBS -o localhost:/vega/free/users/UNI#PBS -e localhost:/vega/free/users/UNI
hellosubmit
# Set output and error directories
#PBS -o localhost:/vega/free/users/UNI#PBS -e localhost:/vega/free/users/UNI
hellosubmit
# Print "Hello World"
echo "Hello World"
# Sleep for 10 seconds
sleep 10
# Print date and time
date
hellosubmit
$ qsub hellosubmit
hellosubmit
$ qsub hellosubmit298151.elk.cc.columbia.edu$
hellosubmit
$ qsub hellosubmit298151.elk.cc.columbia.edu$
qstat
$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1
hellosubmit
$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1
hellosubmit
$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1
hellosubmit
$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1
hellosubmit
$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1
hellosubmit
$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1
hellosubmit
$ qsub hellosubmit298151.elk.cc.columbia.edu$ qstat 298151Job ID Name User Time Use S Queue---------- ------------ ---------- -------- - -----298151.elk HelloWorld hpc2108 0 Q batch1$ qstat 298151qstat: Unknown Job Id Error 298151.elk.cc.columbia.edu
hellosubmit
$ ls -ltotal 4-rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit-rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151-rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151
hellosubmit
$ ls -ltotal 4-rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit-rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151-rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151
hellosubmit
$ ls -ltotal 4-rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit-rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151-rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151
hellosubmit
$ ls -ltotal 4-rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit-rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151-rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151
hellosubmit
$ ls -ltotal 4-rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit-rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151-rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151
hellosubmit
$ cat HelloWorld.o298151Hello WorldThu Oct 9 12:44:05 EDT 2014
hellosubmit
$ cat HelloWorld.o298151Hello WorldThu Oct 9 12:44:05 EDT 2014
Any Questions?
Interactive
• Most jobs run as “batch”• Can also run interactive jobs• Get a shell on an execute node• Useful for development, testing,
troubleshooting
Interactive
$ cat interactiveqsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb
Interactive
$ cat interactiveqsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb
Interactive
$ cat interactiveqsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb
Interactive
$ cat interactiveqsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb
Interactive
$ cat interactiveqsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb
Interactive
$ cat interactiveqsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb
Interactive
$ qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mbqsub: waiting for job 298158.elk.cc.columbia.edu to start
Interactive
qsub: job 298158.elk.cc.columbia.edu ready
.--. ,-,-,--(/o o\)-,-,-,. ,' // oo \\ ', ,' /| __ |\ ', ,' //\,__,/\\ ', , /\ /\ , , /'`\ /' \ , | /' `\ /' '\ | | \ ( ) / | ( /\| /' '\ |/\ ) \| /' /'`\ '\ |/ | /' `\ | ( ( ) ) `\ \ /' /' `\ \ /' /' / / \ \ v v v v v v +--------------------------------+ | | | You are in an interactive job. | | | | Your walltime is 00:05:00 | | | +--------------------------------+
Interactive
$ hostnamecharleston.cc.columbia.edu
Interactive
$ exitlogout
qsub: job 298158.elk.cc.columbia.edu completed$
GUI
• Can run GUI’s in interactive jobs
• Need X Server on your local system
• See user documentation for more information
User Documentation
• hpc.cc.columbia.edu
• Go to “HPC Support”
• Click on Yeti user documentation
Job Queues
• Scheduler puts all jobs into a queue
• Queue selected automatically
• Queues have different settings
Queue Time Limit Memory Limit
Max. User Run
Batch 1 12 hours 4 GB 512
Batch 2 12 hours 16 GB 128
Batch 3 5 days 16 GB 64
Batch 4 3 days None 8
Interactive 4 hours None 4
Job Queues
qstat -q
$ qstat -q
server: elk.cc.columbia.edu
Queue Memory CPU Time Walltime Node Run Que Lm State---------------- ------ -------- -------- ---- --- --- -- -----batch1 4gb -- 12:00:00 -- 42 15 -- E Rbatch2 16gb -- 12:00:00 -- 129 73 -- E Rbatch3 16gb -- 120:00:0 -- 148 261 -- E Rbatch4 -- -- 72:00:00 -- 11 12 -- E Rinteractive -- -- 04:00:00 -- 0 1 -- E Rinterlong -- -- 48:00:00 -- 0 0 -- E Rroute -- -- -- -- 0 0 -- E R ----- ----- 330 362
yetifree
• Maximum processors limited–Currently 4 maximum
• Storage quota–16 GB
• No email support
yetifree
$ quota -sDisk quotas for user hpc2108 (uid 242275): Filesystem blocks quota limit grace files quota limit gracehpc-cuit-storage-2.cc.columbia.edu:/free/ 122M 16384M 16384M 8 4295m 4295m
yetifree
$ quota -sDisk quotas for user hpc2108 (uid 242275): Filesystem blocks quota limit grace files quota limit gracehpc-cuit-storage-2.cc.columbia.edu:/free/ 122M 16384M 16384M 8 4295m 4295m
from: root <[email protected]>to: [email protected]: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu
PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161
from: root <[email protected]>to: [email protected]: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu
PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161
from: root <[email protected]>to: [email protected]: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu
PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161
from: root <[email protected]>to: [email protected]: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu
PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161
from: root <[email protected]>to: [email protected]: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu
PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161
from: root <[email protected]>to: [email protected]: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu
PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161
from: root <[email protected]>to: [email protected]: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu
PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161
from: root <[email protected]>to: [email protected]: Wed, Oct 8, 2014 at 11:41 PMsubject: PBS JOB 298161.elk.cc.columbia.edu
PBS Job Id: 298161.elk.cc.columbia.eduJob Name: HelloWorldExec host: dublin.cc.columbia.edu/4Execution terminatedExit_status=0resources_used.cput=00:00:02resources_used.mem=8288kbresources_used.vmem=304780kbresources_used.walltime=00:02:02Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161
Intern
• Research Computing Services (RCS) is looking for an intern
• Paid position• ~10 hours a week• Will be on LionShare next week
MPI
• Message Passing Interface
• Allows applications to run across multiple computers
MPI
• Edit MPI submit file
• Load MPI environment module
• Compile sample program
MPI
#!/bin/sh
# Directives
#PBS -N MpiHello#PBS -W group_list=yetifree#PBS -l nodes=3:ppn=1,walltime=00:01:00,mem=20mb#PBS -M [email protected]#PBS -m abe#PBS -V
# Set output and error directories
#PBS -o localhost:/vega/free/users/UNI#PBS -e localhost:/vega/free/users/UNI
# Load mpi module.
module load openmpi
# Run mpi program.
mpirun mpihello
MPI
$ module load openmpi$ which mpicc/usr/local/openmpi/bin/mpicc$ mpicc -o mpihello mpihello.c
MPI
$ qsub mpisubmit298501.elk.cc.columbia.edu
Questions?
Any questions?