42
Before We Start Sign in hpcXX account slips Windows Users: Download PuTTY – Google PuTTY – First result – Save putty.exe to Desktop

•Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Before We Start• Sign in

• hpcXX account slips

• Windows Users: Download PuTTY– Google PuTTY

– First result

– Save putty.exe to

Desktop

Page 2: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Research Computing at Virginia TechAdvanced Research Computing

Page 3: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Compute Resources

Page 4: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Blue Ridge Computing Cluster • Resources for running jobs

– 408 dual-socket nodes with 16 cores/node

– Intel Sandy Bridge-EP Xeon (8 cores) in each socket

– 4 GB/core, 64 GB/node– total: 6,528 cores, 27.3 TB memory

• Special nodes:– 130 nodes with 2 Intel Xeon Phi

accelerators

– 18 nodes with 128 GB

– 4 nodes with 2 Nvidia K40 GPUs

• Quad-data-rate (QDR) InfiniBand

Page 5: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

HokieSpeed – CPU/GPU Cluster• 206 nodes, each with:

– Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs and 24 GB of RAM

– Two NVIDIA M2050 Fermi GPUs (448 cores/socket)

• Total: 2,472 CPU cores, 412 GPUs, 5 TB of RAM • Top500 #221, Green500 #43 (November 2012)• 14-foot by 4-foot 3D visualization wall

• Recommended Uses: – Large-scale GPU computing

– Visualization

Page 6: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

HokieOne - SGI UV SMP System

• 492 Intel Xeon 7542 (2.66GHz) cores– Sockets: Six-core Intel Xeon X7542 (Westmere)

– 41 dual-socket blades for computation

– One blade for system + login

• 2.6TB of Shared Memory (NUMA)– 64 GB/blade, blades connected with NUMAlink

• Resources scheduled by blade (6 cores, 32 GB)

• Recommended Uses:– Memory-heavy applications (up to 1 TB)

– Shared-memory (e.g. OpenMP) applications

Page 7: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Ithaca – IBM iDataPlex

• 84 dual-socket quad-core Intel Nehalem 2.26 GHz nodes (672 cores in all)– 66 nodes available for general use

• Memory (2 TB Total): – 56 nodes have 24 GB (3 GB/core)– 10 nodes have 48 GB (6 GB/core)

• Quad-data-rate (QDR) InfiniBand• Operating System: CentOS 6• Recommended uses:

– Parallel Matlab (28 nodes/224 cores)– Beginning users

Page 8: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Storage Resources

Page 9: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Name Intent File SystemEnvironment

Variable

Per User

Maximum

Data

LifespanAvailable On

Home

Long-term

storage of

files

NFS $HOME 100 GB Unlimited

Login and

Compute

Nodes

Work

Fast I/O,

Temporary

storage

Lustre

(BlueRidge)

GPFS (Other

clusters)

$WORK14 TB,

3 million files120 days

Login and

Compute

Nodes

Archive

Long-term

storage for

infrequently-

accessed files

CXFS $ARCHIVE - Unlimited Login Nodes

Local ScratchLocal disk

(hard drives)$TMPDIR

Size of node

hard driveLength of Job

Compute

Nodes

Memory

(tmpfs)Very fast I/O

Memory

(RAM)$TMPFS

Size of node

memoryLength of Job

Compute

Nodes

Old HomeAccess to

legacy filesNFS - Read-only TBD Login Nodes

Page 10: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

GETTING STARTED ON ARC’S SYSTEMS

Page 11: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Getting Started Steps

1. Apply for an account

2. Log in (SSH) into the system

3. System examplesa. Compile

b. Test (interactive job)

c. Submit to scheduler

4. Compile and submit your own programs

Page 12: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

ARC Accounts

1. Review ARC’s system specifications and choose the right system(s) for you

a. Specialty software

2. Apply for an account online

3. When your account is ready, you will receive confirmation from ARC’s system administrators within a few days

Page 13: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Log In

• Log in via SSH– Mac/Linux have built-in client

– Windows need to download client (e.g. PuTTY)

System Login Address (xxx.arc.vt.edu)

BlueRidge blueridge1 or blueridge2

HokieSpeed hokiespeed1 or hokiespeed2

HokieOne hokieone

Ithaca ithaca1 or ithaca2

Page 14: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

BLUERIDGE ALLOCATION SYSTEM

Page 15: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Blue Ridge Allocation System

Goals of the allocation system:

• Ensure that the needs of computational intensive research projects are met

• Document hardware and software requirements for individual research groups

• Facilitate tracking of research

http://www.arc.vt.edu/userinfo/allocations.php

Page 16: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Allocation Eligibility

To qualify for an allocation, you must meet at least one of the following:

• Be a Ph.D. level researcher (post-docs qualify)

• Be an employee of Virginia Tech and the PI for research computing

• Be an employee of Virginia Tech and the co-PI for a research project led by non-VT PI

Page 17: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Allocation Application Process

1. Create a research project in ARC database

2. Add grants and publications associated with project

3. Create an allocation request using the web-based interface

4. Allocation review may take several days

5. Users may be added to run jobs against your allocation once it has been approved

Page 18: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Allocation Tiers

Research allocations fall into three tiers:

• Less than 200,000 system units (SUs)– 200 word abstract

• 200,000 to 1 million SUs– 1-2 page justification

• More than 1 million SUs– 3-5 page justification

Page 19: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Allocation Management

• Users can be added to the project:https://portal.arc.vt.edu/am/research/my_research.php

• glsaccount: Allocation name and membership

• gbalance -h -a <name>: Allocation size and amount remaining

• gstatement -h -a <name>: Usage (by job)

Page 20: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

USER ENVIRONMENT

Page 21: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Environment

• Consistent user environment in systems – Using modules environment

– Hierarchical module tree for system tools and applications

Page 22: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Modules• Modules are used to set the PATH and other environment

variables• Modules provide the environment for building and running

applications– Multiple compiler vendors (Intel vs GCC) and versions– Multiple software stacks: MPI implementations and versions– Multiple applications and their versions

• An application is built with a certain compiler and a certain software stack (MPI, CUDA) – Modules for software stack, compiler, applications

• User loads the modules associated with an application, compiler, or software stack – modules can be loaded in job scripts

Page 23: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Modules

• Modules are used to set up your PATH and other environment variables

% module {lists options}

% module list {lists loaded modules}

% module avail {lists available modules}

% module load <module> {add a module}

% module unload <module> {remove a module}

% module swap <mod1> <mod2> {swap two modules}

% module help <mod1> {module-specific help}

Page 24: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Module commands

module list options

module list list loaded modules

module avail list available modules

module load <module> add a module

module unload <module> remove a module

module swap <mod1> <mod2> swap two modules

module help <module> module environment

module show <module> module description

module reset reset to default

module purge unload all module

Page 25: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Modules

• Available modules depend on:– The compiler (eg. Intel, gcc) and

– The MPI stack selected

• Defaults:BlueRidge: Intel + mvapich2

– HokieOne: Intel + MPT

– HokieSpeed, Ithaca: Intel + OpenMPI

Page 26: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Hierarchical Module Structure

Page 27: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Modules

• The default modules are provided for minimum functionality.

• Module dependencies against choice of compiler and MPI stack are automatically taken care of.

module purge

module load <compiler>

module load <mpi stack>

module load <high level software, i.e. PETSc>

Page 28: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

JOB SUBMISSION & MONITORING

Page 29: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Job Submission

• Submission via a shell script– Job description: Nodes, processes, run time

– Modules & dependencies

– Execution statements

• Submit job script via the qsub command:qsub <job_script>

Page 30: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Batch Submission Process

Queue: Job script waits for resources.Master: Compute node that executes the job

script, launches all MPI processes.

Page 31: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Job Monitoring

• Determine job status, and if pending when it will run

Command Meaning

checkjob –v JOBID Get the status and resource of a job

qstat –f JOBID Get status of a job

showstart JOBID Get expected job start time

qdel JOBID Delete a job

mdiag -n Show status of cluster nodes

Page 32: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Job Execution

• Order of job execution depends on a variety of parameters:– Submission Time

– Queue Priority

– Backfill Opportunities

– Fairshare Priority

– Advanced Reservations

– Number of Actively Scheduled Jobs per User

Page 33: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Examples: ARC Website

• See the Examples section of each system page for sample submission scripts and step-by-step examples:

• http://www.arc.vt.edu/resources/hpc/blueridge.php

• http://www.arc.vt.edu/resources/hpc/hokiespeed.php

• http://www.arc.vt.edu/resources/hpc/hokieone.php

• http://www.arc.vt.edu/resources/hpc/ithaca.php

Page 34: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Getting Started

• Find your training account (hpcXX)

• Log into BlueRidge– Mac: ssh [email protected]

– Windows: Use PuTTY• http://www.chiark.greenend.org.uk/~sgtatham/putty/

• Host Name: ithaca2.arc.vt.edu

Page 35: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Example: Running MPI_Quad

• Source file: http://www.arc.vt.

edu/resources/software/mpi/docs/mpi_quad.c

• Copy the file to Ithaca– wget command

– Could also use scp or sftp

• Build the code

Page 36: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Compile the Code

• Intel compiler is already loadedmodule list

• Compile command (executable is mpiqd)

mpicc -o mpiqd mpi_quad.c

• To use GCC instead, swap it out:module swap intel gcc

Page 37: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Prepare Submission Script

1. Copy sample script: cp /home/TRAINING/ARC_Intro/it.qsub .

2. Edit sample script:a. Walltime

b. Resource request (nodes/ppn)

c. Module commands (add Intel & mvapich2)

d. Command to run your job

3. Save it (e.g., mpiqd.qsub)

Page 38: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Submission Script (Typical)#!/bin/bash

#PBS -l walltime=00:10:00#PBS -l nodes=2:ppn=8#PBS -q normal_q#PBS -W group_list=ithaca#PBS –A AllocationName <--Only for BlueRidge

module load intel mvapich2

cd $PBS_O_WORKDIRecho "MPI Quadrature!" mpirun -np $PBS_NP ./mpiqd

exit;

Page 39: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Submission Script (Today)#!/bin/bash#PBS -l walltime=00:10:00#PBS -l nodes=2:ppn=8#PBS -q normal_q#PBS -W group_list=training#PBS –A training#PBS –l advres=NLI_ARC_Intro.8

module load intel mvapich2

cd $PBS_O_WORKDIRecho "MPI Quadrature!" mpirun -np $PBS_NP ./mpiqd

exit;

Page 40: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Submit the job

1. Copy the files to $WORK:cp mpiqd $WORKcp mpiqd.qsub $WORK

2. Navigate to $WORKcd $WORK

3. Submit the job:qsub mpiqd.qsub

4. Scheduler returns job number: 25770.master.cluster

Page 41: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Wait for job to complete

1. Check job status:qstat –f 25770 or qstat –u hpcXXcheckjob –v 25770

2. When complete:1. Job output: mpiqd.qsub.o257702. Errors: mpiqd.qsub.e25770

3. Copy results back to $HOME:cp mpiqd.qsub.o25770 $HOME

Page 42: •Sign in •hpcXX account slips •Windows Users: Download ...€¦ · HokieSpeed – CPU/GPU Cluster •206 nodes, each with: –Two 6-core 2.40-gigahertz Intel Xeon E5645 CPUs

Resources

• ARC Website: http://www.arc.vt.edu• ARC Compute Resources & Documentation:

http://www.arc.vt.edu/resources/hpc/

• New Users Guide: http://www.arc.vt.edu/userinfo/newusers.php

• Frequently Asked Questions: http://www.arc.vt.edu/userinfo/faq.php

• Unix Introduction: http://www.arc.vt.edu/resources/software/unix/