22
ARCHER Advanced Research Computing High End Resource Nick Brown [email protected]

ARCHER Advanced Research Computing High End Resource Nick Brown [email protected]

Embed Size (px)

Citation preview

Page 1: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

ARCHERAdvanced Research Computing High End Resource

Nick Brown

[email protected]

Page 2: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

http://[email protected]

Website Location

Page 3: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

Machine overviewARCHER (a Cray XC30) is a Massively Parallel Processor (MPP) supercomputer design built from many thousands of individual nodes.

There are two basic types of nodes in any Cray XC30:• Compute nodes (4920)

• These only do user computation and are always referred to as “Compute nodes”

• 24 cores per node, therefore approx 120,000 cores

• Service/Login nodes (72/8) • Login nodes – allow users to log in and perform interactive tasks• Other misc service functions

• Serial/Post-Processing Nodes (2)

About ARCHER

Page 4: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

Interacting with the systemUsers do not log directly into the system. Instead they run commands via an esLogin server. This server will relay commands and information via a service node referred to as a “Gateway node”

Computenode

Computenode

LNET Nodes

Computenode

Computenode

Gatewaynode

Computenode

Computenode

esLoginnode

Lustre OSS

Lustre OSS

CrayAries

Interconnect

Cray XC30 CabinetsCray Sonnexion

Filesystem

Ext

erna

l N

etw

ork

Infiniband links

Ethernet

User guide

Serialnode

Page 5: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

Job submission example

my_job.pbs

nbrown23@eslogin008:~> qsub my_job.pbs50818.sdbnbrown23@eslogin008:~>

PBS QUEUE

Test-job.o50818

Test-job.e50818

Compute node

Compute node

Compute node

Compute node

Compute node

Compute node

nbrown23@eslogin008:~> qstat –u $USER50818.sdb nbrown23 standard test-job -- 2 48 -- 00:20 Q -- nbrown23@eslogin008:~> qstat –u $USER50818.sdb nbrown23 standard test-job 29053 2 48 -- 00:20 R 00:00

#!/bin/bash --login

#PBS -l select=2

#PBS -N test-job#PBS -A budget

#PBS -l walltime=0:20:0

# Make sure any symbolic links are resolved to absolute pathexport PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR) aprun -n 48 -N 24 ./hello_world

Quick start guide

Page 6: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

ARCHER LayoutCompute node architecture and topology

Page 7: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

Cray XC30 nodeThe XC30 Compute node features:• 2 x Intel® Xeon®

Sockets/die• 12 core Ivy Bridge

• 64GB in normal nodes• 128GB in 376 “high

memory” nodes

• 1 x Aries NIC• Connects to shared Aries

router and wider network

Cray XC30 Compute Node

NUMA Node 1NUMA Node 0

Intel® Xeon®12 Core die

AriesRouter

Intel® Xeon®12 Core die

Aries NIC

32GB 32GB

PCIe 3.0

Aries Network

QPI

DDR3

Page 8: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

XC30 Compute Blade

Page 9: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

Cray XC30 Rank1 Network

o Chassis with 16 compute bladeso 128 Socketso Inter-Aries communication over

backplaneo Per-Packet adaptive Routing

Page 10: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

16 Aries connected by backplane

Cray XC30 Rank-2 Copper Network

4 nodes connect to a single Aries

6 backplanes connected with

copper cables in a 2-cabinet group:

Active optical cables interconnect

groups

2 Cabinet Group

768 Sockets

Page 11: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

Copper & Optical Cabling

OpticalConnections

CopperConnections

Page 12: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

ARCHER FilesystemsBrief Overview

Page 13: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

Nodes and filesystems

RDF /home /work

Login/PP Nodes Compute Nodes

Page 14: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

ARCHER Filesystems• /home (/home/n02/n02/<username>)

• Small (200 TB) filesystem for critical data (e.g. source code)• Standard performance (NFS)• Fully backed up

• /work (/work/n02/n02/<username>)• Large (>4 PB) filesystem for use during computations• High-performance, parallel (Lustre) filesystem• No backup

• RDF (/nerc/n02/n02/<username>)• Research Data Facility• Very large (26 PB) filesystem for persistent data storage (e.g. results)• High-performance, parallel (GPFS) filesystem• Backed up via snapshots

User guide

Page 15: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

Research Data Facility• Mounted on machines such as:

• ARCHER (service and PP nodes)• DiRAC Bluegene/Q (frontend nodes)• Data Transfer Nodes (DTN)• Jasmin

• Data Analytic Cluster (DAC)• Run compute, memory, or IO intensive analyses on data hosted on

the service.• Nodes are specifically tailored for data intensive work with direct

connections to the disks.• Separate from ARCHER but very similar architecture

RDF guide

Page 16: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

ARCHER SoftwareBrief Overview

Page 17: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

Cray’s Supported Programming Environment

17

Programming Languages

Fortran

C

C++

I/O Libraries

NetCDF

HDF5

Optimized Scientific Libraries

LAPACK

ScaLAPACK

BLAS (libgoto)

Iterative Refinement

Toolkit

Cray Adaptive FFTs (CRAFFT)

FFTW

Cray PETSc (with CASK)

Cray Trilinos (with CASK)

Cray developed

Licensed ISV SW

3rd party packaging

Cray added value to 3rd party

3rd Party Compilers

• Intel Composer

GNU

Compilers

Cray Compiling Environment

(CCE)

Programming models

Distributed Memory (Cray MPT)• MPI• SHMEM

PGAS & Global View• UPC (CCE)• CAF (CCE)• Chapel

Shared Memory• OpenMP 3.0• OpenACC

Python

•CrayPat• Cray Apprentice2

Tools

Environment setup

Debuggers

Modules

Allinea (DDT)

lgdb

Debugging Support Tools

•Abnormal Termination Processing

Performance Analysis

STAT

Scoping Analysis

Reveal

Page 18: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

Module environment• Software is available via the module environment

• Allows you to load in different packages and different versions of packages

• Deals with potential library conflicts

• This is based around the module command• List currently loaded modules: module list• List all modules: module available• Load a module: module load x• Unload a module: module unload x

Best practice guide

Page 19: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

ARCHER SAFEService Administration

https://www.archer.ac.uk/safe

Page 20: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

SAFE• SAFE is an online ARCHER management system which

all users have an account on• Request machine accounts• Reset passwords• View resource usage

• Primary way in which PIs manage their ARCHER projects• Management of project users • Track user’s project usage• Email users of the project

SAFE user guide

Page 21: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

Project resources• Machine usage is charged in kAUs.

• This is time running your jobs on each compute node, 0.36 kAUs for a node hour.

• There is no usage charge for time spent working on the login nodes, post processing nodes or RDF DAC

• You can track usage via the SAFE or the budgets command (calculated daily.)

• Disk quotas• There is no specific charge made for disk usage, but all projects

have quotas• If you need more disk space then contact the PI or us if you

manage the project

User guide

Page 22: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

To conclude….• You will be using ARCHER during this course

• If you have any questions then let us know

• The documentation on the archer website is a good reference tool• Especially the quick start guide

• In normal use if you have any questions or can not find something then contact the helpdesk• [email protected]