View
228
Download
6
Tags:
Embed Size (px)
Citation preview
Introduction to Introduction to Scientific ComputingScientific Computing
Introduction to Introduction to Scientific ComputingScientific Computing
Shubin Liu, Ph.D.Research Computing Center
University of North Carolina at Chapel Hill
its.unc.edu 2
Course GoalsCourse Goals
An introduction to high-performance computing and UNC Research Computing Center
Available Research Computing hardware facilities
Available software packages
Serial/parallel programming tools and libraries
How to efficiently make use of Research Computing facilities on campus
its.unc.edu 3
AgendaAgenda
Introduction to High-Performance Computing
Hardware Available
• Servers, storage, file systems, etc.
How to Access
Programming Tools Available
• Compilers & Debugger tools
• Utility Libraries
• Parallel Computing
Scientific Packages Available
Job Management
Hands-on Exercises– 2nd hourThe PPT format of this presentation is available here:
http://its2.unc.edu/divisions/rc/training/scientific//afs/isis/depts/its/public_html/divisions/rc/training/scientific/short_courses/
its.unc.edu 4
Pre-requisitesPre-requisites
An account on Emerald cluster
UNIX Basics
Getting started: http://help.unc.edu/?id=5288
Intermediate: http://help.unc.edu/?id=5333
vi Editor: http://help.unc.edu/?id=152
Customizing: http://help.unc.edu/?id=208
Shells: http://help.unc.edu/?id=5290
ne Editor: http://help.unc.edu/?id=187
Security: http://help.unc.edu/?id=217
Data Management: http://help.unc.edu/?id=189
Scripting: http://help.unc.edu/?id=213
HPC Application: http://help.unc.edu/?id=4176
its.unc.edu 5
About Us
About Us
ITS – Information Technology Services
• http://its.unc.edu
• http://help.unc.edu
• Physical locations: 401 West Franklin St. 211 Manning Drive
• 10 Divisions/Departments Information Security IT Infrastructure and Operations Research Computing Center Teaching and Learning User Support and Engagement Office of the CIO Communication Technologies Enterprise Resource Planning Enterprise Applications Finance and Administration
its.unc.edu 6
Research Computing Center
Research Computing Center
Where and who are we and what do we do?• ITS Manning: 211 Manning Drive
• Website
http://its.unc.edu/Research
• Groups
Infrastructure -- Hardware
User Support -- Software
Engagement -- Collaboration
its.unc.edu 7
About MyselfAbout Myself
Ph.D. from Chemistry, UNC-CH
Currently Senior Computational Scientist @ Research Computing Center, UNC-CH
Responsibilities:
• Support Computational Chemistry/Physics/Material Science software
• Support Programming (FORTRAN/C/C++) tools, code porting, parallel computing, etc.
• Offer short training courses for campus users
• Conduct research and engagement projects in Computational Chemistry Development of DFT theory and concept tools
Applications in biological and material science systems
its.unc.edu 8
What is Scientific Computing?
What is Scientific Computing?
Short Version
• To use high-performance computing (HPC) facilities to solve real scientific problems.
Long Version, from Wikipedia.com
• Scientific computing (or computational science) is the field of study concerned with constructing mathematical models and numerical solution techniques and using computers to analyze and solve scientific and engineering problems. In practical use, it is typically the application of computer simulation and other forms of computation to problems in various scientific disciplines.
its.unc.edu 9
Engineering
Sciences
Natural
Sciences
Computer
Science Applied
Mathematics Scientific Computing
Theory/Model Layer
Algorithm Layer
Hardware/Software
Application Layer
From scientific discipline viewpoint From operational viewpoint
Parallel Computing
High- Performance
Computing
ScientificComputing
From Computing Perspective
What is Scientific Computing?
What is Scientific Computing?
its.unc.edu 10
What is HPC?What is HPC?
Computing resources which provide more than an order of magnitude more computing power than current top-end workstations or desktops – generic, widely accepted.
HPC ingredients:
• large capability computers (fast CPUs)
• massive memory
• enormous (fast & large) data storage
• highest capacity communication networks (Myrinet, 10 GigE, InfiniBand, etc.)
• specifically parallelized codes (MPI, OpenMP)
• visualization
its.unc.edu 11
Why HPC?Why HPC?
What are the three-dimensional structures of all of the proteins encoded by an organism's genome and how does structure influence function, both spatially and temporally?
What patterns of emergent behavior occur in models of very large societies? How do massive stars explode and produce the heaviest elements in the
periodic table? What sort of abrupt transitions can occur in Earth’s climate and ecosystem
structure? How do these occur and under what circumstances? If we could design
catalysts atom-by-atom, could we transform industrial synthesis? What strategies might be developed to optimize management of complex
infrastructure systems? What kind of language processing can occur in large assemblages of neurons? Can we enable integrated planning and response to natural and man-made
disasters that prevent or minimize the loss of life and property?
http://www.nsf.gov/pubs/2005/nsf05625/nsf05625.htm
its.unc.edu 12
Machine/CPU
Type
LINPACK Performanc
e
Peak Performan
ce
Intel Pentium 4 (2.53 GHz) 2355 5060
NEC SX-6/1 (1proc. 2.0 ns) 7575 8000
Compaq ES45 (1000 MHz) 1542 2000
Intel Pentium III (933 MHz) 507 933
Intel Pentium II Xeon (450 MHz)
295 450
HP rx5670 Itanium2 (1GHz) 3528 4000
IBM eServer pSeries 690 (1300 MHz)
2894 5200
Cray SV1ex-1-32(500MHz) 1554 2000
AMD Athlon MP1800+(1530MHz)
1705 3060
SGI Origin 2000 (300 MHz) 533 600
Sun UltraSPARC (167MHz) 237 333
1 CPU, Units in MFLOPS (x106)
Reference: http://performance.netlib.org/performance/html/linpack.data.col0.html
Measure of PerformanceMeasure of Performance
Mega FLOPS (x106)Giga FLOPS (x109)Tera FLOPS (x1012)Peta FLOPS (x1015)Exa FLOPS (x1018)Zetta FLOPS (x1021)Yotta FLOPS (x1024)
http://en.wikipedia.org/wiki/FLOPS
its.unc.edu 13
How to Quantify Performance? TOP500
How to Quantify Performance? TOP500
A list of the 500 most powerful computer systems over the world
Established in June 1993
Compiled twice a year (June & November)
Using LINPACK Benchmark code (solving linear algebra equation aX=b )
Organized by world-wide HPC experts, computational scientists, manufacturers, and the Internet community
Homepage: http://www.top500.org
its.unc.edu 14
TOP500:November 2009TOP500:November 2009
TOP 5, Units in GFLOPS (=1024 MGLOPS)
Rank SiteManufacturer Computer Country Year Cores RMax RPeak
1Oak Ridge National Laboratory
Cray Inc.
Cray XT5-HE Opteron Six Core 2.6 GHz
United States 2009 224162 1759000 2331000
2 DOE/NNSA/LANL IBM
BladeCenter QS22/LS21 Cluster, PowerXCell 8i 3.2 Ghz / Opteron DC 1.8 GHz, Voltaire Infiniband
United States 2009 122400 1042000 1375780
3
National Institute for Computational Sciences/University of Tennessee
Cray Inc.
Cray XT5-HE Opteron Six Core 2.6 GHz
United States 2009 98928 831700 1028850
4Forschungszentrum Juelich (FZJ) IBM Blue Gene/P Solution
Germany 2009 294912 825500 1002700
5
National SuperComputer Center in Tianjin/NUDT NUDT
NUDT TH-1 Cluster, Xeon E5540/E5450, ATI Radeon HD 4870 2, Infiniband China 2009 71680 563100 1206190
its.unc.edu 15
List Systems Highest RankingSum Rmax
(GFlops)Sum Rpeak
(GFlops)Site Efficiency
(%)
11/2009 1 239 28770.00 38821.10 74.11
06/2009 1 175 28770.00 38821.10 74.11
11/2008 1 88 28770.00 38821.10 74.11
06/2008 1 67 28770.00 38821.10 74.11
11/2007 1 36 28770.00 38821.10 74.11
06/2007 1 25 28770.00 38821.10 74.11
11/2006 1 104 6252.00 7488.00 83.49
06/2006 1 74 6252.00 7488.00 83.49
11/2003 1 393 439.30 1209.60 36.32
06/1999 1 499 24.77 28.80 86.01
TOP500 History of UNC-CH Entry
TOP500 History of UNC-CH Entry
its.unc.edu 16
Shared memory - single address space. All processors have access to a pool of shared memory. (examples: Chastity/zephyr, happy/yatta, cedar/cypress, sunny) Methods of memory access : Bus and Crossbar
Distributed memory - each processorhas it’s own local memory. Must do message passing to exchange data between processors. (examples: Emerald, Topsail Clusters)
MEMORY
BUS
CPU CPU CPU CPUCPU CPU CPU CPU
MMMM
NETWORK
Shared/Distributed-Memory Architecture
Shared/Distributed-Memory Architecture
its.unc.edu 17
What is a Beowulf Cluster?
What is a Beowulf Cluster?
A Beowulf system is a collection of personal computers constructed from commodity-off-the-shelf hardware components interconnected with a system-area-network and configured to operate as a single unit, parallel computing platform (e.g., MPI), using an open-source network operating system such as LINUX.
Main components:• PCs running LINUX OS
• Inter-node connection with Ethernet,
Gigabit, Myrinet, InfiniBand, etc.
• MPI (message passing interface)
its.unc.edu 18
LINUX Beowulf Clusters
LINUX Beowulf Clusters
its.unc.edu 19
What is Parallel Computing ?What is Parallel Computing ?
Concurrent use of multiple processors to process data
•Running the same program on many processors.
•Running many programs on each processor.
its.unc.edu 20
Advantages of ParallelizationAdvantages of Parallelization
Cheaper, in terms of Price/Performance Ratio
Faster than equivalently expensive uniprocessor machines
Handle bigger problems
More scalable: the performance of a particular program may be improved by execution on a large machine
More reliable: In theory if processors fail we can simply use others
its.unc.edu 21
Catch: Amdahl's Law Catch: Amdahl's Law
Speedup = 1/(s+p/n)
its.unc.edu 22
Parallel Programming Tools
Parallel Programming Tools
Share-memory architecture
• OpenMP
Distributed-memory architecture
•MPI, PVM, etc.
its.unc.edu 23
OpenMPOpenMP
An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism
What does OpenMP stand for?
• Open specifications for Multi Processing via collaborative work between interested parties from the hardware and software industry, government and academia.
Comprised of three primary API components:
• Compiler Directives
• Runtime Library Routines
• Environment Variables Portable:
• The API is specified for C/C++ and Fortran
• Multiple platforms have been implemented including most Unix platforms and Windows NT
Standardized:
• Jointly defined and endorsed by a group of major computer hardware and software vendors
• Expected to become an ANSI standard later??? Many compilers can automatically parallelize a code with OpenMP!
its.unc.edu 24
OpenMP Example (FORTRAN)
OpenMP Example (FORTRAN)
PROGRAM HELLO
INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS,
+ OMP_GET_THREAD_NUM
C Fork a team of threads giving them their own copies of variables
!$OMP PARALLEL PRIVATE(TID)
C Obtain and print thread id
TID = OMP_GET_THREAD_NUM()
PRINT *, 'Hello World from thread = ', TID
C Only master thread does this
IF (TID .EQ. 0) THEN
NTHREADS = OMP_GET_NUM_THREADS()
PRINT *, 'Number of threads = ', NTHREADS
END IF
C All threads join master thread and disband
!$OMP END PARALLEL
END
its.unc.edu 25
The Message Passing Model
The Message Passing Model
Parallelization scheme for distributed memory.
Parallel programs consist of cooperating processes, each with its own memory.
Processes send data to one another as messages
Message can be passed around among compute processes
Messages may have tags that may be used to sort messages.
Messages may be received in any order.
its.unc.edu 26
MPI: Message Passing Interface
MPI: Message Passing Interface
Message-passing model Standard (specification)
• Many implementations (almost each vendor has one)
• MPICH and LAM/MPI from public domain most widely used
• GLOBUS MPI for grid computing Two phases:
• MPI 1: Traditional message-passing
• MPI 2: Remote memory, parallel I/O, and dynamic processes Online resources
• http://www-unix.mcs.anl.gov/mpi/index.htm
• http://www-unix.mcs.anl.gov/mpi/mpich/
• http://www.lam-mpi.org/
• http://www.mpi-forum.org
• http://www-unix.mcs.anl.gov/mpi/tutorial/learning.html
its.unc.edu 27
A Simple MPI CodeA Simple MPI Code
#include "mpi.h" #include <stdio.h>
int main( argc, argv ) int argc; char **argv;
{ MPI_Init( &argc, &argv ); printf( "Hello world\n" ); MPI_Finalize(); return 0; }
include ‘mpif.h’integer myid, ierr, numprocs
call MPI_INIT( ierr)call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs,ierr)
write(*,*) ‘Hello from ‘, myidwrite(*,*) ‘Numprocs is’, numprocscall MPI_FINALIZE(ierr)
end
C Version FORTRAN Version
its.unc.edu 28
Other Parallelization Models
Other Parallelization Models
VIA: Virtual Interface Architecture -- Standards-based Cluster Communications
PVM: a portable message-passing programming system, designed to link separate host machines to form a ``virtual machine'' which is a single, manageable computing resource. It’s largely an academic effort and there has been no much development since 1990s.
BSP: Bulk Synchronous Parallel Model, a generalization of the widely researched PRAM (Parallel Random Access Machine) model
Linda:a concurrent programming model from Yale, with the primary concept of ``tuple-space''
HPF: PGI’s first standard parallel programming language for shared and distributed-memory systems.
its.unc.edu 29
RC Servers @ UNC-CHRC Servers @ UNC-CH
SGI Altix 3700 – SMP, 128 CPUs, cedar/cypress
Emerald LINUX Cluster – Distributed memory, ~500 CPUs, emerald• yatta/p575 IBM AIX nodes
Dell LINUX cluster – Distributed memory 4160 CPUs, topsail
its.unc.edu 30
IBM P690/P575 SMP IBM P690/P575 SMP
-IBM pSeries 690/P575 Model 6C4, Power4+ Turbo, 32 1.7 GHz processors
- access to 4TB of NetApp NAS RAID array used for scratch space, mounted as /nas and /netscr
-OS: IBM AIX 5.3 Maintenance Level 04
- login node: emerald.isis.unc.edu
- compute node:
-yatta.isis.unc.edu 32 CPUs
-P575-n00.isis.unc.edu 16 CPUs
-P575-n01.isis.unc.edu 16 CPUs
-P575-n02.isis.unc.edu 16 CPUs
-P575-n03.issi.unc.edu 16 CPUs
its.unc.edu 31
SGI Altix 3700 SMPSGI Altix 3700 SMP
Servers for Scientific Applications such as Gaussian, Amber, and custom code
Login node: cedar.isis.unc.edu
Compute node: cypress.isis.unc.edu
Cypress: SGI Altix 3700bx2 - 128 Intel Itanium2 Processors (1600MHz), each with 16k L1 cache for data, 16k L1 cache for instructions, 256k L2 cache, 6MB L3 cache, 4GB of Shared Memory (512GB total memory)
Two 70 GB SCSI System Disks as /scr
its.unc.edu 32
SGI Altix 3700 SMPSGI Altix 3700 SMP
Cedar: SGI Altix 350 - 8 Intel Itanium2 Processors (1500MHz), each with 16k L1 cache for data, 16k L1 cache for instructions, 256k L2 cache, 4MB L3 cache, 1GB of Shared Memory (8GB total memory), two 70 GB SATA System Disks.
RHEL 3 with Propack 3, Service Pack 3
No AFS (HOME & pkg space) access
Scratch Disk:
/netscr, /nas, /scr
its.unc.edu 33
Emerald ClusterEmerald Cluster
General purpose Linux Cluster for Scientific and Statistical Applications
Machine Name: emerald.isis.unc.edu
2 Login Nodes: IBM BladeCenter, one Xeon 2.4GHz, 2.5GB RAM and one Xeon 2.8GHz, 2.5GB RAM
18 Compute Nodes: Dual AMD Athlon 1600+ 1.4GHz MP Processor, Tyan Thunder MP Motherboard, 2GB DDR RAM on each node
6 Compute Nodes: Dual AMD Athlon 1800+ 1.6GHz MP Processor, Tyan Thunder MP Motherboard, 2GB DDR RAM on each node
25 Compute Nodes: IBM BladeCenter, Dual Intel Xeon 2.4GHz, 2.5GB RAM on each node
96 Compute Nodes: IBM BladeCenter, Dual Intel Xeon 2.8GHz, 2.5GB RAM on each node
15 Compute Nodes: IBM BladeCenter, Dual Intel Xeon 3.2GHz, 4.0GB RAM on each node
Access to 10 TB of NetApp NAS RAID array used for scratch space, mounted as /nas and /scr
Login: emerald.isis.unc.edu Access to 7TB of NetApp NAS RAID array used for scratch
space, mounted as /nas and /scr OS: RedHat Enterprise Linux 3.0 TOP500: 395th place in the June 2003 release.
its.unc.edu 34
its.unc.edu 35
Dell LINUX Cluster, TopsailDell LINUX Cluster, Topsail 520 dual nodes (4160 CPUs) Xeon (EM64T)
3.6GHz, 2MB L1 cache 2GB memory per CPU
InfiniBand inter-node connection
Not AFS mounted, not open to general public
Access based on peer-reviewed proposal
HPL: 6.252 Teraflops, 74th in 2006 JuneTOP500 list and 104th in the November 2006 list and 25th in the June 2007 list (28.77 teraflops after upgrade)
its.unc.edu 36
TopsailTopsail Login node : topsail.unc.edu 8 CPUs @ 2.3
GHz Intel EM64T with 2x4M L2 cache (Model E5345/Clovertown), 12 GB memory
Compute nodes : 4,160 CPUs @ 2.3 GHz Intel EM64T with 2x4M L2 cache (Model E5345/Clovertown), 12 GB memory
Shared Disk : (/ifs1) 39 TB IBRIX Parallel File System
Interconnect: Infiniband 4x SDR
Resource management is handled by LSF v.7.2, through which all computational jobs are submitted for processing
its.unc.edu 37
File SystemsFile Systems
AFS (Andrew File System): AFS is a distributed network file system that enables files from any AFS machine across the campus to be accessed as easily as files stored locally.
• As ISIS HOME for all users with an ONYEN – the Only Name You’ll Ever Need
• Limited quote: 250 MB for most users [type “fs lq” to view]
• Current production version openafs-1.3.8.6
• Files backed up daily [ ~/OldFiles ]
• Directory/File tree: /afs/isis/home/o/n/onyen For example: /afs/isis/home/m/a/mason, where “mason”
is the ONYEN of the user
• Accessible from emerald, happy/yatta
• But not from cedar/cypress, topsail
• Not sutiable for research computing tasks!
• Recommended to compile, run I/O intensive jobs on /scr or /netscr
• More info: http://help.unc.edu/?id=215#d0e24
its.unc.edu 38
Basic AFS CommandsBasic AFS Commands
To add or remove packages
• ipm add pkg_name, ipm remove pkg_name
To find out space quota/usage
• fs lq
To see and review AFS tokens (read/write-able), which expires in 25 hours
• tokens, klog
Over 300 packages installed in AFS pkg space
• /afs/isis/pkg/
More info available at
• http://its.unc.edu/dci/dci_components/afs/
its.unc.edu 39
Data StorageData Storage
Local Scratch: /scr – local to a machine• Cedar/cypress: 2x500 GB SCSI System Disks • Topsail: /ifs1/scr 39 TB IBRIX Parallel File System • Happy/yatta: 2x500 GB Disk Drives • For running jobs, temporary data storage, not backed up
Network Attached Storage (NAS) – for temporary storage• /nas/uncch, /netscr• >20TB of NetApp NAS RAID array used for scratch space,
mounted as /nas and /scr • For running jobs, temporary data storage, not backed up• Shared by all login and compute nodes (cedar/cypress, happy/yatta, emerald)
Mass Storage (MS) – for permanent storage• Mounted for long term data storage on all scientific computing
servers’ login nodes as ~/ms ($HOME/ms)• Never run jobs in ~/ms (compute nodes do not have ~/ms
access)
its.unc.edu 40
Subscription of Services
Subscription of Services
Have an ONYEN ID
• The Only Name You’ll Ever Need
Eligibility: Faculty, staff, postdoc, and graduate students
Go to http://onyen.unc.edu
its.unc.edu 41
Access to ServersAccess to Servers
To Emerald
• ssh emerald.isis.unc.edu
To cedar
• ssh cedar.isis.unc.edu
To Topsail
• ssh topsail.unc.edu
its.unc.edu 42
Programming ToolsProgramming Tools
Compilers
• FORTRAN 77/90/95
• C/C++
Utility Libraries
• BLAS, LAPACK, FFTW, SCALAPACK
• IMSL, NAG,
• NetCDF, GSL, PETSc
Parallel Computing
• OpenMP
• PVM
• MPI (MPICH, LAM/MPI, OpenMPI, MPICH2)
its.unc.edu 43
Compilers: SMP Machines
Compilers: SMP Machines
Cedar/Cypress – SGI Altix 3700, 128 CPUs
• 64-bit Intel Compiler versions 9.1 and 10.1, /opt/intel
FORTRAN 77/90/95: ifort/ifc/efc
C/C++: icc/ecc
• 64-bit GNU compilers
FORTRAN 77 f77/g77
C and C++ gcc/cc and g++/c++
Yatta/P575 – IBM P690/P575, 32/64CPUs
• XL FORTRAN 77/90 8.1.0.3 xlf, xlf90
• C and C++ AIX 6.0.0.4 xlc, xlC
its.unc.edu 44
Compilers: LINUX Cluster
Compilers: LINUX Cluster
Absoft ProFortran Compilers
• Package Name: profortran
• Current Version: 7.0
• FORTRAN 77 (f77): Absoft FORTRAN 77 compiler version 5.0
• FORTRAN 90/95 (f90/f95): Absoft FORTRAN 90/95 compiler version 3.0 GNU Compilers
• Package Name: gcc
• Current Version: 4.1.2
• FORTRAN 77 (g77/f77): 3.4.3, 4.1.2
• C (gcc): 3.4.3, 4.1.2
• C++ (g++/c++): 3.4.3, 4.1.2 Intel Compilers
• Package Name: intel_fortran intel_CC
• Current Version: 10.1
• FORTRAN 77/90 (ifc): Intel LINUX compiler version 8.1, 9.0, 10.1
• CC/C++ (icc): Intel LINUX compiler version 8.1, 9.0, 10.1 Portland Group Compilers
• Package Name: pgi
• Current Version: 7.1.6
• FORTRAN 77 (pgf77): The Portland Group, Inc. pgf77 v6.0, 7.0.4, 7.1.3
• FORTRAN 90 (pgf90): The Portland Group, Inc. pgf90 v6.0, 7.0.4, 7.1.3
• High Performance FORTRAN (pghpf): The Portland Group, Inc. pghpf v6.0, 7.0.4, 7.1.3
• C (pgcc): The Portland Group, Inc. pgcc v6.0, 7.0.4, 7.1.3
• C++ (pgCC): The Portland Group, Inc. v6.0, 7.0.4, 7.1.3
its.unc.edu 45
LINUX Compiler Benchmark
LINUX Compiler Benchmark
Absoft ProFortran 90
Intel FORTRAN 90
Portland Group FORTRAN 90
GNU FORTRAN 77
Molecular Dynamics (CPU time)
4.19 (4) 2.83 (2) 2.80 (1) 2.89 (3)
Kepler (CPU Time) 0.49 (1) 0.93 (2) 1.10 (3) 1.24 (4)
Linpack (CPU Time)
98.6 (4) 95.6 (1) 96.7 (2) 97.6 (3)
Linpack (MFLOPS) 182.6 (4) 183.8 (1) 183.2 (3) 183.3 (2)
LFK (CPU Time) 89.5 (4) 70.0 (3) 68.7 (2) 68.0 (1)
LFK (MFLOPS) 309.7 (3) 403.0 (2) 468.9 (1) 250.9 (4)
Total Rank 20 11 12 17
•For reference only. Notice that performance is code and compilation flag dependent. For each benchmark, three identical runs were performed and the best CPU timing was chosen among the three and then listed in the Table. Optimization flags: for Absoft -O, Portland Group -O4 -fast, Intel -O3, GNU -O
its.unc.edu 46
Profilers & DebuggersProfilers & Debuggers
SMP machines
•Happy/yatta: dbx, prof, gprof
•Cedar/cypress: gprof
LINUX Cluster
•PGI: pgdebug, pgprof, gprof
•Absoft: fx, xfx, gprof
•Intel: idb, gprof
•GNU: gdb, gprof
its.unc.edu 47
Utility LibrariesUtility Libraries
Mathematic Libraries
• IMSL, NAG, etc.
Scientific Computing
• Linear Algebra
BLAS, ATLAS
EISPACK
LAPACK
SCALAPACK
• Fast Fourier Transform, FFTW
• BLAS/LAPACK, ScaLAPACK
• The GNU Scientific Library, GSL
• Utility Libraries, netCDF, PETSc, etc.
its.unc.edu 48
Utility LibrariesUtility Libraries
SMP Machines
• Yatta/P575: ESSL (Engineering and Scientific Subroutine Library), -lessl
BLAS
LAPACK
EISPACK
Fourier Transforms, Convolutions and Correlations, and Related Computations
Sorting and Searching
Interpolation
Numerical Quadrature
Random Number Generation
Utilities
its.unc.edu 49
Utility LibrariesUtility Libraries
SMP Machines Cedar/Cypress: MKL (Intel Math Kernel Library) 8.0,
-L/opt/intel/mkl721/lib/64 -lmkl -lmkl_lapack -lsolver -lvml -lguide
o BLASo LAPACKo Sparse Solverso FFTo VML (Vector Math Library)o Random-Number Generators
its.unc.edu 50
Utility Libraries for Emerald Cluster
Utility Libraries for Emerald Cluster
Mathematic Libraries
• IMSL
The IMSL Libraries are a comprehensive set of mathematical and statistical functions
From Visual Numerics, http://www.vni.com
Functions include
- Optimization - FFT’s- Interpolation - Differential equations - Correlation - Regression - Time series analysis - and many more
Available in FORTRAN and C
Package name: imsl
Required compiler: Portland Group compiler, pgi
Installed on AFS ISIS package space, /afs/isis/pkg/imsl
Current default version 4.0, latest version 5.0
To subscribe IMSL, type “ipm add pgi imsl”
To compiler a C code, code.c, using IMSL:
pgcc -O $CFLAGS code.c -o code.x $LINK_CNL_STATIC
its.unc.edu 51
Mathematic Libraries
• NAG NAG produces and distributes numerical, symbolic, statistical,
visualisation and simulation software for the solution of problems in a wide range of applications in such areas as science, engineering, financial analysis and research.
From Numerical Algorithms Group, http://www.nag.co.uk Functions include
- Optimization - FFT’s- Interpolation - Differential equations - Correlation - Regression - Time series analysis - Multivariate factor analysis - Linear algebra - Random number generator
Available in FORTRAN and C Package name: nag Available platform: SGI IRIX, SUN Solaris, IBM AIX, LINUX Installed on AFS ISIS package space, /afs/isis/pkg/nag Current default version 6.0 To subscribe IMSL, type “ipm add nag”
Utility Libraries for Emerald Cluster
Utility Libraries for Emerald Cluster
its.unc.edu 52
Utility Libraries for Emerald Cluster
Utility Libraries for Emerald Cluster
Scientific Libraries
• Linear Algebra
BLAS, LAPACK, LAPACK90, LAPACK++, ATALS, SPARSE-BLAS, SCALAPACK, EISPACK, FFTPACK, LANCZOS, HOMPACK, etc.
Source code downloadable from the website: http://www.netlib.org/liblist.html
Compiler dependent
BLAS and LAPACK available for all 4 compiler at AFS ISIS package space, gcc, profortran, intel and pgi
SCALAPACK available for pgi and intel compilers
Assistance available if other versions are needed
its.unc.edu 53
Scientific Libraries
• Other Libraries: not fully implemented yet and thus please be cautious and patient when using them
FFTW http://www.fftw.org/
GSL http://www.gnu.org/software/gsl/
NetCDF http://www.unidata.ucar.edu/software/netcdf/
NCO http://nco.sourceforge.net/
HDF http://hdf.ncsa.uiuc.edu/hdf4.html
OCTAVE http://www.octave.org/
PETSc http://www-unix.mcs.anl.gov/petsc/petsc-as/
……
• If you think more libraries are of broad interest, please recommend to us
Utility Libraries for Emerald Cluster
Utility Libraries for Emerald Cluster
its.unc.edu 54
Parallel ComputingParallel Computing
SMP Machines:
• OpenMP
Compilation: Use “-qsmp=omp” flag on happy Use “-openmp” flag on cedar
Environmental Variable Setup setenv OMP_NUM_THREADS n
• MPI
Compilation: Use “-lmpi” flag on cedar Use MPI capable compilers, e.g., mpxlf, mpxlf90, mpcc, mpCC
• Hybrid (OpenMP and MPI): Do both!
its.unc.edu 55
Parallel Computing With Emerald Cluster
Parallel Computing With Emerald Cluster
Setup
MPI Implementation MPICH/MPICH2 MPI-LAM
MPI Package to be “ipm add”-ed mpich mpi-lam
Vendor\Programming Language F77 F9
0 C C++
F77
F90 C C+
+
GNU Compilers √ √ √ √ √ √
Absoft ProfFortran Compilers √ √ √ √ √ √ √ √
Portland Group Compilers √ √ √ √ √ √ √ √
Intel Compilers √ √ √ √ √ √ √ √
its.unc.edu 56
Setup Vendor \
LanguagePackage
NameFORTRAN
77FORTRAN
90 C C++
GNU gcc
profortran
pgi
intel_fortranintel_CC
mpich or mpi-lam
g77 gcc g++
Absoft ProfFortran f77 f95
Portland Group pgf77 pgf90 pgcc pgCC
Intel ifc ifc icc icc
Commands for Parallel
MPI Compilation
mpif77 mpif90 mpicc mpiCC
Parallel Computing With Emerald Cluster
Parallel Computing With Emerald Cluster
its.unc.edu 57
Setup
• AFS Packages to be “ipm add”-ed
• Notice the order: Compiler is always added first
• Add ONLY ONE compiler into your environment
COMPILER MPICH/MPICH2 MPI-LAM
GNU ipm add gcc mpich ipm add gcc mpi-lam
Absoft ProFortran
ipm add profortran mpich
ipm add profortran mpi-lam
Portland Group ipm add pgi mpich ipm add pgi mpi-lam
Intel ipm add intel_fortran intel_CC mpich
ipm add intel_fortran intel_CC mpi-lam
Parallel Computing With Emerald Cluster
Parallel Computing With Emerald Cluster
its.unc.edu 58
Compilation • To compile an MPI Fortran 77 code, code.f, and to
form an executable, exec%mpif77 -O -o exec code.f
• For a Fortran 90/95 code, code.f90, and to form an executable, exec%mpif90 -O -o exec code.f90
• For a C code, code.c, and to form an executable, exec%mpicc -O -o exec code.c
• For a C++ code, code.cc, and to form an executable, exec%mpiCC -O -o exec code.cc
Parallel Computing With Emerald Cluster
Parallel Computing With Emerald Cluster
its.unc.edu 59
Scientific PackagesScientific Packages
Available in AFS package space To subscribe a package, type “ipm add pkg_name” where
“pkg_name is the name of the package. For example, “ipm add gaussian”
To remove it, type “ipm remove pkg_name” All packages are installed at the /afs/isis/pkg/ directory.
For example, /afs/isis/pkg/gaussian. Categories of scientific packages include:
• Quantum Chemistry
• Molecular Dynamics
• Material Science
• Visualization
• NMR Spectroscopy
• X-Ray Crystallography
• Bioinformatics
• Others
its.unc.edu 60
Scientific Package: Quantum Chemistry
Scientific Package: Quantum Chemistry
Software Package Name Platforms
ABINIT abinit LINUX 4.3.3 YES (MPI)
LINUX
LINUX
LINUX
LINUX
IRIX
MOLFDIR molfdir LINUX 2001 NO
LINUX
LINUX
LINUX
LINUX
LINUX
adf
cerius2
gamess
gaussian
macromodel
molpro
nwchem
materisalstudio
CPMD cpmd 3.9 YES (MPI)
aces2
Current Version Parallel
ADF 2002.02 Yes (PVM)
Cerius2 4.10 Yes (MPI)
GAMESS 2003.9.6 Yes (MPI)
Gaussian 03E01 Yes (OpenMP)
MacroModel 7.1 No
Molpro 2006.6 Yes (MPI)
NWChem 5.1 Yes (MPI)
MaterialStudio 4.2 Yes (MPI)
ACES2 4.1.2 No
its.unc.edu 61
Scientific Package: Molecular Dynamics
Scientific Package: Molecular Dynamics
Software Package Name Platforms
amber LINUX
LINUX
LINUX
IRIX
IRIX
LINUX
IRIX
LINUX
LINUx
TINKER tinker LINUX 4.2 --
LINUX
namd,vmd
gromcs
insightII
macromodel
pmemd
quanta
sybyl
CHARMM charmm 3.0B1 MPI
o
Current Version Parallel
Amber 9.1 MPI
NAMD/VMD 2.5 MPI
Gromacs 3.2.1 MPI
InsightII 2000.3 --
MacroModel 7.1 --
PMEMD 3.0.0 MPI
Quanta 2005 MPI
Sybyl 7.1 --
O 9.0.7 --
its.unc.edu 62
Molecular & ScientificVisualization
Molecular & ScientificVisualization
Software Package Name Platforms
avs LINUX
LINUX
IRIX/LINUX
IRIX
LINUX
LINUX/AIX
IRIX
LINUX
MOIL-VIEW Moil-view IRIX 9.1
MOLDEN molden LINUX 4.0
MOLKEL molkel IRIX 4.3
MOLMOL molmol LINUX 2K.1
MOLSCRIPT molscript IRIX 2.1.2
MOLSTAR molstar IRIX/LINUX 1.0
Avs-express
cerius2
dino
ecce
gaussian
grasp
insightII
Current Version
AVS 5.6
AVS Express 6.2
Cerius2 4.9
DINO 0.8.4
ECCE 2.1
GaussView 4.0
GRASP 1.3.6
InsightII 2000.3
its.unc.edu 63
Molecular & Scientific Visualization
Molecular & Scientific Visualization
Software Package Name Platforms
moviemol LINUX
LINUX
IRIX/LINUX
IRIX/LINUX/AIX
IRIX/LINUX
IRIX
IRIX
LINUX
VMD vmd LINUX 1.8.2
XtalView xtalview IRIX 4.0
GIMP gimp IRIX/LINUX/AIX 1.0.2
XMGR xmgr LINUX 4.1.2
GRACE grace LINUX 5.1.2
IMAGEMAGICK Imagemagick IRIX/LINUX/AIX 6.2.1.3
XV xv IRIX/LINUX/AIX 3.1.0a
nbo
quanta
rasmol
raster3d
spartan
spock
sybyl
Current Version
MOVIEMOL 1.3.1
NBOView 5.0
QUANTA 2005
RASMOL 2.7.3
RASTER3D 2.7c
SPARTAN 5.1.3
SPOCK 1.7.0p1
SYBYL 7.1
its.unc.edu 64
NMR & X-Ray Crystallography NMR & X-Ray Crystallography
Software Package Name Platforms
cnssolve IRIX/LINUX
IRIX/LINUX
IRIX
IRIX/LINUX
IRIX
IRIX/LINUX
IRIX/LINUX
IRIX/LINUX
GAMMA gamma IRIX 4.1.0
MOGUL mogul IRIX/LINUX 1.0
Phoelix phoelix IRIX 1.2
TURBO turbo IRIX 5.5
XPLOR-NIH Xplor_nih IRIX/LINUX 2.11.2
XtalView xtalview IRIX 4.0
aqua
blender
bnp
cambridge
ccp4
cns
felix
Current Version
CNSsolve 1.1
AQUA 3.2
BLENDER 2.28a
BNP 0.99
CAMBRIDGE 5.26
CCP4 4.2.2
CNX 2002
FELIX 2004
its.unc.edu 65
Scientific Package: Bioinformatics
Scientific Package: Bioinformatics
Software Package Name Platforms
bioperl IRIX
IRIX/LINUX
IRIX
IRIX
LINUX
IRIX
IRIX/LINUX
LINUX
SEAVIEW seaview IRIX/LINUX 1.0
AUTODOCK autodock IRIX 3.05
DOCK dock IRIX/LINUX 5.1.1
FTDOCK ftdock IRIX 1.0
HEX hex IRIX 2.4
blast
clustalx
emboss
gcg
iminer
modeller
pise
Current Version
BIOPERL 1.4.0
BLAST 2.2.6
CLUSTALX 8.1
EMBOSS 2.8.0
GCG 11.0
Insightful Miner 3.0
Modeller 7.0
PISE 5.0a
its.unc.edu 66
Why do We Need Job Management Systems?
Why do We Need Job Management Systems?
“Whose job you run in addition to when and where it is run, may be as important as how many jobs you run!”
Effectively optimizes the utilization of resources
Effectively optimizes the sharing of resources
Often referred to as Resource Management Software, Queuing Systems, or Job Management System, etc.
its.unc.edu 67
Job Management Tools
Job Management Tools
PBS - Portable Batch System• Open Source Product Developed at NASA Ames Research Center
DQS - Distributed Queuing System• Open Source Product Developed by SCRI at Florida State University
LSF - Load Sharing Facility• Commercial Product from Platform Computing, Already Deployed at
UNC-CH ITS Computing Servers
Codine/Sun Grid Engine• Commercial Version of DQS from Gridware, Inc. Now owned by SUN.
Condor• A Restricted Source ‘Cycle Stealing’ Product From The University of
Wisconsin
Others Too Numerous To Mention
its.unc.edu 68
Submission host
LIM
Batch API
Master host
MLIM
MBD
Execution host
SBD
Child SBD
LIM
RES
User jobLIM – Load Information ManagerMLIM – Master LIMMBD – Master Batch DaemonSBD – Slave Batch DaemonRES – Remote Execution Server
queue1
2
3
45
6 7
89
10
11
12
13
Loadinformation
otherhosts
otherhosts
bsub app
Operations of LSFOperations of LSF
its.unc.edu 69
Common LSF Commands
Common LSF Commands
lsid
• A good choice of LSF command to start with is the lsid command
lshosts/bhosts
• shows all of the nodes that the LSF system is aware of
bsub
• submits a job interactively or in batch using LSF batch scheduling and queue layer of the LSF suite
bjobs
• isplays information about a recently run job. You can use the –l option to view a more detailed accounting
bqueues
• displays information about the batch queues. Again, the –l option will display a more thorough description
bkill <job ID# >
• kill the job with job ID number of #
bhist -l <job ID# >
• displays historical information about jobs. A “-a” flag can displays information about both finished and unfinished jobs
bpeek -f <job ID#>
• displays the stdout and stderr output of an unfinished job with a job ID of #.
bhpart
• displays information about host partitions
bstop
• Suspend a unfinished jobs bswitch
• switches unfinished jobs from one queue to another
its.unc.edu 70
More about LSFMore about LSF
Type “jle” -- checks job efficiency
Type “bqueues” for all queues on one cluster/machine (-m); Type “bqueues -l queue_name” for more info about the queue named “queue_name”
Type “busers” for user job slot limits
Specific for Emerald:
• cpufree -- to check how many free/idle CPUs avaialble
• pending -- to check how many jobs are still pending
• bfree – to check how many free slots available “bfree –h”
its.unc.edu 71
LSF Queues Emerald ClustersLSF Queues Emerald Clusters
Queues Description
int Interactive jobs
now Preemptive debugging queue, 10 min wall-clock limit, 2 CPUs
week Default queue, one week wall-clock limit, up to 32 CPUs/user
month Long-running serial-job queue, one month wall-clock limit, up to 4 jobs per user
staff ITS Research Computing staff queue
manager For use by LSF administrators
its.unc.edu 72
How to Submit Jobs via LSF on Emerald Clusters
How to Submit Jobs via LSF on Emerald Clusters
Jobs to Interactive Queue
bsub -q int -m cedar -Ip my_interactive_job Serial Jobs
bsub -q week -m cypress my_batch_job
Parallel OpenMP Jobssetenv OMP_NUM_THREADS 4
bsub -q week -n 4 -m cypress my_parallel_job
Parallel MPI Jobsbsub -q week -n 4 -m cypress mpirun -np 4 my_parallel_job
its.unc.edu 73
Peculiars of Emerald Cluster
Peculiars of Emerald Cluster
Parallel Job Submission
CPU TypeResources
-Resub-a
Wrapper
Xeon 2.4 GHz Xeon24, blade,…
Xeon 2.8 GHz Xeon28, blade,…
Xeon 3.2 GHz Xeo32, blade,…
16-Way IBM P575 p5aix,…
lammpi
mpichp4
lammpirun_wrapper
mpichp4_wrapper
Notice that -R and -a flags are mutually exclusive in one command line.
its.unc.edu 74
Run Jobs on Emerald LINUX Cluster
Run Jobs on Emerald LINUX Cluster
Interactive Jobs
bsub -q int -R xeon28 -Ip my_interactive_job Syntax for submitting a serial job is:
bsub -q queuename -R resources executable
• For example
bsub -q week -R blade my_executable To run a MPICH parallel job on AMD Athlon machines with, say, 4 CPUs,
bsub -q idle -n 4 -a mpichp4 mpirun.lsf my_par_job
To run LAM/MPI parallel jobs on IBM BladeCenter machines with, say, 4 CPUs:bsub -q week -n 4 -a lammpi mpirun.lsf my_par_job
its.unc.edu 75
Final Friendly Reminders
Final Friendly Reminders
Never run jobs on login nodes
• For file management, coding, compilation, etc., purposes only
Never run jobs outside LSF
• Fair sharing
Never run jobs on your AFS ISIS home or ~/ms. Instead, on /scr, /netscr, or /nas
• Slow I/O response, limited disk space
Move your data to mass storage after jobs are finished and remove all temporary files on scratch disks
• Scratch disk not backed up, efficient use of limited resources
• Old files will automatically be deleted without notification
its.unc.edu 76
Online ResourcesOnline Resources
Get started with Research Computing:
http://www.unc.edu/atn/hpc/getting_started/index.shtml?id=4196 Programming Tools
http://www.unc.edu/atn/hpc/programming_tools/index.shtml Scientific Packages
http://www.unc.edu/atn/hpc/applications/index.shtml?id=4237 Job Management
http://www.unc.edu/atn/hpc/job_management/index.shtml?id=4484 Benchmarks
http://www.unc.edu/atn/hpc/performance/index.shtml?id=4228 High Performance Computing
http://www.beowulf.org
http://www.top500.org
http://www.linuxhpc.org
http://www.supercluster.org/
its.unc.edu 77
Short CoursesShort Courses
Introduction to Scientific Computing
Introduction to Emerald
Introduction to Topsail
LINUX: Introduction
LINUX: Intermediate
MPI for Parallel Computing
OpenMP for Parallel Computing
MATLAB: Introduction
STATA: Introduction
Gaussian and GaussView
Introduction to Computational Chemistry
Shell Scripting
Python: An Introduction
Introduction to Perlhttp://learnit.unc.edu click “Current Schedule of ITS Workshops”
its.unc.edu
Please direct comments/questions about research computing toE-mail: [email protected]
Please direct comments/questions pertaining to this presentation toE-Mail: [email protected]
Please direct comments/questions about research computing toE-mail: [email protected]
Please direct comments/questions pertaining to this presentation toE-Mail: [email protected]
its.unc.edu 79
Hands-on ExercisesHands-on Exercises
If you haven’t done so yet
• Subscribe the Research Computing services
• Access via SecureCRT or X-Win32 to emerald, topsail, etc.
• Create a working directory for yourself on /netscr or /scr
• Get to know basic AFS, UNIX commands
• Get to know the Emerald Beowulf cluster Compile OpenMP codes on Emerald Compile serial and one parallel (MPI) codes on Emerald Get familiar with basic LSF commands Get to know available packages available in AFS space Submit jobs via LSF using serial or (OpenMP/MPI)parallel
queues
The WORD .doc format of this hands-on exercises is available here: http://its2.unc.edu/divisions/rc/training/scientific/ /afs/isis/depts/its/public_html/divisions/rc/training/scientific/short_courses/labDirections_SciComp.doc