67
High Performance Computing: Technologies and Opportunities Dr. Charles J Antonelli LSAIT ARS May, 2013

High Performance Computing: Technologies and Opportunities

  • Upload
    aerona

  • View
    48

  • Download
    4

Embed Size (px)

DESCRIPTION

High Performance Computing: Technologies and Opportunities. Dr. Charles J Antonelli LSAIT ARS May, 2013. ES13 Mechanics. Welcome! Please sign in If registered, check the box next to your name If walk-in, please write your name, email, standing, unit, and department - PowerPoint PPT Presentation

Citation preview

Page 1: High Performance Computing: Technologies and Opportunities

High PerformanceComputing: Technologies

and OpportunitiesDr. Charles J Antonelli

LSAIT ARSMay, 2013

Page 2: High Performance Computing: Technologies and Opportunities

ES13 2

ES13 MechanicsWelcome! Please sign in

If registered, check the box next to your nameIf walk-in, please write your name, email, standing, unit, and department

Please drop from sessions for which you registered by do not plan to attend – this makes room for folks on the wait listPlease attend sessions that interest you, even if you are on the wait list

5/13

Page 3: High Performance Computing: Technologies and Opportunities

ES13 3

GoalsHigh-level introduction to high-performance computingOverview of high-performance computing resources, including XSEDE and FluxDemonstrations of high-performance computing on GPUs and Flux

5/13

Page 4: High Performance Computing: Technologies and Opportunities

ES13 4

IntroductionsName and departmentArea of researchWhat are you hoping to learn today?

5/13

Page 5: High Performance Computing: Technologies and Opportunities

ES13 5

RoadmapHigh Performance Computing OverviewCPUs and GPUsXSEDEFlux

Architecture & MechanicsBatch Operations & Scheduling

5/13

Page 6: High Performance Computing: Technologies and Opportunities

6

High Performance Computing

5/13ES13

Page 8: High Performance Computing: Technologies and Opportunities

ES13 8

High Performance Computing

5/13http://arc.research.umich.edu/Image courtesy of Frank Vazquez, Surma Talapatra, and Eitan Geva.

Page 9: High Performance Computing: Technologies and Opportunities

ES13 9

Node

ProcessorRAM

Local disk

5/13

P

Process

Page 10: High Performance Computing: Technologies and Opportunities

ES13 10

High Performance Computing

“Computing at scale”Computing cluster

Collection of powerful computers (nodes), interconnected by a high-performance network, connected to large amounts of high-speed permanent storage

Parallel codeApplication whose components run concurrently on the cluster’s nodes

5/13

Page 11: High Performance Computing: Technologies and Opportunities

ES13 11

Coarse-grained parallelism

5/13

Page 12: High Performance Computing: Technologies and Opportunities

ES13 12

Programming Models (1)

Coarse-grained parallelismThe parallel application consists of several processes running on different nodes and communicating with each other over the network

Used when the data are too large to fit on a single node, and simple synchronization is adequate“Message-passing”Implemented using software libraries

MPI (Message Passing Interface)

5/13

Page 13: High Performance Computing: Technologies and Opportunities

ES13 13

Fine-grained parallelism

Cores

RAM

Local disk

5/13

Page 14: High Performance Computing: Technologies and Opportunities

ES13 14

Programming Models (2)

Fine-grained parallelismThe parallel application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives

Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable“Shared-memory parallelism” or “multi-threaded parallelism”Implemented using compilers and software libraries

OpenMP (Open Multi-Processing) 5/13

Page 15: High Performance Computing: Technologies and Opportunities

ES13 15

Advantages of HPCMore scalable than your laptop

Cheaper than a mainframeBuy or rent only what you needCOTS hardware, software, expertise

5/13

Page 16: High Performance Computing: Technologies and Opportunities

ES13 16

Why HPCMore scalable than your laptopCheaper than the mainframeBuy or rent only what you needCOTS hardware, software, expertise

5/13

Page 17: High Performance Computing: Technologies and Opportunities

ES13 17

Good parallelEmbarrassingly parallel

Folding@home, RSA Challenges, password cracking, …http://en.wikipedia.org/wiki/List_of_distributed_computing_projects

Regular structuresEqual size, stride, processingPipelines

5/13

Page 18: High Performance Computing: Technologies and Opportunities

ES13 18

Less good parallelSerial algorithms

Those that don’t parallelize easilyIrregular data & communications structures

E.g., surface/subsurface water hydrology modeling

Tightly-coupled algorithmsUnbalanced algorithms

Master/worker algorithms, where the worker load is uneven

5/13

Page 19: High Performance Computing: Technologies and Opportunities

ES13 19

Amdahl’s LawIf you enhance a fraction f of a computationby a speedup S, the overall speedup is:

5/13

Page 20: High Performance Computing: Technologies and Opportunities

ES13 20

Amdahl’s Law

5/13

Page 21: High Performance Computing: Technologies and Opportunities

ES13 21

CPUs and GPUs

5/13

Page 22: High Performance Computing: Technologies and Opportunities

ES13 22

CPUCentral processing unitExecutes serially instructions stored in memoryA CPU may contain a handful of coresFocus is on executing instructions as quickly as possible

Aggressive caching (L1, L2)Pipelined architectureOptimized execution strategies

5/13

Page 23: High Performance Computing: Technologies and Opportunities

ES13 23

GPUGraphics processing unitParallel throughput architecture

Focus is on executing many GPU cores slowly, rather than a single CPU very quickly

Simpler processorHundreds of cores in a single GPU“Single-Instruction Multiple-Data”Ideal for embarrassingly parallel graphics problems

e.g., 3D projection, where each pixel is rendered independently

5/13

Page 24: High Performance Computing: Technologies and Opportunities

ES13 24

High Performance Computing

5/13http://www.pgroup.com/lit/articles/insider/v2n1a5.htm

Page 25: High Performance Computing: Technologies and Opportunities

ES13 25

GPGPUGeneral-purpose computing on graphics processing unitsUse of GPU for computation in applications traditionally handled by CPUsApplication a good fit for GPU when

Embarrassingly parallelComputationally intensiveMinimal dependencies between data elements

Not so good whenExtensive data transfer from CPU to GPU memory are requiredWhen data are accessed irregularly

5/13

Page 26: High Performance Computing: Technologies and Opportunities

ES13 26

Programming models

CUDANvidia proprietaryArchitectural and programming frameworkC/C++ and extensionsCompilers and software librariesGenerations of GPUs: Fermi, Tesla, Kepler

OpenCLOpen standard competitor to CUDA

5/13

Page 27: High Performance Computing: Technologies and Opportunities

ES13 27

GPU-enabled applications

Application writers provide GPGPU supportAmberGAMESSMATLABMathematica…

See list at http://www.nvidia.com/docs/IO/123576/nv-applications-catalog-lowres.pdf

5/13

Page 28: High Performance Computing: Technologies and Opportunities

ES13 28

DemonstrationTask: Compare CPU / GPU performance in MATLAB

Demonstrated on the Statistics Department & LSA CUDA and Visualization Workstation

5/13

Page 29: High Performance Computing: Technologies and Opportunities

ES13 29

Recommended Session

Introduction to the CUDA GPU and Visualization Workstation Available to LSAPresenter: Seth Meyer

Thursday, 5/9, 1:00 pm – 3:00 pm429 West Hall1085 South University, Central Campus

5/13

Page 30: High Performance Computing: Technologies and Opportunities

ES13 30

Further StudyVirtual School of Computational Science and Engineering (VSCSE)

Data Intensive Summer School (July 8-10, 2013)Proven Algorithmic Techniques for Many-Core Processors (July 29 – August 2, 2013)

https://www.xsede.org/virtual-school-summer-courseshttp://www.vscse.org/

5/13

Page 31: High Performance Computing: Technologies and Opportunities

ES13 31

XSEDE

5/13

Page 32: High Performance Computing: Technologies and Opportunities

ES13 32

XSEDEExtreme Science and Engineering Discovery Environment

Follow-on to TeraGrid“XSEDE is a single virtual system that scientists can use to interactively share computing resources, data and expertise. People around the world use these resources and services — things like supercomputers, collections of data and new tools — to improve our planet.”

5/13

https://www.xsede.org/

Page 33: High Performance Computing: Technologies and Opportunities

ES13 33

XSEDENational-scale collection of resources:

13 High Performance Computing (loosely- and tightly-coupled parallelism, GPCPU)2 High Throughput Computing (embarrassingly parallel)2 Visualization10 StorageGateways

https://www.xsede.org/resources/overview

5/13

Page 34: High Performance Computing: Technologies and Opportunities

ES13 34

XSEDEIn 2012

Between 250 and 300 million SUs consumed in the XSEDE virtual system per month

A Service Unit = 1 core-hour, normalizedAbout 2 million SUs consumed by U-M researchers per month

5/13

Page 35: High Performance Computing: Technologies and Opportunities

ES13 35

XSEDEAllocations required for use

StartupShort application, rolling review cycle, ~200,000 SU limits

EducationFor academic or training courses

ResearchProposal, reviewed quarterly, millions of SUs awarded

https://www.xsede.org/active-xsede-allocations

5/13

Page 36: High Performance Computing: Technologies and Opportunities

ES13 36

XSEDELots of resources available https://www.xsede.org/

User PortalGetting Started guideUser GuidesPublications User groupsEducation & TrainingCampus Champions

5/13

Page 37: High Performance Computing: Technologies and Opportunities

ES13 37

XSEDEU-M Campus ChampionBrock PalenCAEN [email protected]

Serves as advocate & local XSEDE support, e.g.,Help size requests and select resourcesHelp test resourcesTrainingApplication supportMove XSEDE support problems forward

5/13

Page 38: High Performance Computing: Technologies and Opportunities

ES13 38

Recommended Session

Increasing Your Computing Power with XSEDEPresenter: August Evrard

Friday, 5/10, 10:00 am – 11:00 amGallery Lab, 100 Hatcher Graduate Library913 South University, Central Campus

5/13

Page 39: High Performance Computing: Technologies and Opportunities

ES13 39

Flux Architecture

5/13

Page 40: High Performance Computing: Technologies and Opportunities

ES13 40

FluxFlux is a university-wide shared computational discovery / high-performance computing service. Interdisciplinary

Provided by Advanced Research Computing at U-M (ARC)Operated by CAEN HPCHardware procurement, software licensing, billing support by U-M ITSUsed across campus

Collaborative since 2010Advanced Research Computing at U-M (ARC) College of Engineering’s IT Group (CAEN)Information and Technology ServicesMedical SchoolCollege of Literature, Science, and the ArtsSchool of Information

5/13

http://arc.research.umich.edu/resources-services/flux/

Charles Antonelli
Page 41: High Performance Computing: Technologies and Opportunities

ES13 41

The Flux clusterLogin nodes Compute nodes

Storage…

Data transfernode

5/13

Page 42: High Performance Computing: Technologies and Opportunities

ES13 42

A Flux node

12 Intel cores

48 GB RAM

Local disk

Ethernet InfiniBand5/13

Page 43: High Performance Computing: Technologies and Opportunities

ES13 43

A Flux BigMem node

1 TB RAM

Local disk

Ethernet InfiniBand5/13

40 Intel cores

Page 44: High Performance Computing: Technologies and Opportunities

ES13 44

Flux hardware8,016 Intel cores 200 Intel BigMem cores632 Flux nodes 5 Flux BigMem nodes48/64 GB RAM/node 1 TB RAM/ BigMem node4 GB RAM/core (average) 25 GB RAM/BigMem core4X Infiniband network (interconnects all nodes)

40 Gbps, <2 us latencyLatency an order of magnitude less than Ethernet

Lustre FilesystemScalable, high-performance, openSupports MPI-IO for MPI jobsMounted on all login and compute nodes

5/13

Page 45: High Performance Computing: Technologies and Opportunities

ES13 45

Flux softwareLicensed & open source software:

Abacus, Java, Mason, Mathematica, Matlab,R, STATA SE, …http://cac.engin.umich.edu/resources/software/index.html

Software development (C, C++, Fortran)Intel, PGI, GNU compilers

5/13

Page 46: High Performance Computing: Technologies and Opportunities

ES13 46

Flux dataLustre filesystem mounted on /scratch on all login, compute, and transfer nodes

640TB of short-term storage for batch jobsLarge, fast, short-term

NFS filesystems mounted on /home and /home2 on all nodes

80 GB of storage per user for development & testingSmall, slow, short-term

5/13

Page 47: High Performance Computing: Technologies and Opportunities

ES13 47

Globus OnlineFeatures

High-speed data transfer, much faster than SCP or SFTPReliable & persistentMinimal client software: Mac OS X, Linux, Windows

GridFTP EndpointsGateways through which data flowExist for XSEDE, OSG, …UMich: umich#flux, umich#nyxAdd your own server endpoint: contact flux-supportAdd your own client endpoint!

More informationhttp://cac.engin.umich.edu/resources/loginnodes/globus.html 5/13

Page 48: High Performance Computing: Technologies and Opportunities

ES13 48

Flux Mechanics

5/13

Page 49: High Performance Computing: Technologies and Opportunities

ES13 49

Using FluxThree basic requirements to use Flux:

1. A Flux account2. A Flux allocation3. An MToken (or a Software Token)

5/13

Page 50: High Performance Computing: Technologies and Opportunities

ES13 50

Using Flux1. A Flux account

Allows login to the Flux login nodesDevelop, compile, and test codeAvailable to members of U-M community, freeGet an account by visiting http://arc.research.umich.edu/resources-services/flux/managing-a-flux-project/

5/13

Page 51: High Performance Computing: Technologies and Opportunities

ES13 51

Using Flux2. A Flux allocation

Allows you to run jobs on the compute nodesCurrent rates:

$18 per core-month for Standard Flux$24.35 per core-month for BigMem Flux$8 cost-sharing per core month for LSA, Engineering, and Medical SchoolDetails at http://arc.research.umich.edu/resources-services/flux/flux-costing/

To inquire about Flux allocations please email [email protected]

5/13

Page 52: High Performance Computing: Technologies and Opportunities

ES13 52

Using Flux3. An MToken (or a Software Token)

Required for access to the login nodesImproves cluster security by requiring a second means of proving your identity

You can use either an MToken or an application for your mobile device (called a Software Token) for thisInformation on obtaining and using these tokens at http://cac.engin.umich.edu/resources/loginnodes/twofactor.html

5/13

Page 53: High Performance Computing: Technologies and Opportunities

ES13 53

Logging in to Fluxssh flux-login.engin.umich.edu

MToken (or Software Token) requiredYou will be randomly connected a Flux login node

Currently flux-login1 or flux-login2Firewalls restrict access to flux-login.To connect successfully, either

Physically connect your ssh client platform to the U-M campus wired network, orUse VPN software on your client platform, orUse ssh to login to an ITS login node, and ssh to flux-login from there

5/13

Page 54: High Performance Computing: Technologies and Opportunities

ES13 54

DemonstrationTask: Use the R multicore packageThe multicore package allows you to use multiple cores on the same node when writing R scripts

5/13

Page 55: High Performance Computing: Technologies and Opportunities

ES13 55

DemonstrationTask: compile and execute simple programs on the Flux login node

Copy sample code to your login directory:cdcp ~brockp/cac-intro-code.tar.gz .tar -xvzf cac-intro-code.tar.gzcd ./cac-intro-code

Examine, compile & execute helloworld.f90:ifort -O3 -ipo -no-prec-div -xHost -o f90hello helloworld.f90./f90hello

Examine, compile & execute helloworld.c:icc -O3 -ipo -no-prec-div -xHost -o chello helloworld.c./chello

Examine, compile & execute MPI parallel code:mpicc -O3 -ipo -no-prec-div -xHost -o c_ex01 c_ex01.cmpirun -np 2 ./c_ex01

5/13

Page 56: High Performance Computing: Technologies and Opportunities

ES13 56

Flux Batch Operations

5/13

Page 57: High Performance Computing: Technologies and Opportunities

ES13 57

Portable Batch System

All production runs are run on the compute nodes using the Portable Batch System (PBS)PBS manages all aspects of cluster job execution except job scheduling

Flux uses the Torque implementation of PBSFlux uses the Moab scheduler for job schedulingTorque and Moab work together to control access to the compute nodes

PBS puts jobs into queuesFlux has a single queue, named flux

5/13

Page 58: High Performance Computing: Technologies and Opportunities

ES13 58

Cluster workflowYou create a batch script and submit it to PBSPBS schedules your job, and it enters the flux queueWhen its turn arrives, your job will execute the batch scriptYour script has access to any applications or data stored on the Flux clusterWhen your job completes, anything it sent to standard output and error are saved and returned to youYou can check on the status of your job at any time, or delete it if it’s not doing what you wantA short time after your job completes, it disappears

5/13

Page 59: High Performance Computing: Technologies and Opportunities

ES13 59

DemonstrationTask: Run an MPI job on 8 cores

Sample code uses MPI_Scatter/Gather to send chunks of a data buffer to all worker cores for processing

5/13

Page 60: High Performance Computing: Technologies and Opportunities

ES13 60

The Batch SchedulerIf there is competition for resources, two things help determine when you run:

How long you have waited for the resourceHow much of the resource you have used so far

Smaller jobs fit in the gaps (“backfill”)

CoresTime

5/13

Page 61: High Performance Computing: Technologies and Opportunities

ES13 61

Flux Resourceshttp://www.youtube.com/user/UMCoECAC

UMCoECAC’s YouTube channelhttp://orci.research.umich.edu/resources-services/flux/

U-M Office of Research Cyberinfrastructure Flux summary pagehttp://cac.engin.umich.edu/

Getting an account, basic overview (use menu on left to drill down)http://cac.engin.umich.edu/started

How to get started at the CAC, plus cluster news, RSS feed and outageshttp://www.engin.umich.edu/caen/hpc

XSEDE information, Flux in grant applications, startup & retention offershttp://cac.engin.umich.edu/ Resources | Systems | Flux | PBS

Detailed PBS information for Flux useFor assistance: [email protected]

Read by a team of peopleCannot help with programming questions, but can help with operational Flux and basic usage questions

5/13

Page 62: High Performance Computing: Technologies and Opportunities

ES13 62

Wrap-up

5/13

Page 63: High Performance Computing: Technologies and Opportunities

ES13 63

Further StudyCSCAR/ARC Python Workshop (week of June 12, 2013)Sign up for news and events on the Advanced Research Computing web page at http://arc.research.umich.edu/news-events/

5/13

Page 64: High Performance Computing: Technologies and Opportunities

ES13 64

Any Questions?Charles J. AntonelliLSAIT Advocacy and Research [email protected]://www.umich.edu/~cja734 763 0607

5/13

Page 65: High Performance Computing: Technologies and Opportunities

ES13 65

References1. http://cac.engin.umich.edu/resources/software/R.html

2. http://cac.engin.umich.edu/resources/software/matlab.html

3. CAC supported Flux software, http://cac.engin.umich.edu/resources/software/index.html, (accessed August 2011)

4. J. L. Gustafson, “Reevaluating Amdahl’s Law,” chapter for book, Supercomputers and Artificial Intelligence, edited by Kai Hwang, 1988. http://www.scl.ameslab.gov/Publications/Gus/AmdahlsLaw/Amdahls.html (accessed November 2011).

5. Mark D. Hill and Michael R. Marty, “Amdahl’s Law in the Multicore Era,” IEEE Computer, vol. 41, no. 7, pp. 33-38, July 2008. http://research.cs.wisc.edu/multifacet/papers/ieeecomputer08_amdahl_multicore.pdf (accessed November 2011).

6. InfiniBand, http://en.wikipedia.org/wiki/InfiniBand (accessed August 2011).7. Intel C and C++ Compiler 1.1 User and Reference Guide,

http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/compiler_c/index.htm (accessed August 2011).

8. Intel Fortran Compiler 11.1 User and Reference Guide,http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/index.htm (accessed August 2011).

9. Lustre file system, http://wiki.lustre.org/index.php/Main_Page (accessed August 2011).10. Torque User’s Manual, http://www.clusterresources.com/torquedocs21/usersmanual.shtml (accessed

August 2011).11. Jurg van Vliet & Flvia Paginelli, Programming Amazon EC2,’Reilly Media, 2011. ISBN 978-1-449-39368-7.

5/13

Page 66: High Performance Computing: Technologies and Opportunities

ES13 66

ExtraTask: Run an interactive job

Enter this command (all on one line):qsub –I -V -l procs=2 -l walltime=15:00 -A FluxTraining_flux -l qos=flux -q fluxWhen your job starts, you’ll get an interactive shellCopy and paste the batch commands from the “run” file, one at a time, into this shellExperiment with other commandsAfter fifteen minutes, your interactive shell will be killed

5/13

Page 67: High Performance Computing: Technologies and Opportunities

ES13 67

ExtraOther above-campus

Amazon EC2Microsoft AzureIBM Smartcloud…

5/13