28
Introduction to HPC and the National CyberInfrastructure Ritu Arora Email: [email protected] October 27, 2014 1

Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

Introduction to HPC and the National CyberInfrastructure

Ritu Arora

Email: [email protected]

October 27, 2014

1

Page 2: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

Objectives

• Provide a basic introduction to HPC

• Introduce the audience to XSEDE (the national CyberInfrastructure)

• Introduce the audience to TACC resources

• Provide basic information on using TACC resources

• Time too short for in-depth coverage

• However, the knowledge imparted will be sufficient to get you started at conducting your data management activities on national open-science resources

2

Page 3: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

3

Source: www.nature.com

Page 4: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

What is HPC?

• High Performance Computing (HPC) is the use of parallel processing techniques for enabling larger computations in shorter turnaround times than your laptops or desktops

• HPC systems, are also known as Supercomputers, currently petaFLOPS range

• It can be expensive to buy your own high-end HPC system and spend money in their operation and maintenance

• Good News: you can get access to HPC and high-end storage systems without involving any direct cost to you through XSEDE – More on XSEDE on next-to-next slide

4

1 petaFLOP (PF) = 1 quadrillion math

operations per second

Page 5: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

Serial Versus Parallel

• One job, one processor: a serial solution

• One job, several processors: a parallel solution

• Parallel Programming usually works by breaking problems into pieces, and working on those pieces at the same time

5

Page 6: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

High Throughput Computing

• High Throughput Computing (HTC) consists of running many jobs that are typically similar and not highly parallel (no communication is needed between the different instances of the program that are running simultaneously)

• A common example is running a parameter sweep where the same program is run with varying inputs, resulting in hundreds or thousands of executions of the program

• We will be doing HTC in today’s afternoon session

6

Page 7: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

XSEDE: Extreme Science and Engineering Discovery Environment

7

Page 8: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

8

Page 9: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

What can you get from XSEDE? • XSEDE is composed of multiple partner institutions known as

Service Providers (SPs), each of which contributes allocatable services

• Resources include HPC machines, High Throughput Computing (HTC) machines, visualization, data storage, testbeds, & services

– https://www.xsede.org/web/guest/resources/overview

• Extended Collaborative Support Service (ECSS) through which researchers can request to be paired with expert staff members for an extended period (weeks up to a year).

– ECSS staff provide expertise in many areas of advanced CI and can work with a research team to advance their work through that knowledge

• Training and Education Programs

9

Page 10: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

10

Page 11: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

TACC: Texas Advanced Computing Center

11

Page 12: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

12

Page 13: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

TACC Resources

13

HPC, HTC, Data Analysis Platforms

Scientific Visualization and Analysis Resources

Data Services

Stampede 6400+ nodes

10 PFLOPs

Wrangler Data Intensive

Science (01/2015)

Lonestar 1800+ nodes 302 TFLOPs

Maverick Vis, Analysis

132 K40 GPUs

Vis Lab 12.4 Megapixel Collaborative Touch display

Ranch Tape Archive

100 PB

Rodeo Cloud Services

User VMs

Corral Data storage and sharing 6 PB Storage

Page 14: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

Stampede HPC System – You will use it today!

• Stampede is one of the world’s most powerful supercomputers

• About 6400 compute nodes and additional specialized nodes

• Each node has multiple cores - 522080 processing cores in total

• 270 TB of total memory – Each typical compute node on Stampede has 32GB of memory, or RAM

– There are 16 specialized nodes called large-memory nodes, each having 1TB of memory or RAM

– Note 1: A typical desktop computer has 4-, 8-, or maybe 16GB of RAM

– Note 2: More memory means that researchers can work on problem sizes much larger than they could otherwise using a desktop computer

• 14 PB of high performance storage system

• 75 miles of network cable

14

Page 15: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

U

15

Page 16: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

A node on Stampede

16

1. Infiniband HCA card 2. Two 8 core Intel Xeon processors 3. Memory 4. Storage/File-System 5. Space for future Expandability using coprocessors or accelerators 6. Intel Xeon Phi Coprocessor

Page 17: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

Mike Packard, TACC Senior Systems Administrator, slides one of Stampede’s 6,400

nodes into its cabinet during installation

17

Page 18: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

Oversimplified Diagram – Accessing Stampede

18

SSH

Stampede

Login Node (login3)

Login Node (login4)

Login Node (login1)

Login Node (login2)

Typical Compute Nodes (e.g., C201-231, …)

Typical Compute Nodes (e.g., C201-231, …)

Typical Compute Nodes (e.g., C201-231, …)

Specialized Compute Nodes (e.g., large memory nodes,

Visualization Nodes)

File-Systems ($HOME, $WORK, $SCRATCH)

Interconnect

Interconnect

1 1 1 1

1 => job scheduler

Page 19: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

Login Nodes on Stampede

19

Component Description 4 login nodes stampede.tacc.utexas.edu

Processors Sockets per Node/Cores per Socket

Intel E5-2680, 2.7GHz 2/8

Motherboard Dell R720, Intel QPI C600 Chipset

Memory Per Node 32GB

$HOME (default directory upon login) Lustre, 5 GB quota

Use login nodes for installing software, compiling your programs, editing files, transferring files, submitting or monitoring batch jobs, starting an interactive session, and running additional light-weight processes.

Page 20: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

Compute Nodes on Stampede

• The majority of the 6400 nodes are configured with two Xeon E5-2680 processors and one Intel Xeon Phi SE10P Coprocessor

• These compute nodes are configured with 32GB of "host" memory with an additional 8GB of memory on the Xeon Phi Coprocessor card

• A smaller number of compute nodes are configured with two Xeon Phi Coprocessors

20

Use the compute nodes for running any batch or interactive jobs that would take more than few seconds to complete.

Page 21: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

Stampede File-Systems • User-owned storage on the Stampede system is available in three

directories that are identified by $HOME, $WORK and $SCRATCH environment variables

• These directories are separate file systems, and accessible from any node in the system

21

$HOME $WORK $SCRATCH

5 GB quota, maximum 150K files allowed

400 GB quota, maximum 3M files allowed

No Quota Restriction

Backed up Not backed up Not backed up

No purge policy No purge policy Files with access times of greater than 10 days can be purged

Store your source code and build your software here

Store large files here Store large files here

Parallel File System – named Lustre – makes hundreds of Spinning disks act like a single disk

Page 22: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

Stampede’s Archival Storage System

• Stampede's tape-based archival storage system is Ranch

• 60 PB capacity, not backed up, not replicated

• Ranch (ranch.tacc.utexas.edu) is accessible from Stampede via the $ARCHIVER and $ARCHIVE variables

– Store permanent files here for archival storage

– This file system is NOT mounted (directly accessible) on any node

– Use it for only for long-term file storage

– You need to stage the data back to Stampede for any usage – cannot directly access the data on Ranch through your jobs

22

Page 23: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

How to Get Started?

24

Page 24: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

How do you get Started? • In order to use TACC resources (or additional XSEDE resources)

and request for an expert’s help 1. Create an XSEDE portal account and a TACC portal account

1. Note: in order to activate your TACC account, you will need to log into the TACC portal once after getting an email from the TACC user services group

2. PIs should then submit a request for start-up allocation (computing hours) and ECSS staff through the XSEDE portal

3. Once the allocation request is approved, the project PI can add his or her group members (having active portal accounts) to the allocation by logging in to the XSEDE portal

4. Once you have resources and expert/s assigned to your project through XSEDE, you can use the credentials of your TACC portal account to directly log into TACC resources to do your research and development work – you may also use XSEDE Single Sign on Login Hub

25

With the training accounts that you will receive at this workshop, Steps 1 to 3 mentioned above can be temporarily ignored. In the next session you will use the training accounts to connect to TACC resources.

Page 25: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

After you are connected

• Once you are connected to a TACC (or another XSEDE) resource remotely, you will need to understand the user environment on those resources – Linux OS (to be discussed in the next session)

– Usage Policies: Resources shared with other users and hence understand the resource usage policies (a slide on it later)

– Bring your data

– Do software installation in your account if what you need is not already available on Stampede

– Do your processing or post-processing

– Move the results to a secondary or tertiary storage media at TACC or at your institution

26

Page 26: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

Some of the activities that you avoid on shared resources like Stampede

• Avoid running time-consuming jobs (could be programs or scripts) on the login node – All such jobs should be run on the compute nodes

– Compute nodes can be accessed via a batch job or interactively (more on this in the afternoon sesion)

• Avoid running large jobs from $HOME – Run such jobs from $SCRATCH

• Avoid running more than 2 (or 3) rsync processes simultaneously for data transfer

• Avoid parking your data for months on $SCRATCH without accessing it periodically

27

Page 28: Introduction to HPC and the National CyberInfrastructure · • Introduce the audience to XSEDE (the national CyberInfrastructure) • Introduce the audience to TACC resources •

Thanks for listening!

Any questions or comments?

29