Parallel Computing Basics,...

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude

Parallel Computing Basics, SemanticsLandau’s 1st Rule of Education

Rubin H Landau

Sally Haerer, Producer-Director

Based on A Survey of Computational Physics by Landau, Páez, & Bordeianu

with Support from the National Science Foundation

Course: Computational Physics II

1 / 15

Parallel Problems

Basic and Assigned

Impressive parallel (‖) computing hardware advances

Beyond ‖ I/O, memory, internal CPU

‖: multiple processors, single problem

Software stuck in 1960s

Message passing = dominant, = too elementary

Need sophisticated compilers (OK cores)

Understanding hybrid programming models

Problem: Parallelize simple program’s parameter space

Why do? faster, bigger, finer resolutions, different2 / 15

‖ Computation Example, Matrix Multiplication

Need Communication, Synchronization, Math

[B] = [A][B] (1)

Bi,j =N∑

Ai,kBk ,j (2)

Each LHS Bi,j ‖

Each LHS row, column [B] ‖

RHS Bk ,j = old, before mult values ⇒ communicate

[B] = [A][B] = data dependency, order matters

[C] = [A][B] = data parallel3 / 15

Parallel Computer Categories

Nodes, Communications, Instructions & Data

Gigabyte InternetI/O Node

Fast Ethernet

Compute Nodes

CPU-CPU, mem-mem networks

Internal (2) & external

Node = processor location

Node: 1-N CPUs

Single-instruction, single-data

Single-instruction, multiple-data

Multiple instructs, multiple data

MIMD: message-passing

MIMD: no shared mem cluster

MIMD: Difficult program, $⇒ dominant

4 / 15

Relation to MultiTasking

Locations in Memory (s)

Much ‖ on PC, Unix

Multitasking ∼ ‖

Indep progssimultaneously in RAM

Round robin processing

SISD: 1 job/t

MIMD: multi jobs/same t

5 / 15

Parallel Categories

Granularity

Grain = measurecomputational work

= computation /communication

Coarse-grain: Separateprograms & computers

e.g. MC on 6 Linux PCs

Medium-grain: Severalsimultaneous processors

Bus = communication channel

Parallel subroutines ∆ CPUs

Fine-grain: custom compiler

e.g. ‖ for loops

6 / 15

Distributed Memory ‖ via Commodity PCs

Clusters, Multicomputers, Beowulf, David

Values of Parallel ProcessingValues of Parallel Processing

Mainframe

Vector Computer

Workstation

Dominant coarse-medium grain= Stand-alone PCs, hi-speed switch, messages & networkReq: data chunks to indep busy ea processorSend data to nodes, collect, exchange, ...

7 / 15

Parallel Performance: Amdahl’s law

Simple Accounting of Time

Parallel Fraction

Speedup

p = 16

infinity

Amdahl's Law

0 20% 40% 60% 80%

Percent Parallel

Speedup

Clogged ketchup bottle incafeteria line

Slowest step determinesreaction rate

‖ serial, communication =ketchup

Need ∼90% parallel

Need ∼100% for massive

Need new problems

8 / 15

Amdahl’s Law Derivation

p = no. of CPUs T1 = 1-CPU time, Tp = p-CPU time (1)

Sp = max parallel speedup =T1

Tp→ p (2)

Not achieved: some serial, data & memory conflictsCommunication, synchronization of the processorsf = ‖ fraction of program ⇒

Ts = (1− f )T1 (serial time) (3)

Tp = fT1

p(parallel time) (4)

Speedup Sp =T1

Ts + Tp=

11− f + f/p

(Amdahl’s law) (5)

9 / 15

Amdah’s Law + Communication Overhead

Include Communication Time; Simple & ProfoundLatency = Tc = time to move data

Sp 'T1

T1/p + Tc< p (1)

For communication time not to matter

p� Tc ⇒ p � T1

As ↑ number processors p, T1/p → Tc

Then, more processors ⇒ slower

Faster CPU irrelevant

10 / 15

How Actually Parallelize

Main task program

Main routine

Serial subroutine a

Parallel sub 1 Parallel sub 2 Parallel sub 3

Summation task

User creates tasks

Task assigns processor threads

Main: master, controller

Subtasks: parallel subroutines,slaves

Avoid storage conflicts

↓ Communication,synchronization

Don’t sacrifice science to speed

11 / 15

Practical Aspects of Message Passing; Don’t Do It

More Processors = More Challenge

Only most numerically intensive ‖Legacy codes often Fortran90Rewrite (N months) vs Modify serial (∼70%)?Steep learning curve, failures, hard debuggingPreconditions: run often, for days, little changeNeed higher resolution, more bodiesProblem affects parallelism: data use, problem structurePerfectly (embarrassingly) parallel: (MC) repeatsFully synchronous: Data ‖ (MD), tightly coupledLoosely synchronous: (groundwater diffusion)Pipeline parallel: (data→ images→ animations)

12 / 15

High-Level View of Message Passing

4 Simple Communication Commands

Compute

Create

Compute

Receive

Compute

Receive

Master

compute

receive

compute

Slave 1

compute

receive

compute

Slave 2

Simple basics

C, Fortran + 4communications

send: named message

receive: any sender

myid: ID processor

numnodes

13 / 15

‖ MP: What Can Go Wrong?

Hardware Communication = Problematic

Compute

Create

Compute

Receive

Compute

Receive

Master

compute

receive

compute

Slave 1

compute

receive

compute

Slave 2

Task cooperation, division

Correct data division

Many low-level details

Distributed error messages

Wrong messages order

Race conditions: orderdependent

Deadlock: wait forever

14 / 15

Conclude: IBM Blue Gene = ‖ by Committee

Performance/wattOn, off chip mem2 core CPU1 Core compute, 1communicate65,536 (216) nodes

Peak = 360 teraflops (1012)Medium speed CPU5.6 Gflop (cool)512 chips/card, 16cards/BoardControl: MPI

15 / 15

Parallel Computing Basics,...

Documents

MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College

Parallel I/O for High Performance Computing · Methods for parallel I/O Benchmarks Outline 1 Introduction and Prerequisites HPC machine architecture MPI basics Parallel ﬁle system

Parallel & Scalable Machine Learningmorrisriedel.de/wp-content/uploads/2018/01/2018-01... · 1. Introduction to Machine Learning Fundamentals 2. PRACE and Parallel Computing Basics

Lecture 4: Parallel Programming Basics

Basics Concepts of Parallel Programming

Lecture 4: Parallel Programming Basics...Many mapping decisions are trivial in parallel programs-Parallel application uses the entire machine -So oversubscribing machine with multiple

Parallel I/O (mpp io) - GFDL's Data Portal · MPP_IO Basics • Provides a set of simple calls that abstract the parallel environment • Built on mpp and mpp_domains • Supported

ECE 1747: Parallel Programming Basics of Parallel Architectures: Shared-Memory Machines

MAP REDUCE BASICS CHAPTER 2. Basics Divide and conquer – Partition large problem into smaller subproblems – Worker work on subproblems in parallel Threads

Firenze, 10-11 Giugno 2003, C. Gheller Parallel I/O Basics Claudio Gheller CINECA c.gheller@cineca.it

The Eclipse Parallel Tools Platformdownload.eclipse.org/tools/ptp/docs/ptp-tutorial-hpcsw08.pdf · 2008. 4. 15. · Eclipse basics Configuring Resource Managers & setup 3. Parallel

Introduction to Parallel Computing. Introduction to High Performance Computing Page 2 Abstract This presentation covers the basics of parallel computing

series and parallel connection Basics

Accelerated MRI Techniques: Basics of Parallel Imaging and ......Accelerated MRI Techniques: Basics of Parallel Imaging and Compressed Sensing Peng Hu, Ph.D. Associate Professor Department

Contentsphysics.oregonstate.edu/~landaur/Books/CPbook/TOCwiley.pdf · RubinH.Landau,ManuelJ.Páez,CristianC.Bordeianu:ComputationalPhysics— 2015/5/5 — page VII — le-tex VII

Parallel and Concurrent Programming in Haskell: … · 3/31/2011 · Parallel and Concurrent Programming in ... Vegetable Gardening Basics is a comprehensive guide to teach you how

CSE373: Data Structures & Algorithms Lecture 25 ... · Java basics Learn a couple basics built into Java via java.lang.Thread – But for style of parallel programming we’ll advocate,

Computer Architecture: Parallel Processing Basicsece740/f13/lib/exe/fetch.php?media=onur... · Computer Architecture: Parallel Processing Basics Prof. Onur Mutlu Carnegie Mellon University

MAP REDUCE BASICS CHAPTER 2 Basics Divide and conquer – Partition large problem into smaller subproblems – Worker work on subproblems in parallel Threads

Lecture 5: Parallel Programming Basics (part II