48
Introduction to Parallel Processing Dr. Guy Tel-Zur Version 26-10-2014

Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Introduction to Parallel Processing

Dr. Guy Tel-Zur

Version 26-10-2014

Page 2: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Talk Outline

• Motivation• Basic terms• S/W. Methods of Parallelization• Examples• Profiling, Benchmarking and Performance Tuning• H/W• Supercomputers• HTC and Condor• Grid Computing and Cloud Computing• Future Trends

Page 3: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

A Definition fromOxford Dictionary of Science:

A technique that allows more than one process – stream of activity – to be running at any given moment in a computer system, hence processes can be executed in parallel. This means that two or more processors are active among a group of processes at any instant.

Page 4: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Can WeMultitask?

http://news.stanford.edu/news/2009/august24/multitask-research-study-082409.html

Page 5: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

• Motivation• Basic terms• Parallelization methods• Examples• Profiling, Benchmarking and Performance Tuning• Common H/W• Supercomputers• HTC and Condor• The Grid• Future trends

Page 6: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Sou

rce:

http

s://

dow

nloa

d.na

p.ed

u/ca

talo

g.ph

p?re

cord

_id=

1342

7

Page 7: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Parallel Processing

Parallel ComputingEmbarrassingly

Parallel

Parallel File System

Parallel Visualization

Supercomputing

Many Cores

Multi Cores

Clusters

Accelerators

SMPFarming

Page 8: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Image courtesy of Ioan Raicu, University of Chicago. iSG

TW

Op

inio

n -

Man

y Ta

sk C

om

pu

tin

g:

Bri

dg

ing

th

e p

erfo

rman

ce-t

hro

ug

hp

ut

gap

. J

AN

UA

RY

28,

200

9

Page 9: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

The need for Parallel Processing

• Get the solution faster and or solve a bigger problem

• Other considerations…(for and against)– Power -> MutliCores

• Serial processor limits

DEMO:N=input('Enter dimension: ')A=rand(N);B=rand(N);ticC=A*B;toc

Page 10: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Memory limit

>> n_sizeEnter dimension: 10Elapsed time is 0.101245 seconds.>> n_sizeEnter dimension: 100Elapsed time is 0.119061 seconds.>> n_sizeEnter dimension: 1000Elapsed time is 0.146440 seconds.

>> n_sizeEnter dimension: 10000Error using randOut of memory. Type HELP MEMORY for your options.

Error in n_size (line 2)A=rand(N);

Page 11: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Demo… (Qt Octave)

Page 12: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Why Parallel Processing

• The universe is inherently parallel, so parallel models fit it best.

" א מז חיזוי מרחוק חישה " חישובית "ביולוגיה

Page 13: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

The Demand for Computational Speed

Continual demand for greater computational speed from a computer system than is currently possible. Areas requiring great computational speed include numerical modeling and simulation of scientific and engineering problems. Computations must be completed within a “reasonable” time period.

Page 14: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Exercise

• In a galaxy there are 10^11 stars

• Estimate the computing time for 100 iterations assuming O(N^2) interactions on a 1GFLOPS computer

Page 15: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

• For 1011 starts there are 1022 interactions

• X100 iterations 1024 operations• Therefore the computing time:

• Conclusion: Improve the algorithm! Do approximations…hopefully n·log(n)

Solution

t=1024

109 =1015sec=31 , 709 ,791 years

Page 16: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Performance units

FLOPs = Floating Point Operations per second

Page 17: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

ExaFLOPs2020±2?

PetaFLOPs2008

TeraFLOPs1997

GigaFLOPs

MegaFLOPs

ZettaFLOPs2032 ?????

Zetta 1021

Exa 1018

Peta 1015

Tera 1012

Giga 109

Mega 106

Zetta 1021

Exa 1018

Peta 1015

Tera 1012

Giga 109

Mega 106

Page 18: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Lego Turing machine0.001FLOPs

http://rubens.ens-lyon.fr/http://rubens.ens-lyon.fr/

Page 19: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Large Memory RequirementsUse parallel computing for executing larger problems which require more memory than exists on a single computer.

2004 Japan’s Earth Simulator (35TFLOPS)

2011 Japan’s K Computer (8.2PF)

An Aurora simulation

Page 20: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes
Page 21: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Source: SciDAC Review, Number 16, 2010

Page 22: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Molecular Dynamics

Source: SciDAC Review, Number 16, 2010

Page 23: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Other considerations

• Development cost– Difficult to program and debug

– TCO, ROI…

Page 24: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Introduction to Parallel Processing

24/9/2010

לחיזוק ידיעהשעוד למי המוטיבציה

בחשיבות השתכנע לא...התחום

Page 25: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

• Motivation• Basic terms• Parallelization methods• Examples• Profiling, Benchmarking and Performance Tuning• Common H/W• Supercomputers• HTC and Condor• The Grid• Future trends

Page 26: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Basic terms

• Buzzwords

• Flynn’s taxonomy• Speedup and Efficiency

• Amdah’l Law

• Load Imbalance

Page 27: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Buzzwords

Farming - Embarrassingly parallel

Parallel Computing - simultaneous use of

multiple processors

Symmetric Multiprocessing (SMP) - a single address space.

Cluster Computing - a combination of commodity

units.

Supercomputing - Use of the fastest, biggest machines to solve large problems.

Page 28: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Michael Flynn

Click here for a link to “Some Computer Organizations and Their Effectiveness” paper in IEEEXplore

Page 29: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Flynn’s taxonomy

• single-instruction single-data streams (SISD)

• single-instruction multiple-data streams (SIMD)

• multiple-instruction single-data streams (MISD)

• multiple-instruction multiple-data streams (MIMD) SPMD

Page 30: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

http

://e

n.w

ikip

edia

.org

/wik

i/Fly

nn

%2

7s_

taxo

nom

y

Page 31: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

“Time” Terms

Serial time, ts = Time of best serial (1 processor)

algorithm

Parallel time, tP = Time of the parallel algorithm

+ architecture to solve the problem using p processors.

Note: tP ≤ t

s but t

P=1 ≥ t

s many times we assume

t1 ≈ t

s

Page 32: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

! ביותר חשובים בסיסיים מושגים

• Speedup: ts / t

P , 0 ≤ speedup ≤p

• Work (cost): p · tP , t

s ≤ W(p) ≤ ∞

(number of numerical operations)

● Efficiency: ts / (p · t

P) , 0 ≤ ≤ 1

(w1/w

p)

Page 33: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Maximal Possible Speedup

Page 34: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes
Page 35: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Amdahl’s Law (1967)

11

/11/1

timeParallel1

fraction code Serial

timeprocessor 1 timeSerial

+)f(n

n=

t

t=S(n)

n)f)(n+(t=nf)t(+tf=t

=f)t(

=f

==t

p

s

sssp

s

s

Page 36: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Maximal Possible Efficiency

= ts / (p · t

P) ; 0 ≤ ≤1

Page 37: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Amdahl’s Law

With only 5% of the computation being serial, the maximum speedup is 20

f=nS

n

1)(

Page 38: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

An Example of Amdahl’s Law

Amdahl’s Law bounds the speedup due to any improvement.– Example: What will the speedup be if 20% of the exec. time is

in interprocessor communications which we can improve by 10X?

S=T/T’= 1/ [0.2/10 +0.8] = 1.25=> Invest resources where time is spent. The slowest portion willDominate.

Amdahl’s Law and Murphy’s Law: “If any system component candamage performance, it will.”

Page 39: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

http://mprc.pku.edu.cn/courses/architecture/autumn2005/reevaluating-Amdahls-law.pdf

Communications of the ACMMay 1988 Volume 31 Number 5 pp. 532-533.

Communications of the ACMMay 1988 Volume 31 Number 5 pp. 532-533.

Page 40: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Gustafson’s Law

• f is the fraction of the code that can not be parallelized

• tp=f·t

p + (1-f)·t

p

• ts=f·t

p + (1-f)·p·t

p

• S=ts/t

p=f+(1-f)·p this is the Scaled Speedup

• S=f+p-f·p=p+(1-p)·f=f+p·(1-f)

• The Scaled Speedup is linear with p !

Page 41: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

http

://w

ww

.scl

.am

esla

b.go

v/P

ublic

atio

ns/G

us/A

mda

hlsL

aw

/Am

dahl

s.ht

ml

Amdahl, G.M. Validity of the single-processor approach to achieving large scale computing capabilities. In AFIPS Conference Proceedings vol. 30 (Atlantic City, N.J., Apr. 18-20). AFIPS Press, Reston, Va., 1967, pp. 483-485.

Page 42: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

The computation time is constant (instead of the problem size)

increasing number of CPUs solve bigger problem and get better results in the same time.

Introduction to Parallel Processing

http

://w

ww

.scl

.am

esla

b.go

v/P

ublic

atio

ns/G

us/A

mda

hlsL

aw

/Am

dahl

s.ht

ml

Benner, R.E., Gustafson, J.L., and Montry, G.R., Development and analysis of scientific application programs on a 1024-processor hypercube," SAND 88-0317, Sandia National Laboratories, Feb. 1988.

Page 43: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

• Amdahl’s – fixed problem size (different run time)

• Gustafson’s – fixed run time (different problem size)

Page 44: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Computation/Communication Ratio

Computation timeCommunication time

=tcomp

tcomm

Page 45: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Overhead

Page 46: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Load Imbalance

• Static / Dynamic

Page 47: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

Dynamic Partitioning – Domain Decomposition by Quad or Oct Trees

Page 48: Introduction to Parallel Processingtel-zur.net/teaching/bgu/pp/lecture01_part1.pdfprocess – stream of activity – to be running at any given moment in a computer system, hence processes

• Motivation• Basic terms• Parallelization Methods• Examples• Profiling, Benchmarking and Performance Tuning• Common H/W• Supercomputers• HTC and Condor• The Grid• Future trends