19
Thoai Nam 

4 PP Speedup

Embed Size (px)

Citation preview

Page 1: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 1/19

ThoaiNam 

Page 2: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 2/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Speedup & Efficiency

 Amdahl’s Law Gustafson’s Law

Sun & Ni’s Law

Page 3: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 3/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Speedup:

S = Time(the most efficient sequential

algorithm) / Time(parallel algorithm)

Efficiency:

E = S / N with N is the number of processors

Page 4: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 4/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Amdahl’s Law – Fixed Problem

Size (1)

The main objective is to produce the results as

soon as possible– (ex) video compression, computer graphics, VLSI

routing, etc

Implications– Upper-bound is

– Make Sequential bottleneck as small as possible

– Optimize the common case

Modified Amdahl’s law for fixed problem size

including the overhead

Page 5: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 5/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Amdahl’s Law – Fixed Problem

Size (2)

ParallelSequentialSequential

P5 P6 P7 P8P4P0 P1 P2 P3 P9SequentialParallel

T(1)

T(N)

Ts Tp

Ts=αT(1)  ⇒ Tp= (1-α)T(1)

T(N) = αT(1)+ (1-α)T(1)/N

Number ofprocessors

Page 6: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 6/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Amdahl’s Law – Fixed Problem

Size (3)

∞→→−

+

=−

+

=   N as

 N  N 

T T 

T Speedup

α α 

α 

α 

α 

1

)1(

1

)1()1()1(

)1(

)(

)1(

 N Time

TimeSpeedup   =

Page 7: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 7/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

∞→

+

+−

+

=   N as

T T T 

 N T T 

T Speedup

overhead overhead 

)1(

1

)1()1()1(

)1(

α α 

α 

The overhead includes parallelism

and interaction overheads

Page 8: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 8/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Gustafson’s Law – Fixed Time (1)

User wants more accurate results within a time limit

– Execution time is fixed as system scales– (ex) FEM for structural analysis, FDM for fluid dynamics

Properties of a work metric

– Easy to measure

– Architecture independent

– Easy to model with an analytical expression

– No additional experiment to measure the work

– The measure of work should scale linearly with sequentialtime complexity of the algorithm

Time constrained seems to be most generally viable

model!

Page 9: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 9/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Gustafson’s Law – Fixed Time (2)

Parallel

P5 P6 P7 P8P4P0 P1 P2 P3 P9Sequential

Sequential

P0Sequential

P9

.

.

.

W0Ws

α = Ws / W(N)

W(N) = αW(N) + (1-α)W(N)⇒ W(1) = αW(N) + (1-α)W(N)N

W(N)

Page 10: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 10/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Gustafson’s Law – Fixed Time

without overhead

 N W 

 NW W 

k  N W 

k W 

 N T 

Speedup )1(

1(

).(

).1(

)(

)1(α α 

α α 

−+=

)−+

===

Time = Work . k 

W(N) = W

Page 11: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 11/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Gustafson’s Law – Fixed Time

with overhead

 N 

W W 

 NW W 

k  N W 

k W 

 N T 

T Speedup

00 1

1(1(

).(

).1(

)(

)1(

+

)−+=

+

)−+===

  α α α α 

W(N) = W + W0

Page 12: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 12/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Sun and Ni’s Law –

Fixed Memory (1)

Scale the largest possible solution limited

by the memory space. Or, fix memoryusage per processor

Speedup,

– Time(1)/Time(N) for scaled up problem is notappropriate.

– For simple profile, and G(N) is the increase of

parallel workload as the memory capacityincreases n times.

Page 13: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 13/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Sun and Ni’s Law –

Fixed Memory (2)

 N 

 N G  N GSpeedup MC  )()1(

)()1(α α 

α α 

−+

−+

=

W=αW+(1-  α)W

Let M be the memory capacity of a singlenode

 N nodes:

– the increased memory nM– The scaled work: W=αW+(1-  α)G(N)W

Page 14: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 14/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Definition:

A function is homomorphism if there exists a functionsuch that for any real number c and variable x ,

.

Theorem:

If W = for some homomorphism function ,, then, with all data being shared by

all available processors, the simplified memory-

bounced speedup is

Sun and Ni’s Law –

Fixed Memory (3)

 N 

 N G

 N G

W  N 

 N  g W 

W  N  g W S 

 N 

 N  N  )(

)1(

)()1(

)(

)(

1

1*

α α 

α α 

−+

−+=

+

+=

 g )()()(   x g c g cx g    =

 g 

)( M  g    g )()()(   x g c g cx g    =

Page 15: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 15/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Proof:

Let the memory requirement of W n  be M, W n  = .M is the memory requirement when 1 node is available.

With N nodes available, the memory capacity willincrease to NM.

Using all of the available memory, for the scaled parallelportion :

.

Sun and Ni’s Law –

Fixed Memory (4)

 N  N    W  N  g  M  g  N  g  NM  g W  )()()()(*===

)( M  g 

*

 N W 

 N 

 N 

 N 

 N  N 

W  N 

 N  g W 

W  N  g W 

 N 

W W 

W W S )()(

1

1

**

1

**

1*

+

+=

+

+=

Page 16: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 16/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

– When the problem size is independent of the system,

the problem size is fixed, G(N)=1 ⇒ Amdahl’s Law.– When memory is increased N times, the workload also

increases N times, G(N)=N ⇒ Gustafson’s Law

– For most of the scientific and engineering applications,the computation requirement increases faster than the

memory requirement, G(N)>N .

 N 

 N  N 

W  N  N GW 

W  N GW S 

)(

)(

1

1*

+

+=

Page 17: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 17/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

0

2

4

6

8

10

Processors

S(Linear)

S(Normal)

Page 18: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 18/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Parallelizing a code does not always result in a speedup;sometimes it actually slows the code down! This can be

due to a poor choice of algorithm or to poor coding

The best possible speedup is linear, i.e. it is proportionalto the number of processors: T(N) = T(1)/N where

N = number of processors, T(1) = time for serial run. A code that continues to speed up reasonably close to

linearly as the number of processors increases is said tobe scalable. Many codes scale up to some number of

processors but adding more processors then brings noimprovement. Very few, if any, codes are indefinitelyscalable.

Page 19: 4 PP Speedup

8/10/2019 4 PP Speedup

http://slidepdf.com/reader/full/4-pp-speedup 19/19

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Software overheadEven with a completely equivalent algorithm, software overheadarises in the concurrent implementation. (e.g. there may be additionalindex calculations necessitated by the manner in which data are "splitup" among processors.)i.e. there is generally more lines of code to be executed in the parallel

program than the sequential program.

Load balancing

Communication overhead