Speedup for Multi-Level Parallel Computing

Preview:

Citation preview

Speedup for Multi-Level Parallel Computing

School of Computer Engineering

Nanyang Technological University

21st May 2012

Shanjiang Tang, Bu-Sung Lee, Bingsheng He

OutLine

•  Background & Motivation •  Multi-Level Parallel Speedup •  Evaluation •  Conclusion

Multi-Level Computing Architecture and Paradigm

Multi-Level Computing Architecture and Paradigm

•  MPI+OpenMP •  MPI+CUDA •  MPI+OpenMP+CUDA …..

Multi-Level Parallel Computing Model

L Lm L3 L2 L1

LLLLLLLL

Notes: Sequential Part Parallel Part

PE2,

2

PE1,

1 PE2,

1

PE3,

1

PE3,

2

PE3,

3

PE3,

4

PE3,

5

PE3,

6

PE3,

7

PE3,

8

Parallel Speedup

•  Definition •  Classification

Ø Absolute Speedup

Ø Relative Speedup

SequentialExecutionTimeSpeedupParallelExecutionTime

=

BestSequentialALGExecutionTimeSpeedupParallelALGExecutionTime

=

ParallelALGSequentialExecutionTimeSpeedupParallelALGExecutionTime

=

Relative Speedup Model

•  Fixed-size Speedup

Ø Amdahl’s Law

where is parallel fraction workload of the program, is the

number of processors.

•  Fixed-time Speedup

Ø Gustafson’s Law

pmeparallelTiTimesequentialSpeedup

αα +−

==1

1

ppmeparallelTiTimesequentialSpeedup αα

αααα

+−=+−

+−== 111

α p

Motivation Example—NAS Benchmark (MPI+OpenMP)

Motivation Example—NAS Benchmark (MPI+OpenMP)

Amdahl’s Law is UNSUITABLE for Multi-Level Parallel Computing

OutLine

•  Background & Motivation •  Multi-Level Parallel Speedup •  Evaluation •  Conclusion

E-Amdahl’s Law

•  Awareness of Different Grained-Level Parallelism

L Lm L3 L2 L1

LLLLLLLL

Notes:

Sequential Part

Parallel Part

PE2,

2

PE1,

1 PE2,

1

PE3,

1

PE3,

2

PE3,

3

PE3,

4

PE3,

5

PE3,

6

PE3,

7

PE3,

8

⎪⎪⎪⎪

⎪⎪⎪⎪

<≤

++−

=+−

=

)1(

)1()()()(1

1

)(

)()()(1

1

)(

mi

ispipifif

mi

mpmfmf

isp

E-Amdahl’s Law

•  Two-Level Parallelism Speedup Model (MPI+OpenMP)

where is the parallel fraction of coarse-grained (MPI-level) parallelism. is the parallel fraction of fine-grained (OpenMP-level) parallelism. is the number of processes spawned. is the number of threads spawned per process.

pt

pt

tpsp)1(

1

1),,,(β

βαα

βα+−

+−

=

αβ

E-Gustafson’s Law

•  Awareness of Different Grained-Level Parallelism

L Lm L3 L2 L1

LLLLLLLL

Notes:

Sequential Part

Parallel Part

PE2,

2

PE1,

1 PE2,

1

PE3,

1

PE3,

2

PE3,

3

PE3,

4

PE3,

5

PE3,

6

PE3,

7

PE3,

8

⎩⎨⎧

<≤++−

=+−=

)1()1()()()(1)()()()(1

)(miispipififmimpmfmf

isp

OutLine

•  Background & Motivation •  Multi-Level Parallel Speedup •  Evaluation •  Conclusion

Experiment Setup

•  Platform and Configuration

Ø A linux cluster consisting of eight computing nodes each with two quad-core chips Ø Configuration: One thread per CPU core

•  Benchmarks

NAS Parallel Benchmark (NPB) Multi-Zone (MZ) Version: Ø BT-MZ (Unbalanced Workload Partitioning) Ø SP-MZ (balanced Workload Partitioning) Ø  LU-MZ (balanced Workload Partitioning)

Performance Prediction

Prediction Result Comparison

OutLine

•  Background & Motivation •  Multi-Level Parallel Speedup •  Evaluation •  Conclusion

Conclusion

•  Traditional speedup models are unsuitable for multi-level parallelism

–  Unable to be awareness of different granularities of parallelism for multi-level parallel computing.

•  Multi-level Parallelism Model

–  A guidance model for multi-level optimization. –  A prediction model for multi-level parallelism.

Argument Estimation

Speedup Under E-Amdahl’s Law

Speedup Under E-Gustafson’s Law

Recommended