Upload
allyson-watkins
View
213
Download
0
Embed Size (px)
DESCRIPTION
Multi-Level Computing Architecture and Paradigm
Citation preview
Speedup for Multi-Level Parallel Computing
School of Computer Engineering
Nanyang Technological University
21st May 2012
Shanjiang Tang, Bu-Sung Lee, Bingsheng He
OutLine
• Background & Motivation• Multi-Level Parallel Speedup• Evaluation• Conclusion
Multi-Level Computing Architectureand Paradigm
Multi-Level Computing Architectureand Paradigm• MPI+OpenMP • MPI+CUDA• MPI+OpenMP+CUDA …..
Multi-Level Parallel Computing Model LmL3L2L1
Notes: Sequential Part Parallel Part
PE2,2
PE1,1 PE
2,1
PE3,1
PE3,2
PE3,3
PE3,4
PE3,5
PE3,6
PE3,7
PE3,8
Parallel Speedup
• Definition
• Classification
Absolute Speedup
Relative Speedup
SequentialExecutionTimeSpeedupParallelExecutionTime
=
BestSequentialALGExecutionTimeSpeedupParallelALGExecutionTime
=
ParallelALGSequentialExecutionTimeSpeedupParallelALGExecutionTime
=
Relative Speedup Model
• Fixed-size Speedup
Amdahl’s Law
• Fixed-time Speedup
Gustafson’s Law
1
1
sequentialTimeSpeedupparallelTime
paa
= =- +
1 11
sequentialTime pSpeedup ppparallelTimep
a a a aaa
- += = = - +- +
Motivation Example—NAS Benchmark (MPI+OpenMP)
Motivation Example—NAS Benchmark (MPI+OpenMP)
Amdahl’s Law is UNSUITABLE for Multi-Level Parallel Computing
OutLine
• Background & Motivation• Multi-Level Parallel Speedup• Evaluation• Conclusion
E-Amdahl’s Law
• Awareness of Different Grained-Level Parallelism
1 ( )( )1 ( )( )
( )1 (1 ).( )1 ( )( ) ( 1)
i mf mf mp m
sp ii m
f if ip i sp i
ìïï =ïïï - +ïïï=íïï £ <ïïï - +ïï +ïî
LmL3L2L1
Notes:
Sequential Part
Parallel Part
PE2,2
PE1,1 PE
2,1
PE3,1
PE3,2
PE3,3
PE3,4
PE3,5
PE3,6
PE3,7
PE3,8
E-Amdahl’s Law
• Two-Level Parallelism Speedup Model (MPI+OpenMP)
where is the parallel fraction of coarse-grained (MPI-level) parallelism. is the parallel fraction of fine-grained (OpenMP-level) parallelism. is the number of processes spawned. is the number of threads spawned per process.
1( , , , )(1 )
1
sp p t
tp
a b ba ba
=- +
- +
abpt
E-Gustafson’s Law
• Awareness of Different Grained-Level Parallelism
1 ( ) ( ) ( ) ( )( )
1 ( ) ( ) ( ) ( 1) (1 ).f m f m p m i m
sp if i f i p i sp i i m
ì - + =ïï=íï - + + £ <ïî
LmL3L2L1
Notes:
Sequential Part
Parallel Part
PE2,2
PE1,1 PE
2,1
PE3,1
PE3,2
PE3,3
PE3,4
PE3,5
PE3,6
PE3,7
PE3,8
OutLine
• Background & Motivation• Multi-Level Parallel Speedup• Evaluation• Conclusion
Experiment Setup
• Platform and Configuration
A linux cluster consisting of eight computing nodes each with two quad-core chips Configuration: One thread per CPU core
• Benchmarks
NAS Parallel Benchmark (NPB) Multi-Zone (MZ) Version: BT-MZ (Unbalanced Workload Partitioning) SP-MZ (balanced Workload Partitioning) LU-MZ (balanced Workload Partitioning)
Performance Prediction
Prediction Result Comparison
OutLine
• Background & Motivation• Multi-Level Parallel Speedup• Evaluation• Conclusion
Conclusion
• Traditional speedup models are unsuitable for multi-level parallelism
– Unable to be awareness of different granularities of parallelism for multi-level parallel computing.
• Multi-level Parallelism Model
– A guidance model for multi-level optimization.– A prediction model for multi-level parallelism.
Argument Estimation
Speedup Under E-Amdahl’s Law
Speedup Under E-Gustafson’s Law