Upload
ethan-keating
View
224
Download
0
Embed Size (px)
Citation preview
Analyzing Parallel Performance
Intel Software College
Introduction to Parallel Programming – Part 6
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
2Analyzing Parallel Performance
Intel® Software College
Objectives
At the end of this module, you should be able to
Define speedup and efficiency
Use Amdahl’s Law to predict maximum speedup
Use the Karp-Flatt metric to
analyze parallel program performance
predict speedup with additional processors
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
3Analyzing Parallel Performance
Intel® Software College
Speedup
Speedup is the ratio between sequential execution time and parallel execution time
For example, if the sequential program executes in 6 seconds and the parallel program executes in 2 seconds, the speedup is 3
Speedup curveslook like this
Processors
Sp
eed
up y = x
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
4Analyzing Parallel Performance
Intel® Software College
Efficiency
EfficiencyA measure of processor utilizationSpeedup divided by the number of processors
ExampleProgram achieves speedup of 3 on 4 CPUsEfficiency is 3 / 4 = 75%
Effi
cien
cy
Processors
Efficiency curveslook like this
y = 1.0
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
5Analyzing Parallel Performance
Intel® Software College
Idea Behind Amdahl’s Law
Processors
Execu
tion T
ime
f
f
ff f
1-f
(1-f )/2 (1-f )/3(1-f )/5(1-f )/4
Portion of computationthat will be performed
sequentially
Portion of computationthat will be executed
in parallel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
6Analyzing Parallel Performance
Intel® Software College
Derivation of Amdahl’s Law
Speedup is ratio of execution time on 1 processor to execution time on p processors
Execution time on 1 processor is f + (1-f)
Execution time on p processors is at least f + (1-f)/p
pffpff
ff
/)1(
1
/)1(
)1(
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
7Analyzing Parallel Performance
Intel® Software College
Amdahl’s Law Is Too Optimistic
Amdahl’s Law ignores parallel processing overhead
Examples of this overhead include time spent creating and terminating threads
Parallel processing overhead is usually an increasing function of the number of processors
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
8Analyzing Parallel Performance
Intel® Software College
Graph with Parallel Overhead Added
Processors
Execu
tion T
ime Parallel overhead
increases with# of processors
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
9Analyzing Parallel Performance
Intel® Software College
Other Optimistic Assumptions
Amdahl’s Law assumes that the computation divides evenly among the processors
In reality, the amount of work does not divide evenly among the processors
Processor waiting time is another form of overhead
Task started
Task completed
Working time
Waiting time
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
10Analyzing Parallel Performance
Intel® Software College
Graph with Workload Imbalance Added
Processors
Execu
tion T
ime
Time lostdue to
workloadimbalance
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
11Analyzing Parallel Performance
Intel® Software College
More General Speedup Formula
(n,p) Speedup for problem of size n on p CPUs
(n) Time spent in sequential portion of codefor problem of size n
(n) Time spent in parallelizable portion ofcode for problem of size n
(n,p)Parallel overhead
),(/)()(
)()(),(
pnpnn
nnpn
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
12Analyzing Parallel Performance
Intel® Software College
Amdahl’s Law: Maximum Speedup
),(/)()(
)()(),(
pnpnn
nnpn
This term is set to 0
Assumes parallelwork divides perfectlyamong available CPUs
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
13Analyzing Parallel Performance
Intel® Software College
The Amdahl Effect
),(/)()(
)()(),(
pnpnn
nnpn
As n theseterms dominate
Speedup is an increasing function of problem size
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
14Analyzing Parallel Performance
Intel® Software College
Illustration of the Amdahl Effect
n = 100,000
n = 10,000
n = 1,000
Processors
Speed
up
Linear speedup
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
15Analyzing Parallel Performance
Intel® Software College
Using Amdahl’s Law
Program executes in 5 seconds
Profile reveals 80% of time spent in function alpha, which we can execute in parallel
What would be maximum speedup on 2 processors?
New execution time ≥ 5 sec / 1.67 = 3 seconds
67.16.0
1
2/)2.01(2.0
1
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
16Analyzing Parallel Performance
Intel® Software College
The Karp-Flatt Metric
Suppose we benchmark a parallel program and get these speedup figures
Why is efficiency dropping?
How much speedup could we expect on 8 processors?
Processors Speedup Efficiency
2 1.5 75%
3 1.8 60%
4 2 50%
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
17Analyzing Parallel Performance
Intel® Software College
Deriving the Karp-Flatt Metric
The denominator represents parallel execution time
One processor does sequential code; others idle
All processors incur overhead time
“Wasted time” = (p-1)(n) + p(n, p)
Experimentally determined serial fraction = “wasted time” divided by (p-1) times sequential time
),(/)()(
)()(),(
pnpnn
nnpn
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
18Analyzing Parallel Performance
Intel® Software College
Karp-Flatt Metric
The experimentally determined serial fraction is a function of speedup and the number of processors
We can use e to determine whether efficiency decreases are due to
Sequential component of computation
Increases in overhead
p
pe
/11
/1/1
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
19Analyzing Parallel Performance
Intel® Software College
How to Interpret “e”
If “e” is constant as the number of processors increases, then speedup is constrained by the sequential component of the computation
If “e” is increasing as the number of processors increases, then speedup is constrained by parallel overhead, such as
Thread creation/termination timeContention for shared data structuresCache-related inefficiencies
Often a combination of the two factors
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
20Analyzing Parallel Performance
Intel® Software College
Going Back to Our Example
Processors Speedup Efficiency e
2 1.5 75% 0.33
3 1.8 60% 0.33
4 2.0 50% 0.33
In this case, speedup is constrained by the relatively large amount of time spent in sequential code
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
21Analyzing Parallel Performance
Intel® Software College
Example: Rectangle Rule Program
Benchmark data from an OpenMP program computing using the rectangle rule
We can predict speedup on 6 processors
Extrapolate e to be 0.11
Speedup would be 3.87
Processors Speedup Efficiency e
2 1.87 93% 0.070
3 2.60 87% 0.078
4 3.16 79% 0.089
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
22Analyzing Parallel Performance
Intel® Software College
Speedup Prediction Formula
1)1(
/11
/1/1
pe
p
p
pe
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
23Analyzing Parallel Performance
Intel® Software College
Case Study
We benchmark a sequential program and find it spends 85% of its time in functions we believe we can make parallel
We make these functions multithreaded and execute the program on a dual-core system
The parallel program achieves a speedup of 1.67 on 2 processors
If we can get access to a quad-core system, what kind of speedup should we expect?
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
24Analyzing Parallel Performance
Intel® Software College
Prediction Based on Amdahl’s Law
76.2
4/)15.01(15.0
1
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
25Analyzing Parallel Performance
Intel® Software College
Prediction Based on Karp-Flatt Metric
When p = 2, e = 0.25
We know 0.15 of e is sequential component
Rest of e (0.05) is parallel overhead
If parallel overhead increases linearly with number of processors, then it will be 0.15 when p = 3
We predict when p = 4, e = 0.30
Hence when p = 4, we predict speedup of 2.11
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
26Analyzing Parallel Performance
Intel® Software College
Superlinear Speedup
According to our general speedup formula, the maximum speedup a program can achieve on p processors is p
Superlinear speedup is the situation where speedup is greater than the number of processors used
It means the computational rate of the processors is faster when the parallel program is executing
Superlinear speedup is usually caused because the cache hit rate of the parallel program is higher
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
27Analyzing Parallel Performance
Intel® Software College
References
Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004).
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
28Analyzing Parallel Performance
Intel® Software College