Upload
marissa-oistin
View
38
Download
0
Tags:
Embed Size (px)
DESCRIPTION
4. Assessing and Understanding Performance. 4. Performance. 4.1 Introduction 4.2 CPU Performance and Its Factors 4.3 Evaluating Performance 4.4 Real Stuff: Two SPEC Benchmarks and the Performance of Recent Intel Processors 4.5 Fallacies and Pitfalls 4.6 Concluding Remarks - PowerPoint PPT Presentation
Citation preview
4. Assessing and Understanding Performance
Computer Architecture 4-2
4. Performance
4.1 Introduction4.2 CPU Performance and Its Factors4.3 Evaluating Performance4.4 Real Stuff: Two SPEC Benchmarks and the
Performance of Recent Intel Processors4.5 Fallacies and Pitfalls4.6 Concluding Remarks4.7 Historical Perspective and Further Reading4.8 Exercises
Computer Architecture 4-3
How to measure, report, and summarize performance
Defining Performance An analogy
AirplanePassenger capacity
Cruising range
Cruising speed
Passenger throughpu
t
Boeing 777 375 4630 610 228,750
Boeing 747 470 4150 610 286,700
BAC/Sud Concorde
132 4000 1350 178,200
Douglas DC-8-50
146 8720 544 79,424
Back to chapter overview
Figure 4.1
4.1 Introduction
Computer Architecture 4-4
Performance of a Computer
Response time ( = execution time ) The time between the start and completion of a task
Throughput The total amount of a work done in a given time
Performance and execution time Performancex = 1 / Execution timex
X is n times faster than Y
nX
Y
Y
X
Time ExceutionTime Execution
ePerformancePerformanc
Computer Architecture 4-5
Measuring Performance
Definitions of time Wall-clock time = Response time = Elapsed time
Total time to complete a task Including disk accesses, memory accesses, I/O activities, OS
overhead and etc. CPU execution time = CPU time
The time CPU spends computing for this task CPU time = User CPU time + System CPU time
UNIX time command 90.7u 12.9s 2:39 65%
Definitions of performance System performance: based on elapsed time CPU performance: based on user CPU time
Computer Architecture 4-6
CPU execution time
= CPU clock cycles x clock cycle time
= CPU clock cycles / clock rate
Example: Improving Performance Same instruction sets
Computer A : 4 GHz, 10 seconds
Computer B : ? GHz, 6 second
B requires 1.2 times as many clock cycles as A.
Back to chapter overview
4.2 CPU Performance and Its Factors
Computer Architecture 4-7
[Answer]
CPU timeA = CPU clock cyclesA / clock rateA
10 seconds = CPU clock cyclesA / (4 X 109 cycles/sec)
CPU clock cyclesA = 10 sec. X 4 X 109 cycles/sec
= 40 X 109 cycles
CPU timeB = CPU clock cyclesB / clock rateB
= 1.2 X CPU clock cyclesA / clock rateB
6 seconds = 1.2 X 40 X 109 cycles / clock rateB
clock rateB = 1.2 X 40 X 109 cycles / 6 seconds = 8 GHz
Computer Architecture 4-8
Hardware Software Interface
CPU clock cycles = IC x CPI
IC (Instruction Count) Dependent on compilers and architectures
CPI (Cycles Per Instruction) Dependent on implementations
Performance equation
Execution Time = IC x CPI x clock cycle time
= (IC x CPI) / clock rate
Computer Architecture 4-9
Same instruction set architecture, same program Clock cycle timeA = 250ps, CPIA = 2.0
Clock cycle timeB = 500ps, CPIB = 1.2 Which is faster, and by how much ?[Answer]
Let I = instruction count for the program. CPU timeA = ICA x CPIA x clock cycle timeA
= I x 2.0 x 250 ps = 500 x I ps CPU timeB = I x 1.2 x 500 ps = 600 x I ps Then
Thus, A is 1.2 times faster than B for this program.
1.2 ps I 500
ps I 600
time Executiontime Execution
ePerformanc CPUePerformanc CPU
A
B
B
A
Example: Using the Performance Equation
Computer Architecture 4-10
The Big Picture
cycle ClockSecond
nInstructiocycles Clock
nInstructio Time
Components of performance Units of measure
CPU execution time for a program Seconds for the program
Instruction count (IC) Instructions executed for the program
Clock cycles per instruction (CPI) Average clock cycles / Instruction
Clock cycle time Seconds / Clock cycle
Computer Architecture 4-11
Example: Comparing Code Segments
Which will be faster ? What is the CPI for each sequence ?
Instruction class CPI for the class
A 1
B 2
C 3
Inst. Count Code Sequence A B C
1 2 1 2
2 4 1 1
Computer Architecture 4-12
[Answer]
instruction count1 = 2 + 1 + 2 = 5 and
instruction count2 = 4 + 1 + 1 = 6
Thus (1) executes fewer instructions. CPU clock cycles1 = 2x1 + 1x2 + 2x3 = 10 and
CPU clock cycles2 = 4x1 + 1x2 + 1x3 = 9
Thus (2) is faster. CPI1 = CPU clock cycles1 / instruction count1
= 10 / 5 =2 CPI2 = 9 / 6 = 1.5
(2) has lower CPI.
Computer Architecture 4-13
Benchmarking The process of performance comparison for two or more
systems by measurements
Benchmark Programs specifically chosen to measure performance A workload that the user hopes will predict the performance of
the actual workload
Compiler tricks Optimizations in either the architecture or compiler
Back to chapter overview
4.3 Evaluating Performance
Computer Architecture 4-14
Compiler Tricks by IBM
Computer Architecture 4-15
Difficulties with summarizing performance
A is 10 times faster than B for program 1. B is 10 times faster than A for program 2.
Total execution time: A Consistent Summary Measure
AM: Arithmetic Mean =
Weighted arithmetic mean =
n
1 iiTime
n1
n
1i 1.0 iw
n
1i where, )iwi(Time
Computer A Computer B
Program 1(seconds) 1 10
Program 2(seconds) 1000 100
Total time (seconds) 1001 110
Figure 4.4
Comparing and Summarizing Performance
Computer Architecture 4-16
4.6 Concluding Remarks
Three design criteria1. High-performance design
Supercomputer and high-end server
2. Low-cost design Embedded system
3. Cost/performance design Desktop computer
Execution time of real program as the metrics
Back to chapter overview
cycle clockseconds
ninstructiocycle clock
programnsinstructio
programsecond