26
EECE476: Computer Architecture Lecture 11: Understanding and Assessing Performance Chapter 4.1, 4.2 The University of British Columbia EECE 476 © 2005 Guy Lemieux

EECE476: Computer Architecture Lecture 11: Understanding and Assessing Performance Chapter 4.1, 4.2 The University of British ColumbiaEECE 476© 2005 Guy

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

EECE476: Computer Architecture

Lecture 11: Understanding and Assessing Performance

Chapter 4.1, 4.2

The University ofBritish Columbia EECE 476 © 2005 Guy Lemieux

2

Questions

• Why do we want high performance?

• How do you measure performance?

• Why measure it?

3

Measurement and Evaluation

Architecture is an iterative process -- searching the space of possible designs -- at all levels of computer systems

Good IdeasGood Ideas

Mediocre IdeasBad Ideas

Cost /PerformanceAnalysis

Design

Analysis

Creativity

We need a way to measure performance so we can find the good ideas!!!

4

Performance Trends

Microprocessors

Minicomputers

MainframesSupercomputers

1995

Year

19901970 1975 1980 1985

Lo

g o

f P

erfo

rma

nce

5

Performance! How did we obtain this?

Performance

Year

6

How to Obtain Performance?Through Transistors?

• Source: Intel

7

How to Obtain Performance?Through Clock Speed?

100.0E+3

1.0E+6

10.0E+6

100.0E+6

1.0E+9

10.0E+9

Jun-71

Jun-74

May-77

May-80

Jun-83

Jun-86

Jun-89

May-92

Jun-95

Jun-98

Jun-01

Jun-04

ClockSpeedof IntelCPUs

(cyclespersecond)

YEAR

8

How to Obtain Performance?Through Power?

9

Performance

• How to obtain performance?– We can’t really answer this until we understand how

to measure performance!

• How to measure performance?– This is a fundamental question!– Buying a Car:

• Top Speed? Fuel Economy? Range? Turning radius?

– Buying a Computer:• Clock Speed? Power? Battery Life? Boot-up time?

10

Airplanes!Which has Greater Performance?

Airplane Passenger Capacity

(ppl)

Range (km) Speed (km/h)

Throughput (ppl*km/h)

Boeing 777 375 7,450 980 367,500

Boeing 747 470 6,680 980 460,600

Concorde 132 6,440 2,170 286,440

DouglasDC-8-50

146 14,030 875 127,750

11

Performance:Two Fundamental Concepts

1. Throughput (aka bandwidth)– Total amount of work done in a given time

• Boeing 747• Laundromat with many washers & dryers• Important for computer data centres

2. Response time (aka latency)– Time from start to end of a given task

• Concorde• One fast, modern laundry machine at home• Important for personal computers

Which is more important for this course?– Mostly response time!– Better response time usually implies higher throughput (but not )

12

Defining Performance(Response Time)

Given a computer architecture X, define:PerformanceX = 1 / ExecutionTimeX

Suppose X is “faster” than Y:PerformanceX > PerformanceY

Implies:1 / ExecutionTimeX > 1 / ExecutionTimeY

orExecutionTimeY > ExecutionTimeX

13

Relative PerformanceX is n times faster than Y means:

n = PerformanceX / PerformanceY

= ExecutionTimeY / ExecutionTimeX

Example: how much faster is A than B?

• Machine A: 10 seconds.• Machine B: 15 seconds.• 15/10 = 1.5

Hence, A is 1.5 times faster than B.

Try to be clear: IMPROVE performance, don’t increase it!!!

14

Measuring Execution TimeThree possible ways of measuring response time

1. Wall-clock Time• Start to finish, includes everything (eg, other programs, I/O)• Very non-deterministic!

2. CPU Time (System + User)• User Time = your program (directly)• System Time = in OS on behalf of your program (excludes I/O)• System Time difficult to ascertain, other programs may affect it• Can vary greatly, depending on quality of OS!• Non-deterministic!

3. CPU Time (User only)• Users program, excluding I/O, excluding OS• Fairly deterministic

Which is better? Either 2 or 3 …

15

Poor Choices for Performance Metrics

• Why are each of these metrics bad?

– Number of instructions• Static instruction count• Dynamic instruction count• How much work is done in each instruction?

– Number of instructions per second• MIPS: millions of instructions per second• MIPS: meaningless indicator of processor speed (!)

– Number of clock cycles– Clock speed (clock rate)

• Taken together, we may have something here….

16

Performance Equation (1)

Simplified version:CPUTime = #ClockCycles * CycleTime

= #ClockCycles / ClockRate

#ClockCycles• Encapsulates two things:

– Number of instructions in a program– Complexity of each instruction

CycleTime = 1 / ClockRate• Clock Rate is the clock speed (in MHz or GHz) of the CPU• Cycle Time is the clock period (in ns) of the CPU

17

Processor Speed

Which is faster?

A) 3.6 GHz Pentium 4

B) 2.0 GHz Pentium M

18

CycleTimeCycleTime == clock period

• 3.6 GHz Pentium 4 processor is fast!– 0.2778ns cycle time– SPECint_base2000 benchmark: 1510 (15.1 times faster than ULTRASparc)– http://www.spec.org/cpu2000/results/res2004q3/cpu2000-20040621-03127.html

• 2.0 GHz Pentium M processor is faster!– 0.5ns cycle time– SPECint_base2000 benchmark: 1528 (15.28 times faster than ULTRASparc)– http://www.spec.org/cpu2000/results/res2004q2/cpu2000-20040614-03081.html

Huh????

• Clock speed alone is not a good indicator of processor speed

19

3.6 GHz Pentium 4

20

2.0 GHz Pentium M

21

#ClockCycles (part 1)

Could assume each instruction takes one cycle:

1st

inst

ruct

ion

2nd

inst

ruct

ion

3rd

inst

ruct

ion

4th

5th

6th ...

This assumption is incorrect,

different instructions take different amounts of time on different machines.

Why? hint: these are machine instructions, not lines of C code

time

22

#ClockCycles (part 2)

Reality: each instruction can take a different number of cycles!

1. Multiplication is slower than addition

2. Floating point operations are slower than integer operations

3. Accessing memory takes is slower than accessing registers

Important point: changing the cycle time often changes the number of cycles required for various instructions (more later)

time

23

#ClockCycles (part 3)

• MIPS or InstrCount alone is meaningless• #ClockCycles alone is meaningless• CycleTime alone is meaningless

… need to tie all three together….

• InstrCount (instructions per program)• CPI (cycles per instruction)• CycleTime (time per cycle)

24

Performance Equation (2)

• Put the pieces together…CPUTime = InstrCount * CPI * CycleTime

• Dimensional analysis– Check the units…

time/prog = (instr/prog)*(cycle/instr)*(time/cycle)X XX X

25

Performance Equation (3)

Full version:

CPUTime = i (InstrCounti * CPIi) * CycleTime

• InstrCounti count of instructions of type i

• CPIi cycles per instruction of type i

26

Quickie Quiz• Give 2 most important concepts of performance measurements

• Give 3 ways of measuring performance

• Explain what is wrong with the following performance metrics– Instructions per second– Clock speed– Cycles per instruction– Number of transistors– Power

• What performance metric is used in this course?

• What is the performance equation? What does it mean? Why is it used?