49
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier (UM)

Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Embed Size (px)

Citation preview

Page 1: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Lecture 1: Performance

EEN 312: Processors: Hardware, Software, and Interfacing

Department of Electrical and Computer Engineering

Spring 2013, Dr. Rozier (UM)

Page 2: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

PERFORMANCE TRENDS

Page 3: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Growth in Processor Performance since 1978.

Logarithmic Scale!

Page 4: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Moore’s Law

• Gordon Moore – One of the founders of Intel– Famously predicted in 1960 that the transistor

capacity of integrated circuits would double every 18-24 months.

– Not really a law, but has largely held true.

– Generally translates into increased performance, and decreased cost.

Page 5: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Moore’s Law

Page 6: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Exponential Growth

Page 7: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

How do we get to Performance?

• Does more transistors really mean more performance?

• Is it a one-to-one correlation?

• How might transistors NOT correlate to increased performance?

Page 8: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

MEASURING PERFORMANCE

Page 9: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

A simple example

• Say we have two computers. You know one is rated at 1GHz and another is rated at 800MHz.

• Which computer has a higher performance?

Page 10: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

A simple example?

• What do GHz and MHz even mean?

• What else could differ about the machines?

• What else could differ about the context of performance?

Page 11: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

The situation is a complex one!

Page 12: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

First, Some Measure Theory

• What is a measure? Formally?– A way of assigning numbers to the subsets of

some set, which can be said (intuitively) to be the size of the set.

– Measures require measurable spaces, and measurable sets.

– Not all sets are measurable!

Page 13: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Measurable Sets/Spaces

• One reason a space or set may be unmeasurable is if it is ill-defined.

Page 14: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Which Plane has a Higher Performance?

0 100 200 300 400 500

DouglasDC-8-50

BAC/ SudConcorde

Boeing 747

Boeing 777

Passenger Capacity

0 2000 4000 6000 8000 10000

Douglas DC-8-50

BAC/ SudConcorde

Boeing 747

Boeing 777

Cruising Range (miles)

0 500 1000 1500

DouglasDC-8-50

BAC/ SudConcorde

Boeing 747

Boeing 777

Cruising Speed (mph)

0 100000 200000 300000 400000

Douglas DC-8-50

BAC/ SudConcorde

Boeing 747

Boeing 777

Passengers x mph

Page 15: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Defining Performance

• We can define performance in several ways.

• Response time– How long does it take to accomplish a task?

– We send input to a black box, and measure how long it takes to get out output.

Page 16: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Defining Performance

• We can define performance in several ways.

• Throughput– How much work gets done during a certain

amount of time?

– Watch a system, count the number of jobs finished during a certain amount of time.

Page 17: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Throughput Example

• What is the fastest way you can think to deliver a large amount of data?

• Never underestimate the throughput of a Mack Truck loaded with hard drives!

Page 18: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

What’s the Response time of our Truck?

Page 19: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Response time as Execution Time

• Start a program, wait for it to return results.

Page 20: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Comparing Performance

• Given the performance or execution time of a computer (A) and a different computer (B) running the same program, we can compare performance.

Page 21: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Comparing Performance

• Relative performance

Page 22: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Why is Relative Performance Important?

Page 23: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

So How Do We Measure Performance

• First let’s define performance:– Execution time

• What is our measurable space?• What is our measurable set?

Page 24: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Measuring Execution Time

• CPU execution time• Wall clock time

• How might these differ?

Page 25: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Measuring Execution Time

• Clock cycles• Instruction count

Page 26: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Clock Cycles

• Clock period – duration of a clock cycle• Clock frequency – number of cycles per

second

Clock (cycles)

Data transferand computation

Update state

Clock period

Page 27: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

CPU Time

• We can improve performance by– Reducing the number of clock cycles– Increasing clock rate

– Often there is a trade-off

Rate Clock

Cycles Clock CPU

Time Cycle ClockCycles Clock CPUTime CPU

Page 28: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

CPU Example

• Computer A: 2 GHz clock, 10s CPU time• Computer B

– Aim for 6s CPU time. If you increase clock speed, the number of cycles increase by 1.2x.

Break Into GroupsFind the necessary clock rate for Computer B

Page 29: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

CPU Example

• Computer A: 2 GHz clock, 10s CPU time• Computer B

– Aim for 6s CPU time. If you increase clock speed, the number of cycles increase by 1.2x.

4GHz6s

1024

6s

10201.2Rate Clock

10202GHz10s

Rate ClockTime CPUCycles Clock

6s

Cycles Clock1.2

Time CPU

Cycles ClockRate Clock

99

B

9

AAA

A

B

BB

Page 30: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Instruction Count and CPI

• Instruction count– How many instructions the program has

• Depends on the ISA and compiler• CPI

– Cycles per instruction• Determined by hardware

Rate Clock

CPICount nInstructio

Time Cycle ClockCPICount nInstructioTime CPU

nInstructio per CyclesCount nInstructioCycles Clock

Page 31: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

CPI Example

• Computer A: Cycle Time = 250ps, CPI = 2.0• Computer B: Cycle Time = 500ps, CPI = 1.2• Same ISA• Which is faster? By how much?

Break Into Groups

Page 32: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

CPI Example

• Computer A: Cycle Time = 250ps, CPI = 2.0• Computer B: Cycle Time = 500ps, CPI = 1.2• Same ISA• Which is faster? By how much?

1.2500psI

600psI

ATime CPUBTime CPU

600psI500ps1.2IBTime CycleBCPICount nInstructioBTime CPU

500psI250ps2.0IATime CycleACPICount nInstructioATime CPU

A is faster…

…by this much

Page 33: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

CPI Detail

• Sometimes different instructions take differing amounts of time.

• Often we will want to weight by instruction proportion in a program.

n

1iii )Count nInstructio(CPICycles Clock

n

1i

ii Count nInstructio

Count nInstructioCPI

Count nInstructio

Cycles ClockCPI

Relative frequency

Page 34: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

CPI Example

• Have instruction classes A, B, and C. Two was to compile our code:

Give the average CPI for each program

Page 35: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

CPI Example

Sequence 1: IC = 5 Clock Cycles

= 2×1 + 1×2 + 2×3= 10

Avg. CPI = 10/5 = 2.0

Sequence 2: IC = 6 Clock Cycles

= 4×1 + 1×2 + 1×3= 9

Avg. CPI = 9/6 = 1.5

Page 36: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Performance Summary

• Performance depends on– Algorithm: affects IC, possibly CPI– Programming language: affects IC, CPI– Compiler: affects IC, CPI– Instruction set architecture: affects IC, CPI, Tc

cycle Clock

Seconds

nInstructio

cycles Clock

Program

nsInstructioTime CPU

Page 37: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

So Why Don’t We Have 1THz Computers?

Page 38: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

The Power Wall

• In CMOS IC technology

FrequencyVoltageload CapacitivePower 2

×1000×30 5V → 1V

Page 39: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

The Power Wall• Suppose a new CPU has

– 85% of capacitive load of old CPU– 15% voltage and 15% frequency reduction

0.520.85FVC

0.85F0.85)(V0.85C

P

P 4

old2

oldold

old2

oldold

old

new

The power wall We can’t reduce voltage further We can’t remove more heat

How else can we improve performance?

Page 40: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Multiprocessors

• Multicore microprocessors– More than one processor per chip

• Requires explicitly parallel programming– Compare with instruction level parallelism

• Hardware executes multiple instructions at once

• Hidden from the programmer– Hard to do

• Programming for performance• Load balancing• Optimizing communication and synchronization

Page 41: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Amdahl’s Law• Improving an aspect of a computer and expecting a

proportional improvement in overall performance

unaffectedaffected

improved Tfactor timprovemen

TT

Example: multiply accounts for 80s/100s How much improvement in multiply performance to

get 5× overall?

Break into Groups!

Page 42: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Amdahl’s Law• Improving an aspect of a computer and expecting a

proportional improvement in overall performance

2080

20 n

Can’t be done!

unaffectedaffected

improved Tfactor timprovemen

TT

Example: multiply accounts for 80s/100s How much improvement in multiply performance to

get 5× overall?

Corollary: make the common case fast

Page 43: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

PROBLEM SETS

Page 44: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Consider the following processors, P1, P2, and P3 executing the same instruction set with clock rates and CPI as indicated

1.Which processor has the highest performance in terms of instructions per second?2.If the processors each execute a program in 10s, find the number of cycles and the number of instructions3.We are trying to reduce the execution time by 30% but this leads to an increase in CPI of 20%. What clock rate should we have to get this reduction?

Processor Clock Rate CPI

P1 3 GHz 1.5

P2 2.5 GHz 1.0

P3 4 GHz 2.2

Page 45: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Consider a computer running code with four main routines, A, B, C, and D.

1.How much is the total time reduced if the time for Routine A is reduced by 20%?2.How much is the time for Routine B reduced if the total time is reduced by 20%?3.Can the total time be reduced by 20% by only reducing the time for Routine D?

Routine A Routine B Routine C Routine D Total Time

40s 90s 60s 20s 210s

Page 46: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Consider a computer running code with four main routines, A, B, C, and D.

1.How much is the total time reduced if the time for Routine A is reduced by 20%?2.How much is the time for Routine B reduced if the total time is reduced by 20%?3.Can the total time be reduced by 20% by only reducing the time for Routine D?

Routine A Routine B Routine C Routine D Total Time

Exec Time 40s 90s 60s 20s 210s

Instructions 50x10^6 110x10^6 80x10^6 16x10^6 -

Avg CPI 1 1 4 2 -

Page 47: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

Consider a computer running code with four main routines, A, B, C, and D.

1.How much must we improve the CPI of Routine A if we want the program to run twice as fast?2.How much must we improve the CPI of Routine C if we want the program to run twice as fast?3.How much is the execution time improved if the CPI of routines A and B are reduced by 40%, and the CPI of routines C and D are reduced by 30%?

Routine A Routine B Routine C Routine D Total Time

Exec Time 40s 90s 60s 20s 210s

Instructions 50x10^6 110x10^6 80x10^6 16x10^6 -

Avg CPI 1 1 4 2 -

Page 48: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

WRAP UP

Page 49: Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier

For next time

• Read Chapter 2, Sections 2.1 – 2.3

• Finish Lab 0 by next lab session.