30
06/27/22 Erkay Savas 1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

Embed Size (px)

Citation preview

Page 1: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 1

Performance

Computer Architecture – CS401Erkay Savas

Sabanci University

Page 2: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 2

Performance• What is performance?• How to measure performance?• Performance metrics• Performance evaluation • Why some hardware perform better

than others for different programs?• What factors in hardware are related to

system overall performance?• How does the machine's instruction set

affect performance?

Page 3: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 3

393600

79424

178200

268700

228750

Passenger throughput

(passenger x m.p.h)

Airplane Analogy

• Which of these airplanes has the best performance?

6008400656Airbus A 3xx

5448720146Douglas DC-8-50

13504000132Concorde

6104150470Boeing 747

6104630375Boeing 777

Speed(m.p.h

)

Range

(miles)

Passenger

Capacity

Airplane

Page 4: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 4

Computer Performance• Response time (latency)

– How long does it take for my job to run?– How long does it take to execute a program?– How long must I wait for a database query?

• Throughput– How many jobs can the machine run at once?– What is the average execution rate?– How much work is getting done?

• If we upgrade a machine with a new processor what do we increase?

• If we add a new machine what do we increase?

Page 5: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 5

Which Time to Measure?• Elapsed Time (Wall clock time, response time)

– Counts everything (disk and memory access, I/O, operating system overhead, work on other processes)

– Useful but not always good for comparison purposes

• CPU (execution) time– The time CPU spends computing for the user task– Not include time spent waiting for I/O, running other

programs– user CPU time CPU time spent within the program, – system CPU time CPU time spent in the operating

system performing tasks on behalf of the program

Page 6: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 6

CPU Time

• Unix time command reflects this breakdown by returning the following when prompted:90.7u 12.9s 2:39 65%

Interpretation:• User CPU time is 90.7 s• System CPU time is 12.9s• Elapsed time is 159 s ( 90.7+12.9)• CPU time is 65% of total elapsed time

Page 7: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 7

A Definition of Performance

• For some program running on machine XPerformanceX = 1/Execution_timeX

• The machine X is said to be “n times faster” than the machine Y ifPerformanceX/PerformanceY = n

Execution_timeY/Execution_timeX = n

• Example: Machine A runs a program in 10 seconds and machine B runs the same program in 15 seconds, how much faster is A than B?

Page 8: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 8

Metrics of Performance• “Time to execute a program” is the ultimate

metric in determining the performance• However, it is convenient to inspect other

metrics as well when we examine the details of a machine.

• Computers use a clock that runs at a constant rate and determines when an event takes place in hardware.

• These discrete time intervals are called clock cycles (or ticks, clock ticks, clock periods).

• Clock rate (frequency) is the inverse of clock period.

Page 9: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 9

Clock Cycles• Clock “ticks” indicate when to start activities

• Instead of reporting execution time in seconds, we often use cycles

cycleseconds

programcycles

programseconds

time

Start of events often the risingedge of the clock

Page 10: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 10

Clock Cycle• cycle time (CT) = time between ticks =

seconds per cycle• Cycle Count (CC): the number of clock cycles

to execute a program• clock rate (frequency) = cycles per second

(1 Hz = 1 cycle/sec)• A 200 MHz clock has a 1/(200·106) = ?

nanosecond cycle time• A 4 GHz clock has a 1/(4· 109) = ?

nanosecond cycle time

Page 11: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 11

CPI• CPI Clocks Per Instruction

– Number of cycles spent on an instruction on average.

– CC = IC CPI– Hard to compute. – It is useful when comparing the performances of

two machines with the same ISA. (Why?)

• Example: two machines with the same ISA. For a certain program we have– Machine A: CPI = 2.0– Machine B: CPI = 1.2– Which machine is faster?– What if machine A uses 250 ps and machine B

500 ps cycle time

Page 12: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 12

Improving Performance

So, to improve performance1. Increase the clock frequency (i.e. decrease

the clock period)2. Reduce the number of the clock cycles per

program (IC CPI)

cyclesseconds

programcycles

programseconds

Page 13: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 13

Instruction Cycle ?

• No !• The number of cycles per instruction

depends on the implementations of the instructions in hardware

• The number differs for each processor (even with the same ISA)

Page 14: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 14

The Reason• Operations take different number of cycles

– Multiplication takes longer than addition– Floating point operations take longer than

integer operations– The access time to a register is much shorter

than access to the main memory.

Page 15: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 15

Simple Formulae for CPU Time

• CPU execution time = CPU clock cycles for a program Clock cycle time (CC CT)

• CPU execution time = CPU clock cycles for a program/Clock rate

• We can writeCPU clock cycles for a program =IC CPI

• ThenCPU execution time = (IC CPI)/Clock rate

Page 16: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 16

Example

• Computer A of 800 MHz– It runs our favorite program in 15 s

• Our goal – Design computer B with the same ISA– It will run the same program in 8 s.

• We will use a new technology – can increase the clock rate;– however, it will also increase CPI by 1.25.

• What clock rate should we aim to use?

Page 17: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 17

Performance

• Performance is determined by execution time (CPU time)

• We have also other indicators– # of cycles to execute program – # of instructions in program (IC)– # of cycles per second– average # of cycles per instruction (CPI)– average # of instructions per second

• Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.

Page 18: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 18

Number of Instructions Example

• A compiler designer has the following two alternatives to generate a certain piece of code with instructions A(1 cycle) , B (2 cycles), and C(3 cycles):

1. 2106 of A, 106 of B, and 2106 of C (IC = 5106)

2. 4106 of A, 106 of B, and 106 of C (IC = 6106)

– Which code sequence is faster?

Page 19: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 19

MIPS

• Millions Instructions Per Second =MIPS = IC/(Execution_time 106)

MIPS = IC/(#of clocks cycle time 106)

MIPS = (IC clock rate)/(IC CPI 106)

MIPS = clock rate/(CPI 106)

• A faster machine has a higher MIPS

Execution_time = IC/(MIPS 106)

Page 20: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 20

A MIPS Example• A computer with 500 MHz clock

– Three different classes of instructions:– A (1 cycle), B (2 cycles), C (3 cycles)

• Two compilers used to produce code for a large piece of software.– Compiler 1:

– 5 billion A, 1 billion B, and 1 billion C instructions.– Compiler 2:

• 10 billion A, 1 billion B, and 1 billion C instructions.

• Which sequence will be faster according to execution time?

• Which sequence will be faster according to MIPS?

Page 21: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 21

Problems of MIPS

• MIPS specifies instruction execution rate• MIPS does not take into account the

capabilities of the instructions– Thus, it is impossible to compare computers with

different ISA using MIPS.

• MIPS is not constant, even on a single machine, depends on the application.

• As we saw in the previous example, MIPS can vary inversely with performance.

Page 22: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 22

CPI example

• CPI– Machine A: CPI = 10/7 = 1.43– Machine B: CPI = 15/12 = 1.25

• CPU time– CPU time = (IC CPI) / clock rate– Let us assume both machines use 200 MHz clock

Page 23: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 23

Overview

• A given program will require1. Some number of instructions2. Some number of clock cycles3. Some number of seconds

• Vocabulary– Cycle time: (micro or nano) seconds per cycle– Clock rate (frequency): cycles per second– CPI: clock per instruction– MIPS: millions of instruction per second– MFLOPS: millions of floating point operations

per second

Page 24: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 24

Performance

• Performance is ultimately determined by execution time

• Is any of the following metrics good to measure performance by itself? Why?– # of cycles to execute a program– # of instructions in a program– # of cycles per second– Average # of cycles per instruction– Average # number of instructions per second

Page 25: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 25

Question

• Assuming two machines have the same ISA, which of the following quantities are identical?– Clock rate– CPI– Execution time– # of instructions– MIPS

Page 26: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 26

Program Performance

IC, clock rate, CPI

IC, CPI

IC, CPI

IC, possibly CPI

ISA

Compiler

Programming Language

Algorithm

Affects what? How?HW or SW component

Page 27: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 27

Benchmarks• Programs specifically chosen to measure

performance– must reflect typical workload of the user

• Benchmark types– Real applications– Small benchmarks– Benchmark suites– Synthetic benchmarks

Page 28: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 28

Real Applications• Workload: Set of programs a typical user

runs day in and day out.• To use these real applications for metrics

is a direct way of comparing the execution time of the workload on two machines.

• Using real applications for metrics has certain restrictions:– They are usually big– Takes time to port to different machines– Takes considerable time to execute– Hard to observe the outcome of a certain

improvement technique

Page 29: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 29

Comparing & Summarizing Performance

• A is 100 times faster than B for program 1• B is 10 times faster than A for program 2• For total performance, arithmetic mean is used:

Computer A

Computer B

Program 1 1 s 100 s

Program 2 1000 s 100 s

Total time 1001 s 200 s

n

iiTime

n 1

1AM

Page 30: 10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

04/20/23 Erkay Savas 30

Arithmetic Mean• If each program, in the workload, do not run

equal times, then we have to use weighted arithmetic mean

n

iii Timew

n 1

1AM

weight Computer A Computer B

Program 1 (seconds)

10 1 100

Program 2 (seconds)

1 1000 100

Weighted AM - ? ?

• Suppose that the program 1 runs 10 times as often as the program 2. Which machine is faster?