47
1 1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

Embed Size (px)

Citation preview

Page 1: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

11998 Morgan Kaufmann Publishers

Chapter 2Performance and Cost

21998 Morgan Kaufmann Publishers

買那一支手機比較好

31998 Morgan Kaufmann Publishers

差不多的價錢你怎麼比

Panasonic GD88 Samsung V208 Sharp GX-i98

41998 Morgan Kaufmann Publishers

何謂手機的效能

bull 比較的基準有那些bull 有那些值可以量測評比的bull 如何量bull 如何提出客觀的評比報告bull

51998 Morgan Kaufmann Publishers

Performance

bull Purchasing perspective

Given a collection of machines which has the best performance least cost best performancecost

bull Design perspective

Faced with design options which has the

best performance improvement least cost

best performancecost

bull Both require

basis for comparison

metric for evaluation

bull Goal understand cost and

performance implications

of architectural choices

61998 Morgan Kaufmann Publishers

Tasks of a Computer Architect

bull s

71998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

81998 Morgan Kaufmann Publishers

那一架飛機的效能比較好

bull s

91998 Morgan Kaufmann Publishers

Two Notions of Performance

bull a

Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition

101998 Morgan Kaufmann Publishers

Which Is Better

bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res

ponse time)

We will focus on execution time for a single job

111998 Morgan Kaufmann Publishers

Performance Definition

bull Performance according to time

=gt faster is better

bull If interested in comparing two things

ldquoX is n times faster than Yrdquo means

121998 Morgan Kaufmann Publishers

What is Time

bull Straightforward definition of time Total time to complete a task including disk amp memory

accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o

n your program (since multiple processes running at same time)

ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim

e (in user program)

bull CPU performance user CPU time of a single CPU performance user CPU time of a single program

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 2: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

21998 Morgan Kaufmann Publishers

買那一支手機比較好

31998 Morgan Kaufmann Publishers

差不多的價錢你怎麼比

Panasonic GD88 Samsung V208 Sharp GX-i98

41998 Morgan Kaufmann Publishers

何謂手機的效能

bull 比較的基準有那些bull 有那些值可以量測評比的bull 如何量bull 如何提出客觀的評比報告bull

51998 Morgan Kaufmann Publishers

Performance

bull Purchasing perspective

Given a collection of machines which has the best performance least cost best performancecost

bull Design perspective

Faced with design options which has the

best performance improvement least cost

best performancecost

bull Both require

basis for comparison

metric for evaluation

bull Goal understand cost and

performance implications

of architectural choices

61998 Morgan Kaufmann Publishers

Tasks of a Computer Architect

bull s

71998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

81998 Morgan Kaufmann Publishers

那一架飛機的效能比較好

bull s

91998 Morgan Kaufmann Publishers

Two Notions of Performance

bull a

Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition

101998 Morgan Kaufmann Publishers

Which Is Better

bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res

ponse time)

We will focus on execution time for a single job

111998 Morgan Kaufmann Publishers

Performance Definition

bull Performance according to time

=gt faster is better

bull If interested in comparing two things

ldquoX is n times faster than Yrdquo means

121998 Morgan Kaufmann Publishers

What is Time

bull Straightforward definition of time Total time to complete a task including disk amp memory

accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o

n your program (since multiple processes running at same time)

ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim

e (in user program)

bull CPU performance user CPU time of a single CPU performance user CPU time of a single program

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 3: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

31998 Morgan Kaufmann Publishers

差不多的價錢你怎麼比

Panasonic GD88 Samsung V208 Sharp GX-i98

41998 Morgan Kaufmann Publishers

何謂手機的效能

bull 比較的基準有那些bull 有那些值可以量測評比的bull 如何量bull 如何提出客觀的評比報告bull

51998 Morgan Kaufmann Publishers

Performance

bull Purchasing perspective

Given a collection of machines which has the best performance least cost best performancecost

bull Design perspective

Faced with design options which has the

best performance improvement least cost

best performancecost

bull Both require

basis for comparison

metric for evaluation

bull Goal understand cost and

performance implications

of architectural choices

61998 Morgan Kaufmann Publishers

Tasks of a Computer Architect

bull s

71998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

81998 Morgan Kaufmann Publishers

那一架飛機的效能比較好

bull s

91998 Morgan Kaufmann Publishers

Two Notions of Performance

bull a

Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition

101998 Morgan Kaufmann Publishers

Which Is Better

bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res

ponse time)

We will focus on execution time for a single job

111998 Morgan Kaufmann Publishers

Performance Definition

bull Performance according to time

=gt faster is better

bull If interested in comparing two things

ldquoX is n times faster than Yrdquo means

121998 Morgan Kaufmann Publishers

What is Time

bull Straightforward definition of time Total time to complete a task including disk amp memory

accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o

n your program (since multiple processes running at same time)

ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim

e (in user program)

bull CPU performance user CPU time of a single CPU performance user CPU time of a single program

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 4: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

41998 Morgan Kaufmann Publishers

何謂手機的效能

bull 比較的基準有那些bull 有那些值可以量測評比的bull 如何量bull 如何提出客觀的評比報告bull

51998 Morgan Kaufmann Publishers

Performance

bull Purchasing perspective

Given a collection of machines which has the best performance least cost best performancecost

bull Design perspective

Faced with design options which has the

best performance improvement least cost

best performancecost

bull Both require

basis for comparison

metric for evaluation

bull Goal understand cost and

performance implications

of architectural choices

61998 Morgan Kaufmann Publishers

Tasks of a Computer Architect

bull s

71998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

81998 Morgan Kaufmann Publishers

那一架飛機的效能比較好

bull s

91998 Morgan Kaufmann Publishers

Two Notions of Performance

bull a

Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition

101998 Morgan Kaufmann Publishers

Which Is Better

bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res

ponse time)

We will focus on execution time for a single job

111998 Morgan Kaufmann Publishers

Performance Definition

bull Performance according to time

=gt faster is better

bull If interested in comparing two things

ldquoX is n times faster than Yrdquo means

121998 Morgan Kaufmann Publishers

What is Time

bull Straightforward definition of time Total time to complete a task including disk amp memory

accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o

n your program (since multiple processes running at same time)

ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim

e (in user program)

bull CPU performance user CPU time of a single CPU performance user CPU time of a single program

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 5: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

51998 Morgan Kaufmann Publishers

Performance

bull Purchasing perspective

Given a collection of machines which has the best performance least cost best performancecost

bull Design perspective

Faced with design options which has the

best performance improvement least cost

best performancecost

bull Both require

basis for comparison

metric for evaluation

bull Goal understand cost and

performance implications

of architectural choices

61998 Morgan Kaufmann Publishers

Tasks of a Computer Architect

bull s

71998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

81998 Morgan Kaufmann Publishers

那一架飛機的效能比較好

bull s

91998 Morgan Kaufmann Publishers

Two Notions of Performance

bull a

Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition

101998 Morgan Kaufmann Publishers

Which Is Better

bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res

ponse time)

We will focus on execution time for a single job

111998 Morgan Kaufmann Publishers

Performance Definition

bull Performance according to time

=gt faster is better

bull If interested in comparing two things

ldquoX is n times faster than Yrdquo means

121998 Morgan Kaufmann Publishers

What is Time

bull Straightforward definition of time Total time to complete a task including disk amp memory

accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o

n your program (since multiple processes running at same time)

ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim

e (in user program)

bull CPU performance user CPU time of a single CPU performance user CPU time of a single program

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 6: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

61998 Morgan Kaufmann Publishers

Tasks of a Computer Architect

bull s

71998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

81998 Morgan Kaufmann Publishers

那一架飛機的效能比較好

bull s

91998 Morgan Kaufmann Publishers

Two Notions of Performance

bull a

Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition

101998 Morgan Kaufmann Publishers

Which Is Better

bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res

ponse time)

We will focus on execution time for a single job

111998 Morgan Kaufmann Publishers

Performance Definition

bull Performance according to time

=gt faster is better

bull If interested in comparing two things

ldquoX is n times faster than Yrdquo means

121998 Morgan Kaufmann Publishers

What is Time

bull Straightforward definition of time Total time to complete a task including disk amp memory

accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o

n your program (since multiple processes running at same time)

ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim

e (in user program)

bull CPU performance user CPU time of a single CPU performance user CPU time of a single program

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 7: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

71998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

81998 Morgan Kaufmann Publishers

那一架飛機的效能比較好

bull s

91998 Morgan Kaufmann Publishers

Two Notions of Performance

bull a

Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition

101998 Morgan Kaufmann Publishers

Which Is Better

bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res

ponse time)

We will focus on execution time for a single job

111998 Morgan Kaufmann Publishers

Performance Definition

bull Performance according to time

=gt faster is better

bull If interested in comparing two things

ldquoX is n times faster than Yrdquo means

121998 Morgan Kaufmann Publishers

What is Time

bull Straightforward definition of time Total time to complete a task including disk amp memory

accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o

n your program (since multiple processes running at same time)

ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim

e (in user program)

bull CPU performance user CPU time of a single CPU performance user CPU time of a single program

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 8: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

81998 Morgan Kaufmann Publishers

那一架飛機的效能比較好

bull s

91998 Morgan Kaufmann Publishers

Two Notions of Performance

bull a

Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition

101998 Morgan Kaufmann Publishers

Which Is Better

bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res

ponse time)

We will focus on execution time for a single job

111998 Morgan Kaufmann Publishers

Performance Definition

bull Performance according to time

=gt faster is better

bull If interested in comparing two things

ldquoX is n times faster than Yrdquo means

121998 Morgan Kaufmann Publishers

What is Time

bull Straightforward definition of time Total time to complete a task including disk amp memory

accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o

n your program (since multiple processes running at same time)

ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim

e (in user program)

bull CPU performance user CPU time of a single CPU performance user CPU time of a single program

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 9: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

91998 Morgan Kaufmann Publishers

Two Notions of Performance

bull a

Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition

101998 Morgan Kaufmann Publishers

Which Is Better

bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res

ponse time)

We will focus on execution time for a single job

111998 Morgan Kaufmann Publishers

Performance Definition

bull Performance according to time

=gt faster is better

bull If interested in comparing two things

ldquoX is n times faster than Yrdquo means

121998 Morgan Kaufmann Publishers

What is Time

bull Straightforward definition of time Total time to complete a task including disk amp memory

accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o

n your program (since multiple processes running at same time)

ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim

e (in user program)

bull CPU performance user CPU time of a single CPU performance user CPU time of a single program

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 10: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

101998 Morgan Kaufmann Publishers

Which Is Better

bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res

ponse time)

We will focus on execution time for a single job

111998 Morgan Kaufmann Publishers

Performance Definition

bull Performance according to time

=gt faster is better

bull If interested in comparing two things

ldquoX is n times faster than Yrdquo means

121998 Morgan Kaufmann Publishers

What is Time

bull Straightforward definition of time Total time to complete a task including disk amp memory

accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o

n your program (since multiple processes running at same time)

ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim

e (in user program)

bull CPU performance user CPU time of a single CPU performance user CPU time of a single program

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 11: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

111998 Morgan Kaufmann Publishers

Performance Definition

bull Performance according to time

=gt faster is better

bull If interested in comparing two things

ldquoX is n times faster than Yrdquo means

121998 Morgan Kaufmann Publishers

What is Time

bull Straightforward definition of time Total time to complete a task including disk amp memory

accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o

n your program (since multiple processes running at same time)

ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim

e (in user program)

bull CPU performance user CPU time of a single CPU performance user CPU time of a single program

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 12: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

121998 Morgan Kaufmann Publishers

What is Time

bull Straightforward definition of time Total time to complete a task including disk amp memory

accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o

n your program (since multiple processes running at same time)

ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim

e (in user program)

bull CPU performance user CPU time of a single CPU performance user CPU time of a single program

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 13: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

131998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula (Sec 23)

Measuring and evaluating performancebull Cost

1048698 Cost and price

1048698 Cost of chips

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 14: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

141998 Morgan Kaufmann Publishers

如何以公式表達程式執行時間

bull Hint basic components of a program

bull 指令數

bull 指令執行時間(平均)

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 15: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

151998 Morgan Kaufmann Publishers

何謂程式的指令數

bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]

bull 有幾條組合語言指令

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 16: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

161998 Morgan Kaufmann Publishers

Instruction Execution Time

bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a

constant rate and determines when events take place in the hardware

1 These discrete time intervals called clock cycles (or informally clocks or cycles)

2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period

指令執行時間以 cycle為單位

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 17: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

171998 Morgan Kaufmann Publishers

Program Execution Time

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Clock Cycles for program

= Instructions for program (ldquoInstruction Countrdquo)

x Average Clock Cycles per Instruction (ldquoCPIrdquo)

CPI one way to compare two machines with same instruction set since Instruction Count is same

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 18: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

181998 Morgan Kaufmann Publishers

Performance Calculation (12)

bull CPU execution time for program

= Clock Cycles for program x Clock Cycle Time

bull Substituting for clock cycles

bull CPU execution time for program

= Instruction Count x CPI x Clock Cycle Time

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 19: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

191998 Morgan Kaufmann Publishers

How to Calculate the 3 Components

bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)

bull Instruction Count

Count instructions in loop of small program

Use simulator or emulator to count instructions

Debugger or tracing program

Execution-based monitoring insert instrumentation code into

binary code run and record information

Hardware counter in special register (Pentium II)

bull CPI

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 20: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

201998 Morgan Kaufmann Publishers

Calculating CPI Another Way

bull First calculate CPI for each individual instruction (add sub and etc)

bull Next calculate frequency of each individual instruction in the workload

bull Finally multiply these two for each instruction and add them up to get final CPI

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 21: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

211998 Morgan Kaufmann Publishers

Example (RISC processor)

1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once

Must know the limit of architectural enhancement

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 22: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

221998 Morgan Kaufmann Publishers

Summary CPU Time Formula

bull 1048698

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 23: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

231998 Morgan Kaufmann Publishers

有關效能的另一個公式

bull 1

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 24: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

241998 Morgan Kaufmann Publishers

Amdahls Law

bull Speedup due to enhancement E

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 25: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

251998 Morgan Kaufmann Publishers

Outline

bull Performance

Definition

CPU performance formula

Measuring and evaluating performance (Sec 24-26)

Benchmark programs

Summarizing performance

Reporting performance

bull Cost

1048698 Cost and price

Cost of chips

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 26: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

261998 Morgan Kaufmann Publishers

What Programs for Comparison

bull Whatrsquos wrong with this program as a workload

integer A[ ][ ] B[ ][ ] C[ ][ ]

for (J=0 Ilt100 I++)

for (J=0 Jlt100 J++)

for (K=0 Klt100 K++)

C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]

bull What measured Not measured What it good for

bull Ideally run typical programs with typical input before purchase or before even build machine

Called a ldquoworkloadrdquo For example

Engineer uses compiler spreadsheet

Author uses word processor drawing program compression software

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 27: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

271998 Morgan Kaufmann Publishers

Choosing Benchmark Programs

bull a

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 28: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

281998 Morgan Kaufmann Publishers

Benchmarks

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 29: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

291998 Morgan Kaufmann Publishers

Example Standardized WorkloadBenchmarks

bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl

oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp

wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base

machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3

years

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 30: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

301998 Morgan Kaufmann Publishers

SPECint95base Performance (Oct 1997)

bull 1

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 31: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

311998 Morgan Kaufmann Publishers

SPECfp95base Performance (Oct 1997)

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 32: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

321998 Morgan Kaufmann Publishers

SPEC2000 (CINT)

Benchmark Language Category

164gzip C Compression

175vpr C FPGA PlacementRoute

176gcc C C Compiler

181mcf C Combinatorial Opt

186crafty C Chess

197parser C Word Processing

252eon C++ Computer Visualization

253perlbmk C PERL

254gap C Group Theory Interpreter

255vortex C OO Database

256bzip2 C Compression

300twolf C PlaceRoute Simulator

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 33: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

331998 Morgan Kaufmann Publishers

SPEC2000 (CFP)

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 34: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

341998 Morgan Kaufmann Publishers

Example PC Workload Benchmark

bull PCs Ziff Davis WinStone 99 Benchmark

A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications

Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores

Winstones tests dont mimic what these programs do they run actual application code

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 35: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

351998 Morgan Kaufmann Publishers

Winstone 99 (W99) Results

bull 1

Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 36: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

361998 Morgan Kaufmann Publishers

Summarizing Performance

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 37: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

371998 Morgan Kaufmann Publishers

Early Lessons from SPEC

bull 1

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 38: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

381998 Morgan Kaufmann Publishers

Summarizing Performance

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 39: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

391998 Morgan Kaufmann Publishers

Reporting Performance

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 40: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

401998 Morgan Kaufmann Publishers

Summary Performance

bull Latency v Throughput

CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)

bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations

bull Performance evaluation needs to consider

Benchmark programs

Summarizing performance

Reporting performance results

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 41: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

411998 Morgan Kaufmann Publishers

Outline

bull Performance

1048698 Definition

CPU performance formula

Measuring and evaluating performancebull Cost

Cost and price

Cost of chips

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 42: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

421998 Morgan Kaufmann Publishers

Chip Cost Manufacturing Process

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 43: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

431998 Morgan Kaufmann Publishers

Cost of a Chip Includes

bull Die cost affected by wafer cost number of dies per wafer and die yield

goes roughly with the cube of the die area

An 8rdquo wafer can contain 196 Pentium dies but only 78   Pentium Pro (Fig 116 and 117)

bull Testing costbull Packaging cost depends on pins heat dissipation

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 44: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

441998 Morgan Kaufmann Publishers

Real World Examples

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 45: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

451998 Morgan Kaufmann Publishers

System Cost 1995 Workstation

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 46: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

461998 Morgan Kaufmann Publishers

Cost versus Price

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce

Page 47: 1 Ó1998 Morgan Kaufmann Publishers Chapter 2 Performance and Cost

471998 Morgan Kaufmann Publishers

Summary Cost

bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa

nce