Upload
loren-anthony
View
249
Download
0
Embed Size (px)
Citation preview
11998 Morgan Kaufmann Publishers
Chapter 2Performance and Cost
21998 Morgan Kaufmann Publishers
買那一支手機比較好
31998 Morgan Kaufmann Publishers
差不多的價錢你怎麼比
Panasonic GD88 Samsung V208 Sharp GX-i98
41998 Morgan Kaufmann Publishers
何謂手機的效能
bull 比較的基準有那些bull 有那些值可以量測評比的bull 如何量bull 如何提出客觀的評比報告bull
51998 Morgan Kaufmann Publishers
Performance
bull Purchasing perspective
Given a collection of machines which has the best performance least cost best performancecost
bull Design perspective
Faced with design options which has the
best performance improvement least cost
best performancecost
bull Both require
basis for comparison
metric for evaluation
bull Goal understand cost and
performance implications
of architectural choices
61998 Morgan Kaufmann Publishers
Tasks of a Computer Architect
bull s
71998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
81998 Morgan Kaufmann Publishers
那一架飛機的效能比較好
bull s
91998 Morgan Kaufmann Publishers
Two Notions of Performance
bull a
Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition
101998 Morgan Kaufmann Publishers
Which Is Better
bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res
ponse time)
We will focus on execution time for a single job
111998 Morgan Kaufmann Publishers
Performance Definition
bull Performance according to time
=gt faster is better
bull If interested in comparing two things
ldquoX is n times faster than Yrdquo means
121998 Morgan Kaufmann Publishers
What is Time
bull Straightforward definition of time Total time to complete a task including disk amp memory
accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o
n your program (since multiple processes running at same time)
ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim
e (in user program)
bull CPU performance user CPU time of a single CPU performance user CPU time of a single program
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
21998 Morgan Kaufmann Publishers
買那一支手機比較好
31998 Morgan Kaufmann Publishers
差不多的價錢你怎麼比
Panasonic GD88 Samsung V208 Sharp GX-i98
41998 Morgan Kaufmann Publishers
何謂手機的效能
bull 比較的基準有那些bull 有那些值可以量測評比的bull 如何量bull 如何提出客觀的評比報告bull
51998 Morgan Kaufmann Publishers
Performance
bull Purchasing perspective
Given a collection of machines which has the best performance least cost best performancecost
bull Design perspective
Faced with design options which has the
best performance improvement least cost
best performancecost
bull Both require
basis for comparison
metric for evaluation
bull Goal understand cost and
performance implications
of architectural choices
61998 Morgan Kaufmann Publishers
Tasks of a Computer Architect
bull s
71998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
81998 Morgan Kaufmann Publishers
那一架飛機的效能比較好
bull s
91998 Morgan Kaufmann Publishers
Two Notions of Performance
bull a
Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition
101998 Morgan Kaufmann Publishers
Which Is Better
bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res
ponse time)
We will focus on execution time for a single job
111998 Morgan Kaufmann Publishers
Performance Definition
bull Performance according to time
=gt faster is better
bull If interested in comparing two things
ldquoX is n times faster than Yrdquo means
121998 Morgan Kaufmann Publishers
What is Time
bull Straightforward definition of time Total time to complete a task including disk amp memory
accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o
n your program (since multiple processes running at same time)
ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim
e (in user program)
bull CPU performance user CPU time of a single CPU performance user CPU time of a single program
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
31998 Morgan Kaufmann Publishers
差不多的價錢你怎麼比
Panasonic GD88 Samsung V208 Sharp GX-i98
41998 Morgan Kaufmann Publishers
何謂手機的效能
bull 比較的基準有那些bull 有那些值可以量測評比的bull 如何量bull 如何提出客觀的評比報告bull
51998 Morgan Kaufmann Publishers
Performance
bull Purchasing perspective
Given a collection of machines which has the best performance least cost best performancecost
bull Design perspective
Faced with design options which has the
best performance improvement least cost
best performancecost
bull Both require
basis for comparison
metric for evaluation
bull Goal understand cost and
performance implications
of architectural choices
61998 Morgan Kaufmann Publishers
Tasks of a Computer Architect
bull s
71998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
81998 Morgan Kaufmann Publishers
那一架飛機的效能比較好
bull s
91998 Morgan Kaufmann Publishers
Two Notions of Performance
bull a
Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition
101998 Morgan Kaufmann Publishers
Which Is Better
bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res
ponse time)
We will focus on execution time for a single job
111998 Morgan Kaufmann Publishers
Performance Definition
bull Performance according to time
=gt faster is better
bull If interested in comparing two things
ldquoX is n times faster than Yrdquo means
121998 Morgan Kaufmann Publishers
What is Time
bull Straightforward definition of time Total time to complete a task including disk amp memory
accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o
n your program (since multiple processes running at same time)
ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim
e (in user program)
bull CPU performance user CPU time of a single CPU performance user CPU time of a single program
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
41998 Morgan Kaufmann Publishers
何謂手機的效能
bull 比較的基準有那些bull 有那些值可以量測評比的bull 如何量bull 如何提出客觀的評比報告bull
51998 Morgan Kaufmann Publishers
Performance
bull Purchasing perspective
Given a collection of machines which has the best performance least cost best performancecost
bull Design perspective
Faced with design options which has the
best performance improvement least cost
best performancecost
bull Both require
basis for comparison
metric for evaluation
bull Goal understand cost and
performance implications
of architectural choices
61998 Morgan Kaufmann Publishers
Tasks of a Computer Architect
bull s
71998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
81998 Morgan Kaufmann Publishers
那一架飛機的效能比較好
bull s
91998 Morgan Kaufmann Publishers
Two Notions of Performance
bull a
Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition
101998 Morgan Kaufmann Publishers
Which Is Better
bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res
ponse time)
We will focus on execution time for a single job
111998 Morgan Kaufmann Publishers
Performance Definition
bull Performance according to time
=gt faster is better
bull If interested in comparing two things
ldquoX is n times faster than Yrdquo means
121998 Morgan Kaufmann Publishers
What is Time
bull Straightforward definition of time Total time to complete a task including disk amp memory
accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o
n your program (since multiple processes running at same time)
ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim
e (in user program)
bull CPU performance user CPU time of a single CPU performance user CPU time of a single program
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
51998 Morgan Kaufmann Publishers
Performance
bull Purchasing perspective
Given a collection of machines which has the best performance least cost best performancecost
bull Design perspective
Faced with design options which has the
best performance improvement least cost
best performancecost
bull Both require
basis for comparison
metric for evaluation
bull Goal understand cost and
performance implications
of architectural choices
61998 Morgan Kaufmann Publishers
Tasks of a Computer Architect
bull s
71998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
81998 Morgan Kaufmann Publishers
那一架飛機的效能比較好
bull s
91998 Morgan Kaufmann Publishers
Two Notions of Performance
bull a
Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition
101998 Morgan Kaufmann Publishers
Which Is Better
bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res
ponse time)
We will focus on execution time for a single job
111998 Morgan Kaufmann Publishers
Performance Definition
bull Performance according to time
=gt faster is better
bull If interested in comparing two things
ldquoX is n times faster than Yrdquo means
121998 Morgan Kaufmann Publishers
What is Time
bull Straightforward definition of time Total time to complete a task including disk amp memory
accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o
n your program (since multiple processes running at same time)
ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim
e (in user program)
bull CPU performance user CPU time of a single CPU performance user CPU time of a single program
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
61998 Morgan Kaufmann Publishers
Tasks of a Computer Architect
bull s
71998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
81998 Morgan Kaufmann Publishers
那一架飛機的效能比較好
bull s
91998 Morgan Kaufmann Publishers
Two Notions of Performance
bull a
Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition
101998 Morgan Kaufmann Publishers
Which Is Better
bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res
ponse time)
We will focus on execution time for a single job
111998 Morgan Kaufmann Publishers
Performance Definition
bull Performance according to time
=gt faster is better
bull If interested in comparing two things
ldquoX is n times faster than Yrdquo means
121998 Morgan Kaufmann Publishers
What is Time
bull Straightforward definition of time Total time to complete a task including disk amp memory
accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o
n your program (since multiple processes running at same time)
ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim
e (in user program)
bull CPU performance user CPU time of a single CPU performance user CPU time of a single program
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
71998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
81998 Morgan Kaufmann Publishers
那一架飛機的效能比較好
bull s
91998 Morgan Kaufmann Publishers
Two Notions of Performance
bull a
Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition
101998 Morgan Kaufmann Publishers
Which Is Better
bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res
ponse time)
We will focus on execution time for a single job
111998 Morgan Kaufmann Publishers
Performance Definition
bull Performance according to time
=gt faster is better
bull If interested in comparing two things
ldquoX is n times faster than Yrdquo means
121998 Morgan Kaufmann Publishers
What is Time
bull Straightforward definition of time Total time to complete a task including disk amp memory
accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o
n your program (since multiple processes running at same time)
ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim
e (in user program)
bull CPU performance user CPU time of a single CPU performance user CPU time of a single program
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
81998 Morgan Kaufmann Publishers
那一架飛機的效能比較好
bull s
91998 Morgan Kaufmann Publishers
Two Notions of Performance
bull a
Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition
101998 Morgan Kaufmann Publishers
Which Is Better
bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res
ponse time)
We will focus on execution time for a single job
111998 Morgan Kaufmann Publishers
Performance Definition
bull Performance according to time
=gt faster is better
bull If interested in comparing two things
ldquoX is n times faster than Yrdquo means
121998 Morgan Kaufmann Publishers
What is Time
bull Straightforward definition of time Total time to complete a task including disk amp memory
accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o
n your program (since multiple processes running at same time)
ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim
e (in user program)
bull CPU performance user CPU time of a single CPU performance user CPU time of a single program
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
91998 Morgan Kaufmann Publishers
Two Notions of Performance
bull a
Which has higher performance1048698 Time to delivery 1 passenger deliver 400 passengers1048698 Time to do the task execution time response time latency1048698 Tasks per unit time throughput bandwidthResponse time and throughput often are in opposition
101998 Morgan Kaufmann Publishers
Which Is Better
bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res
ponse time)
We will focus on execution time for a single job
111998 Morgan Kaufmann Publishers
Performance Definition
bull Performance according to time
=gt faster is better
bull If interested in comparing two things
ldquoX is n times faster than Yrdquo means
121998 Morgan Kaufmann Publishers
What is Time
bull Straightforward definition of time Total time to complete a task including disk amp memory
accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o
n your program (since multiple processes running at same time)
ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim
e (in user program)
bull CPU performance user CPU time of a single CPU performance user CPU time of a single program
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
101998 Morgan Kaufmann Publishers
Which Is Better
bull Time of Concorde vs Boeing 747 1048698 Concord is 1350 mph 610 mph = 22 times faster = 65 hours 3 hoursbull Throughput of Concorde vs Boeing 747 1048698 Boeing is 286700 pmph 178200 pmph = 16 times betterbull Boeing is 16 times (60) faster in terms of throughputbull Concord is 22 times (120) faster in terms of flying time (res
ponse time)
We will focus on execution time for a single job
111998 Morgan Kaufmann Publishers
Performance Definition
bull Performance according to time
=gt faster is better
bull If interested in comparing two things
ldquoX is n times faster than Yrdquo means
121998 Morgan Kaufmann Publishers
What is Time
bull Straightforward definition of time Total time to complete a task including disk amp memory
accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o
n your program (since multiple processes running at same time)
ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim
e (in user program)
bull CPU performance user CPU time of a single CPU performance user CPU time of a single program
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
111998 Morgan Kaufmann Publishers
Performance Definition
bull Performance according to time
=gt faster is better
bull If interested in comparing two things
ldquoX is n times faster than Yrdquo means
121998 Morgan Kaufmann Publishers
What is Time
bull Straightforward definition of time Total time to complete a task including disk amp memory
accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o
n your program (since multiple processes running at same time)
ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim
e (in user program)
bull CPU performance user CPU time of a single CPU performance user CPU time of a single program
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
121998 Morgan Kaufmann Publishers
What is Time
bull Straightforward definition of time Total time to complete a task including disk amp memory
accesses IO activities OS overhead hellip May include execution time of other programs in a multiprogramming environment Too many factors involvedbull Alternative the time that the processor (CPU) is working only o
n your program (since multiple processes running at same time)
ldquoCPU execution timerdquo or ldquoCPU time rdquo Often divided into system CPU time (in OS) and user CPU tim
e (in user program)
bull CPU performance user CPU time of a single CPU performance user CPU time of a single program
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
131998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula (Sec 23)
Measuring and evaluating performancebull Cost
1048698 Cost and price
1048698 Cost of chips
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
141998 Morgan Kaufmann Publishers
如何以公式表達程式執行時間
bull Hint basic components of a program
bull 指令數
bull 指令執行時間(平均)
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
151998 Morgan Kaufmann Publishers
何謂程式的指令數
bull 有幾條 C指令 for(i=0 ilt100 i++) a[i] = b[i] c[i]
bull 有幾條組合語言指令
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
161998 Morgan Kaufmann Publishers
Instruction Execution Time
bull Time unit from a userrsquos perspective time = secondsbull CPU Time computers are constructed using a clock that runs at a
constant rate and determines when events take place in the hardware
1 These discrete time intervals called clock cycles (or informally clocks or cycles)
2Length of clock period clock cycle time (eg 2 nanoseconds or 2 ns) and clock rate (eg 500 megahertz or 500 MHz)which is the inverse of the clock period
指令執行時間以 cycle為單位
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
171998 Morgan Kaufmann Publishers
Program Execution Time
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Clock Cycles for program
= Instructions for program (ldquoInstruction Countrdquo)
x Average Clock Cycles per Instruction (ldquoCPIrdquo)
CPI one way to compare two machines with same instruction set since Instruction Count is same
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
181998 Morgan Kaufmann Publishers
Performance Calculation (12)
bull CPU execution time for program
= Clock Cycles for program x Clock Cycle Time
bull Substituting for clock cycles
bull CPU execution time for program
= Instruction Count x CPI x Clock Cycle Time
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
191998 Morgan Kaufmann Publishers
How to Calculate the 3 Components
bull Clock Cycle Time in specification of computer (Clock Rate in advertisements)
bull Instruction Count
Count instructions in loop of small program
Use simulator or emulator to count instructions
Debugger or tracing program
Execution-based monitoring insert instrumentation code into
binary code run and record information
Hardware counter in special register (Pentium II)
bull CPI
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
201998 Morgan Kaufmann Publishers
Calculating CPI Another Way
bull First calculate CPI for each individual instruction (add sub and etc)
bull Next calculate frequency of each individual instruction in the workload
bull Finally multiply these two for each instruction and add them up to get final CPI
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
211998 Morgan Kaufmann Publishers
Example (RISC processor)
1048698 What if Branch instructions twice as fast1048698 What if two ALU instr could be executed at once
Must know the limit of architectural enhancement
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
221998 Morgan Kaufmann Publishers
Summary CPU Time Formula
bull 1048698
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
231998 Morgan Kaufmann Publishers
有關效能的另一個公式
bull 1
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
241998 Morgan Kaufmann Publishers
Amdahls Law
bull Speedup due to enhancement E
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
251998 Morgan Kaufmann Publishers
Outline
bull Performance
Definition
CPU performance formula
Measuring and evaluating performance (Sec 24-26)
Benchmark programs
Summarizing performance
Reporting performance
bull Cost
1048698 Cost and price
Cost of chips
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
261998 Morgan Kaufmann Publishers
What Programs for Comparison
bull Whatrsquos wrong with this program as a workload
integer A[ ][ ] B[ ][ ] C[ ][ ]
for (J=0 Ilt100 I++)
for (J=0 Jlt100 J++)
for (K=0 Klt100 K++)
C[ I ][ J ] = C[ I ][ J ] + A[ I ][ K ]B[ K ][ J ]
bull What measured Not measured What it good for
bull Ideally run typical programs with typical input before purchase or before even build machine
Called a ldquoworkloadrdquo For example
Engineer uses compiler spreadsheet
Author uses word processor drawing program compression software
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
271998 Morgan Kaufmann Publishers
Choosing Benchmark Programs
bull a
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
281998 Morgan Kaufmann Publishers
Benchmarks
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
291998 Morgan Kaufmann Publishers
Example Standardized WorkloadBenchmarks
bull Workstations Standard Performance Evaluation Corporation (SPEC)bull SPEC9518 application benchmarks (with inputs) reflecting a technical workl
oad (Fig 26) Eight integer go m88ksim gcc compress li ijpeg perl vortex Ten floating-point intensive tomcatv swim su2cor hydro2d mgrid applu turb3d apsi fppp
wave5 Separate average for integer (CINT95) and FP (CFP95) relative to a base
machine Benchmarks distributed in source code Company representatives select workload Compiler machine designers target benchmarks so try to change every 3
years
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
301998 Morgan Kaufmann Publishers
SPECint95base Performance (Oct 1997)
bull 1
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
311998 Morgan Kaufmann Publishers
SPECfp95base Performance (Oct 1997)
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
321998 Morgan Kaufmann Publishers
SPEC2000 (CINT)
Benchmark Language Category
164gzip C Compression
175vpr C FPGA PlacementRoute
176gcc C C Compiler
181mcf C Combinatorial Opt
186crafty C Chess
197parser C Word Processing
252eon C++ Computer Visualization
253perlbmk C PERL
254gap C Group Theory Interpreter
255vortex C OO Database
256bzip2 C Compression
300twolf C PlaceRoute Simulator
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
331998 Morgan Kaufmann Publishers
SPEC2000 (CFP)
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
341998 Morgan Kaufmann Publishers
Example PC Workload Benchmark
bull PCs Ziff Davis WinStone 99 Benchmark
A system-level application-based benchmark that measures a PCs overall performance when running todays top-selling windows-based 32-bit applications
Works through a series of scripted activities and uses the time a PC takes to complete those activities to produce its performance scores
Winstones tests dont mimic what these programs do they run actual application code
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
351998 Morgan Kaufmann Publishers
Winstone 99 (W99) Results
bull 1
Note 2 Compaq Machines using K6-2 v 6-3K6-2 Clock Rate is 1125 times faster butK6-3 Winstone 99 rating is 125 times faster
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
361998 Morgan Kaufmann Publishers
Summarizing Performance
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
371998 Morgan Kaufmann Publishers
Early Lessons from SPEC
bull 1
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
381998 Morgan Kaufmann Publishers
Summarizing Performance
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
391998 Morgan Kaufmann Publishers
Reporting Performance
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
401998 Morgan Kaufmann Publishers
Summary Performance
bull Latency v Throughput
CPU Time time spent executing a single program depends solely on design of processor (datapath pipelining effectiveness caches etc)
bull Performance doesnrsquot depend on any single factor need to know Instruction Count Clocks Per Instruction and Clock Rate to get valid estimations
bull Performance evaluation needs to consider
Benchmark programs
Summarizing performance
Reporting performance results
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
411998 Morgan Kaufmann Publishers
Outline
bull Performance
1048698 Definition
CPU performance formula
Measuring and evaluating performancebull Cost
Cost and price
Cost of chips
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
421998 Morgan Kaufmann Publishers
Chip Cost Manufacturing Process
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
431998 Morgan Kaufmann Publishers
Cost of a Chip Includes
bull Die cost affected by wafer cost number of dies per wafer and die yield
goes roughly with the cube of the die area
An 8rdquo wafer can contain 196 Pentium dies but only 78 Pentium Pro (Fig 116 and 117)
bull Testing costbull Packaging cost depends on pins heat dissipation
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
441998 Morgan Kaufmann Publishers
Real World Examples
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
451998 Morgan Kaufmann Publishers
System Cost 1995 Workstation
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
461998 Morgan Kaufmann Publishers
Cost versus Price
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce
471998 Morgan Kaufmann Publishers
Summary Cost
bull Integrated circuits driving computer industrybull Die costs goes up with the cube of die areabull Economics ($$$) is the ultimate driver for performa
nce