Upload
jam-zubair
View
87
Download
1
Embed Size (px)
Citation preview
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 11
CS 704CS 704 Advanced Computer ArchitectureAdvanced Computer Architecture
Lecture 2Lecture 2Quantitative PrinciplesQuantitative Principles
Detailed discussion on the Detailed discussion on the computer Performance – the key to computer Performance – the key to quantitative design and analysis quantitative design and analysis
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 22
Today’s Topics
Recap of Lecture 1Recap of Lecture 1
Growth in processor Growth in processor performance performance
Price-performance designPrice-performance design
CPU performance metricsCPU performance metrics
CPU benchmarks suitesCPU benchmarks suites
SummarySummary
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 33
Recap of Lecture 1 Computer Systems:Computer Systems:
Architecture refers to those attributes of a computer visible to a programmer or compiler writer; e.g. instruction set, addressing techniques, I/O mechanisms etc.
Organization refers to how the features of a computer are implemented? i.e., control signals are generated using the principles of finite state machine (FSM) or microprogramming
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 44
Recap of Lecture 1Computer Development:Computer Development:
•Academically, modern computer developments have their infancy in 1944-49
•Commercially, the first machine was built by Eckert-Mauchly Computer Corporation in 1949
•Technological developments, from vacuum tubes to VLSI circuits, dynamic memory and network technology gave birth to four different generations of computers.
•Microprocessor and PCs were introduced in 1971
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 55
Recap of Lecture 1
Design Perspectives: Design Perspectives:
ProcessorProcessor – ISA, ILP and Cache – ISA, ILP and Cache
Memory hierarchy: Memory hierarchy: Multilevel Multilevel cache and Virtual memorycache and Virtual memory
input/output and storagesinput/output and storages
multiprocessor and networksmultiprocessor and networks
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 66
Recap of Lecture 1Computer Design Cycle:Computer Design Cycle:
• The computer design and development has been under the influence of
-Technology
-performance and
-cost;
the decisive factors for rapid changes in the computer development have been the performance enhancements, price reduction and functional improvements.
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 77
Growth in Processor PerformanceInsert Slide 9 here
•The supercomputers and mainframes, costing millions of dollars and occupying excessively large space, prevailing form of computing in 1960s were replaced with relatively low-cost and smaller-sized minicomputers in 1970s
•In 1980s, very low-cost microprocessor-based desktop computing machines in the form of personal computer (PC) and workstation were introduced.
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 88
Growth in Processor PerformanceInsert Slide 9 here
•The growth in processor performance since mid-1980s has been substantially high than in earlier years
•Prior to the mid-1980s microprocessor performance growth was averaged about 35% per year
•By 2001 the growth raised to about 1.58 per year
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 99
Growth in Processor PerformanceP
erfo
rman
ce r
elat
ive
to M
IPS
Year
0
200
600
800
1000
198419861988199019921994 1996■ ■
■■
■ ■
■
1998
■
2000
■
400
12001400
1600 Intel P-III
HP 9000
HP 9000IBM Power1 DEC
AlphaMIPS R2000
DEC Alpha
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 1010
Price-Performance DesignPrice-Performance Design
Technology improvements are used to lower the cost and increase performance.
The relationship between cost and price is complex one
The cost is the total amount spends to produce a product
The price is the amount for which a finished good is sold.
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 1111
Price-Performance DesignPrice-Performance Design
The cost passes throughdifferent stages before it becomes price.
A small change in cost may have a big impact on price
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 1212
Price vs. Cost Price vs. Cost ….. ….. Insert Slide 14 hereInsert Slide 14 here• Manufacturing Costs:Manufacturing Costs: Total amount spent to Total amount spent to produce a componentproduce a component - - Component Cost: Component Cost: Cost at which the Cost at which the components are available to the components are available to the designer. - designer. - It It ranges fromranges from 40% to 50% 40% to 50% of of thethe list price list price of the of the product. product. - - Recurring costs:Recurring costs: Labor, purchasing Labor, purchasing scrap, warranty – 4% - 16 % of list pricescrap, warranty – 4% - 16 % of list price - - Gross margin – Gross margin – Non-recurring cost:Non-recurring cost: R&D, R&D, marketing, sales, equipment, rental,marketing, sales, equipment, rental, maintenance, financing cost, pre-tax maintenance, financing cost, pre-tax profits, profits, taxestaxes
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 1313
Price vs. Cost Price vs. Cost ….. ….. Insert Slide 14 Insert Slide 14 herehere
• List PriceList Price:: •Amount for which the finished good is Amount for which the finished good is sold; sold; •it includes it includes Average Discount Average Discount of of 15% to 35% of the 15% to 35% of the as volume discounts as volume discounts and/or retailer markupand/or retailer markup
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 1414
Price vs. Cost Price vs. Cost ….. Price-Performance Design Cont’d….. Price-Performance Design Cont’d
0%
20%
40%
60%
80%
100%
Mini W/S PC
Average Discount
Gross Margin
Direct Costs
Component Costs
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 1515
Cost-effective IC Design:Cost-effective IC Design: Price-Performance DesignPrice-Performance Design
Yield: Yield: Percentage of manufactured components surviving testing
Volume: increases manufacturing hence decreases the list price and improves the purchasing efficiency
Feature Size:Feature Size: the minimum size of a the minimum size of a transistor or wire in either x or y direction transistor or wire in either x or y direction
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 1616
Cost-effective IC Design:Cost-effective IC Design: Price-Performance DesignPrice-Performance Design
Reduction in feature size from 10 microns in 1971 and 0.18 in 2001has resulted in:
- - Quadratic rise in transistor count
- Linear increase in performance
- 4-bit to 64-bit microprocessor
- Desktops have replaced time-sharing machines
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 1717
Cost of Integrated CircuitsCost of Integrated Circuits
Manufacturing Stages:
The Integrated circuit manufacturing passes through many stage:
Wafer growth and testing Wafer chopping it into dies Packaging the dies to chips Testing a chip.
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 1818
Cost of Integrated CircuitsCost of Integrated CircuitsInsert Slide 19 hereInsert Slide 19 here
Die: is the square area of the wafer containing the integrated circuit
See that while fitting dies on the wafer the small wafer area around the periphery goes waist
Cost of a die: The cost of a die is determined from cost of a wafer; the number of dies fit on a wafer and the percentage of dies that work, i.e., the yield of the die.
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 1919
Dies of Integrated CircuitsDies of Integrated Circuits
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 2020
Cost of Integrated CircuitsCost of Integrated CircuitsInsert Slide 21 hereInsert Slide 21 here
• The cost of integrated circuit can be determined as ratio of the total cost; i.e., the sum of the costs of die, cost of testing die, cost of packaging and the cost of final testing a chip; to the final test yield.
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 2121
Calculating Integrated Circuits CostsCalculating Integrated Circuits Costs
Cost of IC =
die cost + die testing cost + packaging cost + final testing cost
final test yield
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 2222
Cost of Integrated CircuitsCost of Integrated CircuitsInsert Slide 23 hereInsert Slide 23 here
• The cost of die is the ratio of the cost of the wafer to the product of the dies per wafer and die yield
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 2323
Calculating Integrated Circuits CostsCalculating Integrated Circuits Costs
Cost of IC =
die cost + die testing cost + packaging cost + final testing cost
final test yield
Cost of die = Cost of wafer
dies per wafer x die yield
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 2424
Cost of Integrated CircuitsCost of Integrated CircuitsInsert Slide 25 hereInsert Slide 25 here
• The number of dies per wafer is determined by the dividing the wafer area (minus the waist wafer area near the round periphery) by the die area
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 2525
Calculating Integrated Circuits CostsCalculating Integrated Circuits Costs
Cost of IC =
die cost + die testing cost + packaging cost + final testing cost
final test yield
Cost of die = Cost of wafer
dies per wafer x die yield
Dies per wafer =
π (wafer diameter/2)2 π (wafer diameter)
die area √ 2 x die area
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 2626
Example Calculating Number of DiesExample Calculating Number of Dies
For die of 0.7 Cm on a side, find the number of dies per wafer of 30 cm diameter
Answer:[Wafer area / Die Area] - Wafer Waist area
= π (30/2)2 / 0.49 - π (30) / √ (2 x 0.49)
= 1347 dies
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 2727
ExampleExample
For die of 0.7 Cm on a side, find the number of dies per wafer of 30 cm diameter
Answer:[Wafer area / Die Area] - Wafer Waist area
= π (30/2)2 / 0.49 - π (30) / √ (2 x 0.49)
= 1347 dies
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 2828
Calculating Die YieldCalculating Die YieldInsert Slide 29 hereInsert Slide 29 here
• Die yield is the fraction or percentage of good dies on a wafer number
• Wafer yield accounts for completely bad wafers so need not be tested
• Wafer yield corresponds to on defect density by α which depends on number of masking levels • good estimate for CMOS is 4.0 and
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 2929
Calculating Integrated Circuits CostsCalculating Integrated Circuits Costs Die yield =
Wafer yield x (1 + defects per unit area x die area) -α
α
Example:
The yield of a die, 0.7cm on a side, with defect density of 0.6/cm2
= (1+[0.6x0.47]/4.0) -4 = 0.75
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 3030
Price-Performance DesignPrice-Performance Design
• Time to run the task:
• Execution time, response time, latency
• Throughput or bandwidth:
• Tasks per day, hour, week, sec, ns …
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 3131
Price-Performance DesignPrice-Performance DesignInsert Slid 32Insert Slid 32
• Example:
• To carry 2400 passengers from Lahore to Islamabad – • Train completes the task in 4:00 hrs while airplane completes the same task
in 6.00 hrs.;
• .e., 66.67% of the task in same time – throughput and hence performance of train is 50% more than airplane
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 3232
Price-Performance Design: ExamplePrice-Performance Design: Example
Vehicle
Train
Plane
Cost / person
300 Rs.
3000 Rs.
TimeLah to
Isb
4.0 hours
45 min.
Passengers/ trip
2400
300
Execution time
/person
6.0 sec
9.0 sec.
Cost-performance
300x6=1,800Rs-sec/person
3000x9=27,000Rs-sec/person
Time to complete
job
4.0 hours
45x8 min. = 6.0 Hr
Plane 10 time faster but takes Plane 10 time faster but takes 50% more time to complete the 50% more time to complete the
job; i.e., lesser throughput – job; i.e., lesser throughput – thus performance of train is thus performance of train is
50%better than plane50%better than plane
The time per person and The time per person and cost person of train is less cost person of train is less than that of plane Thus the than that of plane Thus the cost-performance of plane cost-performance of plane
is 1:15is 1:15
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 3333
Metrics of PerformanceMetrics of PerformanceInsert Slide 33Insert Slide 33
MIPS: Millions of Instructions per second
MFLOPS: millions of FP operations per sec.
Cycles per second (clock rate)
Megabytes per second
Compiler
Programming Language
Application
Instruction Set Architecture
Answers per monthOperations per second
Datapath
Control
TransistorsWire – I/OPins/
Function Units
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 3434
Aspects of CPU PerformanceAspects of CPU PerformanceCPU time = Seconds = Instructions x Cycles x
Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
Inst CountInst Count CPI Clock RateCPI Clock Rate
ProgramProgram √√
CompilerCompiler √√
Inst. Set.Inst. Set. √√ √√
Organization Organization √√ √√
TechnologyTechnology √√
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 3535
Cycles Per InstructionCycles Per Instruction• Cycles per Instruction – CPI
= CPU Clock Cycles for program / Instruction Count= (CPU Time * Clock Rate) / Instruction Count
• Instruction Frequency –
For instruction mix, the relative frequency of occurrence of different types of instructions is given as:
FICi = IC of ith instruction / Total Instruction count
• Average Cycles per Instruction –
n nCPI = [1/Instruction count] ∑ ICi x CPIi = ∑ FICi x CPIi
i=1 i=1
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 3636
Example: Calculating average CPIExample: Calculating average CPI
Base Machine (Reg / Reg)
Op Freq Cycles CPI (i) (% Time)
ALU 50% 1 0.5 (33%)
Load 20% 2 0.4 (27%)
Store 10% 2 0.2 (13%)
Branch 20% 2 0.4 (27%)
1.5
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 3737
Cycles Per InstructionCycles Per Instruction nn
Arithmetic mean time:Arithmetic mean time: 1/n 1/n ∑ ∑ Time Time i
i=1i=1
Weighted arithmetic mean time:Weighted arithmetic mean time: nn
∑ ∑ ww i x Time x Time i
i=1i=1
Geometric mean time:Geometric mean time: n __________________n __________________
/ n / n // ππ Execution time ratio Execution time ratio i
√ √ I =1I =1
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 3838
Summary: Price-Performance DesignSummary: Price-Performance Design
Computer cost: The total cost of manufacturing a computer is distributed among different parts of the system such as the cost of cabinet, processor board and I/O devices.
Performance Time is the key measurement of performance
Comparing performance of two designs: the ratio,
η = Execution time Y / Execution time X
determines how much lower execution time machine Y takes as compared to X ; as performance is inverse of execution time, i.e.,
η = Performance X / Performance Y
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 3939
Instruction Execution Rate - MIPSInstruction Execution Rate - MIPS
MIPS specify performance inversely to execution time; For a given program:
MIPS = (instruction count) / (execution time x 106)
MIPS could not be calculated from the instruction mix Relative MIPS for a machine ‘M’ is defined based on some reference machine as: RMIPS = [Performance M / Performance reference] x MIPS reference
or = [Time reference / Time M] x MIPS reference
MFLOPS defined for Floating-point-intensive programs as millions of floating-point operations per second
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 4040
CPU Benchmark SuitesCPU Benchmark Suites
Performance Comparison: the execution time of the same workload running on two machines without running the actual programsBenchmarks: the programs specifically chosen to measure the performance. Five levels of programs: in the decreasing order of accuracy– Real Applications – Modified Applications – Kernels – Toy benchmarks – Synthetic benchmarks
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 4141
SPEC:SPEC: System Performance Evaluation CooperativeSystem Performance Evaluation Cooperative First Round 1989:First Round 1989: 10 programs yielding a single number – 10 programs yielding a single number – SPECmarksSPECmarks
Second Round 1992:Second Round 1992: SPECInt92 (6 integer programs) and SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)SPECfp92 (14 floating point programs)
Third Round 1995Third Round 1995– new set of programs: SPECint95 (8 integer programs) and new set of programs: SPECint95 (8 integer programs) and
SPECfp95 (10 floating point) SPECfp95 (10 floating point)
– ““benchmarks useful for 3 years”benchmarks useful for 3 years”
– Single flag setting for all programs: SPECint_base95, Single flag setting for all programs: SPECint_base95, SPECfp_base95 SPECfp_base95
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 4242
Summary: Summary: Designing and performance comparisonDesigning and performance comparison
• Designing to Last through Trends
Capacity SpeedLogic 2x in 3 years 2x in 3 years
DRAM 4x in 3 years 2x in 10 years
Disk 4x in 3 years 2x in 10 years
• 6yrs to graduate => 16X CPU speed, DRAM/Disk size
• Time to run the task– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns, …– Throughput, bandwidth
• “X is n times faster than Y” means ExTime(Y) Performance(X)
=
ExTime(X) Performance(Y)
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 4343
SummarySummary …….. Cont’d…….. Cont’d
CPI Law:CPI Law:
Execution timeExecution time is the REAL measure of computer is the REAL measure of computer performance!performance!
Good productsGood products created when have: created when have:– Good benchmarks, good ways to summarize Good benchmarks, good ways to summarize
performanceperformanceDie CostDie Cost goes roughly with die area goes roughly with die area44
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
MAC/VU-Advanced CompMAC/VU-Advanced Computer Architectureuter Architecture
Lecture 2 - PerformanceLecture 2 - Performance 4444
SummarySummary ….. Cont’d….. Cont’d
““For better or worse, benchmarks shape a field”For better or worse, benchmarks shape a field”
Good products created when have:Good products created when have:– Good benchmarksGood benchmarks– Good ways to summarize performanceGood ways to summarize performance
Given sales is a function in part of performance relative to Given sales is a function in part of performance relative to competition, investment in improving product as reported competition, investment in improving product as reported by performance summaryby performance summary
If benchmarks/summary inadequate, then choose between If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product improving product for real programs vs. improving product to get more sales;to get more sales;Sales almost always wins!Sales almost always wins!
Execution time is the measure of computer performance!Execution time is the measure of computer performance!