43
Better answers Relaxing Constraints: Relaxing Constraints: Thoughts on the Thoughts on the Evolution of Computer Evolution of Computer Architecture Architecture Joel Emer Joel Emer Alpha Development Group Alpha Development Group Compaq Computer Corporation Compaq Computer Corporation

Better answers Relaxing Constraints: Thoughts on the Evolution of Computer Architecture Joel Emer Alpha Development Group Compaq Computer Corporation

Embed Size (px)

Citation preview

Better answers

Relaxing Constraints: Relaxing Constraints: Thoughts on the Thoughts on the Evolution of Computer Evolution of Computer ArchitectureArchitecture

Joel EmerJoel Emer

Alpha Development GroupAlpha Development Group

Compaq Computer CorporationCompaq Computer Corporation

Better answers

1

10

100

3.73

Date of Introduction

SP

EC

int9

5.

EV45-275

EV5-300

EV56-500

EV6-575

EV67-730

EV4-200

EV56-600

EV56-400

Moore’s Law Alpha-styleMoore’s Law Alpha-style

Better answers

Iron Law of PerformanceIron Law of Performance

Performance = Performance = Frequency * Instructions Frequency * Instructions

CPI CPI

Frequency – largely circuit design/technologyFrequency – largely circuit design/technology

CPI – largely organizationCPI – largely organization

Instructions – largely architecture/compilerInstructions – largely architecture/compiler

Better answers

OutlineOutline

Review of technology factorsReview of technology factors

Retrospective on the quantitative methodRetrospective on the quantitative method

Augmenting the quantitative methodAugmenting the quantitative method

RecommendationRecommendation

Better answers

Power Dissipation TrendsPower Dissipation Trends

Power Dissipation

0

20

40

60

80

100

120

21064 21164 21264 21364

Po

we

r (W

)

0

0.5

1

1.5

2

2.5

3

3.5

Vol

tage

(V

)

Supply Current

010

203040

5060

7080

21064 21164 21264 21364

Cu

rre

nt

(A)

0

0.5

1

1.5

2

2.5

3

3.5

Vo

lta

ge

(V

)

•Power consumption is increasing

•Supply current is increasing faster!

Better answers

Coping With Power GrowthCoping With Power Growth

Technology techniques Better cooling technology needed Accelerate Vdd scaling SOI Clock distribution

Architectural possibilities Use less power-hungry structures Reduce useless speculation

Better answers

Clock Distribution TrendsClock Distribution Trends

32%

18%15%

10%

10%

8%

5%

2%

Global Clock Networks

Instruction Issue Units

Caches

Floating Execution Units

Integer Execution Units

Memory Management Unit

I/O

Miscellaneous Logic

21264 Power (Peak)

Frequencies will continue to scaleFrequencies will continue to scale Clock edge rates are not scalingClock edge rates are not scaling

Better answers

Coping With Clock DistributionCoping With Clock Distribution

Technology solutionTechnology solution Low swing differential clocksLow swing differential clocks Adiabatic clockingAdiabatic clocking

Architectural possibilitiesArchitectural possibilities Multiple clock zonesMultiple clock zones Asynchronous designAsynchronous design

Better answers

Communication DelayCommunication Delay

21064 ~ 1cycle

21164 ~ 1.5 cycles

21264 ~ 3 cycles

21464 ~ 6 cycles

Not drawn to scale

Microprocessor Chip Microprocessor Chip

Better answers

Coping With Communication DelayCoping With Communication Delay

Technology solutionsTechnology solutions Low K dielectricsLow K dielectrics Thinner (Cu) interconnectThinner (Cu) interconnect

Architectural possibilitiesArchitectural possibilities Deeper pipeliningDeeper pipelining Replication/clustering of structuresReplication/clustering of structures More autonomous computationMore autonomous computation

Better answers

SIA RoadmapSIA Roadmap

1997 1999 2002 2005 2008 2012

Technology Node (um) 250 180 130 100 70 50Memory (bit/chip) 256M 1G 4G 16G 64G 256GTransistors/chip (MPU) 11M 21M 76M 200M 520M 1.4GChip Frequency (MHz) 750 1250 2100 3500 6000 10,000Wiring Levels (max) 6 6 to 7 7 7 to 8 8 to 9 9Power Supply Voltage, Vdd (V) 1.8-2.5 1.5-1.8 1.2-1.5 0.9-1.2 0.6-0.9 0.5-0.6Power - High Performance (W), w/Heat sink 70 90 130 160 170 175Power -Hand-held (W) 1.2 1.4 2 2.4 2.8 3.2*The 2012 is directly from the SIA 1997 National Technology Roadmap

Better answers

OutlineOutline

Review of technology factorsReview of technology factors

Retrospective on the quantitative methodRetrospective on the quantitative method

Augmenting the quantitative methodAugmenting the quantitative method

RecommendationRecommendation

Better answers

DisclaimerDisclaimer

The names used and events depicted in this talk are The names used and events depicted in this talk are meant to be real. The events are, however, not an meant to be real. The events are, however, not an exhaustive enumeration of significant milestones.exhaustive enumeration of significant milestones.

The misrepresentations of fact and omission of The misrepresentations of fact and omission of contributors are unintentional and solely the contributors are unintentional and solely the responsibility of the presenter. Finally, the responsibility of the presenter. Finally, the interpretations are just that and are mine as well. interpretations are just that and are mine as well.

Better answers

Early quantitative method - 1981Early quantitative method - 1981

Better answers

uPC Histogram Chart – 1981-5uPC Histogram Chart – 1981-5

Compute Read R-Stall Write W-Stall lB-Stall Total

Decode 1.000 0.613 1.613Spec1 0.895 0.306 0.364 1.565Spec2-6 1.052 0.148 0.116 0.161 0.192 0.102 1.771B-Disp 0.221 0.005 0.226Simple 0.870 0.029 0.017 0.033 0.027 0.977Field 0.482 0.049 0.058 0.007 0.002 0.600Float 0.292 0.000 0.000 0.008 0.001 0.302Call/Ret 0.937 0.133 0.074 0.130 0.184 1.458System 0.434 0.015 0.031 0.014 0.028 0.522Character 0.318 0.039 0.099 0.046 0.004 0.506Decimal 0.026 0.002 0.000 0.001 0.002 0.031Int/Except 0.055 0.002 0.005 0.004 0.006 0.071Mem Mngmt 0.555 0.061 0.200 0.004 0.003 0.824Abort 0.127 0.127TOTAL 7.267 0.783 0.964 0.409 0.450 0.720 10.593

Average VAX Instruction T iming (Cycles per Instruction)

TABLE 8

Better answers

Paper countsPaper counts

ISCA 1ISCA 1 ISCA24ISCA24

No modelNo model 2222 11

Analytic ModelAnalytic Model 55 ½½

SimulationSimulation 11 21½ 21½

MeasurementMeasurement 00 77

Better answers

Scientific MethodScientific Method

Make hypothesis about behaviorMake hypothesis about behavior Design experiment Design experiment Run experiment and quantifyRun experiment and quantify Interpret resultsInterpret results New hypothesisNew hypothesis

Better answers

Scientific MethodScientific Method

Make hypothesis about behaviorMake hypothesis about behavior Pick baseline design and workload Pick baseline design and workload Run experiment and quantifyRun experiment and quantify Interpret resultsInterpret results New hypothesisNew hypothesis

Better answers

Scientific MethodScientific Method

Make hypothesis about behaviorMake hypothesis about behavior Pick baseline design and workload Pick baseline design and workload Run simulation model or measure hardwareRun simulation model or measure hardware Interpret resultsInterpret results New hypothesisNew hypothesis

Better answers

Scientific MethodScientific Method

Make hypothesis about behaviorMake hypothesis about behavior Pick baseline design and workload Pick baseline design and workload Run simulation model or measure hardwareRun simulation model or measure hardware Interpret resultsInterpret results Propose new designPropose new design

Better answers

Making and Testing HypothesisMaking and Testing Hypothesis

Cache experiment (Schlansker)Cache experiment (Schlansker)

64K word cache64K word cache 32-way set associative cache/LRU replacement32-way set associative cache/LRU replacement 200x200 matrix subblock of an N x N matrix200x200 matrix subblock of an N x N matrix Read twiceRead twice

SizesSizes N=2727: 0 missesN=2727: 0 misses N=2729: 24160 missesN=2729: 24160 misses N=2731: 36382 missesN=2731: 36382 misses

Better answers

Propose new designPropose new design

Direct mapped 4-way associative

4-way skewed

Skewed associative (Seznec)Skewed associative (Seznec)

Better answers

Quantitative Approach ProblemsQuantitative Approach Problems

Too much abstractionToo much abstraction Intra-chip latenciesIntra-chip latencies Memory subsystemMemory subsystem

Poor workloadsPoor workloads

Too incremental…Too incremental…

Better answers

Quantitative -> IncrementalQuantitative -> Incremental

0

0.5

1

1.5

2

2.5

3

3.5

4

a b c d e f g h I j k l

Better answers

OutlineOutline

Review of technology factorsReview of technology factors

Retrospective on the quantitative methodRetrospective on the quantitative method

Augmenting the quantitative methodAugmenting the quantitative method

RecommendationRecommendation

Better answers

Relaxing ConstraintsRelaxing Constraints

Select a constraint to relaxSelect a constraint to relax

Generate designGenerate design

Employ quantitative methodEmploy quantitative method

Evaluate resultsEvaluate results

Better answers

Important Steps…Important Steps…

BeforeBefore Carefully pick a constraint to relaxCarefully pick a constraint to relax

AfterAfter Find contributions without constraintFind contributions without constraint Preserving results after reinstating the constraintPreserving results after reinstating the constraint

Better answers

Extrapolate From Current TrendsExtrapolate From Current Trends

Personal Workstation – Xerox PARC – late 70’sPersonal Workstation – Xerox PARC – late 70’s

ResultsResults Accelerate innovationAccelerate innovation

VAX 11/780VAX 11/780 DoradoDorado

5 MHz5 MHz 15 MHz15 MHz

512 Kilobytes512 Kilobytes 8 Megabytes8 Megabytes

40+ Users40+ Users 1 User1 User

Better answers

Throw Out StandardsThrow Out Standards

Distributed file system - 1985Distributed file system - 1985

Better answers

Use a Simpler Starting PointUse a Simpler Starting Point

Fetch Decode/Map

Queue Reg Read

Execute Dcache/Store Buffer

Reg Write

Retire

PC

Icache

RegisterMap

DcacheRegs Regs

RISC out-of-order (Johnson, Tourng)RISC out-of-order (Johnson, Tourng)

Better answers

CISC-based O-O-OCISC-based O-O-O

K6 (Johnson)K6 (Johnson) Pentium Pro (Colwell, Papworth…)Pentium Pro (Colwell, Papworth…)

PC

Icache

Covert CISC

to RISC

RISCO-O-OCore

Better answers

Abandon conventionsAbandon conventions

VLIW (Fisher)VLIW (Fisher) Relieve hardware of all dependency responsibilityRelieve hardware of all dependency responsibility Give that responsibility to compilerGive that responsibility to compiler

Expected consequencesExpected consequences Much simpler implementationMuch simpler implementation Faster cycle timeFaster cycle time

Better answers

Sometimes not what you expectSometimes not what you expect

Compiler scheduling for hardware is a great ideaCompiler scheduling for hardware is a great idea

For 21064 - narrow in-orderFor 21064 - narrow in-order For 21164 - wider in-orderFor 21164 - wider in-order For 21264 – wider out-of-order For 21264 – wider out-of-order

Better answers

Issue Logic Critical LoopIssue Logic Critical Loop

InstructionSlot

InstructionIssue

to floating pointmultiply pipeline

to floating pointadd pipeline

to integerpipeline 0

to integerpipeline 1

IssueConflictChecker

S2 S3

X

Better answers

Make a Radical DepartureMake a Radical Departure

Multiscalar research (Sohi, Smith…)Multiscalar research (Sohi, Smith…)

Better answers

New Mechanism RequiredNew Mechanism Required

Dependence prediction (Moshovos)Dependence prediction (Moshovos)

Store

Load

Store

ProgramOrder

Load

Load

Load

Load

Load

Store

Execution Order

Trap!

Better answers

What Was Really ImportantWhat Was Really Important

Full hardware management (Sohi)Full hardware management (Sohi) SequencingSequencing Register dependenciesRegister dependencies Memory dependenciesMemory dependencies

Refinement (Mowry and Olukuton)Refinement (Mowry and Olukuton) Compiler managed – registers, sequencingCompiler managed – registers, sequencing Hardware managed memory dependence onlyHardware managed memory dependence only

Better answers

Ignoring Implementation RealitiesIgnoring Implementation Realities

SMT - in-order (Tullsen, Eggers, Levy)SMT - in-order (Tullsen, Eggers, Levy)

Fetch Issue Reg Read

Execute Dcache/Store Buffer

Reg Write

IcacheDcache

PC

Icache

Regs Regs

Better answers

Solution Already AvailableSolution Already Available

SMT out-of-order SMT out-of-order Fetch Decode/

MapQueue Reg

ReadExecute Dcache/

Store Buffer

Reg Write

Retire

IcacheDcache

PC

RegisterMap

Regs Regs

Better answers

OutlineOutline

Review of technology factorsReview of technology factors

Retrospective on the quantitative methodRetrospective on the quantitative method

Augmenting the quantitative methodAugmenting the quantitative method

RecommendationRecommendation

Better answers

Pay Attention to RealityPay Attention to Reality

Look at technology trendsLook at technology trends PowerPower LatencyLatency

Use more realistic modelsUse more realistic models More organizational detailsMore organizational details Better workloadsBetter workloads

Better answers

Ignore RealityIgnore Reality

Look for revolutionary contributionsLook for revolutionary contributions

Decide on a constraint to relaxDecide on a constraint to relax Apply the scientific method Apply the scientific method Revolutionary contributions may arise becauseRevolutionary contributions may arise because

– Constraint will be relaxed in timeConstraint will be relaxed in time– Constraint wasn’t fundamentalConstraint wasn’t fundamental– New avenues of exploration will be openedNew avenues of exploration will be opened

Better answers

AcknowledgmentsAcknowledgments

Bill BowhillBill Bowhill Paul GronowskiPaul Gronowski Bill HerrickBill Herrick Toni JuanToni Juan Geoff LowneyGeoff Lowney Ellen PiccioliEllen Piccioli Andre SeznecAndre Seznec