48
Computer Architecture Computer Architecture Lecture Notes Lecture Notes Spring 2005 Spring 2005 Dr. Michael P. Frank Dr. Michael P. Frank (New) Competency Area 6: (New) Competency Area 6: Introduction to Pipelining Introduction to Pipelining

Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank

  • Upload
    lona

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank. (New) Competency Area 6: Introduction to Pipelining. Basic Pipelining Concepts. P&H 3 rd ed., Chapter 6 H&P 3 rd ed. § A.1. Pipelining - The Basic Concept. - PowerPoint PPT Presentation

Citation preview

Page 1: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Computer Architecture Computer Architecture Lecture Notes Lecture Notes Spring 2005Spring 2005

Dr. Michael P. FrankDr. Michael P. Frank

(New) Competency Area 6:(New) Competency Area 6:Introduction to Pipelining Introduction to Pipelining

Page 2: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Basic Pipelining ConceptsBasic Pipelining Concepts

P&H 3P&H 3rdrd ed., Chapter 6 ed., Chapter 6H&P 3H&P 3rdrd ed. ed. §§A.1A.1

Page 3: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Pipelining - The Basic ConceptPipelining - The Basic Concept• In early CPUs, deep combinational logic networks were used in In early CPUs, deep combinational logic networks were used in

between state updates.between state updates.– Signal delays may vary widely across different paths.Signal delays may vary widely across different paths.– New input cannot be provided to the network until the New input cannot be provided to the network until the

slowest paths have finished.slowest paths have finished.– Slow clock speed, slow overall processing rates.Slow clock speed, slow overall processing rates.

• In pipelined design, deep logic networks are subdivided into In pipelined design, deep logic networks are subdivided into relatively shallow slices (pipeline stages).relatively shallow slices (pipeline stages).– Delays through the network are made uniform.Delays through the network are made uniform.– A new input can be provided to each slice as soon as its A new input can be provided to each slice as soon as its

quick, shallow network has finished.quick, shallow network has finished.– Multiple inputs are processed simultaneously across stages.Multiple inputs are processed simultaneously across stages.– Clock cycle is only as long as the slowest pipeline stage.Clock cycle is only as long as the slowest pipeline stage.

Page 4: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Generic Pipelining IllustrationGeneric Pipelining Illustration

• Let represent any of a variety of logic gatesLet represent any of a variety of logic gates• Initial, non-pipelined design for some random Initial, non-pipelined design for some random

block of complex logic:block of complex logic:

latch latch

Page 5: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Pipelining Illustration cont.Pipelining Illustration cont.

• Aggressively pipelined version of same logic:Aggressively pipelined version of same logic:– Insert extra “pipeline registers” periodicallyInsert extra “pipeline registers” periodically

• Here, after every 1-2 logic layersHere, after every 1-2 logic layers– This design can process 5x as much data at once! This design can process 5x as much data at once!

latch latch

Page 6: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Another View of PipeliningAnother View of Pipelining

• Space-time diagrams:Space-time diagrams:– Here, each colored area shows which parts of the logic network are Here, each colored area shows which parts of the logic network are

occupied with data computed from a given input item, at which times.occupied with data computed from a given input item, at which times.

Depth in logic network

Tim

e

Data 1

Data 2

Depth in logic network

Non-Pipelined Pipelined (depth 6)

Tim

e

Page 7: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Simple Multicycle RISC DatapathSimple Multicycle RISC Datapath

IF ID EX MEM WB

ProgramCounter

Next PC

Inst.Reg.

Loadfr. Mem.

Data

Page 8: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Basic RISC Execution PipelineBasic RISC Execution Pipeline

• Basic idea of instruction-execution pipelining:Basic idea of instruction-execution pipelining:– Each instruction spends 1 clock cycle in each of the Each instruction spends 1 clock cycle in each of the

execution stages (in our example, there are 5).execution stages (in our example, there are 5). during 1 clock cycle, the pipeline can be during 1 clock cycle, the pipeline can be

processing (different stages of) 5 different processing (different stages of) 5 different instructions simultaneously!instructions simultaneously!

time

stag

e

Page 9: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Different VisualizationsDifferent VisualizationsS

ame

Tim

e, D

iffe

rent

Pla

ces

Same Place, Different Times

Same Place, Different Times

Sam

e T

ime,

Dif

fere

nt P

lace

s

Skew

Same Time,DifferentData Item /Instruction

Same instruction, different steps

Page 10: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

More Graphical DetailMore Graphical Detail

Page 11: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Adding Pipeline RegistersAdding Pipeline Registers

Page 12: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Description of Pipe StagesDescription of Pipe Stages

Page 13: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

DependencesDependences

(from H&P 3(from H&P 3rdrd ed. ed. §3.1)§3.1)

Page 14: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

DependencesDependences• A A dependencedependence is a way in which one instruction is a way in which one instruction

can depend on (be impacted by) another for can depend on (be impacted by) another for scheduling purposes.scheduling purposes.

• Three major dependence types:Three major dependence types:– Data dependenceData dependence– Name dependenceName dependence– Control dependenceControl dependence

• I’ll sometimes use the word I’ll sometimes use the word dependencydependency for a for a particular instance of one instruction depending particular instance of one instruction depending on another.on another.– The instructions can’t be effectively (as opposed to The instructions can’t be effectively (as opposed to

just syntactically) fully parallelized, or reordered.just syntactically) fully parallelized, or reordered.

Page 15: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Data DependenceData Dependence• Recursive definition:Recursive definition:

– Instruction Instruction BB is data dependent on instruction is data dependent on instruction AA iff: iff:• BB uses a data result produced by instruction uses a data result produced by instruction AA, or, or• There is another instruction There is another instruction CC such that such that BB is data is data

dependent on dependent on CC, and , and CC is data dependent on is data dependent on AA..

• When a data dependence is present, there is When a data dependence is present, there is a potential a potential RAWRAW hazard. hazard.

Loop: LD F0,0(R1)Loop: LD F0,0(R1) ADDD F4,F0,F2ADDD F4,F0,F2 SD 0(R1),F4SD 0(R1),F4 SUBI R1,R1,#8SUBI R1,R1,#8 BNEZ R1,LoopBNEZ R1,Loop

B

A

B

A

C

Direct data dependenciesin a simple examplecode fragment

Page 16: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Name DependenceName Dependence• When two instructions access the same data When two instructions access the same data

storage location, but are not data dependent.storage location, but are not data dependent.– Also, at least one of the accesses must be a write.Also, at least one of the accesses must be a write.

• Two sub-types (for inst. B after inst. A):Two sub-types (for inst. B after inst. A):– AntidependenceAntidependence: : AA reads, then reads, then BB writes. writes.

• Potential for aPotential for a WARWAR hazard.hazard.– Output dependenceOutput dependence: : A A writes, then writes, then BB writes. writes.

• Potential for aPotential for a WAWWAW hazard.hazard.

• Note:Note: Name dependencies can be avoided by Name dependencies can be avoided by changing instructions to use different locationschanging instructions to use different locations– (Rather than reusing 1 location for 2 purposes.)(Rather than reusing 1 location for 2 purposes.)– This fix is called This fix is called renamingrenaming..

A

B

time

A

B

time

Page 17: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Control DependenceControl Dependence• Occurs when the execution of an instructionOccurs when the execution of an instruction

(as in, will it be executed, or not?)(as in, will it be executed, or not?)

depends on the outcome of some earlier, depends on the outcome of some earlier, conditional branch instruction.conditional branch instruction.

• We generally can’t easily change which We generally can’t easily change which branches an instruction depends on w/o ruining branches an instruction depends on w/o ruining the program’s functional behavior.the program’s functional behavior.

• However, there are exceptions.However, there are exceptions.

Page 18: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Hazards, Stalls, & ForwardingHazards, Stalls, & Forwarding

H&P 3H&P 3rdrd ed. ed. §A.2-3§A.2-3

Page 19: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

HazardsHazards

• HazardsHazards are circumstances which may lead to are circumstances which may lead to stallsstalls in the pipeline if not addressed. in the pipeline if not addressed.– Stalls are delays, and may be called “bubbles”Stalls are delays, and may be called “bubbles”

• There are three major types of hazards:There are three major types of hazards:– Structural hazardsStructural hazards::

• Not enough HW resources to keep all instrs. moving.Not enough HW resources to keep all instrs. moving.– Data hazardsData hazards

• Data results of earlier instrs. not yet avail. when needed.Data results of earlier instrs. not yet avail. when needed.– Control hazardsControl hazards

• Control decisions resulting from earlier instrs. (branches) Control decisions resulting from earlier instrs. (branches) not yet made; don’t know which new instrs. to execute. not yet made; don’t know which new instrs. to execute.

Page 20: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Structural Hazard ExampleStructural Hazard Example

Suppose you had a combined instruction+data memory w. only 1 read port

Page 21: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Hazards Produce “Bubbles”Hazards Produce “Bubbles”

Time

Pro

gres

s th

roug

h pi

pe

Unskew

Bubble rises

Page 22: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Textual ViewTextual View

A pipeline stalled for a structural hazard – a load with only one memory port

Page 23: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Example Data HazardsExample Data Hazards

Page 24: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Forwarding for Data HazardsForwarding for Data Hazards

Page 25: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Another Forwarding ExampleAnother Forwarding Example

Page 26: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Three Types of Data HazardsThree Types of Data Hazards

• Let Let ii be an earlier instruction, be an earlier instruction, jj a later one. a later one.• RAWRAW (read after write) (read after write)

– jj is supposed to is supposed to RRead a value ead a value AAfter fter ii WWrites it,rites it,• But instead But instead jj tries to read the value before tries to read the value before ii has written it has written it

• WAWWAW (write after write) (write after write)– jj should should WWrite to a given place rite to a given place AAfter fter ii WWrites there,rites there,

• But they end up writing in the wrong order.But they end up writing in the wrong order.– Only occurs if >1 pipeline stage can write.Only occurs if >1 pipeline stage can write.

• WARWAR (write after read) (write after read)– jj should should WWrite a new value rite a new value AAfter fter ii RReads the old,eads the old,

• But instead But instead jj writes the new value before writes the new value before ii has read the old one. has read the old one.– Only occurs if writes can happen before reads in pipeline.Only occurs if writes can happen before reads in pipeline.

Page 27: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

An Unavoidable StallAn Unavoidable Stall

Page 28: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Stalling in midst of instructionStalling in midst of instruction

Page 29: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Data Hazard PreventionData Hazard Prevention

• A clever compiler can often reschedule A clever compiler can often reschedule instructions to avoid a stall.instructions to avoid a stall.– A simple example:A simple example:

• Original code:Original code: lw r2, 0(r4)lw r2, 0(r4) add r1, r2, r3 add r1, r2, r3 Note: Stall happens here! Note: Stall happens here! lw r5, 4(r4)lw r5, 4(r4)

• Transformed code:Transformed code: lw r2, 0(r4)lw r2, 0(r4) lw r5, 4(r4) lw r5, 4(r4) add r1, r2, r3 add r1, r2, r3 No stall needed! No stall needed!

Page 30: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Simple RISC Pipeline Stall StatisticsSimple RISC Pipeline Stall Statistics

Percentageof loads thatcause a stall

Note that ~1 in 5loads causes a stallin many programs!

Benchmark

Page 31: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Data Hazard DetectionData Hazard Detection

Page 32: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Hazard Detection LogicHazard Detection Logic

• Example:Example: Detecting whether an instruction that Detecting whether an instruction that has just been fetched needs to be stalled 1 cycle has just been fetched needs to be stalled 1 cycle because of an immediately preceding load.because of an immediately preceding load.

IF ID EX ME WB

IF/ID ID/EX EX/ME ME/WB

IF/ID

Page 33: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Forwarding Situations in DLXForwarding Situations in DLX

Page 34: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Implementing Forwarding in HWImplementing Forwarding in HW

Page 35: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Control Hazards, Branch Control Hazards, Branch Prediction, Delayed BranchesPrediction, Delayed Branches

H&P 3H&P 3rdrd ed., ed., §§A.2-3 & §4.2§§A.2-3 & §4.2

Page 36: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Control HazardsControl Hazards

• Suppose the new PC value was not computed Suppose the new PC value was not computed until the MEM stage (like orig. RISC design).until the MEM stage (like orig. RISC design).

• Then we must stall Then we must stall 33 clocks after clocks after everyevery branch! branch!

Page 37: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Early Branch ResolutionEarly Branch Resolution

Page 38: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

New Pipeline LogicNew Pipeline Logic

Page 39: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Control Instruction StatisticsControl Instruction Statistics

• ~10% of dynamic insts.are fwd. cond. branches

• only ~3% are backwardscond. branches

• similar percentage areunconditional branches`

Page 40: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Stats on Taken BranchesStats on Taken Branches

~67% of cond.branches are

taken

Page 41: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Predict-Not-TakenPredict-Not-Taken

Page 42: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Delayed BranchesDelayed Branches

Machine code sequence:Branch instructionDelay slot instruction(s)Post-branch instructions

Branch is taken(if taken) at this point

Page 43: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Filling the Branch-Delay SlotFilling the Branch-Delay Slot

Page 44: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Static Branch PredictionStatic Branch Prediction

• Earlier we discussed predict-taken, predict-not-Earlier we discussed predict-taken, predict-not-taken static prediction strategiestaken static prediction strategies– Applied uniformly across all branches in programApplied uniformly across all branches in program

• Static analysis in compiler may be able to do Static analysis in compiler may be able to do better, if it can better, if it can non-uniformlynon-uniformly predict whether predict whether each each specificspecific branch is likely to be taken or not branch is likely to be taken or not– One way: One way: Backwards taken, forwards not taken.Backwards taken, forwards not taken.

• If we can do better, it can help with static code If we can do better, it can help with static code scheduling to reduce data hazard stalls…scheduling to reduce data hazard stalls…– Also may assist later dynamic predictionAlso may assist later dynamic prediction

Page 45: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Prediction Helps Static SchedulingPrediction Helps Static Scheduling

LD R1,0(R2)LD R1,0(R2) DSUBU R1,R1,R3DSUBU R1,R1,R3 BEQZ R1,elseBEQZ R1,else OROR R4,R5,R6 R4,R5,R6 DADDU R10,R4,E3DADDU R10,R4,E3 JJ after afterelse: DADDU R7,R8,R9else: DADDU R7,R8,R9 … …after:after:

Potential load delay to fill

If-then-elsecontrol flow

Codemovementsto consider:

Some data dependences

Which way will thisbranch go?

Ifcase

Elsecase

Page 46: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Some Static Prediction SchemesSome Static Prediction Schemes

• Always predict takenAlways predict taken– 34% mispredict rate on SPEC (range 9%-54%)34% mispredict rate on SPEC (range 9%-54%)

• Backwards predict taken, forwards not takenBackwards predict taken, forwards not taken– In SPEC, more than ½ of forwards are taken!In SPEC, more than ½ of forwards are taken!

• This does worse than “always predict taken” strategyThis does worse than “always predict taken” strategy– Usu. not better than 30-40% misprediction rateUsu. not better than 30-40% misprediction rate

• Better than either: Use profile information!Better than either: Use profile information!– Collect statistics on earlier program runs.Collect statistics on earlier program runs.– Works well because individual branches tend to be Works well because individual branches tend to be

strongly biased (taken or not) given average datastrongly biased (taken or not) given average data• Bias tends to remain stable across multiple runsBias tends to remain stable across multiple runs

Page 47: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Profile-Based Predictor StatisticsProfile-Based Predictor Statistics

Floating-Point

Page 48: Computer Architecture  Lecture Notes  Spring 2005 Dr. Michael P. Frank

Predict-Taken vs. Profile-BasedPredict-Taken vs. Profile-Based

Floating-point

Instructions executed in between mispredictions

(Logscale!)