49
COSC 6385 – Computer Architecture Edgar Gabriel COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Fall 2006

COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

COSC 6385 Computer Architecture

- Pipelining (II)

Edgar GabrielFall 2006

Page 2: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Pipeline Hazards• Limits to pipelining: Hazards prevent next instruction

from executing during its designated clock cycle– Structural hazards: HW cannot support this combination of

instructions – Data hazards: Instruction depends on result of prior

instruction still in the pipeline – Control hazards: Caused by delay between the fetching of

instructions and decisions about changes in control flow (branches and jumps).

Slide based on a lecture by David Culler, University of California, Berkleyhttp://www.eecs.berkeley.edu/~culler/courses/cs252-s05

Page 3: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

• Read After Write (RAW): InstrJ tries to read operand before InstrI writes it

– Caused by a “Dependence” (in compiler nomenclature). This hazard results from an actual need for communication.

• Write After Read (WAR): InstrJ writes operand before InstrIreads it

– Called an “anti-dependence”• Write After Write (WAW): InstrJ writes operand before InstrI

writes it.– Called an “output dependence” by compiler writers

Three Generic Data Hazards

Slide based on a lecture by David Culler, University of California, Berkleyhttp://www.eecs.berkeley.edu/~culler/courses/cs252-s05

Page 4: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Four Branch Hazard Alternatives

#1: Stall until branch direction is clear#2: Predict Branch Not Taken

– Execute successor instructions in sequence#3: Predict Branch Taken

– Haven’t calculated branch target address yet, still incurs 1 cycle branch penalty

#4: Delayed Branch– Define branch to take place AFTER a following instruction– 1 slot delay allows proper decision and branch target address in

5 stage pipeline

Slide based on a lecture by David Culler, University of California, Berkleyhttp://www.eecs.berkeley.edu/~culler/courses/cs252-s05

Page 5: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Performance evaluation of pipelines (I)

enh

org

timeExecutiontimeExecution

Speedup__

=enhenhenh

orgorgorg

CPIeClockClyclICCPIClockCycleIC××

××=

If ICorg = ICenhenhenh

orgorg

CPIeClockClyclCPIClockCycle

Speedup×

×=

If ICorg = ICenh and ClockCycleorg = ClockCycleenhenh

org

CPICPI

Speedup =

Page 6: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Performance evaluation of pipelines (II)

enh

orgoverall timeExecution

timeExecutionSpeedup

__

=

=

=

××

××= n

ienh

iienh

n

iorg

iiorg

CPIICeClockClycl

CPIICeClockClycl

1

1

with

If looking at individual instructions

total

ii

ICICf =

If ICtotal does not change, you can also use the average instruction execution time (AvIETime)

enh

orgoverall timeExecution

timeExecutionSpeedup

__

=

=

=

××

××= n

ienh

iienh

n

iorg

iiorg

CPIfeClockClycl

CPIfeClockClycl

1

1

Page 7: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Performance evaluation of pipelines (III)

• Comparing pipelined and non-pipelined execution:

• Also

stagespipelinenumtimeExecution

timeExecution pipelinednonpipelined __

__ _=

stagespipelinenumtimeExecution

timeExecutionSpeedup

pipelined

pipelinednon ___

_ _ ==

pipelined

pipelinednon

pipelined

pipelinednon

pipelined

pipelinednon

ClockCycleClockCycle

CPICPI

AvIETimeAvIETime

Speedup ___ ×==

Ideal CPIpipelined = 1Realist CPIpipelined = Ideal CPIpipelined + Pipeline stall cycles per instruction

Page 8: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Performance evaluation of pipelines (IV)

pipelined

pipelinednon

AvIETimeAvIETime

Speedup _=

Thus:

If ClockCycle does not change:

erInstrallCyclesPPipelineStCPI

Speedup pipelinednon

+=

1_

If all instructions take the same number of cycles (=Number of pipeline stages)

pipelined

pipelinednonpipelinednon

ClockCycleClockCycle

erInstrallCyclesPPipelineStCPI __

+=

erInstrallCyclesPPipelineStstagespipelinenumSpeedup

+=

1__

Page 9: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Example pg A-10• (A) Given an non-pipelined processor:

– 1 ns clock cycle time– 4 Cycles for ALU operations– 4 cycles for branches– 5 cycles for memory operations

• (B) Given also a pipelined processor– 1.2 ns clock cycle time

• Both (A) and (B) have– 40% ALU operations– 40% branches– 20% memory operations

• What is the speedup of (B) over (A) due to pipelining?

Page 10: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Example pg A-10 (II)

For machine (A): ∑=

××=n

i

iiAA CPIfClockCycleAvIETime

1)(

nsns 4.4)54.042.044.0(1 =×+×+××=

For machine (B): assuming ideal CPI (= 1)

∑=

××=n

i

iiBB CPIfClockCycleAvIETime

1)(

nsns 2.1)14.012.014.0(2.1 =×+×+××=

7.32.14.4

)(

)( ===nsns

AvIETimeAvIETime

SpeedupB

AThus

Page 11: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Example: Dual-port vs. Single-port• Machine A: Dual ported memory (“Harvard Architecture”)• Machine B: Single ported memory, but its pipelined

implementation has a 1.05 times faster clock rate• Ideal CPI = 1 for both• Loads are 40% of instructions executed

SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe)= Pipeline Depth

SpeedUpB = Pipeline Depth/(1 + 0.4 x 1) x (clockunpipe/(clockunpipe / 1.05)= (Pipeline Depth/1.4) x 1.05= 0.75 x Pipeline Depth

SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33

• Machine A is 1.33 times faster Slide based on a lecture by David Culler, University of California, Berkleyhttp://www.eecs.berkeley.edu/~culler/courses/cs252-s05

Page 12: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Exceptions• Instruction execution order is interrupted • E.g.

– I/O device request– Invoking an OS service from an application– Tracing execution– Breakpoint– Integer of FP arithmetic anomaly (e.g. overflow)– Page fault– Misaligned memory access– Memory protection violation– Hardware malfunction

Page 13: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Classification of Exceptions• Problems with pipelining:

– Different stages of the pipeline can raise exceptions leading to a different order of exceptions compared to the unpipelined case

• Classes of exceptions1. Synchronous vs. Asynchronous: 2. User requested vs. Coerced3. User maskable vs. user non-maskable4. Within vs. between instructions5. Resume vs. terminate

Page 14: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Exceptions• Most problematic: exceptions raised within instructions,

where the instruction must be resumed– Another program must be invoked to save the state of the

program• Pipelines capable of handling exceptions are called

restartable

NonWB

Page fault on data fetch; misaligned memory access; memory protection violation

MEM

Arithmetic exceptionEX

Undefined or illegal opcodeID

Page fault on Instruction fetch; misaligned memory access; memory protection violation

IF

Possible exceptionsPipeline stage

Page 15: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Exceptions• Since an exception can not be raised when it occurs

– Status vector associated with instruction shows exception– Status vector carried along with instruction– Writing of data values disabled if status vector is set– In WB status vector checked and exception handled

=> Exception of instruction i handled before exception of instruction i+1

=> Since no data values are written back, register file not changed -> instruction can be repeated

Page 16: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Multi-cycle instructions• Floating point instructions can take many cycles to complete• Often implemented by multiple executions of the EX stage

– Not all instructions will take the same amount of cycles to finish!• Latency:

– number of intervening cycles between an instruction that produces a result and instruction that uses the result

– Usually: depth of the EX stage -1• Initiation interval:

– Number of cycles that must elapse between issuing two operations of a given type

• Multi-cycle instructions/pipelines increase the probability for occurring WAW and RAW hazards

Page 17: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Example for a multi-cycle pipeline

IF ID

EXInteger unit

M1 M2 M3 M4 M5 M6 M7

FP/Integer multiply unit

A1 A2 A3 A4FP/Integer add unit

DIVFP/Integer division (non pipelined)

MEM WB

24

6

3

1

0

Latency

25FP divide

1FP multiply

1FP add

1Data memory

1Integer ALU

Initiation intervalFunctional unit

Page 18: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Instruction level parallelism• Exploit parallelism between independent instructions

– Limited by data dependencies– Limited by branches

• Example:

– Each iteration of the loop is independent– Exploitation of that fact is not trivial because of register

reuse!

for (i=0; i<n; i++ ) {c[i] = a[i] + b[i];

}

Page 19: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Instruction level parallelism• Data dependencies:

– True dependencies: instruction i produces a result required by instruction i+k, k>0 (RAW)

• sharing a register or a memory location– Name dependencies: usage of the same register or memory

location without data flow• Antidependence: instruction i+k writes a register/memory

location read by instruction i (WAR)– No problem if not reordering instructions

• Output dependence: instruction i and instruction i+k write the same register/memory location (WAW)

– No problem if not reordering instructions– Control dependencies: determines ordering of an instruction i

with respect to a branch

Page 20: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Dynamic scheduling• Up-to-now

– Instructions are issued in program order– If an instruction is stalled in the pipeline, no later

instruction can proceedDIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F12, F8, F14

• In order to allow out-of-order execution, the ID stage is split into two parts:– Instruction issue: decode instruction and check for

structural hazards– Read operands: Read operands if no data hazard

Page 21: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Dynamic scheduling• Out-of-order execution introduces the possibility of WAR and WAW

hazardsDIV.D F0, F2, F4 DIV.D F0, F2, F4ADD.D F10, F0, F8 SUB.D F8, F8, F14SUB.D F8, F8, F14 ADD.D F10, F0, F8

• Out-of-order execution only improves performance if– Multiple instructions can be executed at once– Multiple functional units are available

• All instructions pass through the issue stage in order• Instructions can be bypassed in the read-operand stage• Algorithms allowing instructions to execute out-of-order

– Scoreboarding– Tomasulo’s approach

Page 22: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Scoreboarding• First implemented in the CDC6600• Assumption for the following slides:

– 2 multipliers– 1 adder– 1 divider– 1 integer unit

• Each instruction goes through the scoreboard– Scoreboard determines when an instruction can execute– Scoreboard monitors usage of execution units– Scoreboard monitors when a result can be written to the

destination register

Page 23: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Scoreboarding (II)4 steps of Scoreboarding (replaces ID, EX and WB)1. Issue: if functional unit is free and no other active

instruction has the same destination register2. Read operands: Scoreboard monitors the availability of

operands. An operand is available if no earlier, active instruction is going to write it.

3. Execution4. Write result: if Execution done, Scoreboard checks for

WAR hazards and stalls the instruction of necessary.

Page 24: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Scoreboarding (II)Scoreboard data structures:• Instruction status: which of the four steps the instruction is in• Functional unit status: status of a functional unit.

– Busy: indicates whether unit is busy or not– Op: operation to be performed– Fi: Destination register number– Fj, Fk: Source register number– Qj, Qk: Functional units producing source registers Fj, Fk– Rj, Rk: Flags indicating whether Fj, Fk are ready. Set to NO

after operands are read.• Register result status: which functional unit will write which register

Page 25: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Scoreboarding example

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Following slides are based on a lecture by Jelena Mirkovic, University of Delawarehttp://www.cis.udel.edu/~sunshine/courses/F04/CIS662/class10.pdf

Assumption:ADD and SUB take 2 clock cyclesMULT takes 1 clock cycleDIV takes 40 clock cycles

Page 26: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

Divide

Add

Mult2

Mult1

YesR2F6LoadYesInteger

RkRjQkQjFkFjFiOpBusyName

Functional unit status

IntegerFU

F30…F12F10F8F6F4F2F0

Register result status

Time=1 Issue first load

Page 27: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

Divide

Add

Mult2

Mult1

NoR2F6LoadYesInteger

RkRjQkQjFkFjFiOpBusyName

Functional unit status

IntegerFU

F30…F12F10F8F6F4F2F0

Register result status

Time=2 first load read operands; second load can not issue (structural hazard)

Page 28: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

Divide

Add

Mult2

Mult1

NoR2F6LoadYesInteger

RkRjQkQjFkFjFiOpBusyName

Functional unit status

IntegerFU

F30…F12F10F8F6F4F2F0

Register result status

Time=3 first load completes exec; second load can not issue (SH)

Page 29: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

Divide

Add

Mult2

Mult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

FU

F30…F12F10F8F6F4F2F0

Register result status

Time=4 first load writes result; second load can not issue (SH)

Page 30: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

Divide

Add

Mult2

Mult1

YesR3F2LoadYesInteger

RkRjQkQjFkFjFiOpBusyName

Functional unit status

IntegerFU

F30…F12F10F8F6F4F2F0

Register result status

Time=5 Second load is issued

Page 31: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

Divide

Add

Mult2

YesNoIntegerF4F2F0MultYesMult1

NoR3F2LoadYesInteger

RkRjQkQjFkFjFiOpBusyName

Functional unit status

IntegerMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=6 Second load reads operands; Mult is issued

Page 32: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

Divide

NoYesIntegerF2F6F8SubYesAdd

Mult2

YesNoIntegerF4F2F0MultYesMult1

NoR3F2LoadYesInteger

RkRjQkQjFkFjFiOpBusyName

Functional unit status

AddIntegerMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=7 Second load completes exec; Mult is stalled waiting for F2; Sub is issued

Page 33: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

YesNoMult1F6F0F10DivYesDivide

YesYesF2F6F8SubYesAdd

Mult2

YesYesF4F2F0MultYesMult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivAddMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=8 Second load writes result; Mult and Sub stalled (F2); Div is issued

Page 34: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

YesNoMult1F6F0F10DivYesDivide

NoNoF2F6F8SubYesAdd

Mult2

NoNoF4F2F0MultYesMult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivAddMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=9 Mult and Sub read operands; Div stalled waiting for (F0); Add not issued (SH)

Page 35: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

YesNoMult1F6F0F10DivYesDivide

NoNoF2F6F8SubYesAdd

Mult2

NoNoF4F2F0MultYesMult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivAddMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=10 Mult executing (1 out of 10 cycles); Sub executing (1 out of 2 cycles); Div stalled (F0);

Page 36: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

YesNoMult1F6F0F10DivYesDivide

NoNoF2F6F8SubYesAdd

Mult2

NoNoF4F2F0MultYesMult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivAddMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=11 Mult executing (2/10); Sub completes execution; Div stalled (F0);

Page 37: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

YesNoMult1F6F0F10DivYesDivide

Add

Mult2

NoNoF4F2F0MultYesMult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=12 Mult executing (3/10); Sub writes result; Div stalled (F0);

Page 38: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

YesNoMult1F6F0F10DivYesDivide

YesYesF2F8F6AddYesAdd

Mult2

NoNoF4F2F0MultYesMult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivAddMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=13 Mult executing (4/10); Div stalled (F0); Add issued

Page 39: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

YesNoMult1F6F0F10DivYesDivide

NoNoF2F8F6AddYesAdd

Mult2

NoNoF4F2F0MultYesMult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivAddMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=14 Mult executing (5/10); Div stalled (F0); Add reads operands

Page 40: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

YesNoMult1F6F0F10DivYesDivide

NoNoF2F8F6AddYesAdd

Mult2

NoNoF4F2F0MultYesMult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivAddMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=15 Mult executing (6/10); Div stalled (F0); Add executes (1 of 2 cycles)

Page 41: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

YesNoMult1F6F0F10DivYesDivide

NoNoF2F8F6AddYesAdd

Mult2

NoNoF4F2F0MultYesMult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivAddMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=16 Mult executing (7/10 cycles); Div stalled (F0); Add completes exec

Page 42: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

YesNoMult1F6F0F10DivYesDivide

NoNoF2F8F6AddYesAdd

Mult2

NoNoF4F2F0MultYesMult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivAddMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=17 Mult executing (8/10); Div stalled (F0); Add stalled (WAR hazard on F6)

Page 43: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

YesNoMult1F6F0F10DivYesDivide

NoNoF2F8F6AddYesAdd

Mult2

NoNoF4F2F0MultYesMult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivAddMult1FU

F30…F12F10F8F6F4F2F0

Register result status

Time=19 Mult completes exec; Div stalled (F0); Add stalled (WAR hazard on F6)

Page 44: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

YesYesF6F0F10DivYesDivide

NoNoF2F8F6AddYesAdd

Mult2

Mult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivAddFU

F30…F12F10F8F6F4F2F0

Register result status

Time=20 Mult writes result; Div stalled (F0); Add stalled (WAR hazard on F6)

Page 45: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

NoNoF6F0F10DivYesDivide

NoNoF2F8F6AddYesAdd

Mult2

Mult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivAddFU

F30…F12F10F8F6F4F2F0

Register result status

Time=21 Div reads operands; Add stalled (WAR hazard on F6)

Page 46: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

NoNoF6F0F10DivYesDivide

Add

Mult2

Mult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivFU

F30…F12F10F8F6F4F2F0

Register result status

Time=22 Div executes (1/40); Add writes result

Page 47: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

NoNoF6F0F10DivYesDivide

Add

Mult2

Mult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

DivFU

F30…F12F10F8F6F4F2F0

Register result status

Time=61 Div completes execution

Page 48: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

ADD.D F6, F8, F2

DIV.D F10, F0, F6

SUB.D F8, F6, F2

MUL.D F0, F2, F4

L.D F2, 45(R3)

L.D F6, 34(R2)

Write resultExecution completeRead operandsIssueInstruction

Instruction status

Divide

Add

Mult2

Mult1

Integer

RkRjQkQjFkFjFiOpBusyName

Functional unit status

FU

F30…F12F10F8F6F4F2F0

Register result status

Time=62 Div writes result

Page 49: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_f06/CA_05_pipelining2.pdf• Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single

COSC 6385 – Computer ArchitectureEdgar Gabriel

Scoreboarding (IV) • Performance of scoreboarding depends on

– The amount of parallelism available among instructions– Number of scoreboard entries– Number and type of functional units– Presence of antidependeces and output dependences