23
1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018 Performance evaluation of pipelines (I) enh org Time Time Speedup enh enh enh org org org CPI e ClockClycl IC CPI ClockCycle IC For a fixed application lets assume that IC org = IC enh enh enh org org CPI e ClockClycl CPI ClockCycle Speedup If we assume additionally that the CPU has the same frequency, i.e. ClockCycle org = ClockCycle enh enh org CPI CPI Speedup General Speedup Formula:

COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

1

COSC 6385

Computer Architecture

- Pipelining (II)

Edgar Gabriel

Spring 2018

Performance evaluation of pipelines (I)

enh

org

Time

TimeSpeedup

enhenhenh

orgorgorg

CPIeClockClyclIC

CPIClockCycleIC

For a fixed application lets assume that ICorg = ICenh

enhenh

orgorg

CPIeClockClycl

CPIClockCycleSpeedup

If we assume additionally that the CPU has the same frequency,

i.e. ClockCycleorg = ClockCycleenh

enh

org

CPI

CPISpeedup

General Speedup Formula:

Page 2: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

2

Performance evaluation of pipelines (II)

enh

org

overallTime

TimeSpeedup

n

i

enhii

enh

n

i

orgii

org

CPIICeClockClycl

CPIICeClockClycl

1

1

with

If looking at individual classes of instructions

total

ii

IC

ICf

Assuming ICtotal is identical in both architectures

enh

org

overallTime

TimeSpeedup

n

i

enhii

enh

n

i

orgii

org

CPIfeClockClycl

CPIfeClockClycl

1

1

Comparing pipelined and non-pipelined

execution

• An ideal pipeline produces one result per clock cycle

→Ideal CPIpipelined = 1

• using the average instruction execution time

(AvIETime)

stagespipeline

pipelinednon

pipelinedno

TimeTime

_

_

stagespipeline

pipelined

pipelinednonno

Time

TimeSpeedup _

_

pipelined

pipelinednon

AvIETime

AvIETimeSpeedup

_

pipelined

pipelinednon

pipelined

pipelinednon

ClockCycle

ClockCycle

CPI

CPI __

Page 3: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

3

Comparing pipelined and non-

pipelined execution (II)

pipelined

pipelinednon

AvIETime

AvIETimeSpeedup

_Thus:

If ClockCycle is constant:

erInstrallCyclesPPipelineSt

CPISpeedup

pipelinednon

1

_

pipelined

pipelinednonpipelinednon

ClockCycle

ClockCycle

erInstrallCyclesPPipelineSt

CPI __

1

Realistic CPIpipelined = Ideal CPIpipelined +

Pipeline stall cycles per instruction

Example I

• (A) Given an non-pipelined processor:

– 1 ns clock cycle time

– 4 cycles for ALU operations

– 4 cycles for branches

– 5 cycles for memory operations

• (B) Given also a pipelined processor

– 1.2 ns clock cycle time

• Both (A) and (B) have

– 40% ALU operations

– 40% branches

– 20% memory operations

• What is the speedup of (B) over (A) due to pipelining?

Page 4: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

4

Example I

For machine (A):

n

i

ii

AA CPIfClockCycleAvIETime1

)(

nsns 4.4)52.044.044.0(1

For machine (B): assuming ideal CPI (= 1)

n

i

ii

BB CPIfClockCycleAvIETime1

)(

nsns 2.1)14.012.014.0(2.1

7.32.1

4.4

)(

)(

ns

ns

AvIETime

AvIETimeSpeedup

B

AThus

Exceptions

• Instruction execution order is interrupted

• E.g.

– I/O device request

– Invoking an OS service from an application

– Tracing execution

– Breakpoint

– Integer or FP arithmetic anomaly (e.g. overflow)

– Page fault

– Misaligned memory access

– Memory protection violation

– Hardware malfunction

Page 5: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

5

Classification of Exceptions

• Problems with pipelining:

– Different stages of the pipeline can raise exceptions

leading to a different order of exceptions compared to

the unpipelined case

• Classes of exceptions

1. Synchronous vs. Asynchronous:

2. User requested vs. Coerced

3. User maskable vs. user non-maskable

4. Within vs. between instructions

5. Resume vs. terminate

Exceptions

• Most problematic: exceptions raised within

instructions, where the instruction must be resumed

– Another program must be invoked to save the state of the

program

• Pipelines capable of handling exceptions are called

restartable

Pipeline stage Possible exceptions

IF Page fault on Instruction fetch; misaligned memory access; memory

protection violation

ID Undefined or illegal opcode

EX Arithmetic exception

MEM Page fault on data fetch; misaligned memory access; memory protection

violation

WB Non

Page 6: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

6

Exceptions

• Since an exception can not be raised when it occurs

– Status vector associated with instruction shows exception

– Status vector carried along with instruction

– Writing of data values disabled if status vector is set

– In WB status vector checked and exception handled

=> Exception of instruction i handled before exception of

instruction i+1

=> Since no data values are written back, register file not

changed -> instruction can be repeated

Multi-cycle instructions

• Not all instructions will take the same amount of cycles to finish!

– Floating point instructions can take many cycles to complete

• Latency:

– number of intervening cycles between an instruction that

produces a result and instruction that uses the result

– Usually: depth of the EX stage -1

• Initiation interval:

– Number of cycles that must elapse between issuing two

operations of a given type

• Multi-cycle instructions/pipelines increase the probability for

occurring WAW and RAW hazards

Page 7: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

7

Example for a multi-cycle pipeline

IF ID

EX

M1 M2 M3 M4 M5 M6 M7

FP/Integer multiply unit

A1 A2 A3 A4

FP/Integer add unit

DIV

FP/Integer division (non pipelined)

MEM WB

Functional unit Latency Initiation interval

Integer ALU 0 1

Data memory 1 1

FP add 3 1

FP multiply 6 1

FP divide 24 25

Instruction level parallelism

• Exploit parallelism between independent instructions

– Limited by data dependencies

– Limited by branches

• Example:

– Each iteration of the loop is independent

– Exploitation of that fact is not trivial because of register

reuse!

for (i=0; i<n; i++ ) {

c[i] = a[i] + b[i];

}

Page 8: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

8

Instruction level parallelism

• Data dependencies:

– True dependencies: instruction i produces a result required by instruction i+k, k>0 (RAW)

• sharing a register or a memory location

– Name dependencies: usage of the same register or memory location without data flow

• Antidependence: instruction i+k writes a register/memory location read by instruction i (WAR)

– No problem if not reordering instructions

• Output dependence: instruction i and instruction i+k write the same register/memory location (WAW)

– No problem if not reordering instructions

– Control dependencies: determines ordering of an instruction i with respect to a branch

Dynamic scheduling

• Up-to-now

– Instructions are issued in program order

– If an instruction is stalled in the pipeline, no later instruction can proceed

DIV.D F0, F2, F4

ADD.D F10, F0, F8

SUB.D F12, F8, F14

• In order to allow out-of-order execution, the ID stage is split into two parts:

– Instruction issue: decode instruction and check for structural hazards

– Read operands: Read operands if no data hazard

Page 9: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

9

Dynamic scheduling

• Out-of-order execution introduces the possibility of WAR and WAW hazards

DIV.D F0, F2, F4 DIV.D F0, F2, F4

ADD.D F10, F0, F8 SUB.D F8, F8, F14

SUB.D F8, F8, F14 ADD.D F10, F0, F8

• Out-of-order execution only improves performance if

– Multiple instructions can be executed at once

– Multiple functional units are available

• All instructions pass through the issue stage in order

• Instructions can be bypassed in the read-operand stage

• Algorithms allowing instructions to execute out-of-order

– Scoreboarding

– Tomasulo’s approach

Scoreboarding

• First implemented in the CDC6600

• Assumption for the following slides:

– 2 multipliers

– 1 adder

– 1 divider

– 1 integer unit

• Each instruction goes through the scoreboard

– Scoreboard determines when an instruction can execute

– Scoreboard monitors usage of execution units

– Scoreboard monitors when a result can be written to the

destination register

Page 10: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

10

Scoreboarding (II)

4 steps of Scoreboarding (replaces ID, EX and WB)

1. Issue: if functional unit is free and no other active

instruction has the same destination register

2. Read operands: Scoreboard monitors the availability of

operands.

3. Execution

4. Write result: if Execution done, Scoreboard checks for

WAR hazards and stalls the instruction if necessary.

Scoreboarding (II)

Scoreboard data structures:

• Instruction status: which of the four steps the instruction is in

• Functional unit status: status of a functional unit.

– Busy: indicates whether unit is busy or not

– Op: operation to be performed

– Fi: Destination register number

– Fj, Fk: Source register number

– Qj, Qk: Functional units producing source registers Fj, Fk

– Rj, Rk: Flags indicating whether Fj, Fk are ready. Set to NO

after operands are read.

• Register result status: which functional unit will write which

register

Page 11: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

11

Scoreboarding example

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Following slides are based on a lecture by Jelena Mirkovic,

University of Delaware

http://www.cis.udel.edu/~sunshine/courses/F04/CIS662/class10.pdf

Assumption:

ADD and SUB take 2 clock cycles

MULT takes 10 clock cycle

DIV takes 40 clock cycles

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F6 R2 Yes

Mult1

Mult2

Add

Divide

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Integer

Time=1 Issue first load

Page 12: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

12

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F6 R2 No

Mult1

Mult2

Add

Divide

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Integer

Time=2 first load read operands; second load can not issue (structural hazard)

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F6 R2 No

Mult1

Mult2

Add

Divide

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Integer

Time=3 first load completes exec; second load can not issue (SH)

Page 13: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

13

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1

Mult2

Add

Divide

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU

Time=4 first load writes result; second load can not issue (SH)

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F2 R3 Yes

Mult1

Mult2

Add

Divide

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Integer

Time=5 Second load is issued

Page 14: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

14

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F2 R3 No

Mult1 Yes Mult F0 F2 F4 Integer No Yes

Mult2

Add

Divide

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Integer

Time=6 Second load reads operands; Mult is issued

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F2 R3 No

Mult1 Yes Mult F0 F2 F4 Integer No Yes

Mult2

Add Yes Sub F8 F6 F2 Integer Yes No

Divide

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Integer Add

Time=7 Second load completes exec; Mult is stalled waiting for F2; Sub is issued

Page 15: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

15

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1 Yes Mult F0 F2 F4 Yes Yes

Mult2

Add Yes Sub F8 F6 F2 Yes Yes

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Add Div

Time=8 Second load writes result; Mult and Sub stalled (F2); Div is issued

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1 Yes Mult F0 F2 F4 No No

Mult2

Add Yes Sub F8 F6 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Add Div

Time=9 Mult and Sub read operands; Div stalled waiting for (F0); Add not issued (SH)

Page 16: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

16

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1 Yes Mult F0 F2 F4 No No

Mult2

Add Yes Sub F8 F6 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Add Div

Time=10 Mult executing (1 out of 10 cycles); Sub executing (1 out of 2 cycles); Div stalled (F0);

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1 Yes Mult F0 F2 F4 No No

Mult2

Add Yes Sub F8 F6 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Add Div

Time=11 Mult executing (2/10); Sub completes execution; Div stalled (F0);

Page 17: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

17

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1 Yes Mult F0 F2 F4 No No

Mult2

Add

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Div

Time=12 Mult executing (3/10); Sub writes result; Div stalled (F0);

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1 Yes Mult F0 F2 F4 No No

Mult2

Add Yes Add F6 F8 F2 Yes Yes

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Add Div

Time=13 Mult executing (4/10); Div stalled (F0); Add issued

Page 18: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

18

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1 Yes Mult F0 F2 F4 No No

Mult2

Add Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Add Div

Time=14 Mult executing (5/10); Div stalled (F0); Add reads operands

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1 Yes Mult F0 F2 F4 No No

Mult2

Add Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Add Div

Time=15 Mult executing (6/10); Div stalled (F0); Add executes (1 of 2 cycles)

Page 19: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

19

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1 Yes Mult F0 F2 F4 No No

Mult2

Add Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Add Div

Time=16 Mult executing (7/10 cycles); Div stalled (F0); Add completes exec

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1 Yes Mult F0 F2 F4 No No

Mult2

Add Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Add Div

Time=17 Mult executing (8/10); Div stalled (F0); Add stalled (WAR hazard on F6)

Page 20: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

20

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1 Yes Mult F0 F2 F4 No No

Mult2

Add Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Mult1 Add Div

Time=19 Mult completes exec; Div stalled (F0); Add stalled (WAR hazard on F6)

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1

Mult2

Add Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Yes Yes

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Add Div

Time=20 Mult writes result; Div stalled (F0); Add stalled (WAR hazard on F6)

Page 21: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

21

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1

Mult2

Add Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 No No

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Add Div

Time=21 Div reads operands; Add stalled (WAR hazard on F6)

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1

Mult2

Add

Divide Yes Div F10 F0 F6 No No

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Div

Time=22 Div executes (1/40); Add writes result

Page 22: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

22

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1

Mult2

Add

Divide Yes Div F10 F0 F6 No No

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU Div

Time=61 Div completes execution

Instruction status

Instruction Issue Read operands Execution complete Write result

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional unit status

Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer

Mult1

Mult2

Add

Divide

Register result status

F0 F2 F4 F6 F8 F10 F12 … F30

FU

Time=62 Div writes result

Page 23: COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

23

Scoreboarding (IV)

• Performance of scoreboarding depends on

– The amount of parallelism available among instructions

– Number of scoreboard entries

– Number and type of functional units

– Presence of antidependeces and output dependences