131
CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 1 CS104 Computer Organization and Design Datapaths

CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 1

CS104 Computer Organization and Design

Datapaths

Page 2: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Admin

•  Homework •  Homework 4 out tonight •  Due Monday March 26th •  Download/check your submissions

•  Reading: •  Chapter 4 •  (Maybe review 1.4)

•  Midterm 2 •  March 28

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 2

Page 3: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

What did we do last week?

•  Who can remind us what we did last week?

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 3

Page 4: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

What did we do last week?

•  Who can remind us what we did last week? •  Ski •  Go to the beach •  Sleep in •  Read a book •  …

•  Ok, but seriously?

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 4

Page 5: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

When last I saw you all..

•  Last time I was here (Feb 27/29) •  Learned basics of logic design

• Gates (And, Or, Nor, …) •  Put gates together to make

• Muxes • Adders •  Latches •  Flip-flops • …

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 5

Page 6: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

While I was at HPCA..

•  Prof. Lebeck started teaching you all about datapaths •  Putting logic together to execute instructions •  Started on single-cycle datapath

•  We’ll review/continue with single cycle •  Then jump into more things!

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 6

Page 7: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 7

Datapath for MIPS ISA

•  Consider only the following instructions add $1,$2,$3 addi $1,2,$3 lw $1,4($3) sw $1,4($3) beq $1,$2,PC_relative_target j absolute_target

•  Why only these? •  Most other instructions are the same from datapath viewpoint •  The one’s that aren’t are left for you to figure out

Page 8: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 8

Start With Fetch

•  PC and instruction memory •  A +4 incrementer computes default next instruction PC

P C

Insn Mem

+ 4

Page 9: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 9

First Instruction: add

•  Add register file and ALU

P C

Insn Mem

Register File

s1 s2 d

+ 4

Page 10: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 10

Second Instruction: addi

•  Destination register can now be either Rd or Rt •  Add sign extension unit and mux into second ALU input

P C

Insn Mem

Register File

S X

s1 s2 d

+ 4

Page 11: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 11

Third Instruction: lw

•  Add data memory, address is ALU output •  Add register write data mux to select memory output or ALU output

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

Page 12: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 12

Fourth Instruction: sw

•  Add path from second input register to data memory data input

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

Page 13: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 13

Fifth Instruction: beq

•  Add left shift unit and adder to compute PC-relative branch target •  Add PC input mux to select PC+4 or branch target

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

z

Page 14: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 14

Sixth Instruction: j

•  Add shifter to compute left shift of 26-bit immediate •  Add additional PC input mux for jump target

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

Page 15: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 15

“Continuous Read” Datapath Timing

•  Works because writes (PC, RegFile, DMem) are independent •  And because no read logically follows any write

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

Read IMem Read Registers Read DMEM Write DMEM Write Registers

Write PC

Page 16: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 16

What Is Control?

•  9 signals control flow of data through this datapath •  MUX selectors, or register/memory write enable signals •  A real datapath has 300-500 control signals

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

Rwe

ALUinB

DMwe

JP

ALUop

BR

Rwd

Rdst

Page 17: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 17

Example: Control for add

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

BR=0

JP=0

Rwd=0

DMwe=0 ALUop=0

ALUinB=0 Rdst=1

Rwe=1

Page 18: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 18

Example: Control for sw

•  Difference between sw and add is 5 signals •  3 if you don’t count the X (don’t care) signals

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

Rwe=0

ALUinB=1

DMwe=1

JP=0

ALUop=0

BR=0

Rwd=X

Rdst=X

Page 19: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 19

Example: Control for beq

•  Difference between sw and beq is only 4 signals

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

Rwe=0

ALUinB=0

DMwe=0

JP=0

ALUop=1

BR=1

Rwd=X

Rdst=X

Page 20: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 20

You all figure LW

•  How would these control signals be set for LW?

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

Rwe

ALUinB

DMwe

JP

ALUop

BR

Rwd

Rdst

Page 21: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 21

Example: Control for LW

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

BR=0

JP=0

Rwd=1

DMwe=0 ALUop=0

ALUinB=1 Rdst=1

Rwe=1

Page 22: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 22

How Is Control Implemented?

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

<< 2

<< 2

Rwe

ALUinB

DMwe

JP

ALUop

BR

Rwd

Rdst

Control?

Page 23: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 23

Implementing Control

•  Each insn has a unique set of control signals •  Most are function of opcode •  Some may be encoded in the instruction itself

• E.g., the ALUop signal is some portion of the MIPS Func field + Simplifies controller implementation • Requires careful ISA design

Page 24: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 24

Control Implementation: ROM

•  ROM (read only memory): think rows of bits •  Bits in data words are control signals •  Lines indexed by opcode •  Example: ROM control for 6-insn MIPS datapath •  X is “don’t care”

BR JP ALUinB ALUop DMwe Rwe Rdst Rwd

add 0 0 0 0 0 1 0 0

addi 0 0 1 0 0 1 1 0

lw 0 0 1 0 0 1 1 1

sw 0 0 1 0 1 0 X X

beq 1 0 0 1 0 0 X X

j 0 1 0 0 0 0 X X

opcode

Page 25: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 25

Control Implementation: Random Logic

•  Real machines have 100+ insns 300+ control signals •  30,000+ control bits (~4KB) –  Not huge, but hard to make faster than datapath (important!)

•  Alternative: random logic (random = ‘non-repeating’) •  Exploits the observation: many signals have few 1s or few 0s •  Example: random logic control for 6-insn MIPS datapath

ALUinB

opco

de

add addi lw sw beq j

BR JP DMwe Rwd Rdst ALUop Rwe

Page 26: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 26

Datapath and Control Timing

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

Control ROM/random logic

Read IMem Read Registers (Read Control ROM)

Read DMEM Write DMEM Write Registers

Write PC

Page 27: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 27

Single-Cycle Datapath Performance

•  Goes against make common case fast (MCCF) principle + Low Cycles Per Instruction (CPI): 1 –  Long clock period: to accommodate slowest insn

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

Control ROM/random logic

Page 28: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Interlude: Performance

•  Previous slide alludes to something new: Performance •  Don’t just want it to work… •  But want it to go fast!

•  Three components to performance: Number of instructions x Cycles per instruction (CPI) x Clock Period (1 / Clock frequency)

Instructions Cycles Seconds Seconds —————— x ————— x ————— = —————— Program Instruction Cycle Program

CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 28

Page 29: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Interlude: Performance

•  Three components to performance: Number of instructions <- Compiler’s Job x Cycles per instruction (CPI) x Clock Period (1 / Clock frequency)

Instructions Cycles Seconds Seconds —————— x ————— x ————— = —————— Program Instruction Cycle Program

•  Insns/Program: determined by compiler + ISA •  Generally assume fixed program when do micro-architecture

CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 29

Page 30: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Micro-architectural factors

•  Micro-architecture: •  The details of how the ISA is implemented •  Affects CPI and Clock frequency

•  Often will look at fixed program, and consider MIPS •  Million Instructions Per Second •  MIPS = IPC * Frequency (in MHz) •  IPC = Instruction Per Cycle (1 / CPI) •  Gives “Bigger is better” number

Instructions Cycles Instructions ————— x ————— = —————— Cycle Second Second (IPC) (Frequency) (Throughput)

CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 30

Page 31: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

“Best” IPC

•  For now, best we can do: IPC = 1 (CPI = 1) •  Do 1 instruction every cycle

•  Later: •  Real processors can do multiple instructions at once! •  Potentially: IPC < 1! •  Best possible IPC depends on design

CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 31

Page 32: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Performance vs ….

•  1990s: Performance at all cost •  Actually more “clock frequency” at all cost…

•  Now: Care about other things •  Energy (electric bill, battery life) •  Power (cooling, also affects energy) •  Area (chip cost) •  Reliability (tolerance of transient faults: e.g., charge particle strikes) •  …

•  Important metric these days “Performance / Watt” •  Throughput divided by power consumption •  Why?

CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 32

Page 33: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Performance Modeling and Analysis

•  Speaking of performance •  Making a processor takes time (years) and money (millions) •  Want to know it will perform well before you finish

•  If its wrong, doing it all over is painful… •  Performance can be simulated in software

• Estimate what IPC will be • Guide design

•  This is my other job by the way…

CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 33

Page 34: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 34

Single-Cycle Datapath Performance

•  Goes against make common case fast (MCCF) principle + Low Cycles Per Instruction (CPI): 1 –  Long clock period: to accommodate slowest insn

P C

Insn Mem

Register File

S X

s1 s2 d

Data Mem

a

d

+ 4

Control ROM/random logic

Page 35: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 35

Alternative: Multi-Cycle Datapath

•  Multi-cycle datapath: attacks high clock period •  Cut datapath into multiple stages (5 here), isolate using FFs •  FSM control “walks” insns thru stages (by staging control signals) +  Insns can bypass stages and exit early

P C

Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

I R D O

B

A

s3

s3

s3 s4

s5

s5 s5

Page 36: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Finite State Machine (FSM)

•  FSM = States + Transitions •  Next state: function of current state + inputs •  Outputs: function of current state + inputs

•  Canonical Example: Combination Lock •  Must enter 3 8 4 to unlock

•  P.S. Useful in software too

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 36

Page 37: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Finite State Machines: Example

•  Combination Lock Example: •  Need to enter 3 8 4 to unlock

•  Initial State: no valid piece of combo seen

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 37

Start

Page 38: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Finite State Machines: Example

•  Combination Lock Example: •  Need to enter 3 8 4 to unlock

•  Input of 3: transition to new state •  Any other input: stay in same state CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 38

Start 1 3

0-2,4-9

Page 39: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Finite State Machines: Example

•  Combination Lock Example: •  Need to enter 3 8 4 to unlock

•  State 1: •  Input = 8? Goto state 2

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 39

Start 1 3

0-2,4-9

2

8

3

0-2,4-7,9

Page 40: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Finite State Machines: Example

•  Combination Lock Example: •  Need to enter 3 8 4 to unlock

•  State 2: •  Input = 4? Goto state 3

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 40

Start 1 3

0-2,4-9

2

8

0-2,5-9

3

3

0-2,4-7,9

3 4

Page 41: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Finite State Machines: Example

•  Combination Lock Example: •  Need to enter 3 8 4 to unlock

•  State 3: Unlock!

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 41

Start 1 3

0-2,4-9

2

8

0-2,5-9

3

3

0-2,4-7,9

3 4

Page 42: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

FSM in Hardware

•  Flip flop (s) to hold state (s) •  Combinatorial logic to determine next state/output •  (Assumes FF enable on input_valid)

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 42

Page 43: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

FSM Hardware Example

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 43

Page 44: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

FSM Hardware Example

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 44

Page 45: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

FSM Hardware Example

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 45

Page 46: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

FSM Hardware Example

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 46

Page 47: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

FSM Hardware Example

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 47

Page 48: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

FSM Hardware Example

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 48

Page 49: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

FSM Hardware Example

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 49

Page 50: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

FSM Implementation: ROM

•  Just saw: FSM implemented with sum-of-products •  Remind us what that is?

•  Can also be implemented with a ROM CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 50

2(N+K) Entry ROM

Inputs

K

Register

N M

Outputs

N

N + K

K-bit input N-bit state M-bit output

Page 51: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

FSM ROM Implementation Example

•  Combination Lock (3 8 4) Example •  4-bit input •  2-bit state •  64-entry ROM (indexed with S1 S0 I3 I2 I1 I0)

• Each entry needs 3 bits (S1 S0 U) •  2 for next state •  1 for unlock signal

•  Example entries in ROM •  0x00 = 000 •  0x03 = 010 •  0x18 = 100 •  0x13 = 010 •  0x3_ = 001

CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 51

Page 52: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Datapath FSM

•  First state: Get a New Instruction •  Output signals to fetch (e.g., read enable IMEM) •  Next State: Always Decode

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 52

Next Insn

Decode Insn

Page 53: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Datapath FSM

•  Second State: Decode •  Output signals to decode instruction (RdEn RegFile) •  Go to Next Insn if NOP •  Otherwise Execute

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 53

Next Insn

Decode Insn

Execute Insn NOP

Page 54: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Datapath FSM

•  Execute State • Execute Insn (varies by insn type) • Next State: Also depends on insn type

• Branches: Next Insn

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 54

Next Insn

Decode Insn

Execute Insn NOP Branch

Page 55: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Datapath FSM

•  Execute State • Execute Insn (varies by insn type) • Next State: Also depends on insn type

• ALU op: write register

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 55

Next Insn

Decode Insn

Execute Insn NOP Branch

Writeback

ALU

Page 56: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Datapath FSM

•  Execute State • Execute Insn (varies by insn type) • Next State: Also depends on insn type

•  Load: Read Memory

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 56

Next Insn

Decode Insn

Execute Insn NOP Branch

Writeback

ALU

Read DMEM

Load

Page 57: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Datapath FSM

•  Execute State • Execute Insn (varies by insn type) • Next State: Also depends on insn type

•  Store: Write Memory

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 57

Next Insn

Decode Insn

Execute Insn NOP Branch

Writeback

ALU

Read DMEM

Load Write DMEM

Store

Page 58: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Datapath FSM

•  Read DMEM State •  Control signals enable DMEM Read •  Next state is writeback

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 58

Next Insn

Decode Insn

Execute Insn NOP Branch

Writeback

ALU

Read DMEM

Load Write DMEM

Store

Page 59: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Datapath FSM

•  Writeback state •  Control signals enable regfile write •  Next state: Next Insn

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 59

Next Insn

Decode Insn

Execute Insn NOP Branch

Writeback

ALU

Read DMEM

Load Write DMEM

Store

Page 60: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Datapath FSM

•  Write DMEM state •  Control signals enable memory write •  Next state: Next Insn

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 60

Next Insn

Decode Insn

Execute Insn NOP Branch

Writeback

ALU

Read DMEM

Load Write DMEM

Store

Page 61: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 61

Multi-Cycle Datapath Example: Add

P C

Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

I R D O

B

A

•  Example: Add •  Cycle 1: Read IMEM

Page 62: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 62

Multi-Cycle Datapath Example: Add

P C

Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

I R D O

B

A

•  Example: Add •  Cycle 1: Read IMEM •  Cycle 2: Decode + Read RF

Page 63: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 63

Multi-Cycle Datapath Example: Add

P C

Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

I R D O

B

A

•  Example: Add •  Cycle 1: Read IMEM •  Cycle 2: Decode + Read RF •  Cycle 3: ALU

Page 64: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 64

Multi-Cycle Datapath Example: Add

•  Example: Add •  Cycle 1: Read IMEM •  Cycle 2: Decode + Read RF •  Cycle 3: ALU •  Cycle 4: Writeback + Increment PC

P C

Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

I R D O

B

A

Page 65: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 65

Multi-Cycle Datapath Performance

•  Opposite performance split of single-cycle datapath + Short clock period –  High CPI

P C

Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

I R D O

B

A

Page 66: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Data-path CPI

•  CPI depends on instructions •  Branches / Jumps: 3 cycles •  ALU: 4 cycles •  Stores: 4 cycles •  Loads: 5 cycles

•  Overall CPI is weighted average

•  Example: •  20% loads, 15% stores, 20% branches, 45% ALU

CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 66

Page 67: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Data-path CPI

•  CPI depends on instructions •  Branches / Jumps: 3 cycles •  ALU: 4 cycles •  Stores: 4 cycles •  Loads: 5 cycles

•  Overall CPI is weighted average

•  Example: •  20% loads, 15% stores, 20% branches, 45% ALU

CPI= 0.20 * 5 +

CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 67

Page 68: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Data-path CPI

•  CPI depends on instructions •  Branches / Jumps: 3 cycles •  ALU: 4 cycles •  Stores: 4 cycles •  Loads: 5 cycles

•  Overall CPI is weighted average

•  Example: •  20% loads, 15% stores, 20% branches, 45% ALU

CPI= 0.20 * 5 + 0.15 * 4 +

CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 68

Page 69: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Multi-cycle Data-path CPI

•  CPI depends on instructions •  Branches / Jumps: 3 cycles •  ALU: 4 cycles •  Stores: 4 cycles •  Loads: 5 cycles

•  Overall CPI is weighted average

•  Example: •  20% loads, 15% stores, 20% branches, 45% ALU

CPI= 0.20 * 5 + 0.15 * 4 + 0.20 * 3 + 0.45 * 4 = 4.0

CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 69

Page 70: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 70

Multi-cycle Datapath Performance

•  Single-cycle •  Clock period = 50ns, CPI = 1 •  Performace = 50 ns/insn

•  Multi-cycle •  Clock period = 10ns •  CPI = (0.2*3+0.2*5+0.6*4) = 4 •  Performance = 40 ns/insn

•  But wait…

Page 71: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 71

Multi-Cycle Datapath Performance

•  Did not just cut up existing logic into 5 pieces •  Also added logic (flip flops)

•  So clock period not 1/5 of single cycle, but slightly longer

P C

Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

I R D O

B

A

Page 72: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 72

Multi-cycle Datapath Performance

•  Single-cycle •  Clock period = 50ns, CPI = 1 •  Performace = 50 ns/insn

•  Multi-cycle •  Clock period = 12ns •  CPI = (0.2*3+0.2*5+0.6*4) = 4 •  Performance = 48 ns/insn

•  Better, but not as exciting… •  Can we do better still? •  Have our cake (low CPI) and eat it too (high clock frequency)?

Page 73: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 73

Clock Period and CPI

•  Single-cycle datapath + Low CPI: 1 –  Long clock period: to accommodate slowest insn

•  Multi-cycle datapath + Short clock period –  High CPI

•  Can we have both low CPI and short clock period? –  No good way to make a single insn go faster +  Insn latency doesn’t matter anyway … insn throughput matters •  Key: exploit inter-insn parallelism

insn0.fetch, dec, exec insn1.fetch, dec, exec

insn0.dec insn0.fetch insn1.dec insn1.fetch

insn0.exec insn1.exec

Page 74: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 74

Pipelining

•  Pipelining: important performance technique •  Improves insn throughput rather than insn latency •  Exploits parallelism at insn-stage level to do so •  Begin with multi-cycle design

•  When insn advances from stage 1 to 2, next insn enters stage 1

•  Individual insns take same number of stages + But insns enter and leave at a much faster rate •  Physically breaks “atomic” VN loop ... but must maintain illusion

•  Automotive assembly line analogy

insn0.dec insn0.fetch insn1.dec insn1.fetch

insn0.exec insn1.exec

insn0.dec insn0.fetch insn1.dec insn1.fetch insn0.exec

insn1.exec

Page 75: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 75

5 Stage Multi-Cycle Datapath

P C

Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

I R D O

B

A

Page 76: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 76

5 Stage Pipelined Datapath

•  Temporary values (PC,IR,A,B,O,D) re-latched every stage •  Why? 5 insns may be in pipeline at once, they share a single PC? •  Notice, PC not latched after ALU stage (why not?)

PC Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

PC

IR

PC

A

B

IR

O

B

IR

O

D

IR

Page 77: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 77

Pipeline Terminology

•  Stages: Fetch, Decode, eXecute, Memory, Writeback •  Latches (pipeline registers): PC, F/D, D/X, X/M, M/W

PC Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

PC

IR

PC

A

B

IR

O

B

IR

O

D

IR

PC

F/D D/X X/M M/W

Page 78: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 78

Some More Terminology

•  Scalar pipeline: one insn per stage per cycle •  Alternative: “superscalar” (next unit)

•  In-order pipeline: insns enter execute stage in VN order •  Alternative: “out-of-order” (not covered in CSE 371)

•  Pipeline depth: number of pipeline stages •  Nothing magical about five •  Trend has been to deeper pipelines

Page 79: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 79

Pipeline Example: Cycle 1

•  3 instructions

PC Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

PC

IR

PC

A

B

IR

O

B

IR

O

D

IR

PC

F/D D/X X/M M/W

add $3,$2,$1

Page 80: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 80

Pipeline Example: Cycle 2

PC Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

PC

IR

PC

A

B

IR

O

B

IR

O

D

IR

PC

F/D D/X X/M M/W

lw $4,0($5) add $3,$2,$1

Page 81: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 81

Pipeline Example: Cycle 3

PC Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

PC

IR

PC

A

B

IR

O

B

IR

O

D

IR

PC

F/D D/X X/M M/W

sw $6,4($7) lw $4,0($5) add $3,$2,$1

Page 82: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 82

Pipeline Example: Cycle 4

•  3 instructions

PC Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

PC

IR

PC

A

B

IR

O

B

IR

O

D

IR

PC

F/D D/X X/M M/W

sw $6,4($7) lw $4,0($5) add $3,$2,$1

Page 83: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 83

Pipeline Example: Cycle 5

PC Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

PC

IR

PC

A

B

IR

O

B

IR

O

D

IR

PC

F/D D/X X/M M/W

sw $6,4($7) lw $4,0($5) add

Page 84: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 84

Pipeline Example: Cycle 6

PC Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

PC

IR

PC

A

B

IR

O

B

IR

O

D

IR

PC

F/D D/X X/M M/W

sw $6,4(7) lw

Page 85: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 85

Pipeline Example: Cycle 7

PC Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

PC

IR

PC

A

B

IR

O

B

IR

O

D

IR

PC

F/D D/X X/M M/W

sw

Page 86: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 86

Pipeline Diagram

•  Pipeline diagram: shorthand for what we just saw •  Across: cycles •  Down: insns •  Convention: X means lw $4,0($5) finishes execute stage and

writes into X/M latch at end of cycle 4

1 2 3 4 5 6 7 8 9

add $3,$2,$1 F D X M W

lw $4,0($5) F D X M W

sw $6,4($7) F D X M W

Page 87: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 87

What About Pipelined Control?

•  Should it be like single-cycle control? •  But individual insn signals must be staged

•  Should it be like multi-cycle control? •  But all stages are simultaneously active

•  How many different controllers are we going to need? •  One for each insn in pipeline?

•  Solution: use simple single-cycle control, but pipeline it •  Single controller

Page 88: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 88

Pipelined Control

PC Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

PC

IR

PC

A

B

IR

O

B

IR

O

D

IR

PC

F/D D/X X/M M/W

CTRL

xC

mC

wC

mC

wC

wC

Page 89: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 89

Pipeline Performance Calculation

•  Single-cycle •  Clock period = 50ns, CPI = 1 •  Performace = 50ns/insn

•  Multi-cycle •  Branch: 20% (3 cycles), load: 20% (5 cycles), other: 60% (4

cycles) •  Clock period = 12ns, CPI = (0.2*3+0.2*5+0.6*4) = 4

• Remember: latching overhead makes it 12, not 10 •  Performance = 48ns/insn

•  Pipelined •  Clock period = 12ns •  CPI = 1.5 (on average insn completes every 1.5 cycles) •  Performance = 18ns/insn

Page 90: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 90

Q1: Why Is Pipeline Clock Period …

•  … > delay thru datapath / number of pipeline stages?

•  Latches (FFs) add delay •  Pipeline stages have different delays, clock period is max delay

•  Both factors have implications for ideal number pipeline stages

Page 91: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 91

Q2: Why Is Pipeline CPI…

•  … > 1? •  CPI for scalar in-order pipeline is 1 + stall penalties •  Stalls used to resolve hazards

• Hazard: condition that jeopardizes VN illusion • Stall: artificial pipeline delay introduced to restore VN illusion

•  Calculating pipeline CPI •  Frequency of stall * stall cycles •  Penalties add (stalls generally don’t overlap in in-order pipelines) •  1 + stall-freq1*stall-cyc1 + stall-freq2*stall-cyc2 + …

•  Correctness/performance/MCCF •  Long penalties OK if they happen rarely, e.g., 1 + 0.01 * 10 = 1.1 •  Stalls also have implications for ideal number of pipeline stages

Page 92: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 92

Dependences and Hazards

•  Dependence: relationship between two insns •  Data: two insns use same storage location •  Control: one insn affects whether another executes at all •  Not a bad thing, programs would be boring without them •  Enforced by making older insn go before younger one

• Happens naturally in single-/multi-cycle designs • But not in a pipeline

•  Hazard: dependence & possibility of wrong insn order •  Effects of wrong insn order cannot be externally visible

• Stall: for order by keeping younger insn in same stage •  Hazards are a bad thing: stalls reduce performance

Page 93: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 93

Why Does Every Insn Take 5 Cycles?

•  Could /should we allow add to skip M and go to W? No –  It wouldn’t help: peak fetch still only 1 insn per cycle –  Structural hazards: imagine add follows lw

PC Insn Mem

Register File

S X

s1 s2 d Data Mem

a

d

+ 4

<< 2

PC

IR

PC

A

B

IR

O

B

IR

O

D

IR

PC

F/D D/X X/M M/W

add $3,$2,$1 lw $4,0($5)

Page 94: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 94

Structural Hazards

•  Structural hazards •  Two insns trying to use same circuit at same time

• E.g., structural hazard on regfile write port

•  To fix structural hazards: proper ISA/pipeline design •  Each insn uses every structure exactly once •  For at most one cycle •  Always at same stage relative to F

Page 95: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 95

Data Hazards

•  Let’s forget about branches and the control for a while •  The three insn sequence we saw earlier executed fine…

•  But it wasn’t a real program •  Real programs have data dependences

• They pass values via registers and memory

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

add $3,$2,$1 lw $4,0($5) sw $6,0($7)

Data Mem

a

d

O

D

IR

M/W

Page 96: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 96

Data Hazards

•  Would this “program” execute correctly on this pipeline? •  Which insns would execute with correct inputs? •  add is writing its result into $3 in current cycle –  lw read $3 2 cycles ago → got wrong value –  addi read $3 1 cycle ago → got wrong value •  sw is reading $3 this cycle → OK (regfile timing: write first half)

add $3,$2,$1 lw $4,0($3) sw $3,0($7) addi $6,1,$3

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

Data Mem

a

d

O

D

IR

M/W

Page 97: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 97

Memory Data Hazards

•  What about data hazards through memory? No •  lw following sw to same address in next cycle, gets right value •  Why? DMem read/write take place in same stage

•  Data hazards through registers? Yes (previous slide) •  Occur because register write is 3 stages after register read •  Can only read a register value 3 cycles after writing it

sw $5,0($1) lw $4,0($1)

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

Data Mem

a

d

O

D

IR

M/W

Page 98: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 98

Fixing Register Data Hazards

•  Can only read register value 3 cycles after writing it

•  One way to enforce this: make sure programs don’t do it •  Compiler puts two independent insns between write/read insn pair

•  If they aren’t there already •  Independent means: “do not interfere with register in question”

• Do not write it: otherwise meaning of program changes • Do not read it: otherwise create new data hazard

•  Code scheduling: compiler moves around existing insns to do this •  If none can be found, must use nops

•  This is called software interlocks • MIPS: Microprocessor w/out Interlocking Pipeline Stages

Page 99: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 99

Software Interlock Example add $3,$2,$1 lw $4,0($3) sw $7,0($3) add $6,$2,$8 addi $3,$5,4

•  Can any of last three insns be scheduled between first two •  sw $7,0($3)? No, creates hazard with add $3,$2,$1 •  add $6,$2,$8? OK •  addi $3,$5,4? No, lw would read $3 from it •  Still need one more insn, use nop

add $3,$2,$1 add $6,$2,$8 nop lw $4,0($3) sw $7,0($3) addi $3,$5,4

Page 100: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 100

Software Interlock Performance

•  Same deal •  Branch: 20%, load: 20%, store: 10%, other: 50%

•  Software interlocks •  20% of insns require insertion of 1 nop •  5% of insns require insertion of 2 nops

•  CPI is still 1 technically •  But now there are more insns •  #insns = 1 + 0.20*1 + 0.05*2 = 1.3 –  30% more insns (30% slowdown) due to data hazards

Page 101: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 101

Hardware Interlocks

•  Problem with software interlocks? Not compatible •  Where does 3 in “read register 3 cycles after writing” come from?

•  From structure (depth) of pipeline •  What if next MIPS version uses a 7 stage pipeline?

•  Programs compiled assuming 5 stage pipeline will break

•  A better (more compatible) way: hardware interlocks •  Processor detects data hazards and fixes them •  Two aspects to this

• Detecting hazards •  Fixing hazards

Page 102: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 102

Detecting Data Hazards

•  Compare F/D insn input register names with output register names of older insns in pipeline Hazard =

(F/D.IR.RS1 == D/X.IR.RD) || (F/D.IR.RS2 == D/X.IR.RD) || (F/D.IR.RS1 == X/M.IR.RD) || (F/D.IR.RS2 == X/M.IR.RD)

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

hazard

Data Mem

a

d

O

D

IR

M/W

Page 103: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 103

Fixing Data Hazards

•  Prevent F/D insn from reading (advancing) this cycle •  Write nop into D/X.IR (effectively, insert nop in hardware) •  Also reset (clear) the datapath control signals •  Disable F/D latch and PC write enables (why?)

•  Re-evaluate situation next cycle

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

hazard

nop

Data Mem

a

d

O

D

IR

M/W

Page 104: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 104

Aside: Insert NOP/Reset Register

•  Earlier: registers support separate clock, write enable •  Useful for writes into register file •  Also useful for implementing stalls

•  Registers should also support synchronous reset (clear) •  Useful for implementing stalls •  Implement as additional hardwired 0 input to FF data mux •  Resetting pipeline registers equivalent to inserting a NOP

•  If NOP is all zeros •  If zero means “don’t write” for all write-enable control signals • Design ISA/control signals to make sure this is the case

FF D Q

[RST:WE] FF

D Q

WE 0

2

Page 105: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 105

Hardware Interlock Example: cycle 1

(F/D.IR.RS1 == D/X.IR.RD) || (F/D.IR.RS2 == D/X.IR.RD) || (F/D.IR.RS1 == X/M.IR.RD) || (F/D.IR.RS2 == X/M.IR.RD)

= 1

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

add $3,$2,$1 lw $4,0($3)

hazard

nop

Data Mem

a

d

O

D

IR

M/W

Page 106: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 106

Hardware Interlock Example: cycle 2

(F/D.IR.RS1 == D/X.IR.RD) || (F/D.IR.RS2 == D/X.IR.RD) || (F/D.IR.RS1 == X/M.IR.RD) || (F/D.IR.RS2 == X/M.IR.RD)

= 1

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

add $3,$2,$1 lw $4,0($3)

hazard

nop

Data Mem

a

d

O

D

IR

M/W

Page 107: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 107

Hardware Interlock Example: cycle 3

(F/D.IR.RS1 == D/X.IR.RD) || (F/D.IR.RS2 == D/X.IR.RD) || (F/D.IR.RS1 == X/M.IR.RD) || (F/D.IR.RS2 == X/M.IR.RD)

= 0

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

add $3,$2,$1 lw $4,0($3)

hazard

nop

Data Mem

a

d

O

D

IR

M/W

Page 108: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 108

Pipeline Control Terminology

•  Hardware interlock maneuver is called stall or bubble

•  Mechanism is called stall logic •  Part of more general pipeline control mechanism

•  Controls advancement of insns through pipeline

•  Distinguish from pipelined datapath control •  Controls datapath at each stage •  Pipeline control controls advancement of datapath control

Page 109: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 109

Pipeline Diagram with Data Hazards

•  Data hazard stall indicated with d* •  Stall propagates to younger insns

•  This is not good (why?)

1 2 3 4 5 6 7 8 9

add $3,$2,$1 F D X M W

lw $4,0($3) F d* d* D X M W

sw $6,4($7) F D X M W

1 2 3 4 5 6 7 8 9

add $3,$2,$1 F D X M W

lw $4,0($3) F d* d* D X M W

sw $6,4($7) F D X M W

Page 110: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 110

Hardware Interlock Performance

•  Same deal •  Branch: 20%, load: 20%, store: 10%, other: 50%

•  Hardware interlocks: same as software interlocks •  20% of insns require 1 cycle stall (I.e., insertion of 1 nop) •  5% of insns require 2 cycle stall (I.e., insertion of 2 nops)

•  CPI = 1 * 0.20*1 + 0.05*2 = 1.3 •  So, either CPI stays at 1 and #insns increases 30% (software) •  Or, #insns stays at 1 (relative) and CPI increases 30% (hardware) •  Same difference

•  Anyway, we can do better

Page 111: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 111

Observe

•  Technically, this situation is broken •  lw $4,0($3) has already read $3 from regfile •  add $3,$2,$1 hasn’t yet written $3 to regfile

•  But fundamentally, everything is OK •  lw $4,0($3) hasn’t actually used $3 yet •  add $3,$2,$1 has already computed $3

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

add $3,$2,$1 lw $4,0($3)

Data Mem

a

d

O

D

IR

M/W

Page 112: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 112

Bypassing

•  Bypassing •  Reading a value from an intermediate (µarchitectural) source •  Not waiting until it is available from primary source •  Here, we are bypassing the register file •  Also called forwarding

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

add $3,$2,$1 lw $4,0($3)

Data Mem

a

d

O

D

IR

M/W

Page 113: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 113

WX Bypassing

•  What about this combination? •  Add another bypass path and MUX input •  First one was an MX bypass •  This one is a WX bypass

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

add $3,$2,$1 lw $4,0($3)

Data Mem

a

d

O

D

IR

M/W

Page 114: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 114

ALUinB Bypassing

•  Can also bypass to ALU input B

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

add $3,$2,$1 add $4,$2,$3

Data Mem

a

d

O

D

IR

M/W

Page 115: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 115

WM Bypassing?

•  Does WM bypassing make sense? •  Not to the address input (why not?) •  But to the store data input, yes

Register File

S X

s1 s2 d Data Mem

a

d

IR

A

B

IR

O

B

IR

O

D

IR

F/D D/X X/M M/W

lw $3,0($2) sw $3,0($4)

Page 116: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 116

Bypass Logic

•  Each MUX has its own, here it is for MUX ALUinA (D/X.IR.RS1 == X/M.IR.RD) => 0 (D/X.IR.RS1 == M/W.IR.RD) => 1 Else => 2

Register File

S X

s1 s2 d

IR

A

B

IR

O

B

IR

F/D D/X X/M

Data Mem

a

d

O

D

IR

M/W

bypass

Page 117: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 117

Bypass and Stall Logic

•  Two separate things •  Stall logic controls pipeline registers •  Bypass logic controls MUXs

•  But complementary •  For a given data hazard: if can’t bypass, must stall

•  Slide #43 shows full bypassing: all bypasses possible •  Is stall logic still necessary?

Page 118: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 118

Yes, Load Output to ALU Input

Stall = (D/X.IR.OP == LOAD) && ((F/D.IR.RS1 == D/X.IR.RD) || ((F/D.IR.RS2 == D/X.IR.RD) && (F/D.IR.OP != STORE))

Register File

S X

s1 s2 d Data Mem

a

d

IR

A

B

IR

O

B

IR

O

D

IR

F/D D/X X/M M/W

lw $3,0($2) stall

nop

add $4,$2,$3

lw $3,0($2) add $4,$2,$3

Page 119: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 119

Pipeline Diagram With Bypassing

•  Use compiler scheduling to reduce load-use stall frequency •  Like software interlocks, but for performance not correctness

1 2 3 4 5 6 7 8 9

add $3,$2,$1 F D X M W

lw $4,0($3) F D X M W

addi $6,$4,1 F d* D X M W

1 2 3 4 5 6 7 8 9

add $3,$2,$1 F D X M W

lw $4,0($3) F D X M W

sub $8,$3,$1 F D X M W

addi $6,$4,1 F D X M W

Page 120: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 120

Control Hazards

•  Control hazards •  Must fetch post branch insns before branch outcome is known •  Default: assume “not-taken” (at fetch, can’t tell it’s a branch)

PC Insn Mem

Register File

s1 s2 d

+ 4

<< 2

F/D D/X

X/M

PC

A

B

IR

O

B

IR

PC

IR

S X

Page 121: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 121

Branch Recovery

•  Branch recovery: what to do when branch is actually taken •  Insns that will be written into F/D and D/X are wrong •  Flush them, i.e., replace them with nops + They haven’t had written permanent state yet (regfile, DMem)

PC Insn Mem

Register File

s1 s2 d

+ 4

<< 2

F/D D/X

X/M

nop nop

PC

A

B

IR

O

B

IR

PC

IR

S X

Page 122: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 122

Branch Recovery Pipeline Diagram

•  Convention: don’t fill in flushed insns •  Taken branch penalty is 2 cycles

1 2 3 4 5 6 7 8 9

addi $3,$0,1 F D X M W

bnez $3,targ F D X M W

sw $6,4($7) F D

targ: addi $8,$7,1 F

targ: addi $8,$7,1 F D X M W

1 2 3 4 5 6 7 8 9

addi $3,$0,1 F D X M W

bnez $3,targ F D X M W

targ: addi $8,$7,1 F D X M W

Page 123: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 123

Branch Performance

•  Back of the envelope calculation •  Branch: 20%, load: 20%, store: 10%, other: 50% •  75% of branches are taken

•  CPI = 1 + 0.20*0.75*2 = 1.3 –  Branches cause 30% slowdown •  How do we reduce this penalty?

Page 124: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 124

Fast Branch

•  Fast branch: can decide at D, not X •  Test must be comparison to zero or equality, no time for ALU + New taken branch penalty is 1 –  Additional insns (slt) for more complex tests, must bypass to D too •  25% of branches have complex tests that require extra insn •  CPI = 1 + 0.20*0.75*1(branch) + 0.20*0.25*1(extra insn) = 1.2

PC Insn Mem

Register File

s1 s2 d

+ 4

<< 2

F/D

D/X X/M S X

<> 0

O

B

IR

A

B

IR

PC

IR

S X

Page 125: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 125

Speculative Execution

•  Speculation: “risky transactions on chance of profit”

•  Speculative execution •  Execute before all parameters known with certainty •  Correct speculation

+ Avoid stall, improve performance •  Incorrect speculation (mis-speculation)

– Must abort/flush/squash incorrect insns – Must undo incorrect changes (recover pre-speculation state)

•  The “game”: [%correct * gain] – [(1–%correct) * penalty]

•  Control speculation: speculation aimed at control hazards •  Unknown parameter: are these the correct insns to execute next?

Page 126: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 126

Control Speculation Mechanics

•  Guess branch target, start fetching at guessed position •  Doing nothing is implicitly guessing target is PC+4 •  Can actively guess other targets: dynamic branch prediction

•  Execute branch to verify (check) guess •  Correct speculation? keep going •  Mis-speculation? Flush mis-speculated insns

• Hopefully haven’t modified permanent state (Regfile, DMem) + Happens naturally in in-order 5-stage pipeline

•  “Game” for in-order 5 stage pipeline •  %correct = ? •  Gain = 2 cycles + Penalty = 0 cycles → mis-speculation no worse than stalling

Page 127: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 127

Dynamic Branch Prediction

•  Dynamic branch prediction: guess outcome •  Start fetching from guessed address •  Flush on mis-prediction (notice new recovery circuit)

PC Insn Mem

Register File

S X

s1 s2 d

+ 4

<< 2

TG PC

IR

TG PC

A

B

IR

O

B

IR

F/D D/X X/M

nop nop

BP

<>

Page 128: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

Branch Prediction: Short Summary

•  Key principle of micro-architecture: •  Programs do the same thing over and over (why?)

•  Exploit for performance: •  Learn what a program did before •  Guess that it will do the same thing again

•  Details of branch prediction: later (~1 month) •  For now, just know it can be done and is important to performance

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 128

Page 129: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 129

Branch Prediction Performance

•  Dynamic branch prediction •  Simple predictor: branches predicted with 75% accuracy •  CPI = 1 + 0.20*0.25*2 = 1.1 •  More advanced predictor: 95% accuracy •  CPI = 1 + 0.20*0.05*2 = 1.02

•  Branch mis-predictions still a big problem though •  Pipelines are long: typical mis-prediction penalty is 10+ cycles •  Pipelines have full bypassing: compiler schedules the rest •  Pipelines are superscalar (later)

Page 130: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 130

Pipelining And Exceptions

•  Pipelining makes exceptions nasty •  5 insns in pipeline at once •  Exception happens, how do you know which insn caused it?

• Exceptions propagate along pipeline in latches •  Two exceptions happen, how do you know which one to take first?

• One belonging to oldest insn •  When handling exception, have to flush younger insns

•  Piggy-back on branch mis-prediction machinery to do this •  What about multi-cycle operations?

•  Just FYI

Page 131: CS104 Computer Organization and Design · • Started on single-cycle datapath • We’ll review/continue with single cycle • Then jump into more things! CS104 (Hilton): Datapaths

CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 131

Pipeline Depth

•  No magic about 5 stages, trend had been to deeper pipelines •  486: 5 stages (50+ gate delays / clock) •  Pentium: 7 stages •  Pentium II/III: 12 stages •  Pentium 4: 22 stages (~10 gate delays / clock) “super-pipelining” •  Core1/2: 14 stages

•  Increasing pipeline depth +  Increases clock frequency (reduces period) –  But decreases IPC (increases CPI) •  Branch mis-prediction penalty becomes longer •  Non-bypassed data hazard stalls become longer •  At some point, CPI losses offset clock gains, question is when?

•  1GHz Pentium 4 was slower than 800 MHz PentiumIII •  What was the point? People by frequency, not frequency * IPC