47
Lecture 7 Slide 1 EECS 470 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar EECS 470 Lecture 7 Speculation & Precise Interrupts Fall 2019 Prof. Ron Dreslinski http://www.eecs.umich.edu/courses/eecs470 Instruction/Decode Buffer Fetch Dispatch Buffer Decode Reservation Dispatch Reorder/ Store Buffer Complete Retire Stations Issue Execute Finish In Order Out of Order In Order Completion Buffer Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides. Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin.

Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

Lecture 7 Slide 1EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

EECS 470Lecture 7Speculation &Precise InterruptsFall 2019

Prof. Ron Dreslinski

http://www.eecs.umich.edu/courses/eecs470

Instruction/Decode Buffer

Fetch

Dispatch Buffer

Decode

Reservation

Dispatch

Reorder/

Store Buffer

Complete

Retire

StationsIssue

Execute

Finish

In O

rder

Out

of

Ord

erIn

Ord

er

Completion Buffer

Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides. Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin.

Page 2: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

Lecture 7 Slide 2EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

r Programming assignment #3 posted, start ASAP

r HW3 Posted, Due 10/11

r Midterm Exam: October 17th, 7:00-8:30pm, 1571 GGBL

r Final Project Information on Wednesday, start thinking about groups

m Project Description Posted on Website shortly

Announcements

Page 3: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

Lecture 7 Slide 3EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Readings

For Today:r Smith & Pleszkun “Implementing Precise Interrupts”

For Wednesday:r H & P Chapter 3.8-3.9r H & P Chapter 3.13

Have you read yet?r D. Sima “Design Space of Register Renaming Techniques”

Page 4: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

Lecture 7 Slide 4EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Dynamic Scheduling Summary• Dynamic scheduling: out-of-order execution

r Higher pipeline/FU utilization, improved performancer Easier and more effective in hardware than software

+ More storage locations than architectural registers+ Dynamic handling of cache misses

• Instruction buffer: multiple F/D latchesr Implements large scheduling scope + “passing” functionalityr Split decode into in-order dispatch and out-of-order issue

m Stall vs. wait

• Dynamic scheduling algorithmsr Scoreboard: no register renaming, limited out-of-orderr Tomasulo: copy-based register renaming, full out-of-order

Page 5: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 5

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Value/Copy-Based Register RenamingTomasulo-style register renaming

• Called “value-based” or “copy-based”

• Names: architectural registers

• Storage locations: register file and reservation stations

• Values can and do exist in both

• Register file holds master (i.e., most recent) values

+ RS copies eliminate WAR hazards

• Storage locations referred to internally by RS# tags

• Register table translates names to tags

• Tag == 0 value is in register file

• Tag != 0 value is not ready and is being computed by RS#

• CDB broadcasts values with tags attached

• So insns know what value they are looking at

Page 6: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 6

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Are we done?When can Tomasulo’s machine go wrong?

– Control transfers!

Branch instructions• Need a branch predictor to predict what to fetch next

• More on branch prediction later• Need speculative execution to “clean up” state when we guess wrong

Interrupts & Exception (today’s focus)• Need to handle uncommon execution cases

• Branch to a specialized software handler

Both face similar problem• Don’t know relative order of instructions in RS

Page 7: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

Lecture 7 Slide 7EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Interrupts

An unexpected transfer of control flowr Pick up where you left off once handled (restartable)

r Transparent to interrupted program

Kinds:r Asynchronous

m I/O device wants attentionm Can “defer” interrupt until convenient

r Synchronous (aka exceptions, traps)m Unusual condition for some instruction

Examples?m Explicit calls to operating system software

i1

i2

i3

H1

H2

Hn

….

interrupt

restart

Page 8: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

Lecture 7 Slide 8EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Precise Interrupts

Sequential Code Semantics Overlapped Execution

i1: xxxx

i2: xxxx

i3: xxxx

i2

i1

i3

i2: i1:

i3:

Precise interrupt appears to happen between two instructions

Page 9: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 9

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Speculation and Precise InterruptsWhy are we discussing these together?

• Sequential (vN) semantics for interrupts• All insns before interrupt should be complete• All insns after interrupt should look as if never started (abort)• Basically want same thing for mis-predicted branch

• What makes precise interrupts difficult?• OoO completion ® must undo post-interrupt writebacks• Same thing for branches• In-order ® branches complete before younger insns writeback• OoO ® not necessarily

Precise interrupts, mis-speculation recovery: same problem Same problem ® same solution

Page 10: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 10

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Precise StateSpeculative execution requires

• (Ability to) abort & restart at every branch• Abort & restart at every load useful for load speculation (later)

• And for shared memory multiprocessing (much later)Precise synchronous (program-internal) interrupts require

• Abort & restart at every load, store, ??

Precise asynchronous (external) interrupts require• Abort & restart at every ??

Bite the bullet• Implement abort & restart at every insn• Called “precise state”

Page 11: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 11

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Precise State OptionsImprecise state: ignore the problem!

– Makes page faults (any restartable exceptions) difficult– Makes speculative execution almost impossible• IEEE standard strongly suggests precise state• Compromise: Alpha implemented precise state only for integer ops

Force in-order completion (W): stall pipe if necessary– Slow

Precise state in software: trap to recovery routine– Implementation dependent• Trap on every mis-predicted branch (you must be joking)

Precise state in hardware+ Everything is better in hardware (except policy)

Page 12: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 12

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

The Problem with Precise State

Problem: writeback combines two separate functions• Forwards values to younger insns: OK for this to be out-of-order• Write values to registers: would like this to be in-order

Similar problem (decode) for OoO execution: solution?• Split decode (D) ® in-order dispatch (D) + out-of-order issue (S)• Separate using insn buffer: scoreboard or reservation station

regfile

D$I$BP

insn buffer

SD

Page 13: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 13

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Re-Order Buffer (ROB)

Insn buffer ® re-order buffer (ROB)• Buffers completed results en route to register file• May be combined with RS or separate

• Combined in picture: register-update unit RUU (Sohi’s method)• Separate (more common today): P6-style

Split writeback (W) into two stages• Why is there no latch between W1 and W2?

regfile

D$I$BP

Reorder buffer (ROB)

W1 W2

Page 14: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 14

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Complete and Retire

Complete (C): second part of writeback• Completed insns write results into ROB+ Out-of-order: wait doesn’t back-propagate to younger insns

Retire (R): aka commit, graduate• ROB writes results to register file• In order: stall back-propagates to younger insns

regfile

D$I$BP

Reorder buffer (ROB)

C R

Page 15: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 15

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Load/Store Queue (LSQ)ROB makes register writes in-order, but what about stores?

As usual, i.e., to D$ in X stage?• Not even close, imprecise memory worse than imprecise registers

Load/store queue (LSQ)• Completed stores write to LSQ• When store retires, head of LSQ written to D$• When loads execute, access LSQ and D$ in parallel

• Forward from LSQ if older store with matching address• More modern design: loads and stores in separate queues• More on this later

Page 16: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 16

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

ROB + LSQ

Modulo gross simplifications, this picture is almost realistic!

regfile

D$

I$BP

ROB

C R

LSQload/store

store dataaddr

load data

Page 17: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 17

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6P6: Start with Tomasulo’s algorithm… add ROB

• Separate ROB and RS

Simple-P6• Our old RS organization: 1 ALU, 1 load, 1 store, 2 3-cycle FP

Page 18: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 18

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Data StructuresReservation Stations are same as beforeROB

• head, tail: pointers maintain sequential order• R: insn output register, V: insn output value

Tags are different• Tomasulo: RS# ® P6: ROB#

Map Table is different• T+: tag + “ready-in-ROB” bit• T==0 ® Value is ready in regfile• T!=0 ® Value is not ready• T!=0+ ® Value is ready in the ROB

Page 19: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 19

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Data Structures

• Insn fields and status bits• Tags• Values

value

V1 V2

FU

T+

T2T1Top========

Map Table

RS

CD

B.V

CD

B.T

Dispatch

Regfile

T

========

R value

ROB

HeadRetire

TailDispatch

Page 20: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 20

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Data StructuresROBht # Insn R V S X C

1 ldf X(r1),f12 mulf f0,f1,f23 stf f2,Z(r1) 4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Map TableReg T+f0f1f2r1

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST no4 FP1 no5 FP2 no

CDBT V

Page 21: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 21

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 PipelineNew pipeline structure: F, D, S, X, C, R

• D (dispatch)• Structural hazard (ROB/LSQ/RS) ? Stall• Allocate ROB/LSQ/RS• Set RS tag to ROB#• Set Map Table entry to ROB# and clear “ready-in-ROB” bit• Read ready registers into RS (from either ROB or Regfile)

• X (execute)• Free RS entry• Use to be at W, can be earlier because RS# are not tags

Page 22: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 22

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Pipeline• C (complete)

• Structural hazard (CDB)? wait• Write value into ROB entry indicated by RS tag• Mark ROB entry as complete• If not overwritten, mark Map Table entry “ready-in-ROB” bit (+)

• R (retire)• Insn at ROB head not complete ? stall• Handle any exceptions• Write ROB head value to register file• If store, write LSQ head to D$• Free ROB/LSQ entries

Page 23: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 23

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Dispatch (D): Part I

• RS/ROB full ? stall • Allocate RS/ROB entries, assign ROB# to RS output tag• Set output register Map Table entry to ROB#, clear “ready-in-ROB”

value

V1 V2

FU

T+

T2T1Top========

Map Table

RS

CD

B.V

CD

B.T

Dispatch

Regfile

T

========

R value

ROB

HeadRetire

TailDispatch

Page 24: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 24

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Dispatch (D): Part II

• Read tags for register inputs from Map Table• Tag==0 ® copy value from Regfile (not shown)• Tag!=0 ® copy Map Table tag to RS• Tag!=0+ ® copy value from ROB

value

V1 V2

FU

T+

T2T1Top========

Map Table

RS

CD

B.V

CD

B.T

Dispatch

Regfile

T

========

R value

ROB

HeadRetire

TailDispatch

Page 25: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 25

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Complete (C)

• Structural hazard (CDB) ? Stall : broadcast <value,tag> on CDB

• Write result into ROB, if still valid clear MapTable “ready-in-ROB” bit

• Match tags, write CDB.V into RS slots of dependent insns

value

V1 V2

FU

T+

T2T1Top========

Map Table

RS

CD

B.V

CD

B.T

Dispatch

Regfile

T

========

R value

ROB

HeadRetire

TailDispatch

Page 26: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 26

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Retire (R)

• ROB head not complete ? stall : free ROB entry• Write ROB head result to Regfile• If still valid, clear Map Table entry

value

V1 V2

FU

T

T2T1Top========

Map Table

RS

CD

B.V

CD

B.T

Dispatch

Regfile

T

========

R value

ROB

HeadRetire

TailDispatch

Page 27: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 27

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Data Structures

• Insn fields and status bits• Tags• Values

value

V1 V2

FU

T+

T2T1Top========

Map Table

RS

CD

B.V

CD

B.T

Dispatch

Regfile

T

========

R value

ROB

HeadRetire

TailDispatch

Page 28: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 28

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Data StructuresROBht # Insn R V S X C

1 ldf X(r1),f12 mulf f0,f1,f23 stf f2,Z(r1) 4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Map TableReg T+f0f1f2r1

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST no4 FP1 no5 FP2 no

CDBT V

Page 29: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 29

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 1ROBht # Insn R V S X Cht 1 ldf X(r1),f1 f1

2 mulf f0,f1,f23 stf f2,Z(r1) 4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Map TableReg T+f0f1 ROB#1f2r1

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD yes ldf ROB#1 [r1]3 ST no4 FP1 no5 FP2 no

CDBT V

allocateset ROB# tag

Page 30: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 30

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 2ROBht # Insn R V S X Ch 1 ldf X(r1),f1 f1 c2t 2 mulf f0,f1,f2 f2

3 stf f2,Z(r1) 4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Map TableReg T+f0f1 ROB#1f2 ROB#2r1

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD yes ldf ROB#1 [r1]3 ST no4 FP1 yes mulf ROB#2 ROB#1 [f0]5 FP2 no

CDBT V

allocateset ROB# tag

Page 31: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 31

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 3ROBht # Insn R V S X Ch 1 ldf X(r1),f1 f1 c2 c3

2 mulf f0,f1,f2 f2t 3 stf f2,Z(r1)

4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Map TableReg T+f0f1 ROB#1f2 ROB#2r1

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#2 ROB#1 [f0]5 FP2 no

CDBT V

allocatefree

Page 32: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 32

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 4ROBht # Insn R V S X Ch 1 ldf X(r1),f1 f1 [f1] c2 c3 c4

2 mulf f0,f1,f2 f2 c43 stf f2,Z(r1)

t 4 addi r1,4,r1 r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Map TableReg T+f0f1 ROB#1+f2 ROB#2r1 ROB#4

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU yes add ROB#4 [r1]2 LD no3 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#2 ROB#1 [f0] CDB.V5 FP2 no

CDBT VROB#1 [f1]

allocate

ROB#1 readygrab CDB.V

ldf finished1. set “ready-in-ROB” bit2. write result to ROB3. CDB broadcast

Page 33: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 33

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 5ROBht # Insn R V S X C

1 ldf X(r1),f1 f1 [f1] c2 c3 c4h 2 mulf f0,f1,f2 f2 c4 c5

3 stf f2,Z(r1) 4 addi r1,4,r1 r1 c5

t 5 ldf X(r1),f1 f16 mulf f0,f1,f27 stf f2,Z(r1)

Map TableReg T+f0f1 ROB#5f2 ROB#2r1 ROB#4

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU yes add ROB#4 [r1]2 LD yes ldf ROB#5 ROB#43 ST yes stf ROB#3 ROB#2 [r1]4 FP1 no5 FP2 no

CDBT V

allocate

free

ldf retires1. write ROB result to regfile

Page 34: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 34

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 6ROBht # Insn R V S X C

1 ldf X(r1),f1 f1 [f1] c2 c3 c4h 2 mulf f0,f1,f2 f2 c4 c5+

3 stf f2,Z(r1) 4 addi r1,4,r1 r1 c5 c65 ldf X(r1),f1 f1

t 6 mulf f0,f1,f2 f27 stf f2,Z(r1)

Map TableReg T+f0f1 ROB#5f2 ROB#6r1 ROB#4

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD yes ldf ROB#5 ROB#43 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#6 ROB#5 [f0]5 FP2 no

CDBT V

allocate

free

Page 35: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 35

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 7ROBht # Insn R V S X C

1 ldf X(r1),f1 f1 [f1] c2 c3 c4h 2 mulf f0,f1,f2 f2 c4 c5+

3 stf f2,Z(r1) 4 addi r1,4,r1 r1 [r1] c5 c6 c75 ldf X(r1),f1 f1 c7

t 6 mulf f0,f1,f2 f27 stf f2,Z(r1)

Map TableReg T+f0f1 ROB#5f2 ROB#6r1 ROB#4+

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD yes ldf ROB#5 ROB#4 CDB.V3 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#6 ROB#5 [f0]5 FP2 no

CDBT VROB#4 [r1]

ROB#4 ready

grab CDB.V

stall D (no free ST RS)

Page 36: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 36

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 8ROBht # Insn R V S X C

1 ldf X(r1),f1 f1 [f1] c2 c3 c4h 2 mulf f0,f1,f2 f2 [f2] c4 c5+ c8

3 stf f2,Z(r1) c84 addi r1,4,r1 r1 [r1] c5 c6 c75 ldf X(r1),f1 f1 c7 c8

t 6 mulf f0,f1,f2 f27 stf f2,Z(r1)

Map TableReg T+f0f1 ROB#5f2 ROB#6r1 ROB#4+

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#3 ROB#2 [f2] [r1]4 FP1 yes mulf ROB#6 ROB#5 [f0]5 FP2 no

CDBT VROB#2 [f2]

ROB#2 readygrab CDB.V

stall R for addi (in-order)

ROB#2 invalid in MapTabledon’t set “ready-in-ROB”

Page 37: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 37

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 9ROBht # Insn R V S X C

1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8

h 3 stf f2,Z(r1) c8 c94 addi r1,4,r1 r1 [r1] c5 c6 c75 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9

t 7 stf f2,Z(r1)

Map TableReg T+f0f1 ROB#5+f2 ROB#6r1 ROB#4+

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 yes mulf ROB#6 ROB#5 [f0] CDB.V5 FP2 no

CDBT VROB#5 [f1]

ROB#5 ready

grab CDB.V

retire mulf

all pipe stages active at once!

free, re-allocate

Page 38: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 38

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 10ROBht # Insn R V S X C

1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8

h 3 stf f2,Z(r1) c8 c9 c104 addi r1,4,r1 r1 [r1] c5 c6 c75 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9 c10

t 7 stf f2,Z(r1)

Map TableReg T+f0f1 ROB#5+f2 ROB#6r1 ROB#4+

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 no5 FP2 no

CDBT V

free

Page 39: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 39

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 11ROBht # Insn R V S X C

1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5 c83 stf f2,Z(r1) c8 c9 c10

h 4 addi r1,4,r1 r1 [r1] c5 c6 c75 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9 c10

t 7 stf f2,Z(r1)

Map TableReg T+f0f1 ROB#5+f2 ROB#6r1 ROB#4+

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 no5 FP2 no

CDBT V

retire stf

Page 40: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 40

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Precise State in P6Point of ROB is maintaining precise state

• How does that work?• Easy as 1,2,3

1. Wait until last good insn retires, first bad insn at ROB head2. Clear contents of ROB, RS, and Map Table3. Start over

• Works because zero (0) means the right thing…• 0 in ROB/RS ® entry is empty• Tag == 0 in Map Table ® register is in regfile

• …and because regfile and D$ writes take place at R• Example: page fault in first stf

Page 41: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 41

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 9 (with precise state)ROBht # Insn R V S X C

1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8

h 3 stf f2,Z(r1) c8 c94 addi r1,4,r1 r1 [r1] c5 c6 c75 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9

t 7 stf f2,Z(r1)

Map TableReg T+f0f1 ROB#5+f2 ROB#6r1 ROB#4+

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 yes mulf ROB#6 ROB#5 [f0] CDB.V5 FP2 no

CDBT VROB#5 [f1]

PAGE FAULT

Page 42: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 42

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 10 (with precise state)ROBht # Insn R V S X C

1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c83 stf f2,Z(r1) 4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Map TableReg T+f0f1f2r1

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST no4 FP1 no5 FP2 no

CDBT V

faulting insn at ROB head?CLEAR EVERYTHING

Page 43: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 43

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 11 (with precise state)ROBht # Insn R V S X C

1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8

ht 3 stf f2,Z(r1) 4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Map TableReg T+f0f1f2r1

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#3 [f4] [r1]4 FP1 no5 FP2 no

CDBT V

START OVER(after OS fixes page fault)

Page 44: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 44

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 12 (with precise state)ROBht # Insn R V S X C

1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8

h 3 stf f2,Z(r1) c12t 4 addi r1,4,r1 r1

5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Map TableReg T+f0f1f2r1 ROB#4

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU yes addi ROB#4 [r1]2 LD no3 ST yes stf ROB#3 [f4] [r1]4 FP1 no5 FP2 no

CDBT V

Page 45: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 45

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 PerformanceIn other words: what is the cost of precise state?

+ In general: same performance as “plain” Tomasulo• ROB is not a performance device• Maybe a little better (RS freed earlier ® fewer struct hazards)

– Unless ROB is too small• In which case ROB struct hazards become a problem

• Rules of thumb for ROB size• At least N (width) * number of pipe stages between D and R• At least N * thit-L2

• Can add a factor of 2 to both if you want• What is the rationale behind these?

Page 46: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 46

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 (Tomasulo+ROB) ReduxPopular design for a while

• (Relatively) easy to implement correctly• Anything goes wrong (mispredicted branch, fault, interrupt)?• Just clear everything and start again

• Examples: Intel PentiumPro, IBM/Motorola PowerPC, AMD K6

Actually making a comeback…• Examples: Intel PentiumM

But went away for a while, why?

Page 47: Fetch EECS 470 Lecture 7 Dispatch Buffer - eecs.umich.edu · Tag != 0 value is not ready and is being computed by RS# CDB broadcasts values with tags aached So insns know what value

EECS 470Lecture 7 Slide 47

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

The Problem with P6

Problem for high performance implementations– Too much value movement (regfile/ROB®RS®ROB®regfile)– Multi-input muxes, long buses complicate routing and slow clock

value

V1 V2

FU

T+

T2T1Top========

Map Table

RS

CD

B.V

CD

B.T

Dispatch

Regfile

T

========

R value

ROB

HeadRetire

TailDispatch