Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Lecture 7 Slide 1EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
EECS 470Lecture 7Speculation &Precise InterruptsFall 2019
Prof. Ron Dreslinski
http://www.eecs.umich.edu/courses/eecs470
Instruction/Decode Buffer
Fetch
Dispatch Buffer
Decode
Reservation
Dispatch
Reorder/
Store Buffer
Complete
Retire
StationsIssue
Execute
Finish
In O
rder
Out
of
Ord
erIn
Ord
er
Completion Buffer
Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides. Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin.
Lecture 7 Slide 2EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
r Programming assignment #3 posted, start ASAP
r HW3 Posted, Due 10/11
r Midterm Exam: October 17th, 7:00-8:30pm, 1571 GGBL
r Final Project Information on Wednesday, start thinking about groups
m Project Description Posted on Website shortly
Announcements
Lecture 7 Slide 3EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Readings
For Today:r Smith & Pleszkun “Implementing Precise Interrupts”
For Wednesday:r H & P Chapter 3.8-3.9r H & P Chapter 3.13
Have you read yet?r D. Sima “Design Space of Register Renaming Techniques”
Lecture 7 Slide 4EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Dynamic Scheduling Summary• Dynamic scheduling: out-of-order execution
r Higher pipeline/FU utilization, improved performancer Easier and more effective in hardware than software
+ More storage locations than architectural registers+ Dynamic handling of cache misses
• Instruction buffer: multiple F/D latchesr Implements large scheduling scope + “passing” functionalityr Split decode into in-order dispatch and out-of-order issue
m Stall vs. wait
• Dynamic scheduling algorithmsr Scoreboard: no register renaming, limited out-of-orderr Tomasulo: copy-based register renaming, full out-of-order
EECS 470Lecture 7 Slide 5
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Value/Copy-Based Register RenamingTomasulo-style register renaming
• Called “value-based” or “copy-based”
• Names: architectural registers
• Storage locations: register file and reservation stations
• Values can and do exist in both
• Register file holds master (i.e., most recent) values
+ RS copies eliminate WAR hazards
• Storage locations referred to internally by RS# tags
• Register table translates names to tags
• Tag == 0 value is in register file
• Tag != 0 value is not ready and is being computed by RS#
• CDB broadcasts values with tags attached
• So insns know what value they are looking at
EECS 470Lecture 7 Slide 6
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Are we done?When can Tomasulo’s machine go wrong?
– Control transfers!
Branch instructions• Need a branch predictor to predict what to fetch next
• More on branch prediction later• Need speculative execution to “clean up” state when we guess wrong
Interrupts & Exception (today’s focus)• Need to handle uncommon execution cases
• Branch to a specialized software handler
Both face similar problem• Don’t know relative order of instructions in RS
Lecture 7 Slide 7EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Interrupts
An unexpected transfer of control flowr Pick up where you left off once handled (restartable)
r Transparent to interrupted program
Kinds:r Asynchronous
m I/O device wants attentionm Can “defer” interrupt until convenient
r Synchronous (aka exceptions, traps)m Unusual condition for some instruction
Examples?m Explicit calls to operating system software
i1
i2
i3
H1
H2
Hn
….
interrupt
restart
Lecture 7 Slide 8EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Precise Interrupts
Sequential Code Semantics Overlapped Execution
i1: xxxx
i2: xxxx
i3: xxxx
i2
i1
i3
i2: i1:
i3:
Precise interrupt appears to happen between two instructions
EECS 470Lecture 7 Slide 9
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Speculation and Precise InterruptsWhy are we discussing these together?
• Sequential (vN) semantics for interrupts• All insns before interrupt should be complete• All insns after interrupt should look as if never started (abort)• Basically want same thing for mis-predicted branch
• What makes precise interrupts difficult?• OoO completion ® must undo post-interrupt writebacks• Same thing for branches• In-order ® branches complete before younger insns writeback• OoO ® not necessarily
Precise interrupts, mis-speculation recovery: same problem Same problem ® same solution
EECS 470Lecture 7 Slide 10
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Precise StateSpeculative execution requires
• (Ability to) abort & restart at every branch• Abort & restart at every load useful for load speculation (later)
• And for shared memory multiprocessing (much later)Precise synchronous (program-internal) interrupts require
• Abort & restart at every load, store, ??
Precise asynchronous (external) interrupts require• Abort & restart at every ??
Bite the bullet• Implement abort & restart at every insn• Called “precise state”
EECS 470Lecture 7 Slide 11
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Precise State OptionsImprecise state: ignore the problem!
– Makes page faults (any restartable exceptions) difficult– Makes speculative execution almost impossible• IEEE standard strongly suggests precise state• Compromise: Alpha implemented precise state only for integer ops
Force in-order completion (W): stall pipe if necessary– Slow
Precise state in software: trap to recovery routine– Implementation dependent• Trap on every mis-predicted branch (you must be joking)
Precise state in hardware+ Everything is better in hardware (except policy)
EECS 470Lecture 7 Slide 12
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
The Problem with Precise State
Problem: writeback combines two separate functions• Forwards values to younger insns: OK for this to be out-of-order• Write values to registers: would like this to be in-order
Similar problem (decode) for OoO execution: solution?• Split decode (D) ® in-order dispatch (D) + out-of-order issue (S)• Separate using insn buffer: scoreboard or reservation station
regfile
D$I$BP
insn buffer
SD
EECS 470Lecture 7 Slide 13
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Re-Order Buffer (ROB)
Insn buffer ® re-order buffer (ROB)• Buffers completed results en route to register file• May be combined with RS or separate
• Combined in picture: register-update unit RUU (Sohi’s method)• Separate (more common today): P6-style
Split writeback (W) into two stages• Why is there no latch between W1 and W2?
regfile
D$I$BP
Reorder buffer (ROB)
W1 W2
EECS 470Lecture 7 Slide 14
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Complete and Retire
Complete (C): second part of writeback• Completed insns write results into ROB+ Out-of-order: wait doesn’t back-propagate to younger insns
Retire (R): aka commit, graduate• ROB writes results to register file• In order: stall back-propagates to younger insns
regfile
D$I$BP
Reorder buffer (ROB)
C R
EECS 470Lecture 7 Slide 15
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Load/Store Queue (LSQ)ROB makes register writes in-order, but what about stores?
As usual, i.e., to D$ in X stage?• Not even close, imprecise memory worse than imprecise registers
Load/store queue (LSQ)• Completed stores write to LSQ• When store retires, head of LSQ written to D$• When loads execute, access LSQ and D$ in parallel
• Forward from LSQ if older store with matching address• More modern design: loads and stores in separate queues• More on this later
EECS 470Lecture 7 Slide 16
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
ROB + LSQ
Modulo gross simplifications, this picture is almost realistic!
regfile
D$
I$BP
ROB
C R
LSQload/store
store dataaddr
load data
EECS 470Lecture 7 Slide 17
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6P6: Start with Tomasulo’s algorithm… add ROB
• Separate ROB and RS
Simple-P6• Our old RS organization: 1 ALU, 1 load, 1 store, 2 3-cycle FP
EECS 470Lecture 7 Slide 18
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Data StructuresReservation Stations are same as beforeROB
• head, tail: pointers maintain sequential order• R: insn output register, V: insn output value
Tags are different• Tomasulo: RS# ® P6: ROB#
Map Table is different• T+: tag + “ready-in-ROB” bit• T==0 ® Value is ready in regfile• T!=0 ® Value is not ready• T!=0+ ® Value is ready in the ROB
EECS 470Lecture 7 Slide 19
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Data Structures
• Insn fields and status bits• Tags• Values
value
V1 V2
FU
T+
T2T1Top========
Map Table
RS
CD
B.V
CD
B.T
Dispatch
Regfile
T
========
R value
ROB
HeadRetire
TailDispatch
EECS 470Lecture 7 Slide 20
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Data StructuresROBht # Insn R V S X C
1 ldf X(r1),f12 mulf f0,f1,f23 stf f2,Z(r1) 4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Map TableReg T+f0f1f2r1
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST no4 FP1 no5 FP2 no
CDBT V
EECS 470Lecture 7 Slide 21
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 PipelineNew pipeline structure: F, D, S, X, C, R
• D (dispatch)• Structural hazard (ROB/LSQ/RS) ? Stall• Allocate ROB/LSQ/RS• Set RS tag to ROB#• Set Map Table entry to ROB# and clear “ready-in-ROB” bit• Read ready registers into RS (from either ROB or Regfile)
• X (execute)• Free RS entry• Use to be at W, can be earlier because RS# are not tags
EECS 470Lecture 7 Slide 22
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Pipeline• C (complete)
• Structural hazard (CDB)? wait• Write value into ROB entry indicated by RS tag• Mark ROB entry as complete• If not overwritten, mark Map Table entry “ready-in-ROB” bit (+)
• R (retire)• Insn at ROB head not complete ? stall• Handle any exceptions• Write ROB head value to register file• If store, write LSQ head to D$• Free ROB/LSQ entries
EECS 470Lecture 7 Slide 23
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Dispatch (D): Part I
• RS/ROB full ? stall • Allocate RS/ROB entries, assign ROB# to RS output tag• Set output register Map Table entry to ROB#, clear “ready-in-ROB”
value
V1 V2
FU
T+
T2T1Top========
Map Table
RS
CD
B.V
CD
B.T
Dispatch
Regfile
T
========
R value
ROB
HeadRetire
TailDispatch
EECS 470Lecture 7 Slide 24
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Dispatch (D): Part II
• Read tags for register inputs from Map Table• Tag==0 ® copy value from Regfile (not shown)• Tag!=0 ® copy Map Table tag to RS• Tag!=0+ ® copy value from ROB
value
V1 V2
FU
T+
T2T1Top========
Map Table
RS
CD
B.V
CD
B.T
Dispatch
Regfile
T
========
R value
ROB
HeadRetire
TailDispatch
EECS 470Lecture 7 Slide 25
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Complete (C)
• Structural hazard (CDB) ? Stall : broadcast <value,tag> on CDB
• Write result into ROB, if still valid clear MapTable “ready-in-ROB” bit
• Match tags, write CDB.V into RS slots of dependent insns
value
V1 V2
FU
T+
T2T1Top========
Map Table
RS
CD
B.V
CD
B.T
Dispatch
Regfile
T
========
R value
ROB
HeadRetire
TailDispatch
EECS 470Lecture 7 Slide 26
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Retire (R)
• ROB head not complete ? stall : free ROB entry• Write ROB head result to Regfile• If still valid, clear Map Table entry
value
V1 V2
FU
T
T2T1Top========
Map Table
RS
CD
B.V
CD
B.T
Dispatch
Regfile
T
========
R value
ROB
HeadRetire
TailDispatch
EECS 470Lecture 7 Slide 27
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Data Structures
• Insn fields and status bits• Tags• Values
value
V1 V2
FU
T+
T2T1Top========
Map Table
RS
CD
B.V
CD
B.T
Dispatch
Regfile
T
========
R value
ROB
HeadRetire
TailDispatch
EECS 470Lecture 7 Slide 28
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 Data StructuresROBht # Insn R V S X C
1 ldf X(r1),f12 mulf f0,f1,f23 stf f2,Z(r1) 4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Map TableReg T+f0f1f2r1
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST no4 FP1 no5 FP2 no
CDBT V
EECS 470Lecture 7 Slide 29
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 1ROBht # Insn R V S X Cht 1 ldf X(r1),f1 f1
2 mulf f0,f1,f23 stf f2,Z(r1) 4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Map TableReg T+f0f1 ROB#1f2r1
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD yes ldf ROB#1 [r1]3 ST no4 FP1 no5 FP2 no
CDBT V
allocateset ROB# tag
EECS 470Lecture 7 Slide 30
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 2ROBht # Insn R V S X Ch 1 ldf X(r1),f1 f1 c2t 2 mulf f0,f1,f2 f2
3 stf f2,Z(r1) 4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Map TableReg T+f0f1 ROB#1f2 ROB#2r1
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD yes ldf ROB#1 [r1]3 ST no4 FP1 yes mulf ROB#2 ROB#1 [f0]5 FP2 no
CDBT V
allocateset ROB# tag
EECS 470Lecture 7 Slide 31
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 3ROBht # Insn R V S X Ch 1 ldf X(r1),f1 f1 c2 c3
2 mulf f0,f1,f2 f2t 3 stf f2,Z(r1)
4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Map TableReg T+f0f1 ROB#1f2 ROB#2r1
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#2 ROB#1 [f0]5 FP2 no
CDBT V
allocatefree
EECS 470Lecture 7 Slide 32
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 4ROBht # Insn R V S X Ch 1 ldf X(r1),f1 f1 [f1] c2 c3 c4
2 mulf f0,f1,f2 f2 c43 stf f2,Z(r1)
t 4 addi r1,4,r1 r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Map TableReg T+f0f1 ROB#1+f2 ROB#2r1 ROB#4
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU yes add ROB#4 [r1]2 LD no3 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#2 ROB#1 [f0] CDB.V5 FP2 no
CDBT VROB#1 [f1]
allocate
ROB#1 readygrab CDB.V
ldf finished1. set “ready-in-ROB” bit2. write result to ROB3. CDB broadcast
EECS 470Lecture 7 Slide 33
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 5ROBht # Insn R V S X C
1 ldf X(r1),f1 f1 [f1] c2 c3 c4h 2 mulf f0,f1,f2 f2 c4 c5
3 stf f2,Z(r1) 4 addi r1,4,r1 r1 c5
t 5 ldf X(r1),f1 f16 mulf f0,f1,f27 stf f2,Z(r1)
Map TableReg T+f0f1 ROB#5f2 ROB#2r1 ROB#4
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU yes add ROB#4 [r1]2 LD yes ldf ROB#5 ROB#43 ST yes stf ROB#3 ROB#2 [r1]4 FP1 no5 FP2 no
CDBT V
allocate
free
ldf retires1. write ROB result to regfile
EECS 470Lecture 7 Slide 34
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 6ROBht # Insn R V S X C
1 ldf X(r1),f1 f1 [f1] c2 c3 c4h 2 mulf f0,f1,f2 f2 c4 c5+
3 stf f2,Z(r1) 4 addi r1,4,r1 r1 c5 c65 ldf X(r1),f1 f1
t 6 mulf f0,f1,f2 f27 stf f2,Z(r1)
Map TableReg T+f0f1 ROB#5f2 ROB#6r1 ROB#4
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD yes ldf ROB#5 ROB#43 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#6 ROB#5 [f0]5 FP2 no
CDBT V
allocate
free
EECS 470Lecture 7 Slide 35
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe,
Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 7ROBht # Insn R V S X C
1 ldf X(r1),f1 f1 [f1] c2 c3 c4h 2 mulf f0,f1,f2 f2 c4 c5+
3 stf f2,Z(r1) 4 addi r1,4,r1 r1 [r1] c5 c6 c75 ldf X(r1),f1 f1 c7
t 6 mulf f0,f1,f2 f27 stf f2,Z(r1)
Map TableReg T+f0f1 ROB#5f2 ROB#6r1 ROB#4+
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD yes ldf ROB#5 ROB#4 CDB.V3 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#6 ROB#5 [f0]5 FP2 no
CDBT VROB#4 [r1]
ROB#4 ready
grab CDB.V
stall D (no free ST RS)
EECS 470Lecture 7 Slide 36
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 8ROBht # Insn R V S X C
1 ldf X(r1),f1 f1 [f1] c2 c3 c4h 2 mulf f0,f1,f2 f2 [f2] c4 c5+ c8
3 stf f2,Z(r1) c84 addi r1,4,r1 r1 [r1] c5 c6 c75 ldf X(r1),f1 f1 c7 c8
t 6 mulf f0,f1,f2 f27 stf f2,Z(r1)
Map TableReg T+f0f1 ROB#5f2 ROB#6r1 ROB#4+
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#3 ROB#2 [f2] [r1]4 FP1 yes mulf ROB#6 ROB#5 [f0]5 FP2 no
CDBT VROB#2 [f2]
ROB#2 readygrab CDB.V
stall R for addi (in-order)
ROB#2 invalid in MapTabledon’t set “ready-in-ROB”
EECS 470Lecture 7 Slide 37
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe,
Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 9ROBht # Insn R V S X C
1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8
h 3 stf f2,Z(r1) c8 c94 addi r1,4,r1 r1 [r1] c5 c6 c75 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9
t 7 stf f2,Z(r1)
Map TableReg T+f0f1 ROB#5+f2 ROB#6r1 ROB#4+
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 yes mulf ROB#6 ROB#5 [f0] CDB.V5 FP2 no
CDBT VROB#5 [f1]
ROB#5 ready
grab CDB.V
retire mulf
all pipe stages active at once!
free, re-allocate
EECS 470Lecture 7 Slide 38
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 10ROBht # Insn R V S X C
1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8
h 3 stf f2,Z(r1) c8 c9 c104 addi r1,4,r1 r1 [r1] c5 c6 c75 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9 c10
t 7 stf f2,Z(r1)
Map TableReg T+f0f1 ROB#5+f2 ROB#6r1 ROB#4+
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 no5 FP2 no
CDBT V
free
EECS 470Lecture 7 Slide 39
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 11ROBht # Insn R V S X C
1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5 c83 stf f2,Z(r1) c8 c9 c10
h 4 addi r1,4,r1 r1 [r1] c5 c6 c75 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9 c10
t 7 stf f2,Z(r1)
Map TableReg T+f0f1 ROB#5+f2 ROB#6r1 ROB#4+
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 no5 FP2 no
CDBT V
retire stf
EECS 470Lecture 7 Slide 40
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
Precise State in P6Point of ROB is maintaining precise state
• How does that work?• Easy as 1,2,3
1. Wait until last good insn retires, first bad insn at ROB head2. Clear contents of ROB, RS, and Map Table3. Start over
• Works because zero (0) means the right thing…• 0 in ROB/RS ® entry is empty• Tag == 0 in Map Table ® register is in regfile
• …and because regfile and D$ writes take place at R• Example: page fault in first stf
EECS 470Lecture 7 Slide 41
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 9 (with precise state)ROBht # Insn R V S X C
1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8
h 3 stf f2,Z(r1) c8 c94 addi r1,4,r1 r1 [r1] c5 c6 c75 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9
t 7 stf f2,Z(r1)
Map TableReg T+f0f1 ROB#5+f2 ROB#6r1 ROB#4+
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 yes mulf ROB#6 ROB#5 [f0] CDB.V5 FP2 no
CDBT VROB#5 [f1]
PAGE FAULT
EECS 470Lecture 7 Slide 42
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 10 (with precise state)ROBht # Insn R V S X C
1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c83 stf f2,Z(r1) 4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Map TableReg T+f0f1f2r1
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST no4 FP1 no5 FP2 no
CDBT V
faulting insn at ROB head?CLEAR EVERYTHING
EECS 470Lecture 7 Slide 43
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 11 (with precise state)ROBht # Insn R V S X C
1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8
ht 3 stf f2,Z(r1) 4 addi r1,4,r15 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Map TableReg T+f0f1f2r1
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#3 [f4] [r1]4 FP1 no5 FP2 no
CDBT V
START OVER(after OS fixes page fault)
EECS 470Lecture 7 Slide 44
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6: Cycle 12 (with precise state)ROBht # Insn R V S X C
1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8
h 3 stf f2,Z(r1) c12t 4 addi r1,4,r1 r1
5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)
Map TableReg T+f0f1f2r1 ROB#4
Reservation Stations# FU busy op T T1 T2 V1 V21 ALU yes addi ROB#4 [r1]2 LD no3 ST yes stf ROB#3 [f4] [r1]4 FP1 no5 FP2 no
CDBT V
EECS 470Lecture 7 Slide 45
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 PerformanceIn other words: what is the cost of precise state?
+ In general: same performance as “plain” Tomasulo• ROB is not a performance device• Maybe a little better (RS freed earlier ® fewer struct hazards)
– Unless ROB is too small• In which case ROB struct hazards become a problem
• Rules of thumb for ROB size• At least N (width) * number of pipe stages between D and R• At least N * thit-L2
• Can add a factor of 2 to both if you want• What is the rationale behind these?
EECS 470Lecture 7 Slide 46
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
P6 (Tomasulo+ROB) ReduxPopular design for a while
• (Relatively) easy to implement correctly• Anything goes wrong (mispredicted branch, fault, interrupt)?• Just clear everything and start again
• Examples: Intel PentiumPro, IBM/Motorola PowerPC, AMD K6
Actually making a comeback…• Examples: Intel PentiumM
But went away for a while, why?
EECS 470Lecture 7 Slide 47
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar
The Problem with P6
Problem for high performance implementations– Too much value movement (regfile/ROB®RS®ROB®regfile)– Multi-input muxes, long buses complicate routing and slow clock
value
V1 V2
FU
T+
T2T1Top========
Map Table
RS
CD
B.V
CD
B.T
Dispatch
Regfile
T
========
R value
ROB
HeadRetire
TailDispatch