55
ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit www.simplescalar.com for additional info Simplescalar was developed by Todd Austin now at Michigan. First version while at UWisconsin. Builds on the experience with other simulators that existed at the time at UWisc. Introduced many simulation speed

ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit for additional

Embed Size (px)

Citation preview

Page 1: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Simplescalar’s out-of-order simulator (v3)ECE1773

Andreas MoshovosVisit www.simplescalar.com for additional info

Simplescalar was developed by Todd Austin now at Michigan.First version while at UWisconsin. Builds on the experience

with other simulators that existed at the time at UWisc.Introduced many simulation speed enhancements.

Can be used for free for academic purposes.

Page 2: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

What is sim-outorder

• Approximate model of dynamically scheduled processor• Simulates:

– I and D caches– Branch prediction– I and D TLBs (constant latency)– Combined Reorder buffer and scheduler– Register renaming– Support for speculative execution after branches– Load/Store scheduler

Page 3: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

How is sim-outorder structured

fetch disp. sched. exec WB commit

memsched.

mem memI-cache

L1I-TLB

U-cacheL1

D-cacheL1

D-TLB

Main MemoryVirtual

bpred

Page 4: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Main Simulator Loop• sim_main: forever do

– ruu_commit ()– ruu_release_fu()

• Internal bookeeping of which functional units are available– ruu_writeback()– lsq_refresh()

• Load/store scheduler– ruu_issue()

• Non-load/store instruction scheduler– ruu_dispatch()– ruu_fetch()

• These correspond to the green boxes on the previous slide• Every iteration is a single cycle: sim_cycle variable counts

them

Page 5: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_fetch()• Fetch and predict up to ruu_decode_width instructions• Place them into fetch_data[] buffer• Inputs: 2 globals

– Fetch_regs_PC: what fetch thinks is the next PC to fetch from– Fetch_pred_PC: what is the predicted PC for after this instruction

• Output: fetch_data[] buffer– Fetch_tail used by ruu_fetch()– Fetch_head used by ruu_dispatch()– Fetch_num = total number of occupied fetch_data entries– ruu_ifq_size = total number of fetch_data entries

• Fetch places insts and Dispatch consumes them• On miss-prediction:

– PCs are reset to appropriate values and fetch_data is drained

Page 6: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_fetch() - loop• If not a bogus address• Access I-Cache with fetch_regs_PC get latency of

access• Access I-TLB hit/miss• Determine overall latency as max of the two• If prediction is enabled:• Access predictor and get fetch_pred_PC plus a back-

pointer to predictor entry• Instruction, PCs and prediction info go into

fetch_data[fetch_tail]• Fetch_num++, fetch_tail++ MOD ruu_ifq_size

Page 7: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

I-Cache Interface – cache.[ch]• Cache_access (*cache_il1, Read/Write, Address, *IObuffer,

nbytes, CycleNow, UserData, *repl_address)– IObuffer, UserData and repl_address are usually NULL– See cache.h

• What it returns is a latency in cycles– Checks if hit– If miss, accesses L2 which in turn may access main memory– Look for il1_access_fn() and ul2_access_fn()

• An approximation:– No real, event-driven simulation of the memory system

• Careful, how one interprets the simulation result• I-TLB also simulated as a cache with few entries and constant,

still large miss latency• Cache does not hold memory data, only the tags of cached

blocks access memory to get insts (optimization be careful)

Page 8: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Branch Prediction Interface – bpred.[ch]

• bpred_lookup (*pred, PC, *target_address, opcode, Call?, Return?, *back-pointer for updates, *back-pointer for stack updates)

• Returns a Predicted PC– Can check whether it is taken or not by comparing with the

next sequential PC• Pred_PC = PC + sizeof (md_inst_t)

• Eventually, call bpred_update (*pred, PC, actual target_address, taken?, pred_taken?, opcode, back_pointer, stack back-pointer)– Can be called at writeback or commit

Page 9: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Fetch buffer: fetch_data[]struct fetch_rec {md_inst_t IR; Complete instructionmd_addr_t regs_PC; Current PCmd_addr_t pred_PC; Predicted PCstruct bpred_update_t dir_update;

bpred back-pointerint stack_recover_idx; stack back-pointerunsigned int ptrace_seq; print trace sequence id};• fetch_tail ruu_fetch writes there• fetch_head ruu_dispatch reads from there• fetch_num how many valid• ruu_ifq_num max entries

Page 10: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_fetch()

for (i=0, branch_cnt=0;

/* fetch up to as many instruction as the DISPATCH stage can decode */ i < (ruu_decode_width * fetch_speed) /* fetch until IFETCH -> DISPATCH queue fills */ && fetch_num < ruu_ifq_size /* and no IFETCH blocking condition encountered */ && !done; i++){ MAIN LOOP}

Done is used for enforcing fetch break conditionsCurrently this happens only when number of branches exceeds fetch_speed

Page 11: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_fetch() – Invalid Address Check

if (ld_text_base <= fetch_regs_PC && fetch_regs_PC < (ld_text_base+ld_text_size) && !(fetch_regs_PC & (sizeof(md_inst_t)-1))) { /* read instruction from memory */ MD_FETCH_INST(inst, mem, fetch_regs_PC);

Page 12: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_fetch() – I-Cache Access if (cache_il1)

/* access the I-cache */ lat = cache_access(cache_il1, Read, IACOMPRESS(fetch_regs_PC), NULL, ISCOMPRESS(sizeof(md_inst_t)), sim_cycle, NULL, NULL); if (lat > cache_il1_lat) last_inst_missed = TRUE; } if (itlb) tlb_lat = cache_access(itlb, Read, IACOMPRESS(fetch_regs_PC)

...lat = MAX(tlb_lat, lat);if (lat != cache_il1_lat)/* I-cache miss, block fetch until it is resolved */ ruu_fetch_issue_delay += lat - 1; break;

Page 13: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

sim_main() ruu_fetch() code

if (!ruu_fetch_issue_delay) ruu_fetch(); else ruu_fetch_issue_delay--;

Page 14: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_dispatch()• Get next inst from fetch buffer• Functionally execute the instruction• Split load/stores into

– 1. Address calculation– 2. Memory operation

• Rename input dependences• Rename target register• Place into scheduler RUU[] and load/store LSQ[]

scheduler if necessary• Determine if miss-prediction• Issue if ready

Page 15: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Functional and timing execution

• Ignore miss-predicts for the time being• Simplescalar executes all instructions in-order during

dispatch– They update registers and memory at that time

• Then it tries to determine when they would actually execute taking into consideration dependences and latencies

• This is simulation so we can do this– Pros: fast, easy to debug– Cons: timing model can be wrong and the simulation will not

produce incorrect results

Page 16: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Handling Miss-Predictions

• Two modes: correct & miss-speculated• ruu_dispatch switches to the 2nd when it decodes a

miss-predicted branch• Know about it because it executes the branch and figures

out whether the prediction is correct• Global “spec_mode” is 1 when in miss-speculated mode

• Switch back to correct when branch is resolved

Page 17: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Handling Miss-Predictions• Keep two states: correct and miss-speculated

– For regs there is regs_R[] and spec_regs_R[] (and _F)– For memory, there is mem_access and spec_mem_access– Speculative memory updates are kept in a temporary hash table

• Loads access this table first and then memory if needed• Stores only write to it when in spec mode

• If in correct state access the correct state• If in spec_mode access the miss-speculated state• Effect: No need to restore state

– Incorrect, speculative updates do not clobber the correct state

• When squashing we simply return to the correct state– i.e., disregard the spec. hash mem table.

Page 18: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_dispatch(): reading from fetch buffer

inst = fetch_data[fetch_head].IR; regs.regs_PC = fetch_data[fetch_head].regs_PC;

pred_PC = fetch_data[fetch_head].pred_PC;

dir_update_ptr = &(fetch_data[fetch_head].dir_update);

stack_recover_idx = fetch_data[fetch_head].stack_recover_idx;

pseq = fetch_data[fetch_head].ptrace_seq; ignore all pseqThey are for a “debugging/tracing” facility

Page 19: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Scheduler Structure• Circular buffer named RUU• Each entry contains

– The instruction, PC and pred_PC– Valid bits for input registers– A linked list of consumers per target register– Branch prediction back-pointers– Status flags, e.g., what state is this in, is it an address op

• An instruction can execute when all source registers are available: readyq in ruu_issue()

• On writeback: – walk target list and set bits of consumers and places them on

readyq if they become ready

Page 20: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Scheduler structure: RUU_stationstruct RUU_station md_inst_t IR; /* instruction bits */ enum md_opcode op; /* decoded instruction opcode */ md_addr_t PC, next_PC, pred_PC; /* inst PC, next PC, predicted PC */ int in_LSQ; /* non-zero if op is in LSQ */ int ea_comp; /* non-zero if op is an addr comp */ int recover_inst; /* start of mis-speculation? */ int stack_recover_idx; /* non-speculative TOS for RSB pred */ struct bpred_update_t dir_update; /* bpred direction update info */ int spec_mode; /* non-zero if issued in spec_mode */ md_addr_t addr; /* effective address for ld/st's */ INST_TAG_TYPE tag; /* RUU slot tag, increment to squash operation */ INST_SEQ_TYPE seq; /* used to sort the ready list and tag inst */ int queued; /* operands ready and queued */ int issued; /* operation is/was executing */ int completed; /* operation has completed execution */ int onames[MAX_ODEPS]; /* output logical names (NA=unused) */ struct RS_link *odep_list[MAX_ODEPS]; /* chains to consuming operations */ int idep_ready[MAX_IDEPS]; /* input operand ready? */ …

Page 21: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Scheduler State

• RUU[]: in-order instructions to be executed– Allocated at dispatch– Deallocated at commit or on squash (tracer_recover())

• RUU_head, RUU_tail, RUU_num, RUU_size• LSQ[]: in order loads and stores

– Same as above– Scheduling is done by comparing addresses– More on this soon

Page 22: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Determining Dependences

• ruu_link_idep(rs, /* idep_ready[] index */0, reg_name);• ruu_install_odep (rs, /* odep_list[] index*/0, reg_name);• Rename table: CREATE_VECTOR(reg_name)

– Returns pointer to RUU entry of producer or NULL if result is available

• Actual data type is CV_link (RUU_station *, next)• SET_CREATE_VECTOR(reg_name, RUU station)

– Make this RUU_Station the current producer of reg_name• Two copies of the create vector:

– Create_vector and spec_create_vector

Page 23: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Renaming Non-Load/Store Instructions

ruu_link_idep(rs, /* idep_ready[] index */0, in1); ruu_link_idep(rs, /* idep_ready[] index */1, in2); ruu_link_idep(rs, /* idep_ready[] index */2, in3);

ruu_install_odep(rs, /* odep_list[] index */0, out1);ruu_install_odep(rs, /* odep_list[] index */1, out2);

Page 24: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Renamind loads/storesruu_link_idep(rs, /* idep_ready[] index */0, NA);ruu_link_idep(rs, /* idep_ready[] index */1, in2);ruu_link_idep(rs, /* idep_ready[] index */2, in3);

ruu_install_odep(rs, /* odep_list[] index */0, DTMP);ruu_install_odep(rs, /* odep_list[] index */1, NA);

ruu_link_idep(lsq,/* idep_ready[] index */STORE_OP_INDEX/* 0 */,in1);ruu_link_idep(lsq, /* idep_ready[] index */STORE_ADDR_INDEX/* 1 */, DTMP);ruu_link_idep(lsq, /* idep_ready[] index */2, NA);

ruu_install_odep(lsq, /* odep_list[] index */0, out1);ruu_install_odep(lsq, /* odep_list[] index */1, out2);

Page 25: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_idep_link (rs, idep_num, idep_name)

struct CV_link head; struct RS_link *link;

if (idep_name == NA)rs->idep_ready[idep_num] = TRUE, return;

head = CREATE_VECTOR(idep_name);

if (!head.rs) rs->idep_ready[idep_num] = TRUE, return;

rs->idep_ready[idep_num] = FALSE;

RSLINK_NEW(link, rs); link->x.opnum = idep_num;link->next = head.rs->odep_list[head.odep_num];head.rs->odep_list[head.odep_num] = link;

Page 26: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

CREATE_VECTOR(N): Register Rename Table Read

(BITMAP_SET_P(use_spec_cv, CV_BMAP_SZ, (N)) ? spec_create_vector[N] : create_vector[N])

use_spec_cv(N) is set when we rename the target register N while in spec_mode

It is a bit vector: one bit per register

Page 27: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_install_odep(rs, odep_num, odep_name) struct CV_link cv;

if (odep_name == NA)rs->onames[odep_num] = NA, return;

rs->onames[odep_num] = odep_name;

rs->odep_list[odep_num] = NULL;

/* indicate this operation is latest creator of ODEP_NAME */ CVLINK_INIT(cv, rs, odep_num); SET_CREATE_VECTOR(odep_name, cv);

Page 28: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

SET_CREATE_VECTOR(odep_name, cv)

• Set the current producer of register odep_name to the RUU entry stored in the cv

SET_CREATE_VECTOR(N, L)If (spec_mode)

BITMAP_SET(use_spec_cv, CV_BMAP_SZ, (N)

spec_create_vector[N] = (L))

else

(create_vector[N] = (L)))

No need to keep old mapping around since we never have to restore

Page 29: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_dispatch(): determining ready to issue insts if (OPERANDS_READY(rs)) { /* eff addr computation ready, queue it on ready list */ readyq_enqueue(rs); } /* issue may continue when the load/store is issued */ RSLINK_INIT(last_op, lsq); // for in-order simulation

/* issue stores only, loads are issued by lsq_refresh() */ if (((MD_OP_FLAGS(op) & (F_MEM|F_STORE)) == (F_MEM|

F_STORE)) && OPERANDS_READY(lsq)) {

/* put operation on ready list, ruu_issue() issue it later */ readyq_enqueue(lsq); }

Page 30: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Miss-Prediction Detection

if (MD_OP_FLAGS(op) & F_CTRL) sim_num_branches++;

if (pred && bpred_spec_update == spec_ID)update predictor if configured for spec. updates

if (pred_PC != regs.regs_NPC && !fetch_redirected) spec_mode = TRUE;

rs->recover_inst = TRUE; recover_PC = regs.regs_NPC;

Page 31: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_issue(): Dynamic scheduling of non loads/stores

• Walk the readyq • Try to get resources (FUs)• Get latency of execution• Put an entry into the event_q for the completion time• If cannot execute place back into readyq

• Eventq is serviced by ruu_writeback

Page 32: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Who places instructions in readyq?• In readyq means the instruction is ready to issue• From dispatch:

– Non-load/store if all sources are available• This includes the address component of lds/sts

– Stores if data is available. Recall address computation is separate “instruction”

• From writeback:– Producer writes last result a consumer waits for

• From lsq_refresh– Called every cycle: Load is ready

• Address is know, all preceding store addresses known and there is no conflict with unavailable store data

Page 33: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_issue(): main loop

• Get next entry from readyq• If still valid (RSLINK_VALID(rs)) try to execute• If store complete instantaneously nothing to produce• fu = res_get (fu_pool, MD_OP_class (rsop)

– Get functional unit for instruction based on operation• Get latency of execution

– For loads access data cache and tlb• Queue event in eventq for completion (ruu_writeback)

– eventq_queue_event(rs, sim_cycle + latency);• If cannot execute place back in readyq

Page 34: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_issue(): Loads

• Get mem port resource• Scan LSQ for matching preceding store

– For this to be executing it must be that if there is a matching store then it has its data

– This is called store-load forwarding• If no match, access cache_dl1 and dtlb• Get latency to be the max of the two

Page 35: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_issue(): High-Level Structure• Temporary list node= readyq; readyq = NULL• So long as there are issue slots available

Get next element from node– If still valid

• Try to get resource– Determine latency– Schedule eventq event

• Place back in readyq• Place remaining nodes back into readyq

(readyq_enqueue() sorted by latency and age)• Order in readyq implicit issue priority

Page 36: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

lsq_refresh(): Placing loads into readyq

• LSQ uses same elements as RUU• Scheduling is done based on addr field and availability

of operands• Scan forward (LSQ_head, counting to LSQ_num)

– If store• Stop if address is unknown loads after it should wait• If data unavailable record address in std_unknowns

– Loads that need this data should wait– If Load and all register ops are ready

• Scan std_unknowns for match • Place in readyq if no match

Page 37: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

lsq_refresh(): stores

if (!STORE_ADDR_READY(&LSQ[index]))break;

else if (!OPERANDS_READY(&LSQ[index]))std_unknowns[n_std_unknowns++] = LSQ[index].addr;

else /* STORE_ADDR_READY() && OPERANDS_READY() */ /* a later STD known hides an earlier STD unknown */ for (j=0; j<n_std_unknowns; j++)

if (std_unknowns[j] == /* STA/STD known */LSQ[index].addr) std_unknowns[j] = /* bogus addr */0;

Page 38: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

lsq_refresh(): Loads if (/* load? */ ((MD_OP_FLAGS(LSQ[index].op) & (F_MEM|F_LOAD)) ==

(F_MEM|F_LOAD)) && /* queued? */!LSQ[index].queued && /* waiting? */!LSQ[index].issued && /* completed? */!LSQ[index].completed && /* regs ready? */OPERANDS_READY(&LSQ[index]))

for (j=0; j<n_std_unknowns; j++)if (std_unknowns[j] == LSQ[index].addr)

break; if (j == n_std_unknowns)

/* no STA or STD unknown conflicts, put load on ready queue */ readyq_enqueue(&LSQ[index]);

Page 39: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_writeback(): Producer notifies consumers• Get next event from eventq• If this is a recover instruction

– Squash all that follows• Ruu_recover, tracer_recover() & bpred_recover()

• If branch update predictor• Update rename table if still the creator

– rs->spec_mode determines which one– Subsequent consumers can get result from register file

• Walk output dependence lists– If link still valid– Set idep_ready flags– If consumer becomes ready place on readyq ruu_issue()

Page 40: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Recovering from Miss-Predictions

• rsrecover_inst as set by ruu_dispatch writesback• ruu_recover()

– From the end of RUU• Clean up output dependence lists freeing RSLinks• Same for LSQ entry if it exists (1-to-1 correspondence

with RUU entries that have rsea_comp set)• rstag++ (invalidate all RSLinks to this RUU, could be

that we linked to producer that will not be squashed)• Clear use_spec_cv (create vector)

Page 41: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

tracer_recover()

• Clear use_spec_R etc.– Bitmaps indicating where register values are– Set when writing to register file in spec_mode

• Cleanup speculative memory store state• Reset fetch stage by emptying fetch_data

– Fetch_tail = fetch_head = fetch_num = 0

• For bpred_recover look into bpred.c

Page 42: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

ruu_commit()

• Scan starting from the oldest inst in RUU (RUU_head)• If completed then try to commit• If store get memory port and write to memory

– Fail if can’t get resource– Does not simulate writebuffer– Access data cache

• If load/store release LSQ entry• If branch update predictor if so configured• Release RUU entry

Page 43: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

How is sim-outorder structured

fetch disp. sched. exec WB commit

memsched.

mem memI-cache

L1I-TLB

U-cacheL1

D-cacheL1

D-TLB

Main MemoryVirtual

bpred

Page 44: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

fetch_data[]

IR regs_PC pred_PC bpred ptrsfetch_tail

ruu_fetch()inser

t

fetch_head

ruu_

ifq_

size

fetch_num

ruu_dispatch()

remove

ruu_writebacktracer_recover

Flush

When resolving

Miss-predicted

branch

Page 45: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

struct RUU_station RUU[WINDOW]

RUU_tail

RUU_head

RU

U_s

ize

RUU_num

ruu_dispatch()inser

t

ruu_commit()

remove

ruu_writebackruu_recover

Flush

When resolving

Miss-predicted

branch

Page 46: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

struct RUU_station Scheduling Related Entries

idep_ready[0]

idep_ready[1]

Input ready flags

onames[0]

idep_ready[2]

odep_list[0]next

&RUU[cosumer]

tag

x.opnum

consumer list

onames[1]

odep_list[1]

tag

All must be 1 to be ready

Output Registers

Unique ID

ruu_dispatch()

set

ruu_writeback()W

alk and free

set

ruu_writebackruu_recover

invalidate

ruu_link_idep

set

ruu_install_odep

set

struct RS_link

Page 47: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

LSQ: Load/Store Scheduler

• Same as RUU

LSQ_tail

LSQ_head

LS

Q_s

ize

LSQ_num

ruu_dispatch()inser

t

ruu_commit()

remove

ruu_writeback()ruu_recover

Flush

When resolving

Miss-predicted

branch

Page 48: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Register Renaming Structures

*rs or *lsq

reg 1create_vector

opnum (0 or 1)

*rs or *lsq

opnum (0 or 1)

reg 2*rs or *lsq

opnum (0 or 1)

reg N

*rs or *lsqspec_create_vector

opnum (0 or 1)

*rs or *lsq

opnum (0 or 1)

*rs or *lsq

opnum (0 or 1)

use_spec_cv

WhichVector to use

Link to RUU and output reg

ruu_dispatch()

set

ruu_install_odep

set when in spec_mode

ruu_writeback()

Poin

t to

RF

ruu_recover

rese

t

0

Page 49: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Register State e.g., reg_R[]

value

reg 1regs.reg_R value

reg 2value

reg N

use_spec_RWhichReg to use

ruu_dispatch()

set when in spec_mode

ruu_writeback()

tracer_recover

rese

t0

valuespec_reg_R value value

Set during

functional simulation

Page 50: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Ready Queue

*rs or *lsq

next

tag

readyq

*rs or *lsq

next

tag

ruu_dispatch()

Insert non-loads if ready

ruu_writeback()

Insert non-loads if ready

lsq_refresh()

Insert loads

ruu_issue()

Remove and tryto execute

RS_link

Page 51: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

x.when

tag

Event Queue

*rs or *lsq

next

x.when

eventq

*rs or *lsq

next

ruu_issue()

Insert at sim_cycle + latency

ruu_writeback()

Remove upon completion

RS_link

tag

Page 52: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Summary of Concepts/Interfaces

• ruu_fetch to ruu_dispatch via fetch_data buffer• ruu_dispatch executes instructions in order

– Breaks load/store into addr and memory op– Links to producer of input regs– Renames output reg to RUU or LSQ– Determines if entering in miss-prediction mode

• Marks inst via rs->recover inst– Two states: miss-speculated and corrected (reg files,

memory, rename tables, etc.)– May place insts in readyq if ready

Page 53: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Summary contd.• ruu_issue:

– Scan readyq trying to issue– Insts in readyq?

• ruu_dispatch: non-loads if inputs are ready• lsq_refresh: loads when certain that there are no conflicts• ruu_writeback: producer places consumers if they

become ready– Get fu, get latency, schedule event for writeback\

• lsq_refresh– When loads can issue– Wait until all preceding stores calculate their address– Stall if conflict with store that has no data

Page 54: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Summary contd.

• ruu_writeback:– Producer notifies consumers of result– Determines if producer is ready and places in readyq– Updates rename tables to indicate that the result is now in the

register file– Calls recovery routines if this is a recover instruction (first

miss-predicted)• ruu_commit:

– Perform Stores– Release RUU and LSQ entry

Page 55: ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit  for additional

ECE ECE1773 Spring ‘02© A. Moshovos (Toronto)

Caveats

• Simplescalar uses optimizations to optimize for simulation speed

• Does not simulate an event driven memory system• Be careful to make sure that you use it appropriately