Upload
morgan-mccoy
View
214
Download
1
Embed Size (px)
Citation preview
CS 7810 Lecture 8
Memory Dependence Prediction using Store Sets
G.Z. Chrysos and J.S. EmerProceedings of ISCA-25
1998
Lifetime of a Load
LSQ Basics
Ld/St Address Data Completed
Store Unknown 1000 --
Load x40000000 -- --
Store x50000000 -- --
Load x50000000 -- --
Load x30000000 -- --
• An incomplete store stalls all future loads – No Speculation – the paper is overly conservative because it also waits for store values
• Most of these stalls are unnecessary – artificial dependences
Aggressive Approach
• Assume that loads do not conflict with earlier stores – all loads and stores execute out of order -- Naive Speculation
• When there is a conflict, the load behaves like a branch mispredict – all subsequent instructions are squashed and re-fetched
Expensive – 30-cycle penalty Rename checkpoints for all instructions Re-execute only the dependent instructions? – more complex, better performance
Ideal Model
• In the perfect model, loads only wait for conflicting stores – no artificial dependences and no memory-order violations
False Dependences and Violations
Store Sets Concept
• For every load, keep track of all stores that it has conflicted with in the past
• A load does not issue if members of its store set have not finished (dependences are introduced at the time of dispatch)
• The implementation is easy if a load depends on only one store a store is present in only one store set
Trivial Implementations
• Execution time normalized to an ideal store set implementation
Ideal Store Set Predictor
• An occasional memory-order violation can introduce many false dependencies – hence, use saturating counters
Implementation Overview
• Every ld/st depends on the last store in its set
• Causes serialized stores and false dependences
st
ststst
st
Store Set Implementation
• Every load and store belong to one color – keep track of the last writer for each color – mpreds can pose problems
• Colors are merged as you discover m-o violations
Store Set Merging
• Store set merging improves performance by 12%• Note that merging happens gradually – no need to instantly correct all entries in the table
Design Details
• Merging store sets
• To deal with occasional dependences and conflicts clear the table every million cycles use saturating counters for each entry
• The SSIT needs 4K entries and the LFST needs 128 entries
Results
Related Work
• Store barrier cache: identify stores that are likely to pose conflicts
• Keep track of all store-load conflict pairs and associatively check for dependences while dispatching instructions
Next Week’s Paper
• “Effective Hardware-Based Prefetching for High-Performance Microprocessors”, T.F. Chen and J.L. Baer, IEEE Transactions on Computers, May 1995
Title
• Bullet