17
CS 7810 Lecture 8 Memory Dependence Prediction using Store Set G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Embed Size (px)

Citation preview

Page 1: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

CS 7810 Lecture 8

Memory Dependence Prediction using Store Sets

G.Z. Chrysos and J.S. EmerProceedings of ISCA-25

1998

Page 2: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Lifetime of a Load

Page 3: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

LSQ Basics

Ld/St Address Data Completed

Store Unknown 1000 --

Load x40000000 -- --

Store x50000000 -- --

Load x50000000 -- --

Load x30000000 -- --

• An incomplete store stalls all future loads – No Speculation – the paper is overly conservative because it also waits for store values

• Most of these stalls are unnecessary – artificial dependences

Page 4: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Aggressive Approach

• Assume that loads do not conflict with earlier stores – all loads and stores execute out of order -- Naive Speculation

• When there is a conflict, the load behaves like a branch mispredict – all subsequent instructions are squashed and re-fetched

Expensive – 30-cycle penalty Rename checkpoints for all instructions Re-execute only the dependent instructions? – more complex, better performance

Page 5: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Ideal Model

• In the perfect model, loads only wait for conflicting stores – no artificial dependences and no memory-order violations

Page 6: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

False Dependences and Violations

Page 7: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Store Sets Concept

• For every load, keep track of all stores that it has conflicted with in the past

• A load does not issue if members of its store set have not finished (dependences are introduced at the time of dispatch)

• The implementation is easy if a load depends on only one store a store is present in only one store set

Page 8: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Trivial Implementations

• Execution time normalized to an ideal store set implementation

Page 9: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Ideal Store Set Predictor

• An occasional memory-order violation can introduce many false dependencies – hence, use saturating counters

Page 10: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Implementation Overview

• Every ld/st depends on the last store in its set

• Causes serialized stores and false dependences

st

ststst

st

Page 11: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Store Set Implementation

• Every load and store belong to one color – keep track of the last writer for each color – mpreds can pose problems

• Colors are merged as you discover m-o violations

Page 12: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Store Set Merging

• Store set merging improves performance by 12%• Note that merging happens gradually – no need to instantly correct all entries in the table

Page 13: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Design Details

• Merging store sets

• To deal with occasional dependences and conflicts clear the table every million cycles use saturating counters for each entry

• The SSIT needs 4K entries and the LFST needs 128 entries

Page 14: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Results

Page 15: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Related Work

• Store barrier cache: identify stores that are likely to pose conflicts

• Keep track of all store-load conflict pairs and associatively check for dependences while dispatching instructions

Page 16: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Next Week’s Paper

• “Effective Hardware-Based Prefetching for High-Performance Microprocessors”, T.F. Chen and J.L. Baer, IEEE Transactions on Computers, May 1995

Page 17: CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

Title

• Bullet