44
Memory Dependence Prediction Andreas Moshovos [email protected] www.ece.nwu.edu/~moshovos Electrical And Computer Engineering Department Northwestern University ECE Dept., Purdue University, Sept. 9, 1999

Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence PredictionAndreas Moshovos

[email protected]

www.ece.nwu.edu/~moshovos

Electrical And Computer Engineering Department

Northwestern University

ECE Dept., Purdue University, Sept. 9, 1999

Page 2: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 2A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Exploi t ing Program BehaviorBuilding faster/better machines:

1. Faster Circuits -> Faster Execution

2. Be “smarter” about program execution:

Exploit Idiosynchrasies in Program Behavior

Example? Caches

Idiosyncrasies in access pattern

Idea: Use small/fast storage for recently accessed data.

What to do next? One Possibility is:

Identify/Exploit Other Idiosyncrasies in Typical Program Behavior

This might be the right time! Technology and Existing Knowledge.

Page 3: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 3A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Memory Dependences are Qui te Regular• Identified a new form of regularity:

Memory Dependence Locality

1. Load/Store has a Dependence?

2. Which Dependence a Load/Store has?

(loadPC, storePC)

Opportunity to Exploit this Regularity

How to exploit? Need Techniques

Tim

e

load

store

load

store

for i

a[i] = a[i - 1] + 1

store load

Page 4: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 4A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Memory Dependence Predict ion

1. Load/Store has a Dependence?

2. Which Dependence a Load/Store has?

Past Behavior ->

Good Indicator of Future Behavior

Basis for Three Micro-Architectural Techniques:

1. Exploit Load/Store Parallelism

2. Reduce Memory Latency

3. Memory Bandwidth Amplification

Memory is a major and ever increasing bottleneck

Guess:

How?

load

store

load

store

1. Observe

2. Predict

GO

AL

Page 5: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 5A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Roadmap

• Exploiting Load/Store Parallelism

- Two directions for improving memory performance

- Load/Store Parallelism

- Speculation/Synchronization

- Performance

• Speculative Memory Cloaking and Bypassing

• Transient Value Cache

• Other Applications

• Future Directions: Slice Prediction & Slice-Processors

Page 6: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 6A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

In Search of Higher Performance #1

Timeline

Min

imiz

e

send load

load

Instruction

address

value

Sequence

Memory

1. Higher Performance: Memory Responds Faster

computationstarts

Page 7: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 7A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

#1. Making Memory Respond Faster

OK, we did the best we could, but ...

...memory is still not that fast

...and it is getting slower

Is this the end?

L1

L2

Main Memory

Ideally: Memory is Large and Fast

Can have it! Technology - Cost trade-off

Solution: Memory Hierarchy

slowerlarger

cheaper

Page 8: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 8A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

In Search of Higher Performance #2

2. Higher Performance:

Send Load Request As Far In Advance As Possible

But, will the program still run correctly?

A. Can we ever move loads up? B. How do we do it?

Timeline

Max

imiz

e

send load

startsload

old position

old position

load

OriginalSequence

ModifiedSequence

computation

Page 9: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 9A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

A. Can We Ever Move Loads Up?

loadA

Instruction Sequence

loadA

load Aload

Aload

A

A and load can execute in any order

if load does not use a value produced by A

dependence

load ..., [r3 + 5]r1 = r2 + 10

load ..., [r1 + 5]r1 = r2 + 10

Valid Execution Order

parallelism

Page 10: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 10A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

B. How To Move Loads Up?Instruction Level Parallel Processors:

load

grab a chunkof instructions

Step #1 Step #2

find thedependences

load

Step #3

load

execute

Instruction Window

memorylatency

depe

nden

ces

Use Parallelism to Tolerate Memory Latency

Inspect Names

Page 11: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 11A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Moving Loads Past Stores

load

store

100

???

Use Of Addresses Hinders Parallelism

load

store

Instructions

depe

nden

ce?

load addr

store addr. calc.

load addr. calc.

➔ Limits Us in Our Effort to Tolerate Memory Latency

store addrload executes

Timeline

Page 12: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 12A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Naive Memory Dependence Speculat ion• Don’t give up, be optimistic, guess no dependences exist

• State-of-the-art in modern processors

Instructions

store addr

Timeline

depe

nden

ce?

load re-executes if yes

Guess no: load executes

dependence?

penalty

Need to Balance: Gain vs. Penalty

load

store

load addr

Page 13: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 13A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Dependence Speculat ion and Performance

CA

BC

load

Program

storeC

load

loadCB

D

AB

D

No Dependence DependenceOrder

D

store

ABC

load

D store

No Speculation Speculation

Speculation may affect performance either way

free load

AB

store D

Penalty:

free

Balance: Gain vs. Penalty

(b) opportunity cost

(a) work thrown away

Page 14: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 14A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

How About Future Systems?

Future Systems: Wish Dependences Were Known

Soon • Memory will be slower

➔ Need to move loads further up

Guessing Naively:

Penalty becomes significant

Loads — Ideal Behavior:

No Dependence: execute at will

Dependence: Synchronize w/ store

Common Case:

Today

Page 15: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 15A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Speculat ion/Synchronizat ionLearn from mistakes:

1. Start with Naive Speculation

2. Remember mispeculations and predict next time the same will happen

- Q1? Which loads should wait

- Q2? How long

Best performing scheme

- A1: Loads w/ dependences

- A2: Synchronize w/ appropriate store

Page 16: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 16A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

How it works

LD

① Mispeculation

② Allocate entry

LDPC STPC PRED

① Execute?

② No! Wait

LD

① Synchronize?② Resume

ST

LDPC STPC PRED LDPC STPC PRED

STPC LDPC

LD

LDPCSTPCLDPC

LDST

Mem. Dependence Prediction TablePredict Loads w/ DependencesST

LD

LD

STLD

1

2

3

1 2 3

Mem. Dependence Synchronization Table

LDPC STPC 0 1

F/EVLDPC STPC 0 1

F/EVMDST

MDPT

Enforce Synchronization

Page 17: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 17A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Memory Dependence Speculat ion/Synchronizat ion

Timeline

1. Predict load

Allocate Sync. bit

2. Predict store

Wait on Sync. bit

3. Store Signals

Load executes

Correct Prediction: Loads wait only as long as it is necessaryIncorrect: Same as Naive or Delay

store addr

load addr

store

load

1

2

3

A. Predict Dependence

2. Load executes

3. Store verifies

B. Predict No Dependence

Page 18: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 18A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Speedup over Naive Speculat ion

0%

20%

40%

60%

80%

ORACLE 2 STORES/LOAD CENTRALIZED SYNONYMS09

912

412

612

913

013

213

414

710

110

210

310

410

711

012

514

114

514

6

256-Entries4K MDPT64 MDST

better

Page 19: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 19A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Evaluat ion - Superscalar1. Can loads inspect Store addresses before obtaining a value?

Requires an address-based load/store scheduler (ABS)

2. Is speculation used?

• Speculation a win (naive over no-speculation):

No ABS: ~29% int, ~113% fp

mispeculations are an issue

ABS: 4.6% int, 5.3% fp (0-cycle scheduling latency)

virtually no mispeculations

• Next:

1. Compare “Oracle + no ABS” with “naive + ABS”

2. Speculation/Synchronization on “no Abs” close to Oracle

Speculation/Synchronization as an Alternative to ABS

Page 20: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 20A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

“Oracle + no ABS” vs. “Naive + ABS”

099

124

126

129

130

132

134

147

101

102

103

104

107

110

125

141

145

146-20%

-10%

0%

10%

20%

30%

40%

better

NAIVE 0-CYCLES

NAIVE 1-CYCLE NAIVE 2-CYCLES

ORACLE

Base case: “ABS + no-speculation”

“Oracle + no ABS” close to “ABS + naive”

Potential for Spec./Sync.

Page 21: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 21A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Speculat ion/Synchronizat ion

0%

1%

2%

3%

09912

412

612

913

013

213

414

710

110

210

310

410

711

012

514

114

514

6

better

“Ora

cle

+ n

o A

BS

” ov

er S

pec/

Syn

c

On the average within 0.93% (int) and 1.001% (fp) of oracle

Page 22: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 22A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Summary

• Improving Processor/Memory Performance:

move loads up in time

• Addresses Hinder Parallelism

• Memory Dependence Speculation/Synchronization

Predict dependences and move loads up in time

Classify loads in two categories:

1. No Dependence: Execute at will

2. Dependence: Synchronize with specific store

Page 23: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 23A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Related Work

• In the context of Dynamically Scheduled Processors

1. J. H. Hesson, J. LeBlanc, S. J. Ciavaglia, IBM Patent,‘97

Store Barrier

2. G. Chrysos, J. Emer (DEC), Superscalar Env., ISCA ‘98

This work: UW Tech. Report, March ‘96, ISCA ‘97

A. Moshovos, S. Breach, T. N. Vijaykumar, G. Sohi

• In the context of Statically Scheduled Processors

1. Static Disambiguation

2. Speculation in Software

• Dynamic Compilation

3. Software/Hardware Hybrids

Page 24: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 24A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Roadmap

• Exploiting Load/Store Parallelism

• Speculative Memory Cloaking and Bypassing

- Memory as a communication mechanism

- Speculative Memory Cloaking

- Other uses of Memory

- Evaluation

• Transient Value Cache

• Other Applications

• Future Directions: Operation Prediction & Slice-Processors

Page 25: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 25A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Memory

Memory as a Communicat ion Mechanism

load

addrcalc

address

address

store

Memory can be an inter-operation communication mechanism

Communication Specification: implicit via Addresses

Is this the best way in terms of performance?

addrcalc

value

Page 26: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 26A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Memory Communicat ion Speci f icat ion - L imitat ions

del

ay

store value

store addr

load

load addr

Timeline

Store - Load: Direct Link

store2 addr

Implicit

1. Calculate address2. Establish Dependence

?

addr

ess

valu

e

Memory

Pro

gram

Ord

er store

load

store 2

Instructions

Explicit

Delays: No Delays

Implicit

Explicit

address

Page 27: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 27A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Predict ing Memory Dependences1. Build Dependence history: Dependence Detection Table

2. Use history to predict forthcoming dependences

assign synonyms to detected dependences

use synonym to locate value

Record: (store PC, address) Loads: (load PC, address)

➔ (store PC, load PC)

Page 28: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 28A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Dynamically & Transparently convert implicit into explicit

• Dependence prediction ➔ direct store-load or load-load links

• Speculative and has to be eventually verified

Speculat ive Memory Cloaking

store PC synonym

Dependence Prediction

MemoryHierarchy

Traditional

value f/e

Synonym File

address

address

1

2

3

4speculative

verify

value

Timeline

store value

addr

addr

load

load PC synonym

Page 29: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 29A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

synonym TAG1

• Extents over multiple store-load dependences

• DEF and USE must co-exist in the instruction window

Speculat ive Memory Bypassing

store R1

USE R2

load R2

1

2

R2 TAG1 TAG2

3

4

R1 TAG1

Observe: Store and Load are used to just pass values

USE R2

Takes load-store or loads off the communication path

load R2store R1

DEF R1DEF R1

Page 30: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 30A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Speculat ive Memory Cloaking/BypassingAddress-based Memory as: Storage vs. Interface

1. What is memory used for?

2. How addresses impact the action?Ask:

LOAD RY

USE RY

CloakingLOAD RZ

USE RY

USE RZ

Cloaking

Bypas

sing

DEF RX

STORE RX

LOAD RY

Bypassing

Inter-operation Communication Data-Sharing

registeraddress

Memory

Dynamically Create Direct Links Between Producers/Consumers

Page 31: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 31A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Cloaking Coverage

10110

210

310

410

711

012

514

114

514

6

DATA STACK HEAP

0%

20%

40%

60%

80%

100%

09912

412

612

913

013

213

414

7

Most Dependences Correctly Predicted

Page 32: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 32A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Cloaking - Mispeculat ion Rates

• Relatively low mispeculation rates

• Causes:

Unstable Dependences - Recursive Functions

099

124

126

129

130

132

134

147 10

110

210

310

410

711

012

514

114

514

60%

1%

2%

3%

4%

5%

6%

0.0%

0.2%

0.4%

0.6%

0.8%

1.0%

DATA STACK HEAP

Page 33: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 33A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Performance - Select ive Inval idat ion

099

124

126

129

130

132

134

147

10110

210

310

410

711

012

514

114

514

6

Oracle Selective

0%

2%

4%

6%

8%

10%

12%

14%

Page 34: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 34A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Summary

• Address-based memory ➔ unnecessary delays

• Used Memory Dependence Prediction to:

1. Improve Store-Load and Load-Load Communication

address-based ➔ dependence-based

2. Take Store-Load off the Communication path

Treat Program as a specification of what to do

Not as also of how to do it.

Page 35: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 35A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Increasing Memory Bandwidth• Parallelism ➔ more bandwidth ➔ more L1 ports

L1

CPU

L1

CPU

• A very small cache is easier to multi-port

• But, it increases the latency for all accesses that miss in it

L1

CPU

L0

Page 36: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 36A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Transient Value Cache

Observe:A. Often RAW or RAR exists with recent store or loadB. Many recent stores are killed

+ Small cache can service these

- Latency for other loads will increase

Yes - in series No - in parallel

CPU

TVC

L1

CPU

TVC

L1

Avoid L1 access Avoid latency

dependence?

A & B / Dependence Status is predictableC.

L1 DCache Bandwidth/Port Requirements are Reduced

• Support for multiple/simultaneous Load/Store Requests

Few Loads Observe a Latency Increase

Page 37: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 37A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

TVC vs. L0 - Hi t Rates

0%

20%

40%

60%

80%

100% TVC

L0

099

124

126

129

130

132

134

147

10110

210

310

410

711

012

514

114

514

6

Page 38: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 38A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

TVC vs. L0 - “Miss” Rates

099

124

126

129

130

132

134

147

10110

210

310

410

711

012

514

114

514

6

0%10%20%30%40%50%60%70%

TVC

L0

0%1%2%3%4%5%6%7%8%

Page 39: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 39A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Other Appl icat ions• Transient Value Cache:

Bandwidth amplification

• Prefetching of recursive data structures

Load-to-Load-EAC dependences

• Virtual Function Call Precomputation

Surgical extension to previous one

Page 40: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 40A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Current Research• Cloaking for Multiprocessors

- Prediction in coherence protocol

• Slice Prediction & Slice Processors

- Use the Program to Predict its actions

Identify Performance critical computation slices

Run them speculatively in advance

Annotate program with predicability information

• Embedded Processor Optimizations

• Multi-media application evolution

• Self-micro-reconfigurable processors

- Run-time extraction of convertable slices

- Interfaces with OOO

Page 41: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 41A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Future Direct ions• Prediction is becoming prevalent

- Branch Prediction

- Value Prediction

- Dependence Prediction

- Coherence Prediction

Thus Far:

Perform Computation as Specified but as fast as possible

Moving toward:

A predict-and-verify model of computation

Q1? How a processor should look like then?

Q2? How to predict accurately?

Page 42: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 42A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Current Predictors• Treat Program as a black box

• Observe outcomes

• Guess: soon predictors of this kind will be very accurate

say 90% or more

• How about the rest 10%?

Amdahl’s law -> may dominate performance

• Slice prediction: execute SCOUT threads in parallel

1. Predict performance critical computation slice

2. Execute as far in advance as possible

Page 43: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 43A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Current Predictors• Treat Program as a black box

• Build an approx. representation of program’s function

(PC1, taken)(PC2, not-taken) (PC1, taken)

“INPUT” “OUTPUT”

f()?

• Guess: soon predictors of this kind will be very accurate

say 90% or more

• How about the rest 10%?

Amdahl’s law -> may dominate performance

program

Page 44: Memory Dependence Predictionmoshovos/research/talk.pdf · 2000-11-28 · A. Moshovos Memory Dependence Prediction 3 Permission to use these slides is granted provided that a reference

Memory Dependence Prediction 44A. Moshovos

Permission to use these slides is granted provided that a reference to their origin is included.

Sl ice Processors

while (list)

if (list->data == KEY)

foo0(list)

else...

foo1(list)

list = list->next

Time

slices

predict

• But program itself is a representation of f()!

• USE PROGRAM TO PREDICT IT’S ACTIONS

list = list->next

list->data == KEY

then branch else branch