35
1 Instruction Based Memory Distance Analysis and its Application to Optimization Changpeng Fang Steve Carr Soner Önder Zhenlin Wang

Instruction Based Memory Distance Analysis and its Application to Optimization

  • Upload
    aleron

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Instruction Based Memory Distance Analysis and its Application to Optimization. Changpeng Fang Steve Carr Soner Önder Zhenlin Wang. Motivation. Widening gap between processor and memory speed memory wall Static compiler analysis has limited capability regular array references only - PowerPoint PPT Presentation

Citation preview

Page 1: Instruction Based Memory Distance Analysis and its Application to Optimization

1

Instruction Based Memory Distance Analysis and its Application to

Optimization

Changpeng FangSteve Carr

Soner ÖnderZhenlin Wang

Page 2: Instruction Based Memory Distance Analysis and its Application to Optimization

2

Motivation Widening gap between processor and memory speed

memory wall Static compiler analysis has limited capability

regular array references only index arrays integer code

Reuse distance prediction across program inputs number of distinct memory locations accessed between two

references to the same memory location applicable to more than just regular scientific code locality as a function of data size predictable on whole program and per instruction basis for

scientific codes

Page 3: Instruction Based Memory Distance Analysis and its Application to Optimization

3

Motivation

Memory distance A dynamic quantifiable distance in terms of memory

reference between tow access to the same memory location.

reuse distance access distance value distance

Is memory distance predictable across both integer and floating-point codes?

predict miss rates predict critical instructions identify instructions for load speculation

Page 4: Instruction Based Memory Distance Analysis and its Application to Optimization

4

Related Work Reuse distance

Mattson, et al. ’70 Sugamar and Abraham ’94 Beyls and D’Hollander ’02 Ding and Zhong ’03 Zhong, Dropsho and Ding ’03 Shen, Zhong and Ding ’04 Fang, Carr, Önder and Wang ‘04 Marin and Mellor-Crummey ’04

Load speculation Moshovos and Sohi ’98 Chyrsos and Emer ’98 Önder and Gupta ‘02

Page 5: Instruction Based Memory Distance Analysis and its Application to Optimization

5

Background Memory distance

can use any granularity (cache line, address, etc.) either forward or backward represented as a pattern

Represent memory distance as a pattern divide consecutive ranges into intervals we use powers of 2 up to 1K and then 1K intervals

Data size the largest reuse distance for an input set characterize reuse distance as a function of the data size Given two sets of patterns for two runs, can we predict a

third set of patterns given its data size?

Page 6: Instruction Based Memory Distance Analysis and its Application to Optimization

6

Background

Let be the distance of the ith bin in the first pattern and be that of the second pattern. Given the data sizes s1 and s2 we can fit the memory distances using

Given ci, ei, and fi, we can predict the memory distance of another input set with its data size

1id

2id

11

22

( )

( )

i i i i

i i i i

d c e f s

d c e f s

Page 7: Instruction Based Memory Distance Analysis and its Application to Optimization

7

Instruction Based Memory Distance Analysis

How can we represent the memory distance of an instruction?

For each active interval, we record 4 words of data• min, max, mean, frequency

Some locality patterns cross interval boundaries• merge adjacent intervals, i and i + 1, if

• merging process stops when a minimum frequency is found• needed to make reuse distance predictable

The set of merged intervals make up memory distance patterns

1min max max mini i i i

Page 8: Instruction Based Memory Distance Analysis and its Application to Optimization

8

Merging Example

Page 9: Instruction Based Memory Distance Analysis and its Application to Optimization

9

What do we do with patterns?

Verify that we can predict patterns given two training runs

coverage accuracy

Predict miss rates for instructions Predict loads that may be speculated

Page 10: Instruction Based Memory Distance Analysis and its Application to Optimization

10

Prediction Coverage

Prediction coverage indicates the percentage of instructions whose memory distance can be predicted

appears in both training runs access pattern appears in both runs and memory distance

does not decrease with increase in data size (spatial locality)

• same number of intervals in both runs Called a regular pattern

For each instruction, we predict its ith pattern by curve fitting the ith pattern of both training runs applying the fitting function to construct a new min, max

and mean for the third run Simple, fast prediction

Page 11: Instruction Based Memory Distance Analysis and its Application to Optimization

11

Prediction Accuracy

An instruction’s memory distance is correctly predicted if all of its patterns are predicted correctly

predicted and observed patterns fall in same interval or, given two patterns A and B such that

B.min A.max B.max

.max max( .min, .min)0.9

max( .max .min, .max .min)

A A B

B B A A

Page 12: Instruction Based Memory Distance Analysis and its Application to Optimization

12

Experimental Methodology

Use 11 CFP2000 and 11 CINT2000 benchmarks others don’t compile correctly

Use ATOM to collect reuse distance statistics Use test and train data sets for training runs Evaluation based on dynamic weighting Report reuse distance prediction accuracy

value and access very similar

Page 13: Instruction Based Memory Distance Analysis and its Application to Optimization

13

Reuse Distance Prediction

Suite Patterns Coverage%

Accuracy%

%constant %linear

CFP2000 85.1 7.7 93.0 97.6

CINT2000 81.2 5.1 91.6 93.8

Page 14: Instruction Based Memory Distance Analysis and its Application to Optimization

14

Coverage issues

Reasons for no coverage1. instruction does not appear in at least one test run2. reuse distance of test is larger than train3. number of patterns does not remain constant in both

training runs

Suite Reason 1 Reason 2 Reason 3

CFP2000 4.2% 0.3% 2.5%

CINT2000 2.2% 4.4% 1.8%

Page 15: Instruction Based Memory Distance Analysis and its Application to Optimization

15

Prediction Details

Other patterns 183.equake has 13.6% square root patterns 200.sixtrack, 186.crafty all constant (no data size

change) Low coverage

189.lucas – 31% of static memory operations do not appear in training runs

164.gzip – the test reuse distance greater than train reuse distance

• cache-line alignment

Page 16: Instruction Based Memory Distance Analysis and its Application to Optimization

16

Number of Patterns

Suite 1 2 3 4 5

CFP2000 81.8% 10.5% 4.8% 1.4% 1.5%

CINT2000 72.3% 10.9% 7.6% 4.6% 5.3%

Page 17: Instruction Based Memory Distance Analysis and its Application to Optimization

17

Miss Rate Prediction

Predict a miss for a reference if the backward reuse distance is greater than the cache size.

neglects conflict misses Accurate miss rate prediction

1max ,

actual predicted

actual predicted

Page 18: Instruction Based Memory Distance Analysis and its Application to Optimization

18

Miss Rate Prediction Methodology

Three miss-rate prediction schemes TCS – test cache simulation

• Use the actual miss rates from running the program on a the test data for the reference data miss rates

RRD – reference reuse distance• Use the actual reuse distance of the reference data set

to predict the miss rate for the reference data set• An upper bound on using reuse distance

PRD –predicted reuse distance• Use the predicted reuse distance for the reference data

set to predict the miss rate.

Page 19: Instruction Based Memory Distance Analysis and its Application to Optimization

19

Cache Configurations

config no. L1 L21 32K, fully assoc. 1M fully assoc.

234

32K, 2-way 1M8-way4-way2-way

Page 20: Instruction Based Memory Distance Analysis and its Application to Optimization

20

L1 Miss Rate Prediction Accuracy

Suite PRD RRD TCS

CFP2000 97.5 98.4 95.1

CINT2000 94.4 96.7 93.9

Page 21: Instruction Based Memory Distance Analysis and its Application to Optimization

21

L2 Miss Rate Accuracy

Suite 2-way Fully Associative

PRD RRD TCS PRD RRD TCS

CFP2000 91% 93% 87% 97% 99.9% 91%

CINT2000 91% 95% 87% 94% 99.9% 89%

Page 22: Instruction Based Memory Distance Analysis and its Application to Optimization

22

Critical Instructions

Given reuse distance for an instruction Can we determine which instructions are critical in terms of

cache performance? An instruction is critical if it is in the set of instructions

that generate the most L2 cache misses Those top miss-rate instructions whose cumulative total

misses account for 95% of the misses in a program. Use the execution frequency of one training run to

determine the relative contribution number of misses for each instruction

Compare the actual critical instructions with predicted Use cache configuration 2

Page 23: Instruction Based Memory Distance Analysis and its Application to Optimization

23

Critical Instruction Prediction

Suite PRD RRD TCS %pred %act

CPF2000 92% 98% 51% 1.66% 1.67%

CINT2000 89% 98% 53% 0.94% 0.97%

Page 24: Instruction Based Memory Distance Analysis and its Application to Optimization

24

Critical Instruction Patterns

Suite 1 2 3 4 5

CFP2000 22.1 38.4 20.0 12.8 6.7

CINT2000 18.7 14.5 25.5 22.5 18

Page 25: Instruction Based Memory Distance Analysis and its Application to Optimization

25

Miss Rate Discussion

PRD performs better than TCS when data size is a factor

TCS performs better when data size doesn’t change much and there are conflict misses

PRD is much better at identifying the critical instructions than TCS

these instructions should be targets of optimization

Page 26: Instruction Based Memory Distance Analysis and its Application to Optimization

26

Memory Disambiguation

Load speculation Can a load safely be issued prior to a preceding store? Use a memory distance to predict the likelihood that a

store to the same address has not finished Access distance

The number of memory operations between a store to and load from the same address

Correlated to instruction distance and window size Use only two runs

• If access distance not constant, use the access distance of larger of two data sets as a lower bound on access distance

Page 27: Instruction Based Memory Distance Analysis and its Application to Optimization

27

When to Speculate

Definitely “no” access distance less than threshold

Definitely “yes” access distance greater than threshold

Threshold lies between intervals compute predicted mis-speculation frequency (PMSF)

• speculate is PMSF < 5% When threshold does not intersect intervals

• total of frequencies that lie below the threshold Otherwise

( min)

(max min)

thesholdfrequency

Page 28: Instruction Based Memory Distance Analysis and its Application to Optimization

28

Value-based Prediction

Memory dependence only if addresses and values match

store a1, v1

store a2, v2

store a3, v3

load a4, v4

Can move ahead if a1=a2=a3=a4, v2=v3 and v1≠v2

The access distance of a load to the first store in a sequence of stores storing the same value is called the value distance

Page 29: Instruction Based Memory Distance Analysis and its Application to Optimization

29

Experimental Design

SPEC CPU2000 programs SPEC CFP2000

• 171.swim, 172.mgrid, 173.applu, 177.mesa, 179.art, 183.equake, 188.ammp, 301.apsi

SPEC CINT2000• 164.gzip, 175.vpr, 176.gcc, 181.mcf, 186.crafty,

197.parser, 253.perlbmk, 300.twolf

Compile with gcc 2.7.2 –O3 Comparison

Access distance, value distance Store set with 16KB table, also with values Perfect disambiguation

Page 30: Instruction Based Memory Distance Analysis and its Application to Optimization

30

Micro-architecture

issue width 8

fetch width 8

retire width 16

window size 128

load/store queue

128

functional units

8

fetch multiblock gshare

data cache perfect

memory ports 2

Operation Latency

load 2

int division 8

int multiply 4

other int 1

float multiply 4

float addition 3

float division 8

other float 2

Page 31: Instruction Based Memory Distance Analysis and its Application to Optimization

31

IPC and Mis-speculation

SuiteAccess

DistanceStore Set

16KB TablePerfect

CFP2000 3.21 3.37 3.71

CINT2000 2.90 3.22 3.35

Suite

Mis-speculation Rate

% Speculated Loads

Access Store Set

Access Store Set

CFP2000 2.36 0.07 57.2 62.0

CINT2000 2.33 0.08 26.9 34.7

Page 32: Instruction Based Memory Distance Analysis and its Application to Optimization

32

Value-based Disambiguation

SuiteValue

DistanceStore Set

16KB Value

CFP2000 3.34 3.55

CINT2000 3.00 3.23

SuiteMis-

speculation Rate

% Speculated Loads

CFP2000 1.22 59.3

CINT2000 1.55 27.6

Page 33: Instruction Based Memory Distance Analysis and its Application to Optimization

33

Cache Model

Suite Access Store Set 16K

CFP2000 1.55 1.61

CINT2000 1.53 1.60

Suite Value Store Set 16K

CFP2000 1.59 1.63

CINT2000 1.55 1.65

Page 34: Instruction Based Memory Distance Analysis and its Application to Optimization

34

Summary

Over 90% of memory operations can have reuse distance predicted with a 97% and 93% accuracy, for floating-point and integer programs, respectively

We can accurately predict miss rates for floating-point and integer codes

We can identify 92% of the instructions that cause 95% of the L2 misses

Access- and value-distance-based memory disambiguation are competitive with best hardware techniques without a hardware table

Page 35: Instruction Based Memory Distance Analysis and its Application to Optimization

35

Future Work

Develop a prefetching mechanism that uses the identified critical loads.

Develop an MLP system that uses critical loads and access distance.

Path-sensitive memory distance analysis Apply memory distance to working-set based

cache optimizations Apply access distance to EPIC style

architectures for memory disambiguation.