28
The square-root law: a rule of thumb for data dependencies Application to understanding the impact of instruction fetching, and miscellaneous other things Pierre Michaud, January 2001

The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

The square-root law:a rule of thumb for data

dependencies

Application to understanding theimpact of instruction fetching, and

miscellaneous other things

Pierre Michaud, January 2001

Page 2: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

2

Dynamic data-flow graph• Consider a program instruction trace• Take a slice of S consecutive instructions in this trace• Consider the data-flow graph of the slice

– take into account true data dependencies only

• Determine the length L of the longest chain– label each instruction with its latency

• L is the minimum execution time of the slice– assuming no value prediction

Page 3: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

3

Example

1 2 3 4 5 6 7 8 9 10 11 12 13

13=S

Longest chain length = 5

* * * * *

instruction latency = 1 cycle

Page 4: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

4

The square-root law

• What is, on average, the length LS of the longestchain in a slice of S consecutive instructions ?

SLS α1

parameter quantifies the “density” of parallelismαfor 1-cycle latencies7.0≈α

Page 5: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

5

IBS benchmarks

load latency = 1 cycle load latency = 4 cycles

slice size S from 8 to 4k instructions

Page 6: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

6

SPECint 95

load latency = 1 cycle load latency = 4 cycles

slice size S from 8 to 4k instructions

Page 7: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

7

Another way to observe the sqrt law

• Processor with– reorder buffer of W entries– perfect branch prediction– unlimited fetch– no resource constraints

• IPC proportional to– NB: average fetch bandwidth needed equals the IPC

W

WIPCW αα 2≤≤demonstration in following slides...

WIPC ≈

for 1-cycle latencies

Page 8: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

8

Demonstration (1)• Consider the whole instruction trace and its data-flow graph• Let’s add a “window-dependence” edge between every pair of

instructions (I,J) such that• The total execution time T is the length of the longest chain in this

graph

WIJ ≥− )(

W W WW W

WLN

T2

WN × instructions

WLW

TWN

IPC2

≤×

=

Page 9: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

9

Demonstration (2)• Consider the whole instruction trace and its data-flow graph• Let’s partition the trace into N slices of W instructions each• Let’s add an edge between every pair of instructions (I,J) such that I

and J belong to different slices• Let’s L be the length of the longest chain in this graph

W W W

WLNLT ×=≤

NB: the graph of the previousslide is a subgraph of this one

WLW

TWN

IPC ≥×

=

Page 10: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

10

A good rule of thumb• Already observed by Riseman and Foster in early 70’s• Works well for slice sizes < 1k instructions• Models only average behavior

– several programs or program with complex control flow– do not use it to predict the performance of a tight loop

• Why the square root ? I don’t really know but ...– it is normal to find a sub-linear law– many programs have similar instruction-type distributions– for S<1k, it is an approximation, for S>1k it is no longer valid

Page 11: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

11

Processor model• Unlimited execution resources• Unlimited reorder buffer• 1-stage pipeline• Limited fetch bandwidth• Control-flow breaks (mainly branch mispredictions)

– processor window flushed after a control-flow break– slice = instructions between consecutive control-flow breaks– the processor executes slices one after another

Page 12: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

12

Observations from experiments• Once the instruction fetch bandwidth is twice the IPC,

extra fetch bandwidth is superfluous

• To double the IPC, we should both double the instructionfetch bandwidth and decrease the number of control-flowbreaks fourfold– the fetch bandwidth requirement grows as the square root of the

distance between branch mispredictions

Page 13: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

13

Fetch cycle constraints

1 2 3 4 5 6 7 8 9 10 11 12 13

0f(1) f(2) f(3) f(4) f(5) f(6) f(7) f(8) f(9) f(10) f(11) f(12) f(13)

f(i) = cycle in which instruction i is fetched

))()((max 1 ilifT Si += = Lexecution time

l(i) = length of the longest chain emanating from instruction i

]1[)( SLil K∈

Page 14: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

14

Milestone instructions

Instructions fetched in program order : fetch function f is increasing

For a given l(i)=n , we only need to consider the instruction fetchedthe latest

))(/]1[max( nilSimn =∈= Lmilestone instruction of rank n

))((max 1 nmfT nLn S+= = L

Optimal fetch: one milestone per cycle

Page 15: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

15

Example

l(i) 122123123455 4

1 2 3 4 5 6 7 8 9 10 11 12 13i

Optimal fetch: cycle 0 : fetch m5= 3 cycle 1 : fetch m4= 4 cycle 2 : fetch m3= 8 cycle 3 : fetch m2= 12 cycle 4 : fetch m1= 13

Page 16: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

16

Milestone model with sqrt law• Let’s define a model of the form

• Coherence with

• The optimal fetch is not uniform– the fetch bandwidth requirement (FBR) decreases gradually as

we get closer to the end of the slice

)(ngSmn −=

SLS α1

≈ 22nSmn α−=

sFBR α2≈ s = distance from the end of the slice

When half the slice is fetched, the FBR is 30 % lower that at the start

Page 17: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

17

24αF

FS

T +=

Uniform fetch

F instructions per cycleFi

if =)(

)(max22

1 FnS

nTSLn

α−+= = L

SFt α2=

tFF ≤

SLT S α1

==tFF >

threshold fetch rate

fetch theslice

execute whatremains inthe window

Page 18: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

18

Ft = 2 IPCmax

S=200

0123456789

0 4 8 12 16 20

F

IPC

2

1

FIPC

+

=

tFF

2IPCmax

tF=

Ft

Page 19: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

19

Fetch bandwidth / mispredict ratio

Curves are homothetic

To double the IPC, we should both double the fetchrate and decrease the mispredict ratio four-fold

Page 20: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

20

Variant: “leaky bucket” model

ghv 2=

h

ttt gNFNN 21 −+=+

Nt = number of instructions waiting forexecution in the window in cycle t

ttt gNNN 21 −=+

FS

t <

FS

t >

tt gNILP 2= g = “gravity” constant

Page 21: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

21

ILP: rising, constant, falling

ggNgN tt −≈−+ 22 1

gF

FS

T +≈

24 2

≈= αg

S=200

0

2

4

6

8

10

12

0 10 20 30 40 50

cycle

ILP

F=5

F=8

F=12

NB: instructions on correct path only

Page 22: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

22

Experimental corroboration

IBS gcc

IBS gs

Average on slices ofS=200 instructions

0

2

4

6

8

10

12

0 10 20 30 40 50

cycle

ILP

exp F=5

exp F=8

exp F=12

model F=5

model F=8

model F=12

0

2

4

6

8

10

12

0 10 20 30 40 50

cycle

ILP

exp F=5

exp F=8

exp F=12

model F=5

model F=8

model F=12

Page 23: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

23

Scaling the pipeline length• We wish to multiply the IPC by k• We multiply the pipeline width by k and we divide the mispredict

ratio by k2

• What pipeline lengthening can we tolerate ? answer : k

P = pipeline penalty ona control-flow breakwidthdepth CPICPI

gSF

FSP

CPI +=

++=

1

Keep proportional toSP

CPI depth =S

1

SP ∝we tolerate

Page 24: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

24

A closer look at control-flow breaks ...

instructiontrace

CFB

SxxSLLT )1(21min −+=+=

xSS =1 SxS )1(2 −=

For ILP, clustered CFBs arebetter than evenly-spaced ones xxxf −+= 1)(

Page 25: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

25

Impact of CFB distributionEach CFB defines a slice

Distribution F(s) = probability that slice size s≤

∫∫∞∞

−=′×=00

))(1(2

1)( dssF

sdssFsTslice

sT spacedevenly =_

sTgeometric 2π

≈ 13.1/2 ≈π

Experiment with real branch predictor:ILP ~10 % higher than constant S model

Page 26: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

26

Simultaneous multithreading• SMT processor with fixed instruction window W• We wish to double the global IPC• How many threads ?

N independent threads WNNW

NIPC ≈≈

To double the global IPC, the number of threads should bemultiplied by 4

Alternative: double both the number of threads and the window

Page 27: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

27

Window partitioning

16 16 16 16

oldestinstructions

recentinstructions

416 =41664 =−

8=unifiedIPC

624 =+=dpartitioneIPC30 % CPI increase from partitioning

Page 28: The square-root law: a rule of thumb for data dependencies · 2009. 11. 5. · The square-root law: a rule of thumb for data dependencies Application to understanding the impact of

28

Conclusion

• The impact of data dependencies is not always intuitive

• Square-root law useful for having a rough idea of whatsimulation results should be expected

• Use it for predicting relative rather than absolute behavior– “ what speedup processor A will have on processor B ? ”