Upload
doxuyen
View
219
Download
1
Embed Size (px)
Citation preview
Static Transformation for Heap Layout Using Memory Access Patterns
Jinseong Jeon Computer Science, KAIST
2006-12-12 CS @ KAIST 2
Static Transformation
computing
machine compiler
+ static transformation
user
• Compilers can transform program memory layout. – program behaviors: memory access patterns – machine properties: memory hierarchy
2006-12-12 CS @ KAIST 3
Heap Layout Transformation [ Pool Allocation ] - complex pointer analysis
[ Field Layout Reconstruction ] - profiling
Node { int key; char data[6]; Node *next; } * T; char* search(int k) { ... while (...) { if (h→key == k) return h→data; h = h→next; } ... }
k n
...
d ...
2006-12-12 CS @ KAIST 4
Goal & Direction
• To build static transformation for heap layout – Based on both heap layout transformations
• Predict program behaviors – How to represent memory access behaviors
• Regular expressions
– How to extract run-time behaviors from codes • Code → CFG → Automaton → R.E.
• Then, apply optimizing techniques – How to interpret predicted behaviors
2006-12-12 CS @ KAIST 5
Overview
Structure Selection Analysis
Field Affinity Analysis
Access Pattern Analysis
Layout Transformer
Sourcecode
Optimizedcode
2006-12-12 CS @ KAIST 6
Structure Selection
S1.x = T1.c; for (...) { Ti.a = ...; ... = Ti.b; Uj.y = ...; }
S = T ; for ( ) { T = ; = T ; U = ; }
TS
TTU
TS(TTU)*
T U
conversion
candidate selection for pool allocation
structure type projection
2006-12-12 CS @ KAIST 7
Field Affinity Estimation
= .c; for ( ) { .a = ; = .b; }
c
ab
c(ab)*
a
c
b*
a b
...
c ...
field usage projection conversion
symbolic estimation field layout reconstruction
S1.x = T1.c; for (...) { Ti.a = ...; ... = Ti.b; Uj.y = ...; }
2006-12-12 CS @ KAIST 8
Field Affinity Estimation
• Symbolic approach – record closure marks with nesting information
– regard all closure marks as a same variable
n
k
d
**
**
**
*
*
***
**
n
k
d
2x2+3x3x
x
x2
(kdn(n)*)* ((kn)*(kd+))* ((kn)*(kd+))*
2006-12-12 CS @ KAIST 9
Code Transformation • Explicit field names → field accesses on modified layouts
– Oi.next is converted into *(Oi + offset(next)).
– Random pointer dereferences like *(p + 4) are not allowed.
– For some accesses, extra instructions are required.
2006-12-12 CS @ KAIST 10
Code Transformation • Type-aware malloc → pool allocation routines
– For custom allocators, feed hints which consist of target structures and corresponding custom allocators
...
... = malloc(sizeof(T)); ...
...
... = _T_alloc_(); ...
char* my_malloc(int s) { ... ... = _T_alloc(); }
char* _T_alloc_() { // pool allocation }
...
... = my_malloc(sizeof(T)); ...
char* my_malloc(int s) { ... ... = malloc(s); }
...
... = my_malloc(sizeof(T)); ...
char* _T_alloc_() { // pool allocation }
2006-12-12 CS @ KAIST 11
Overview
Structure Selection Analysis
Field Affinity Analysis
Access Pattern Analysis
Layout Transformer
Sourcecode
Optimizedcode
2006-12-12 CS @ KAIST 12
Experimental Environment
• Using the CIL compiler and OCaml
• Redhat 9.0 Linux PC – 2.6GHz Pentium4 processor – 8KB L1D cache, 512KB L2 cache, 1.7GB main memory
• GCC 3.2.2 with -O3
2006-12-12 CS @ KAIST 13
Analysis Time Benchmark Program
Lines of Code
Structure Selection
Field Affinity
Code Transform
Total
SPECINT 2000
175.vpr 300.twolf
11301 17821
7.220 15.598
0.324 3.455
0.107 1.126
7.651 20.179
FreeBench analyzer 763 0.096 0.027 0.012 0.135
McGill chomp misr
378 181
0.021 0.003
0.006 0.002
0.003 0.001
0.030 0.006
Olden suite
bisort health
mst
perimeter
treeadd
tsp
voronoi
597 474 408
345 154 433 975
0.020 0.024 0.031
0.012 0.002 0.011 0.048
0.003 0.004 0.004
0.012 0.000 0.004 0.004
0.002 0.002 0.002
0.001 0.000 0.002 0.003
0.025 0.030 0.037
0.025 0.002 0.017 0.055
Ptrdist suite
anagram bc
ft
ks
355 4303 926
551
0.031 2.028 0.050
0.055
0.003 0.634 0.014
0.012
0.001 0.193 0.010
0.020
0.035 2.855 0.074
0.087
2006-12-12 CS @ KAIST 14
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
175.vpr
300.twolf
analyzer
chomp
misr
bisort
health
mst
perimeter
treeadd
tsp
voronoi
anagram
bc ft ks
Nor
mal
ized
L1D
cac
he m
iss
(1.0
= O
rigi
nal)
Pool
Pool + Re
Cache Miss - L1D 1.99 2.23
Pool 0.86
0.84 Pool + Re
2006-12-12 CS @ KAIST 15
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
175.vpr
300.twolf
analyzer
chomp
misr
bisort
health
mst
perimeter
treeadd
tsp
voronoi
anagram
bc ft ks
Nor
mal
ized
L2
cach
e m
iss
(1.0
= O
rigi
nal)
Pool
Pool + Re
Cache Miss - L2 4.10 4.18
Pool 1.06
1.00 Pool + Re
2006-12-12 CS @ KAIST 16
Performance Benchmark Program
Lines of Code
Original (second)
Pool / Original
Pool + Re / Original
SPECINT 2000
175.vpr 300.twolf
11301 17821
10.959 435.19
1.01 0.98
1.01 0.99
FreeBench analyzer 763 66.64 0.41 0.45
McGill chomp misr
378 181
7.44 31.39
0.59 0.99
0.47 1.01
Olden suite
bisort health
mst
perimeter
treeadd
tsp
voronoi
597 474 408
345 154 433 975
24.29 86.05 65.73
7.19 10.17 20.44 11.03
0.99 0.71 0.82
0.78 0.48 0.96 0.99
0.99 0.63 0.82
0.84 0.55 0.97 0.99
Ptrdist suite
anagram bc
ft
ks
355 4303 926
551
1.53 1.95 8.25
7.46
0.99 0.82 0.83
1.03
1.11 0.81 0.73
1.03
Avg. 0.84 0.84
2006-12-12 CS @ KAIST 17
Contribution
• Predict memory access patterns at compile-time – Regular expressions
– Automata reduction algorithm
• Interpret predicted patterns
according to heap layout transformations
• Cache misses are reduced by 16%
• Execution times are reduced by 14%
Backup Slides
2006-12-12 CS @ KAIST 19
From CFG to Automaton
start
return
h == NULL
h→key == k
h = h→nexth→data
NotFound
T F
T F k
d n
2006-12-12 CS @ KAIST 20
State Elimination
e
ae*c
be*d
2006-12-12 CS @ KAIST 21
From Automaton to R.E.
k
d n
k
d
nk
nk
kd+e kd+e
kn
(kn)*(kd+e)
(kn)*(kd+)
2006-12-12 CS @ KAIST 22
State Compare
state_compare(state s1, state s2) b1 Ã whether 9s’.(s’ → s1, s1.dfn ≤ s’.dfn) // 0 or 1 b2 Ã whether 9s’.(s’ → s2, s2.dfn ≤ s’.dfn) // 0 or 1 if b1 and not b2 then 1 // s1 > s2 else if not b1 and b2 then -1 // s1 < s2 else if b1 and b2 then compare(s2.dfn, s1.dfn) // dfn = Depth First Numbering else compare(s1.dfn, s2.dfn) end if
2006-12-12 CS @ KAIST 23
Automata Reduction
worklist à ; workhorse(state s) if s ≠ start state and s ≠ end state then for all s’ 2 s.successor do delete s’ from worklist end for eliminate(s) for all s’ 2 s.successor do push s’ into worklist end for end if
2006-12-12 CS @ KAIST 24
Automata Reduction
reduce() E à {s 2 S | 9 s’.s →ε s’} R à {s 2 E | @ s’.s’ → s, s.dfn ≤ s’.dfn} for all s 2 R do workhorse(s) end for worklist à S\R while worklist ≠ ; do workhorse(pop(worklist)) end while
2006-12-12 CS @ KAIST 25
From Intra- to Inter-proc.
b
a
b
a
f()
• Intrinsically, reverse topological order of a call graph • For self-recursive function calls,
f() { ... = s.a; if (!end) f(); ... = s.b; }
a*abb*
F → ab | aFb
aibi
2006-12-12 CS @ KAIST 26
Structure Selection
• “One structure per pool” – Most pools are used in a type-consistent manner
• Identify which structures are exhaustively used – Structure access patterns – Repeatedly used ones
• Structure detection in closures
2006-12-12 CS @ KAIST 27
Closure Detection
• Presence of closures – EMPTY, NORMAL, HAVE
. . foo(); . .
. . bar1(); . .
. . while(..) bar2(); . .
. . s->f1; s->f2; . .
main foo bar1 bar2
bar2 x NORMALbar1 x HAVEbar2 x NORMAL
foo x HAVEbar1 x HAVEbar2 x NORMAL
main x HAVEfoo x HAVEbar1 x HAVEbar2 x NORMAL
exc. exc.
2006-12-12 CS @ KAIST 28
Field Affinity
key
next datadatakey,next
712440
2849975
70486030278
7580
4267275
37858
o4.key ...o4.next o5.key o5.nexto3.next... o6.key
2006-12-12 CS @ KAIST 29
Affinity Relation Abstraction
. s->f3; foo(); . .
. s->f1; bar1(); s->f2; .
. s->f3; while(..) bar2(); s->f1; .
. . s->f1; s->f2; . .
main foo bar1 bar2
bar2.s x {f1}bar2.e x {f2}bar2.r x[(f1,f2) x {(0,1)}]
bar1.s x {f3}bar1.e x {f1}bar1.r x[(f1,f3) x {(0,1)} (f1,f2) x {(1,1), (0,1)}]
foo.s x {f1}foo.e x {f2}foo.r x[(f1,f3) x {(0,2)} (f1,f2) x {(1,1), (0,2)}]
main.s x {f3}main.e x {f2}main.r x[(f1,f3) x {(0,3)} (f1,f2) x {(1,1), (0,2)}]
where F is the set of fields where VAR is the set of function names
2006-12-12 CS @ KAIST 30
Offset Calculation (1/2)
2006-12-12 CS @ KAIST 31
Offset Calculation (2/2)
2006-12-12 CS @ KAIST 32
Traditional vs. WTO based Program SLOC time peak total time peak total
175.vpr 300.twolf
11301 17821
N.A. N.A.
N.A. N.A.
N.A. N.A.
0.154 0.360
15.97 27.03
178.68 313.14
analyzer 763 0.022 1.47 16.59 0.007 1.23 11.97
chomp misr
378 181
0.003 0.003
0.74 0.49
5.01 2.87
0.003 0.003
0.74 0.49
4.96 2.69
bisort health mst
perimeter treeadd tsp voronoi
597 474 408
345 154 433 975
0.002 0.004
0.003
0.003
0.003
0.004
0.005
0.74 0.74
0.74
0.74
0.49
0.74
1.72
4.79 5.90
5.92
4.52
1.52
5.31
14.64
0.002 0.002
0.002
0.002
0.000
0.002
0.003
0.74 0.74
0.74
0.74
0.49
0.74
1.72
4.66 5.47
5.51
4.19
1.51
4.94
14.28
anagram bc ft
ks
355 4303 926
551
0.002 572.897
0.006
0.008
0.74 612.93
0.98
0.98
5.33 4379.97
9.07
9.37
0.002 0.059 0.004
0.004
0.74 9.34 0.98
0.98
5.32 114.09
8.67
7.92
2006-12-12 CS @ KAIST 33
Instruction Reference
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
175.vpr
300.twolf
analyzer
chomp
misr
bisort
health
mst
perimeter
treeadd
tsp
voronoi
anagram
bc ft ks
Nor
mal
ized
inst
ruct
ion
refe
renc
e
Pool
Pool + Re
Pool + Re 0.97
0.94 Pool