Upload
keene
View
20
Download
0
Embed Size (px)
DESCRIPTION
Experiments on the Effectiveness of an Automatic Insertion of Memory Reuses into ML-like Programs. Oukseh Lee (Hanyang University) Kwangkeun Yi (Seoul National University). *. - PowerPoint PPT Presentation
Citation preview
Experiments on the Experiments on the Effectiveness Effectiveness of an Automatic of an Automatic Insertion of Memory Reuses into ML-Insertion of Memory Reuses into ML-like Programslike Programs
Oukseh Lee (Hanyang University)Kwangkeun Yi (Seoul National University)
QuestionQuestion
Our SAS 2003 paper* presented an algorithm to replace allocations by memory reuse (or
destructive update); and some promising yet preliminary experiment numbers.
When and how much is it cost-effective? Space & time-wise. Before launching it inside our nML compiler.
* Oukseh Lee, Hongseok Yang, and Kwangkeun Yi. Inserting Safe Memory Reuse Commands into ML-like Programs. In Proceedings of the Annual International Static Analysis Symposium, volume 2694 of Lecture Notes in Computer Science, pp. 171-188, San Diego, California, June 2003.
Brief Overview of Our Brief Overview of Our AlgorithmAlgorithm
Example: insertExample: insert
1 2 3 4 6 nil
linsert 5 l
fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z
54321result
fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in free l; h::z
3 4
Example: insertExample: insert
1 2 3 4 6 nil
linsert 5 l
fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z
fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in free l; h::z
5
21result
fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert b i t in free l when b; h::z
AnalysisAnalysis
fun insert i l =
case l of
[] => i::[]
| h::t =>
if i<h then
i::l
else
let z = insert i t in h::z
X1
X2
X3
X4
Z
L.tl
L
X1
X2 [ L
X4 [ Z
L.hd
L.tl
X1[X2[L[X4[Z L.hd [ L.tl
Z µ X3 [ L.tl
X [ L L[µ
L.hd
result usage
X=X1[ X2[ X3[ X4
=L.hd[L.tl
Transformation [1/3]Transformation [1/3]fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z
fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z
When b=true, the transformed insert functiondeallocates the cons cells of the input list l
excluding those of the result list.
Transformation [2/3]Transformation [2/3]
must not be freed
when area overlap?
necessary condition
the input list l
b=false
L yes b=true
the result list X4 [ Z no none
When is it safe to free the tail cells t not in the result z (L.tl\Z)?
fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert b i t in h::z
b
Transformation [3/3]Transformation [3/3]
must not freed when area overlap?
necessary condition
the input list l b=false
L yes b=true
the cons cells freed during insert b i t
b=true L.tl \ Z no none
the result list X4 [ Z no none
When is it safe to free the head cell (L.hd)?
fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert b i t in free l when ; h::z
b
ExperimentsExperiments
Analysis & Transformation Analysis & Transformation CostCost
lines cost (s)
sieve 29 0.001merge 40 0.001qsort 41 0.001queens 44 0.003msort 73 0.003professor 193 0.012mirage 245 0.015life 366 0.017k-eval 645 0.220kb 808 0.095nucleic 3019 0.488
slope=1.461,500~29,000 lines/sec
program size(logarithmic scale)
an
aly
sis
& t
ran
sform
ati
on
co
st(l
og
ari
thm
ic s
cale
)
Reuse RatioReuse Ratio
total allocation reuse ratio
(kilo words) (kilo words)
B C C/B
sieve 18694 15760 84.3%
merge 11719 5860 50.0%
qsort 95450 89664 93.9%
queens 122505 5206 4.2%
msort 45455 40573 89.3%
professor 822589 344794 41.9%
mirage 101972 86054 84.4%
life 48305 5125 10.6%
k-eval 132234 41710 31.5%
kb 57705 1948 3.4%
nucleic 31487 5307 16.9%
3.4%~93.9%of allocations are
avoided.
low reuse ratio due to
much sharing.
Memory Peak ReductionMemory Peak Reduction
reuse peak peak (reuse)
ratio (words) (words)
D E (D-E)/E
sieve 84.3% 690 300 56.5%
merge 50.0% 1197 606 49.4%
qsort 93.9% 1189 334 71.9%
queens 4.2% 255 255 0.0%
msort 89.3% 714 321 55.0%
professor 41.9% 1394 1281 8.1%
mirage 84.4% 1398 1361 2.6%
life 10.6% 2346 1746 25.6%
k-eval 31.5% 1044 944 9.6%
kb 3.4% 27125 26501 2.3%
nucleic 16.9% 103677 89352 13.8%
0.0%~71.9%peak reduction
much reuse =much peak reduction
memory reuse ratio
mem
ory
peak
red
uct
ion
84.4%
10.6%
2.6%
25.6%
41.9% 8.1%
Difference in Live CellsDifference in Live Cells
sieve84.3%56.5%
merge50.0%49.4%
qsort93.9%71.9%
msort89.3%55.0%
Difference in Live CellsDifference in Live Cells
queens4.2%0.0%
kb3.4%2.3%
nucleic16.9%13.8%
k-eval31.5% 9.6%
Difference in Live CellsDifference in Live Cells
life10.6%25.6%
mirage84.4% 2.6%
professor41.9% 8.1%
GC Time & Runtime GC Time & Runtime ChangesChanges
reuse runtimeGC time GC time (reuse)runtime (flags) runtime (reuse)ratio A B B/A C (B-C)/B D (A-D)/A E (A-E)/A
(Intel Pentium4 3.0GHz, Linux RedHat 9.0)sieve 84.3% 0.40 0.178 44.2% 0.087 51.0% 0.41 -2.6% 0.41 -0.7%merge 50.0% 0.62 0.470 76.0% 0.243 48.3% 0.68 -9.8% 0.47 24.0%qsort 93.9% 2.08 1.312 63.2% 0.124 90.5% 2.16 -4.1% 1.26 39.1%queens 4.2% 1.58 0.822 52.2% 0.812 1.3% 1.68 -6.8% 1.65 -4.7%msort 89.3% 0.95 0.572 59.9% 0.140 75.6% 0.98 -2.9% 0.75 21.6%professor 41.9% 2.99 0.215 7.2% 0.134 37.8% 3.27 -9.3% 3.16 -5.5%mirage 84.4% 1.06 0.060 5.6% 0.011 82.0% 1.12 -5.1% 1.09 -2.6%life 10.6% 3.44 0.050 1.4% 0.051 -2.8% 3.64 -6.0% 3.57 -3.8%k-eval 31.5% 1.01 0.019 1.9% 0.015 21.0% 1.04 -3.2% 1.04 -2.9%kb 3.4% 0.80 0.255 31.7% 0.255 -0.3% 0.83 -3.6% 0.85 -5.8%nucleic 16.9% 0.44 0.230 52.1% 0.147 36.0% 0.43 2.5% 0.41 7.2%(Sun UltraSparc 400MHz, Solaris 2.7)sieve 84.3% 4.14 1.464 35.4% 0.740 49.5% 4.39 -6.1% 4.04 2.4%merge 50.0% 4.47 3.492 78.2% 1.835 47.5% 5.02 -12.4% 3.13 30.0%qsort 93.9% 15.87 9.073 57.2% 0.901 90.1% 16.71 -5.3% 11.40 28.2%queens 4.2% 13.23 6.132 46.4% 6.557 -6.9% 14.34 -8.4% 14.20 -7.3%msort 89.3% 7.47 4.126 55.3% 0.999 75.8% 7.74 -3.7% 5.92 20.7%professor 41.9% 34.35 1.465 4.3% 0.969 33.8% 36.32 -5.7% 32.71 4.8%mirage 84.4% 9.79 0.409 4.2% 0.073 82.1% 10.29 -5.1% 9.79 0.1%life 10.6% 32.56 0.370 1.1% 0.365 1.5% 33.50 -2.9% 32.84 -0.9%k-eval 31.5% 9.34 0.120 1.3% 0.091 24.5% 9.46 -1.3% 9.29 0.6%kb 3.4% 5.76 1.420 24.7% 1.509 -6.2% 6.28 -9.1% 6.15 -6.7%nucleic 16.9% 2.57 1.188 46.3% 0.770 35.2% 2.58 -0.7% 2.34 8.8%
-6.9%~90.5%GC-time reduction
-7.3%~39.1%runtime reduction
in Objective Caml system
GC Time & Runtime GC Time & Runtime ChangesChanges
reuse runtimeGC time GC time (reuse)runtime (flags) runtime (reuse)ratio A B B/A C (B-C)/B D (A-D)/A E (A-E)/A
(Intel Pentium4 3.0GHz, Linux RedHat 9.0)sieve 84.3% 0.40 0.178 44.2% 0.087 51.0% 0.41 -2.6% 0.41 -0.7%merge 50.0% 0.62 0.470 76.0% 0.243 48.3% 0.68 -9.8% 0.47 24.0%qsort 93.9% 2.08 1.312 63.2% 0.124 90.5% 2.16 -4.1% 1.26 39.1%queens 4.2% 1.58 0.822 52.2% 0.812 1.3% 1.68 -6.8% 1.65 -4.7%msort 89.3% 0.95 0.572 59.9% 0.140 75.6% 0.98 -2.9% 0.75 21.6%professor 41.9% 2.99 0.215 7.2% 0.134 37.8% 3.27 -9.3% 3.16 -5.5%mirage 84.4% 1.06 0.060 5.6% 0.011 82.0% 1.12 -5.1% 1.09 -2.6%life 10.6% 3.44 0.050 1.4% 0.051 -2.8% 3.64 -6.0% 3.57 -3.8%k-eval 31.5% 1.01 0.019 1.9% 0.015 21.0% 1.04 -3.2% 1.04 -2.9%kb 3.4% 0.80 0.255 31.7% 0.255 -0.3% 0.83 -3.6% 0.85 -5.8%nucleic 16.9% 0.44 0.230 52.1% 0.147 36.0% 0.43 2.5% 0.41 7.2%(Sun UltraSparc 400MHz, Solaris 2.7)sieve 84.3% 4.14 1.464 35.4% 0.740 49.5% 4.39 -6.1% 4.04 2.4%merge 50.0% 4.47 3.492 78.2% 1.835 47.5% 5.02 -12.4% 3.13 30.0%qsort 93.9% 15.87 9.073 57.2% 0.901 90.1% 16.71 -5.3% 11.40 28.2%queens 4.2% 13.23 6.132 46.4% 6.557 -6.9% 14.34 -8.4% 14.20 -7.3%msort 89.3% 7.47 4.126 55.3% 0.999 75.8% 7.74 -3.7% 5.92 20.7%professor 41.9% 34.35 1.465 4.3% 0.969 33.8% 36.32 -5.7% 32.71 4.8%mirage 84.4% 9.79 0.409 4.2% 0.073 82.1% 10.29 -5.1% 9.79 0.1%life 10.6% 32.56 0.370 1.1% 0.365 1.5% 33.50 -2.9% 32.84 -0.9%k-eval 31.5% 9.34 0.120 1.3% 0.091 24.5% 9.46 -1.3% 9.29 0.6%kb 3.4% 5.76 1.420 24.7% 1.509 -6.2% 6.28 -9.1% 6.15 -6.7%nucleic 16.9% 2.57 1.188 46.3% 0.770 35.2% 2.58 -0.7% 2.34 8.8%
-6.9%~90.5%GC-time reduction
-7.3%~39.1%runtime reduction
High reuse ratio & big GC portion:
runtime speedup
50.0%93.9%
89.3%
16.9%
50.0%93.9%
89.3%
16.9%
76.0%63.2%
59.9%
52.1%
78.2%57.2%
55.3%
46.3%
24.0%39.1%
21.6%
7.2%
30.0%28.2%
20.7%
8.8%
in Objective Caml system
GC Time & Runtime GC Time & Runtime ChangesChanges
reuse runtimeGC time GC time (reuse)runtime (flags) runtime (reuse)ratio A B B/A C (B-C)/B D (A-D)/A E (A-E)/A
(Intel Pentium4 3.0GHz, Linux RedHat 9.0)sieve 84.3% 0.40 0.178 44.2% 0.087 51.0% 0.41 -2.6% 0.41 -0.7%merge 50.0% 0.62 0.470 76.0% 0.243 48.3% 0.68 -9.8% 0.47 24.0%qsort 93.9% 2.08 1.312 63.2% 0.124 90.5% 2.16 -4.1% 1.26 39.1%queens 4.2% 1.58 0.822 52.2% 0.812 1.3% 1.68 -6.8% 1.65 -4.7%msort 89.3% 0.95 0.572 59.9% 0.140 75.6% 0.98 -2.9% 0.75 21.6%professor 41.9% 2.99 0.215 7.2% 0.134 37.8% 3.27 -9.3% 3.16 -5.5%mirage 84.4% 1.06 0.060 5.6% 0.011 82.0% 1.12 -5.1% 1.09 -2.6%life 10.6% 3.44 0.050 1.4% 0.051 -2.8% 3.64 -6.0% 3.57 -3.8%k-eval 31.5% 1.01 0.019 1.9% 0.015 21.0% 1.04 -3.2% 1.04 -2.9%kb 3.4% 0.80 0.255 31.7% 0.255 -0.3% 0.83 -3.6% 0.85 -5.8%nucleic 16.9% 0.44 0.230 52.1% 0.147 36.0% 0.43 2.5% 0.41 7.2%(Sun UltraSparc 400MHz, Solaris 2.7)sieve 84.3% 4.14 1.464 35.4% 0.740 49.5% 4.39 -6.1% 4.04 2.4%merge 50.0% 4.47 3.492 78.2% 1.835 47.5% 5.02 -12.4% 3.13 30.0%qsort 93.9% 15.87 9.073 57.2% 0.901 90.1% 16.71 -5.3% 11.40 28.2%queens 4.2% 13.23 6.132 46.4% 6.557 -6.9% 14.34 -8.4% 14.20 -7.3%msort 89.3% 7.47 4.126 55.3% 0.999 75.8% 7.74 -3.7% 5.92 20.7%professor 41.9% 34.35 1.465 4.3% 0.969 33.8% 36.32 -5.7% 32.71 4.8%mirage 84.4% 9.79 0.409 4.2% 0.073 82.1% 10.29 -5.1% 9.79 0.1%life 10.6% 32.56 0.370 1.1% 0.365 1.5% 33.50 -2.9% 32.84 -0.9%k-eval 31.5% 9.34 0.120 1.3% 0.091 24.5% 9.46 -1.3% 9.29 0.6%kb 3.4% 5.76 1.420 24.7% 1.509 -6.2% 6.28 -9.1% 6.15 -6.7%nucleic 16.9% 2.57 1.188 46.3% 0.770 35.2% 2.58 -0.7% 2.34 8.8%
-6.9%~90.5%GC-time reduction
-7.3%~39.1%runtime reduction
High reuse ratio & big GC portion:
runtime speedup
Low reuse ratio: flags overhead4.2%
3.4%
4.2%
3.4%
-8.4%
-9.1%
-6.8%
-3.6%
-5.8%
-6.7%
-4.7%
-7.3%
in Objective Caml system
GC Time & Runtime GC Time & Runtime ChangesChanges
reuse runtimeGC time GC time (reuse)runtime (flags) runtime (reuse)ratio A B B/A C (B-C)/B D (A-D)/A E (A-E)/A
(Intel Pentium4 3.0GHz, Linux RedHat 9.0)sieve 84.3% 0.40 0.178 44.2% 0.087 51.0% 0.41 -2.6% 0.41 -0.7%merge 50.0% 0.62 0.470 76.0% 0.243 48.3% 0.68 -9.8% 0.47 24.0%qsort 93.9% 2.08 1.312 63.2% 0.124 90.5% 2.16 -4.1% 1.26 39.1%queens 4.2% 1.58 0.822 52.2% 0.812 1.3% 1.68 -6.8% 1.65 -4.7%msort 89.3% 0.95 0.572 59.9% 0.140 75.6% 0.98 -2.9% 0.75 21.6%professor 41.9% 2.99 0.215 7.2% 0.134 37.8% 3.27 -9.3% 3.16 -5.5%mirage 84.4% 1.06 0.060 5.6% 0.011 82.0% 1.12 -5.1% 1.09 -2.6%life 10.6% 3.44 0.050 1.4% 0.051 -2.8% 3.64 -6.0% 3.57 -3.8%k-eval 31.5% 1.01 0.019 1.9% 0.015 21.0% 1.04 -3.2% 1.04 -2.9%kb 3.4% 0.80 0.255 31.7% 0.255 -0.3% 0.83 -3.6% 0.85 -5.8%nucleic 16.9% 0.44 0.230 52.1% 0.147 36.0% 0.43 2.5% 0.41 7.2%(Sun UltraSparc 400MHz, Solaris 2.7)sieve 84.3% 4.14 1.464 35.4% 0.740 49.5% 4.39 -6.1% 4.04 2.4%merge 50.0% 4.47 3.492 78.2% 1.835 47.5% 5.02 -12.4% 3.13 30.0%qsort 93.9% 15.87 9.073 57.2% 0.901 90.1% 16.71 -5.3% 11.40 28.2%queens 4.2% 13.23 6.132 46.4% 6.557 -6.9% 14.34 -8.4% 14.20 -7.3%msort 89.3% 7.47 4.126 55.3% 0.999 75.8% 7.74 -3.7% 5.92 20.7%professor 41.9% 34.35 1.465 4.3% 0.969 33.8% 36.32 -5.7% 32.71 4.8%mirage 84.4% 9.79 0.409 4.2% 0.073 82.1% 10.29 -5.1% 9.79 0.1%life 10.6% 32.56 0.370 1.1% 0.365 1.5% 33.50 -2.9% 32.84 -0.9%k-eval 31.5% 9.34 0.120 1.3% 0.091 24.5% 9.46 -1.3% 9.29 0.6%kb 3.4% 5.76 1.420 24.7% 1.509 -6.2% 6.28 -9.1% 6.15 -6.7%nucleic 16.9% 2.57 1.188 46.3% 0.770 35.2% 2.58 -0.7% 2.34 8.8%
-6.9%~90.5%GC-time reduction
-7.3%~39.1%runtime reduction
High reuse ratio & big GC portion:
runtime speedup
Low reuse ratio: flags overhead
Small GC portion: almost no effect
7.2%5.6%1.4%1.9%
4.3%4.2%1.1%1.3%
-5.5%-2.6%-3.8%-2.9%
4.8%0.1%
-0.9%0.6%
in Objective Caml system
GC-time & Runtime GC-time & Runtime ChangesChanges
much reuse =
much GC-time reduction
much reuse & big GC-time portion
= much runtime reduction
memory reuse ratio
GC
tim
e
red
uct
ion
GC portion x memory reuse ratio
run
tim
e
red
uct
ion
ConclusionConclusion
programtransformation
resultprogram
performance
not muchsharing
+big GC-time
portionruntime speedup
high reuse ratio
memory peakreduction& GC timespeedup