37
Efficient Reflection String Analysis via Graph Coloring Neville Grech George Kastrinis Yannis Smaragdakis

Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

Efficient Reflection String Analysis via Graph Coloring

Neville GrechGeorge KastrinisYannis Smaragdakis

Page 2: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

Strings important for static analysis?s1 = “script error in file {0} : {1}”

s2 = “count”

s3 = “Usage: {0} [options] [arguments...]\n\nwhere...”

s4 = “Manager”

Page 3: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

Enter reflections1 = “script error in file {0} : {1}”

s2 = “count”

s3 = “Usage: {0} [options] [arguments...]\n\nwhere...”

s4 = “Manager”

Class c = Class.forName(s4)

Method m = c.getMethod(s2 + “Sales”)

m.invoke(...)

Page 4: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

Reflection - The backbone of dynamic features● e.g. Dynamic Proxy Pattern in Java (~ 21% of open source programs)

● ignoring reflection ⇒ top causes of unsoundness

● highly controlled through string values (member selectors)

Page 5: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

Naive string analysis is expensive● Doop & DaCapo-Bach avrora (context-insensitive):

2.9M Strings VS 2M regular objects in Var-Points-To

● IBM WALA & DaCapo-2006 antlr (0-1-CFA):

6.7M VS 1.7M

dominated by string values

Page 6: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated
Page 7: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

compress Stringconstants

Retain member selection ability

Page 8: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

sneak peek

Page 9: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated
Page 10: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

size reduction for points-to sets (with string values)over very aggressive string interning techniques ~2x

Page 11: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

size reduction for points-to sets (with string values)over very aggressive string interning techniques

size reduction for computed sets

~2x

~1.5x

Page 12: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

size reduction for points-to sets (with string values)over very aggressive string interning techniques

size reduction for computed sets

speedup~1.5x

~2x

~20%

Page 13: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

size reduction for points-to sets (with string values)over very aggressive string interning techniques

size reduction for computed sets

speedup

transparent approach - no pitfalls!!

~1.5x~2x

~20%

Page 14: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

The idea - Color a conflict graph● string constants as nodes

● edge iff two nodes match* distinct members in same class

● fast graph coloring (?)

● nodes with the same color can be merged

Page 15: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

class A { int foo; void bar() {..} void baz() {..}}

class B { int frotz; int grue; String zork() {..}}

class C { int frodo; void gandalf() {..} void barahir() {..}}

“bar”

“fro”

“gand”

“foo”

“baz”

“frotz”

“gru”

“bara”

“zork”“alf”

Page 16: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

“fro”

“foo”

“frotz”

“gru”

“gand” “bara”

“bar”

“baz”

“zork”

“alf”

class A { int foo; void bar() {..} void baz() {..}}

class B { int frotz; int grue; String zork() {..}}

class C { int frodo; void gandalf() {..} void barahir() {..}}

Page 17: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

“fro”

“foo”

“frotz”

“gru”

“gand” “bara”

“bar”

“baz”

“zork”

“alf”

class A { int foo; void bar() {..} void baz() {..}}

class B { int frotz; int grue; String zork() {..}}

class C { int frodo; void gandalf() {..} void barahir() {..}}

Page 18: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

“fro”

“foo”

“frotz”

“gru”

“gand” “bara”

“bar”

“baz”

“zork”

“alf”

class A { int foo; void bar() {..} void baz() {..}}

class B { int frotz; int grue; String zork() {..}}

class C { int frodo; void gandalf() {..} void barahir() {..}}

Page 19: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

“fro”

“foo”

“frotz”

“gru”

“gand” “bara”

“bar”

“baz”

“zork”

“alf”

class A { int foo; void bar() {..} void baz() {..}}

class B { int frotz; int grue; String zork() {..}}

class C { int frodo; void gandalf() {..} void barahir() {..}}

Page 20: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

Suboptimal is good enough● minimum #colors required already too large ● several thousands ⇒ few hundreds already beneficial

● benefit not proportional to the reduction

near-linear-time greedy algorithm

(strings) (colors)

Page 21: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

“fro”

“foo”

“frotz”

“gru”

“gand” “bara”

“bar”

“baz”

“zork”

“alf”

class A { int foo; void bar() {..} void baz() {..}}

class B { int frotz; int grue; String zork() {..}}

class C { int frodo; void gandalf() {..} void barahir() {..}}

Page 22: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

“fro”

“foo”

“frotz”

“gru”

“gand” “bara”

“bar”

“baz”

“zork”

“alf”

class A { int foo; void bar() {..} void baz() {..}}

class B { int frotz; int grue; String zork() {..}}

class C { int frodo; void gandalf() {..} void barahir() {..}}

Page 23: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

“foo” “gru”

“gand”

“baz”

“zork”

“alf”“fro”

“frotz”

“bara”“bar”

Page 24: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

“fro”

“frotz”

“bara”“bar”

“foo” “gru”

“gand”

“baz”

“zork”

“alf”

Page 25: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

under the hood

Page 26: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

BeforeString a = “zork”;

Class cls = unknown() ? A.getClass() : B.getClass();

Method m = cls.getMethod(a); B:zo ()

Page 27: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

foo gru

gand

baz

zork

alf

after - unwanted imprecisionString a = ;

Class cls = unknown() ? A.getClass() : B.getClass();

Method m = cls.getMethod(a); B:zo ()A:ba ()

Page 28: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

foo gru

gand

baz

zork

alf

backward analysisString a = ;

Class cls = unknown() ? A.getClass() : B.getClass();

Method m = cls.getMethod(a);

String s = (String) m.invoke();

B:zo ()A:ba ()

vo A:ba ()

Page 29: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated
Page 30: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

Analysis speedup~20% ~14%

Page 31: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

Memory reduction (Var-Points-To)~1.5x ~1.16x

Page 32: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

String Var-Points-To reduction~1.97x

Page 33: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

precision & soundness ε < 0.2%

effectiveness ~1sec

compression ratio ~6.5x

in most cases zero

Page 34: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

conclusion

● string values important in analyzing reflection

Page 35: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

conclusion

● string values important in analyzing reflection

● they would dominate a naive analysis

Page 36: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated

conclusion

● string values important in analyzing reflection

● they would dominate a naive analysis

● compress while retaining member selection ability(with no practical drawbacks)

Page 37: Efficient Reflection String Analysis via Graph Coloring · 2019-03-28 · Analysis via Graph Coloring Neville Grech ... IBM WALA & DaCapo-2006 antlr (0-1-CFA): 6.7M VS 1.7M dominated