Upload
yolanda-terrell
View
33
Download
1
Tags:
Embed Size (px)
DESCRIPTION
On Abstraction Refinement for Program Analyses in Datalog. Xin Zhang , Ravi Mangal , Mayur Naik Georgia Tech. Radu Grigore , Hongseok Yang Oxford University. Datalog for program a nalysis. Datalog. What is Datalog ?. Datalog. What is Datalog ?. Input relations: - PowerPoint PPT Presentation
Citation preview
On Abstraction Refinement for Program Analyses in
DatalogXin Zhang, Ravi Mangal,
Mayur NaikGeorgia Tech
Radu Grigore, Hongseok YangOxford University
2 6/10/2014
Datalog for program analysis
Datalog
Programming Language Design and Implementation, 2014
3 6/10/2014
What is Datalog?
Datalog
Programming Language Design and Implementation, 2014
4 6/10/2014
What is Datalog?
Datalog
Input relations:Output relations:Rules:
Programming Language Design and Implementation, 2014
Least fixpoint computation:
edge(i, j).path(i, j).
(1) path(i, i).(2) path(i, k) :- path(i, j), edge(j, k).
Input: edge(0, 1), edge(1, 2).path(0, 0).path(1, 1).path(2, 2).path(0, 1) :- path(0, 0), edge(0, 1).path(0, 2) :- path(0, 1), edge(1, 2).
5 6/10/2014
Why Datalog?
Programming Language Design and Implementation, 2014
Datalog
If there exists a path from a to b, and there is an edge from b to c, then there exists a path from a to c:
path(a, c) :- path(a, b), edge(b, c).
6 6/10/2014
Why Datalog?
Programming Language Design and Implementation, 2014
k-object-sensitivity,
k = 2, ~100KLOC
7 6/10/2014
Limitation
Programming Language Design and Implementation, 2014
k-object-sensitivity,
k = 2, ~100KLOC
k-object-sensitivity,
k = 10, ~500KLOC
8 6/10/2014
Program abstraction
Programming Language Design and Implementation, 2014
Precision
Scalability
Abstraction
9 6/10/2014
Parametric program abstraction
Programming Language Design and Implementation, 2014
Precision
Scalability
Abstraction
1 1 1 1 1
10 6/10/2014
Parametric program abstraction
Programming Language Design and Implementation, 2014
Precision
Scalability
Abstraction
1 0 1 1 0
11 6/10/2014
Parametric program abstraction: Example 1
Programming Language Design and Implementation, 2014
1 0 1 1 0
Cloning depth K for each call site and
allocation site
Pointer Analysis
12 6/10/2014
Parametric program abstraction: Example 2
Programming Language Design and Implementation, 2014
1 0 1 1 0
Predicates to use as
abstraction predicates
Shape Analysis
13 6/10/2014
Program abstraction
Programming Language Design and Implementation, 2014
1 0 1 1 0 0 1 1 0 0
Datalog Progra
m
Datalog Progra
m
alias(p, q)?
alias(m, n)?
14 6/10/2014
Program abstraction
Programming Language Design and Implementation, 2014
1 0 1 1 0 0 1 1 0 0
Datalog Progra
m
Datalog Progra
m
alias(p, q)?
alias(m, n)?
Counterexample guided refinement
(CEGAR) via MAXSAT
15 6/10/2014
Pointer analysis example
Programming Language Design and Implementation, 2014
f(){ v1 = new ...; v2 = id1(v1); v3 = id2(v2);q2:assert(v3!= v1);}
id1(v){return v;}
g(){ v4 = new ...; v5 = id1(v4); v6 = id2(v5);q1:assert(v6!= v1);}
id2(v){return v;}
16 6/10/2014
Pointer analysis as graph reachability
Programming Language Design and Implementation, 2014
0
1
2
3
4
5
6’
7
66’’
7’7’’
a1 a0 b0
c1 c0 d0 d1
a1
c1 c0
b0 b1
d1d0
a0
b1
17 6/10/2014
Graph reachability in Datalog
Programming Language Design and Implementation, 2014
Input relations: edge(i, j, n), abs(n)
Output relations: path(i, j)
Rules: (1) path(i, i).(2) path(i, j) :- path(i, k), edge(k, j, n), abs(n).
0
1
2
3
4
5
6’
7
66’’
7’7’’
a1 a0 b0
c1 c0 d0 d1
a1
c1 c0
b0 b1
d1d0
a0
b1
Input tuples:edge(0, 6, a0), edge(0, 6’, a1), edge(3, 6, b0),
…
abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).
Query Tuple
Original Query
q1: path(0, 5)
assert(v6!= v1)
q2: path(0, 2)
assert(v3!= v1)16 possible abstractions
in total
18 6/10/2014
Desired result
Programming Language Design and Implementation, 2014
0
1
2
3
4
5
6’
7
6
7’
6’’
7’’
a1 b0
c1 d0
a1
c1
b0
d0
a0
c0 d1
c0
b1
d1
a0
b1
Input tuples:edge(0, 6, a0), edge(0, 6’, a1), edge(3, 6, b0),
…
Query Answer
q1: path(0, 5)
a1b0c1d0
q2: path(0, 2)
Impossibility
abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).
Input relations: edge(i, j, n), abs(n)
Output relations: path(i, j)
Rules: (1) path(i, i).(2) path(i, j) :- path(i, k), edge(k, j, n), abs(n).
19 6/10/2014
Iteration 1
Programming Language Design and Implementation, 2014
0
1
2
3
4
5
7
66’
7’
6’’
7’’
b0
d0
b0
d0
a0
c0
c0
a0
a1
c1
a1
c1
d1
b1
d1
b1
Query Eliminated Abstractions
q1: path(0, 5)
q2: path(0, 2)
abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).
path(0, 0).path(0, 6) :- path(0, 0), edge(0, 6, a0), abs(a0).path(0, 1) :- path(0, 6), edge(6, 1, a0), abs(a0).path(0, 7) :- path(0, 1), edge(1, 7, c0), abs(c0).path(0, 2) :- path(0, 7), edge(7, 2, c0), abs(c0).path(0, 4) :- path(0, 6), edge(6, 4, b0), abs(b0).path(0, 7) :- path(0, 4), edge(4, 7, d0), abs(d0).path(0, 5) :- path(0, 7), edge(7, 5, d0), abs(d0).…
20 6/10/2014
Iteration 1 - derivation graph
Programming Language Design and Implementation, 2014
0
1
2
3
4
5
7
66’
7’
6’’
7’’
b0
d0
b0
d0
a0
c0
c0
a0
a1
c1
a1
c1
d1
b1
d1
b1
abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).
Query Eliminated Abstractions
q1: path(0, 5)
q2: path(0, 2)
21 6/10/2014
Iteration 1 - derivation graph
Programming Language Design and Implementation, 2014
abs(d0)
path(0,6)edge(6,1,a0) edge(6,4,b0)
path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)
path(0,7)edge(7,2,c0) edge(7,5,d0)
path(0,2) path(0,5)
abs(a0)edge(0,6,a0)path(0,0)
abs(a0) abs(b0)
abs(c0) abs(d0)
22 6/10/2014
Iteration 1 - derivation graph
Programming Language Design and Implementation, 2014
abs(d0)
path(0,6)edge(6,1,a0) edge(6,4,b0)
path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)
path(0,7)edge(7,2,c0) edge(7,5,d0)
path(0,2) path(0,5)
abs(a0)edge(0,6,a0)path(0,0)
abs(a0) abs(b0)
abs(c0) abs(d0)
a0c0
23 6/10/2014
Iteration 1 - derivation graph
Programming Language Design and Implementation, 2014
abs(d0)
path(0,6)edge(6,1,a0) edge(6,4,b0)
path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)
path(0,7)edge(7,2,c0) edge(7,5,d0)
path(0,2) path(0,5)
abs(a0)edge(0,6,a0)path(0,0)
abs(a0) abs(b0)
abs(c0) abs(d0)
a0c0
24 6/10/2014
Iteration 1 - derivation graph
Programming Language Design and Implementation, 2014
abs(d0)
path(0,6)edge(6,1,a0) edge(6,4,b0)
path(0,1) path(0,4)abs(c0)edge(1,7,c0) edge(4,7,d0)
path(0,7)edge(7,2,c0) edge(7,5,d0)
path(0,2) path(0,5)
abs(a0)edge(0,6,a0)path(0,0)
abs(a0) abs(b0)
abs(c0) abs(d0)
a0b0d0
25 6/10/2014
Iteration 1 - derivation graph
Programming Language Design and Implementation, 2014
0
1
2
3
4
5
7
66’
7’
6’’
7’’
b0
d0
b0
d0
a0
c0
c0
a0
a1
c1
a1
c1
d1
b1
d1
b1
abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).
Query Eliminated Abstractions
q1: path(0, 5)
a0c0d0, a0b0d0 (4/16)
q2: path(0, 2)
a0c0 (4/16)
26 6/10/2014
Encoded as MAXSAT
Programming Language Design and Implementation, 2014
MAXSAT(Find thatMaximize Subject to
Hard Constraints
Soft Constraints
27 6/10/2014
Encoded as MAXSAT
Programming Language Design and Implementation, 2014
abs(a0)abs(a1), abs(b0)abs(b1),abs(c0)abs(c1), abs(d0)abs(d1).
Hard constraints:
…
Soft constraints:
Avoid all the counterexam
ples
Minimize the abstraction
cost
28 6/10/2014
Encoded as MAXSAT
Programming Language Design and Implementation, 2014
Hard constraints:
…
Soft constraints:
Query Eliminated Abstractions
q1: path(0, 5)
a0c0d0, a0b0d0 (4/16)
q2: path(0, 2)
a0c0 (4/16)
Solution:
a1c0d0
29 6/10/2014
Iteration 2 and beyond
Programming Language Design and Implementation, 2014
MAXSAT solver
Datalog
solver
a1c0d0
Iteration 1 Derivation
Query Answer Eliminated Abstractions
q1: path(0, 5)
a0c0d0, a0b0d0, a1d0 (4/16)
q2: path(0, 2)
a0c0, a1c0 (4/16)
Constraint s
𝑪𝟏 𝑪𝟏∧
30 6/10/2014
Iteration 2 and beyond
Programming Language Design and Implementation, 2014
MAXSAT solver
Datalog
solver
a1c0d0
Iteration 2 Derivation
Query Answer Eliminated Abstractions
q1: path(0, 5)
a0c0d0, a0b0d0, a1d0 (4/16)
q2: path(0, 2)
a0c0, a1c0 (4/16)
Constraint s
𝑪𝟏 𝑪𝟏∧
31 6/10/2014
Iteration 2 and beyond
Programming Language Design and Implementation, 2014
MAXSAT solver
Datalog
solver
a1c0d0
Iteration 2 Derivation
Query Answer Eliminated Abstractions
q1: path(0, 5)
a0c0d0, a0b0d0, a1d0 (4/16)
q2: path(0, 2)
a0c0
(4/16)
Constraint s
𝑪𝟏
32 6/10/2014
Iteration 2 and beyond
Programming Language Design and Implementation, 2014
MAXSAT solver
Datalog
solver
a1c1d0
Iteration 2 Derivation
Query Answer Eliminated Abstractions
q1: path(0, 5)
a0c0d0, a0b0d0, a1c0d0
(6/16)
q2: path(0, 2)
a0c0, a1c0 (8/16)
Constraint s
𝑪𝟏
33
Query Answer Eliminated Abstractions
q1: path(0, 5)
a0c0d0, a0b0d0, a1c0d0 (6/16)
q2: path(0, 2)
a0c0, a1c0 (8/16) 6/10/2014
Iteration 2 and beyond
Programming Language Design and Implementation, 2014
MAXSAT solver
Datalog
solver
a1c1d0
Iteration 3
q1 is proven.
a1c1d0
Derivation
Constraint s
𝑪𝟏
34
Constraint s
𝑪𝟏
6/10/2014
Iteration 2 and beyond
Programming Language Design and Implementation, 2014
MAXSAT solver
Datalog
solver
a1c1d0
Query Answer Eliminated Abstractions
q1: path(0, 5)
a0c0d0, a0b0d0, a1c0d0 (6/16)
q2: path(0, 2)
a0c0, a1c0, a1c1, a0c1 (16/16)
Iteration 3
q2 is impossible to prove.
Impossibility
Derivation
q1 is proven.
a1c1d0
35 6/10/2014
Mixing counterexamples
Programming Language Design and Implementation, 2014
Iteration 1 Iteration 3
a0c0 a1c1Eliminated Abstractio
ns:
36 6/10/2014
Mixing counterexamples
Programming Language Design and Implementation, 2014
Iteration 1 Iteration 3
a0c0 a1c1a0c1
Mixed!
Eliminated Abstractio
ns:
37
Implemented in JChord using off-the-shelf solvers: Datalog: bddbddb MAXSAT: MiFuMaX
Applied to two analyses that are challenging to scale: k-object-sensitivity pointer analysis:
flow-insensitive, weak updates, cloning-based typestate analysis:
flow-sensitive, strong updates, summary-based
Evaluated on 8 Java programs from DaCapo and Ashes.
6/10/2014
Experimental setup
Programming Language Design and Implementation, 2014
38 6/10/2014
Benchmark characteristics
Programming Language Design and Implementation, 2014
classes methods bytecode(KB)
KLOC
toba-s 1K 6K 423 258
javasrc-p 1K 6.5K 434 265
weblech 1.2K 8K 504 326
hedc 1K 7K 442 283
antlr 1.1K 7.7K 532 303
luindex 1.3K 7.9K 508 295
lusearch 1.2K 8K 511 314
schroeder-m
1.9k 12K 708 460
39 6/10/2014
Results: pointer analysis
Programming Language Design and Implementation, 2014
queries abstraction size
iterationstotal
resolved
current
baseline
final max
toba-s 7 7 0 170 18K 10
javasrc-p 46 46 0 470 18K 13
weblech 5 5 2 140 31K 10
hedc 47 47 6 730 29K 18
antlr 143 143 5 970 29K 15
luindex 138 138 67 1K 40K 26
lusearch 322 322 29 1K 39K 17
schroeder-m
51 51 25 450 58K 15
4-object-sensitivity
< 50%
< 3% of max
40 6/10/2014
Performance of Datalog: pointer analysis
Programming Language Design and Implementation, 2014
lusearchk = 1, 153s
k = 2, 214s
k = 4, 3h28m
k = 3, 590s
Baseline
41 6/10/2014
Performance of MAXSAT: pointer analysis
Programming Language Design and Implementation, 2014
lusearch
42 6/10/2014
Statistics of MAXSAT formulae
Programming Language Design and Implementation, 2014
pointer analysis
variables
clauses
toba-s 0.7M 1.5M
javasrc-p 0.5M 0.9M
weblech 1.6M 3.3M
hedc 1.2M 2.7M
antlr 3.6M 6.9M
luindex 2.4M 5.6M
lusearch 2.1M 5M
schroeder-m
6.7M 23.7M
43 6/10/2014
Conclusion
Programming Language Design and Implementation, 2014
Abstraction
Datalog MAXSAT
44 6/10/2014
Conclusion
Programming Language Design and Implementation, 2014
Datalog
Soundness
Tradeoffs
MAXSAT
Hard Constraints
Soft Constraints
Scalability vs. Precision
Sound vs. Complete
…
A(x, y):- B(x, z), C(z, y)
𝐴 (𝑥 , 𝑦 )∨¬𝐵 (𝑥 , 𝑧 )∨¬𝐶 (𝑧 , 𝑦 )
1 0 1 1 0
𝑎𝑏𝑠 (𝑎0 ) 𝐰𝐞𝐢𝐠𝐡𝐭𝟏