30
Random Interpretation Sumit Gulwani UC-Berkeley

Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

Embed Size (px)

Citation preview

Page 1: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

Random Interpretation

Sumit Gulwani

UC-Berkeley

Page 2: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

2

Program Analysis

Applications in all aspects of software development, e.g.

• Program correctness• Compiler optimizations• Translation validation

Parameters• Completeness (precision, no false positives)• Computational complexity• Ease of implementation• What if we allow probabilistic soundness?

– We obtain a new class of analyses: random interpretation

Page 3: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

3

Random Interpretation

= Random Testing + Abstract Interpretation

• Almost as simple as random testing but better soundness guarantees.

• Almost as sound as abstract interpretation but more precise, efficient, and simple.

Page 4: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

4

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert(c+d = 0); assert(c = a+i)

c := 2a + b; d := b – 2i;

True False

FalseTrue

*

*

Example 1

•Random testing needs to execute all 4 paths to verify assertions.

•Abstract interpretation analyzes statements once but uses complicated operations.

•Random interpretation executes program once, but in a way that captures effect of all paths.

Page 5: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

5

Outline

Random Interpretation

– Linear arithmetic (POPL 2003)

– Uninterpreted functions (POPL 2004)

– Inter-procedural analysis (POPL 2005)

– Other applications

Page 6: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

6

Problem: Linear relationships in linear programs

• Does not mean inapplicability to “real” programs– “abstract” other program stmts as non-

deterministic assignments (standard practice in program analysis)

• Linear relationships are useful for– Program correctness

• Buffer overflows

– Compiler optimizations• Constant propagation, copy propagation, common

subexpression elimination, induction variable elimination.

Page 7: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

7

Basic idea in random interpretation

Generic algorithm:

• Choose random values for input variables.

• Execute both branches of a conditional.

• Combine the values of variables at join points.

• Test the assertion.

Page 8: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

8

Idea #1: The Affine Join operation

• Affine join of v1 and v2 w.r.t. weight w

w(v1,v2) ´ w v1 + (1-w) v2

• Affine join preserves common linear relationships (e.g. a+b=5)

• It does not introduce false relationships w.h.p.• Unfortunately, non-linear relationships are not

preserved (e.g. a £ (1+b) = 8)

w = 7

a = 2b = 3

a = 4b = 1

a = 7(2,4) = -10b = 7(3,1) = 15

Page 9: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

9

Geometric Interpretation of Affine Join

a

ba + b =

5

b = 2

(a = 2, b = 3)

(a = 4, b = 1)

: State before the join

: State after the join

satisfies all the affine relationships that are satisfied by both (e.g. a + b = 5)

Given any relationship that is not satisfied by any of (e.g. b=2), also does not satisfy it with high probability

Page 10: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

i=3, a=0, b=3

i=3

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert (c+d = 0); assert (c = a+i)

i=3, a=-4, b=7

i=3, a=-4, b=7c=23, d=-23

c := 2a + b; d := b – 2i;

i=3, a=1, b=2

i=3, a=-4, b=7c=-1, d=1

i=3, a=-4, b=7 c=11, d=-11

False

False

w1 = 5

w2 = 2

True

True*

*

Example 1

• Choose a random weight for each join independently.

• All choices of random weights verify first assertion

• Almost all choices contradict second assertion

Page 11: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

11

Example 2

We need to make use of the conditional x=y on the true branch to prove the assertion.

a := x + y

b := a

b := 2x

assert (b = 2x)

True Falsex = y ?

Page 12: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

12

Idea #2: The Adjust Operation

• Execute multiple runs of the program in parallel.

• Sample = Collection of states at a program point

• Combine states in the sample before a conditional s.t.– The equality conditional is satisfied.– Original relationships are preserved.

• Use adjusted sample on the true branch.

Page 13: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

13

Geometric Interpretation of Adjust

• Program states = points• Adjust = projection onto the hyperplane• S’ satisfies e=0 and all relationships satisfied by S

Algorithm to obtain S’ = Adjust(S, e=0)

S4

S2S3

S1

S’3

S’1S’2

Hyp

erpl

ane

e =

0

Page 14: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

14

Correctness of Random Interpreter R

• Completeness: If e1=e2, then R ) e1=e2

– assuming non-det conditionals

• Soundness: If e1e2, then R ) e1=e2

– error prob. ·• b: number of branches• j: number of joins• d: size of the field• k: number of points in the sample

– If j = b = 10, k = 15, d ¼ 232, then error ·

Page 15: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

15

Outline

Random Interpretation

– Linear arithmetic (POPL 2003)

– Uninterpreted functions (POPL 2004)

– Inter-procedural analysis (POPL 2005)

– Other applications

Page 16: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

16

Problem: Global value numbering

• Goal: Detect expression equivalence in programs that have been abstracted using “uninterpreted functions”

• Axiom of the theory of uninterpreted functionsIf x=y, then F(x)=F(y)

• Applications– Compiler optimizations– Translation validation

Page 17: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

assert(x = y);

assert(z = F(y));

*x = (a,b)

y = (a,b)

z = (F(a),F(b))

F(y) = F((a,b))

• Typical algorithms treat as uninterpreted– Hence cannot verify the second assertion

• The randomized algorithm interprets – as affine join operation w

x := a; y := a;

z := F(a);

x := b; y := b;

z := F(b);

Example

True False

Page 18: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

18

How to “execute” uninterpreted functions

e := y | F(e1,e2)

• Choose a random interpretation for F

• Non-linear interpretation– E.g. F(e1,e2) = r1e1

2 + r2e22

– Preserves all equivalences in straight-line code– But not across join points

• Lets try linear interpretation

Page 19: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

19

Random Linear Interpretation

• Encode F(e1,e2) = r1e1 + r2e2

• Preserves all equivalences across a join point• Introduces false equivalences in straight-line code. E.g. e and e’ have same encodings even though e

e’

• Problem: Scalar multiplication is commutative.• Solution: Evaluate expressions to vectors and

choose r1 and r2 to be random matrices

F

F F

a b c d

e = F

F F

a c b d

e’ = Encodings

e = r1(r1a+r2b) + r2(r1c+r2d)

= r12(a)+r1r2(b)+r2r1(c)

+r22(d)

e’ = r12(a)+r1r2 (c)+r2r1(b)

+r22(d)

Page 20: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

20

Outline

Random Interpretation

– Linear arithmetic (POPL 2003)

– Uninterpreted functions (POPL 2004)

– Inter-procedural analysis (POPL 2005)

– Other applications

Page 21: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

21

Example

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert (c + d = 0); assert (c = a + i)

c := 2a + b; d := b – 2i;

True False

False

•The second assertion is true in the context i=2.

•Interprocedural Analysis requires computing procedure summaries.

True

*

*

Page 22: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

i=2

a=0, b=i

a := 0; b := i;

a := i-2; b := 2;

c := b – a; d := i – 2b;

assert (c+d = 0); assert (c = a+i)

a=8-4i, b=5i-8

a=8-4i, b=5i-8c=21i-40, d=40-21i

c := 2a + b; d := b – 2i;

a=i-2, b=2

a=8-4i, b=5i-8c=8-3i, d=3i-8

a=8-4i, b=5i-8 c=9i-16, d=16-9i

False

False

w1 = 5

w2 = 2

Idea #1: Keep input variables symbolic

•Do not choose random values for input variables (to later instantiate by any context).

• Resulting program state at the end is a random procedure summary.

a=0, b=2c=2, d=-2

True

True

*

*

Page 23: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

23

Idea #2: Generate fresh summaries

u = 5¢2 -7 = 3v = 5¢1 -7 = -2w = 5¢1 -7 = -2

x = 5i-7

w = 5 x = 3x = i+1

x := i+1;

x := 3;

return x;

*

Procedure P Input: i

Assert (u = 3);Assert (v = w);

u := P(2); v := P(1); w := P(1);

Procedure Q

•Plugging the same summary twice is unsound.

•Fresh summaries can be generated by random affine combination of few independent summaries!

True False

Page 24: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

24

Experiments

0255075

go (29K) ijpeg(28K) li (23K) gzip (8K)

# of input

constants

Page 25: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

25

Experiments

0255075

go (29K) ijpeg(28K) li (23K) gzip (8K)

# of input

constants

0

25

50

go (29K) ijpeg(28K) li (23K) gzip (8K)

time (in s)

Randomized Deterministic

• Randomized algorithm discovers 10-70% more facts.

• Randomized algorithm is slower by a factor of 2.

Page 26: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

26

Experimental measure of error

The % of incorrect relationships decreases with increase in • S = size of set from which random values are chosen.• N = # of random summaries used.

103 105 108

2 95.5 95.5 95.5

3 64.3 3.2 0

4 0.2 0 0

5 0 0 0

6 0 0 0

S

N

The experimental results are better than what is predicted by theory.

Page 27: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

27

Outline

Random Interpretation

– Linear arithmetic (POPL 2003)

– Uninterpreted functions (POPL 2004)

– Inter-procedural analysis (POPL 2005)

– Other applications

Page 28: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

28

Other applications of random interpretation

• Model Checking– Randomized equivalence testing algorithm for

FCEDs, which represent conditional linear expressions and are generalization of BDDs. (SAS 04)

• Theorem Proving– Randomized decision procedure for linear arithmetic

and uninterpreted functions. This runs an order of magnitude faster than det. algo. (CADE 03)

• Ideas for deterministic algorithms– PTIME algorithm for global value numbering, thereby

solving a 30 year old open problem. (SAS 04)

Page 29: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

29

Future Work and Limitations

Future Work• Random interpreters for other theories

– E.g. data-structures

• Combining random interpreters– E.g. random interpreter for the combined theory

of linear arithmetic and uninterpreted functions.

Limitations• Does not discover “never equal” information

– Only detects “always equal” information

Page 30: Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler

Summary

Key Idea Complexity

Complexity

Linear Arithmetic Affine Join O(n2) O(n4)

Random interpretation

Abstract interpretation

Lessons Learned

•Randomization buys efficiency, simplicity at cost of prob. soundness.

•Randomization suggests ideas for deterministic algorithms.

•Combining randomized techniques with symbolic is powerful.

Uninterpreted Fns.

Vectors O(n3) O(n4)

Interproc. Analysis

Symbolic i/p variables

Poly blowup

?