Verifying Dereference Safety via Expanding-Scope Analysis

Verifying Dereference Safety via Expanding-Scope Analysis

Alexey Loginov (GrammaTech, Inc.)

Joint work with: E. Yahav, S. Chandra, S. Fink (IBM TJ Watson) N. Rinetzky (Tel-Aviv University) M.G. Nanda (IBM IRL)

Why Null-Dereference Analysis?

Common problem …or symptom of other problems

› Null-dereference warning may help in identifying root cause Relevant to all software Specification is obvious (absence of NPE)

› Requires no user interaction

2

Why Sound Null-Dereference Analysis?

Safety guarantees are important in some domains Results can become an in-code specification, e.g., via JSR 305

› Annotations can help with code understanding› Annotations can simplify future analyses (e.g., after modifications)

Precise and efficient sound analysis is challenging› Lessons carry over to other static analyses

3

Example answers expected1. class A {2. final A a = new A();

3. static main() {4. B b = new B();5. initB(b);6. a.foo(b); // okay7. }

8. foo(B b) {9. b.f.fun(); // okay10. b.f.f.gun(); // null-deref.11. }

12. static initB(B b) {13. b.f = new F(); // okay14. b.f.f = null; // okay15. }16. }

4

Interprocedural information is needed often– Allocations in callers (e.g., new B()) common– Allocations in callees (e.g., new F())

common

Common approaches

Most existing tools perform intraprocedural analysis Have to make assumptions about callers/callees Option 1: pessimistic assumptions about callers/callees

› Result: a sea of false alarms

5

Results of pessimistic intraproc. analysis1. class A {2. final A a = new A();

3. static main() {4. B b = new B();5. initB(b);6. a.foo(b); // null deref.7. }

8. foo(B b) {9. b.f.fun(); // two null derefs.10. b.f.f.gun(); // null deref.11. }

12. static initB(B b) {13. b.f = new F(); // null deref.14. b.f.f = null; // okay15. }16. }

6

Reports four false alarms– Only real error is on line 10

Common approaches

Most existing tools perform intraprocedural analysis Have to make assumptions about callers/callees Option 2: optimistic assumptions about callers/callees

› Result: missing real errors (catching the most glaring ones)

7

Results of optimistic intraproc. analysis1. class A {2. final A a = new A();

3. static main() {4. B b = new b();5. initB(b);6. a.foo(b); // okay7. }

8. foo(B b) {9. b.f.fun(); // okay10. b.f.f.gun(); // okay11. }

12. static initB(B b) {13. b.f = new F(); // okay14. b.f.f = null; // okay15. }16. }

8

Misses the real error on line 10

Common approaches

Most existing tools perform intraprocedural analysis Have to make assumptions about callers/callees Option 3: mostly optimistic assumptions

› Detects inconsistencies in programmer’s beliefs• Test x == null: belief that x could be null before test• Dereference of x without a test: belief that x cannot be null

› Allow analysis to dismiss assumptions contradicted by beliefs› Result: missing real errors, reporting safe dereferences as unsafe

• Generally, few false alarms but many missed errors• Same result as option 2 (optimistic assumptions) in our example

9

Prospects for interprocedural analysis

Whole-program analysis cannot scale to large software› Majority of instructions are relevant to null-dereference analysis

• Can’t prune down program to a small relevant subset

Need mechanism to break down a program’s complexity

10

Expanding-Scope Analysis Holy Grail

› Cost: INTRAprocedural analysis› Precision: INTERprocedural (whole-program) analysis

Staged approach› Analyze dereferences with limited interprocedural context› Verify dereferences with the least amount of context› Increase interprocedural context for harder cases› In simplest form

• Start with local analysis (with pessimistic assumptions)– Verify some dereferences without considering context

• Consider remaining dereferences with extra level of context– Verify some dereferences within a call subtree of immediate callers

• …› We refer to individual analyses as Limited-Scope Analyses

11

Expanding-Scope Analysis

12

… f.foo() …

f f f

f

f f

f

Expanding-Scope Analysis

13

foo

main

initB

b.f.fb.f.fun();

.gun();

B b = new B();initB(b);a.foo(b);

b.f = new F();b.f.f = null

Abstract Domain Product of three abstract domains

1. Abstract domain for may-alias analysis• Implementation: flow- & context-insensitive Andersen-style

2. Abstract domain for must-alias analysis• Implementation: demand-driven (based on def-use chains)

3. Set APnn of non-null access paths• Access paths denote l-value expressions:

– (VarId | StaticFieldId).InstanceFieldId*• Finiteness of domain guaranteed by (parameterized) bounds on

– Size of APnn

– Maximal length of access paths in APnn

› Only the final component (set of non-null access paths APnn) changes

14

Transfer Functions (statements)

15

Statement Transfer functionv = null APnn \ { v. | }

v = new T() APnn {v}

v = w APnn {v. | w. APnn}

v = w.f APnn {v. | w.f. APnn} mustAlias(w)

v.f = null APnn \ {e′.f. | e′ mayAlias(v), } mustAlias(v)

v.f = w APnn {e′.f. | w. APnn, e′ mustAlias(v)} mustAlias(v)

…v.foo()……v[i]……v.length…

APnn mustAlias(v)

Let = InstanceFieldId* (sequences of instance fields)

Transfer Functions (conditions)

16

Condition Transfer functionon true branch on false branch

v == null v APnn ? : APnn APnn mustAlias(v)

v instanceof T APnn mustAlias(v) APnn

v == wAPnn

(mustAlias(w) if v APnn) (mustAlias(v) if w APnn)

APnn

Real OO applications (e.g., web applications) have wide call graphs› High scope limits are too expensive to analyze

New stages help stave off the need for high scope limits1. Pruning

• Verifies dereferences of (non-null) final and stationary fields2. Special local (scope-0) analyses

a. Caller-guarantee analysis (top-down in call graph)– Propagates callers’ guarantees to callees– E.g., for references passed as arguments down deep call chains

b. Callee-guarantee analysis (bottom-up in call graph)– Propagates callees’ guarantees up to callers– E.g., for field initializations in deep initialization call chains

17

Staged Analysis in SALSA(Scalable Analysis via Lazy Scope expAnsion)

Staged Analysis in SALSA(Scalable Analysis via Lazy Scope expAnsion)

18

subtrees of depth 1 from parents

pruning

caller-guarantee

callee-guarantee

scope-1

scope-2

…

subtrees of depth 2 from grandparents

symbolic

high priority low priority

…

Steps of staged interproc. analysis1. class A {2.

3. static main() {4. 5. initB(b);6. 7. }

8. foo(B b) {9. 10. 11. }

12. static initB(B b) {13. 14. 15. }16. }

20

Pruning (final & stationary fields) Limited-scope analysis

1. Scope-0 (local analysis)

2. Scope-1 analysis

final A a = new A();

a.foo(b);

b.f.f

b.f.f = null;b.f

b.f.fun();

B b = new B();

.gun();

= new F();

1. Caller-guarantee (local) analysis2. Callee-guarantee (local) analysis3. Scope-1 analysisb.f APnn

b APnn

b APnn

Experimental results 21 (mostly open-source) applications

› ~3K-465K bytecodes; ~300-37K dereferences Avg: ~90% of dereferences verified soundly and automatically

› ~8% dismissed by Pruning› ~77% dismissed by caller-guarantee analysis› ~5% dismissed by remaining stages

Final scope limit: between 2 and 5 (chosen heuristicallly)› Diminishing returns after local analyses (caller-/callee-guarantee)› Higher scope limits useful in the absence of caller/callee guarantees

Max. access-path length: 2 for all but four applications› Higher access-path lengths had no effect for most applications› Helped C-like applications (direct field dereferences without getters)

21

Experimental results Expected many false alarms due to simple abstract domain Implemented heuristic symbolic path-validity checking

› This phase selected ~20% as high-priority warnings› Surprisingly low incidence of false alarms due to path-correlation

Biggest domain shortcoming: not tracking access-path types› Causes unnecessarily high cost of verifying certain dereferences

• Includes too many irrelevant code portions when verifying a dereference› Produces false alarms due to examining type-infeasible paths

Results are encouraging for the simplicity of the domain

22

Tool-User Interaction The output includes suggested annotations

› Ordered by the number of warnings guaranteed to be dismissed• Actual number would require an alternate abstract domain

› Current annotation options• Field f is non-null• Parameter p or return value of method foo() is non-null

User may choose to accept some annotations› We studied annotations for 8 benchmarks with high warning counts› A few hours effort for non-familiar code

• Result: 30% decrease in warning counts

23

Summary

Novel expanding-scope analysis› Applicable to multiple abstract domains

Scalable and precise null-dereference analysis› Staged analysis makes a simple abstract domain effective

Vision: improve programs’ specifications and robustness› Cleanse programs by examining warnings and suggested annotations› Check accepted annotations with assertions or symbolic techniques› Extend the program’s specification and analyzability via annotations

25

Documents

Verifying Dereference Safety via Expanding-Scope Analysis