Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008

Convergence of Model Checking & Program Analysis

Philippe Giabbanelli CMPT 894 – Spring 2008

Some of you know more than me on the topic…

…do not hesitate to point out my mistakes at the end !

1

Overview

Convergence

Customization

Toward customization Formalism Analysis Results

2

• We are doing static analysis: we want the properties of programs without having to execute them.

• We have the code of a program, and we want to be guaranteed that there are no errors.

• We don’t work with the raw code but with higher representations of it like the Control Flow Graph (CFG).

• Basic example from the CFG: if there are too many nested control flow instructions, it is a bad pattern in the complexity.


3

Model Checking

I╞ S ?Given an implementation of the system, does it satisfy the specifications ?

Program Analysis

• The implementation is represented as an automaton.

• If we represent all states, it will be huge… thus we have sets of states (OBDD).

• Accurate analysis but very costly: small programs!

• If we have big programs, we cannot afford too costly operations: we use approximations.

• The aim is an efficient calculus of more or less basic properties.

Graphs from Patrick Clousot


4

Model Checking Program AnalysisThere are different paths through the states of the program. They define a « reachability tree » (i.e. states that ‘can happen’).

• Not every path can happen: we want to be accurate, we keep the paths well separate.

• We consider that every path of the CFG can be executed: we are doing an approximation by ignoring things such as conditional statements.

→ Two nodes are merged if they refer to the same control location.

→ Nodes are never merged.

The merge operator is a bit different.


5


Model Checking Program Analysis

We don’t explore the reachability tree forever: we stop at some point. That’s what we call the « termination » : when we stop at a node.

• We stop when the set of states computed for the next step is included in the current.

• We stop when the abstract state does not represent new concrete states (it is a fixpoint).

The termination operator is a bit different.


6


The termination operator is a bit different.

• The model checker BLAST has been extended to allow customized program analyses.

• We have a set of abstract interpreters. Let’s call the overall engine the meta-engine.

• We configure the meta-engine by defining a composite merge operator and a composite terminator operator (composite as several interpreters).


7

So, what do we have here?

• It is neither stricly a model checker nor a program analyzer.

• The difference between a model checker and a program analyzer is a somehow a parameter of our configurable model.

• This illustrates the convergence: we have a bit of both approaches!


8

• We consider simple imperative programming languages.

• What do we have in a program ?

∙ Lines, the current one being indicated by Pc.

∙ The control flow is transfered from one location to another.

∙ A set X of variables.

• The ways to move in a program are already given by a CFG. If we add information about variables, we are turning it into an automaton (CFA).

∙ L is the set of program locations (values taken by Pc).

∙ are the control-flow edges

∙ A concrete state c associates all variables X and Pc to a value.

∙ C is the set of all possible concrete states.

∙ A subset r of C is called a region.


9

• We have the automaton, but how do we move in it? ∙ G is the set of transitions (or edges). g is a transition if it belongs to G.

∙ c → c’: we can go from the concrete state c to the concrete state c’ if there exists a transition from c to c’ by some g

∙ A state cn is reachable from r if there is a path from r to cn. This path is defined as a set of transitions:

cn belongs to Reach(r) if there exists (r, c1,…, cn) such that ci-1 → ci for all 1 ≤ i ≤ n.


10

• Let see what an abstract domain is through an example…

∙ What is the result of 48176 * 59876 * 285561 ?

∙ What is the sign of 48176 * 59876 * 285561 ?

∙ We replaced the domain of integers by the domain of sign.

∙ As we said in abstract interpretation, we are simplifying a bit the problem. Here, we consider an abstract domain of signs: we still compute something relevant, but a bit less ambitious.∙ We take the problem, we abstract its domain and we get an answer in a reasonable time. To get back from the abstract domain to the concrete one, we use a concretization function.

C(negative) = [-∞, -1] C(positive) = [0, +∞]


11

Schemes from Patrick Cousot, « Abstract Interpretation Based Formal Methods and Future Challengeces » (Springer-Verlag 2001)


12

• Now that we’ve got the basis, what is configurable program analysis?

∙ An abstract domain, determines the objective of the analysis.

∙ A transfer relation, assigns to each abstract state its successors.

∙ The merge operator, combines two abstract states.

∙ The termination check : if the abstract state given as first parameter is covered by the states of abstract state given as second parameter, then we stop. The way we define covered is where the customization happens.

• Each of the four components independently influences precision&cost.


13

• Among the program analysis used in the experiments, we have: location analysis, predicate abstraction and shape analysis.

∙ In location analysis, we track the reachability of program locations. Can we go there?

∙ The predicate abstraction defined by Ball, Podeslki and Rajamani considers programs where the only type is boolean. Their programs c2bp turned a program P with predicates E into a boolean program B(P, E), and then they launched the model checker bebop. Developped between 1999-2001.

∙ There might be destructive updating on dynamically allocated storage. Shape analysis keeps tracks of the data structures stored on the heap (i.e. dynamic allocation).


14

• The authors have developped the reachability algorithm CPA. Given a configurable program analysis and an initial abstract state, it gives the set of reachable abstract states that over-approximates the set of reachable concrete states.

∙ Let see what an over-approximation is through an example.

int i = 0; for(int j=0; j<10; j++) tttttif (rand() > 0.3) tttttttttti++

At the end, i is in [0, 10]. An over-approximation of those concrete states would be [-5,15].

∙ If we set the merge operator to combine, then it considers that x might be equal to z and declares that it is unsafe. This is not possible in the concrete program: false-alarm!

∙ Now, we configure the merge to keep things separated, and we find that x and y will have different values: no division by 0, the program is safe, hurray!

∙ Why don’t we always go for the accurate analysis? Because if he have n if, it might lead to 2^n abstract states and the algorithm might not terminate because of loops…


15

• We have many tools, some being accurate but quite slow, other converging faster but loosing in accuracy… Let’s combine them!

∙ For example, predicate abstraction and shape analysis can be combined. The shape graph can become more accurate by using information from the predicate abstraction. The accuracy of the combination is the degree of sharpness.

• Three composite program analysis are used in the experiments.∙ Basic BLAST’s. The components are the configurable program analysis for program locations and the configurable program analysis for predicate. Merge/Step are separated.

∙ We can add to it a third component: shape analysis.

∙ We can also add to it pointer analysis as a third component (tracks pointer aliases, memory allocations, etc.)


16

• Let’s consider BLATS’s + shape analysis. We want an efficient configuration for it. Let see what we’ve got from the experiments…

∙ A. Stop is on separate and merge is on join. Bad: many highly expensive operations.

∙ B. Stop is on separate and merge is on separate (lazy shape analysis). Better cost as operations on small sets of shape graphs are much more efficient than on large one (like in A).

∙ C. Stop is on separate and merge is on separate and we use the predicate to sharpen the shape information. This sharpening has a small cost and gives a more precise analysis!

∙ D. To do better than B, let’s turn stop on join. On the examples, it doesn’t do well: the time spent on termination checks was very small, and as there is an overhead for the join then we are loosing in performances.

∙ E. Turning both on join looses precision and has a high cost.


17

Source for this presentation. Configurable Software Verification: concretizing the convergence of model checking and program analysis, Dirk Beyer, Thomas A. Henzinger and Grégory Théoduloz

Thanks For Your Time

Documents

Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008