The Code Validation Tool (CVT)

The Code Validation Tool (CVT)

The Code Validation Tool(CVT)Paper by:A. PnueliO. ShtrichmanM. SiegelCourse: 236814

Served to: Professor Orna GrumbergServed By: Yehonatan RubinBefore we startThis article is from 1998.

Thus, I would like you all to join me into a trip to the past.

I would like to introduce you all to a Language called DC+.DC+Intermediate language for multiple synchronous languages.

Advantages:PortabilityMultiple languages: needing only one optimizerCombining models written in different languages.

DC+s General Principals(from the official Manual)Requirement 1: It should be general enough to greet a variety of source languages.

Requirement 2: Since it is intended to be used in industrial compilers, which are submitted to hard performance requirements, its use should neither complexify too much the compiling process, nor impede the performances of the generated code.Single optimizerSingle Intermediate language for many popular languages.

Very complex efficient code generator were writtenCan we even trust those code generators?Two feasible solutionsFormal validation of the code generatorValidate semantic equivalency between the original DC+ code and the resulted C codeTodays TopicSo lets start from why this is badFormal validation of the code generatorMeaning: full formal verification of the code generator.

Problems:Extremely hard to do in industrial size code generators.Once finished- frizzes the design. (Who would dare to change something?)Code ValidationInstead of validating the code generator once-Validating each run.

Meaning: making sure that the resulted C code is semantically equivalent to the initial DC+ code.Has to be AutomaticCVT in one slideDC+ code

C codeCVTApprovedNot ApprovedWhat CVT is good for?Production of safety critical systems

Enables the use of code generation tool in such high quality systems.

The combination of automatic code generation and validation eliminating the need for hand-coding the target code... .The combination of automatic code generation and validation improves the design flow of embedded systems in both safety and productivity by eliminating the need for hand-coding the target code

10When does a C code correctly implements the DC+ source?Hard question.

First, will need to understand DC+s semantics.Semantics of DC+ programsSynchronous.

Describes a reactive system whose behavior along time is observable as an infinite sequence of states.

State changes are triggered by the arrival of new values for the input variables.No InterruptsSemantics of DC+ programs IIA list of constraints on the program variables.

When new values arrive to the input variables, the other variables values are being determined according to the constraints list.

At each instance in time all constraints have to be satisfied by the values that the variables have at that instance.

The list of constraints determines the transition relation of the system.Semantics of DC+ programs IIIThere are four kinds of variables:Input variablesOutput variablesInternal variablesThe trigger for state changesObservable variablesFor internal useRegister variablesStore informationabout the history of the current computationComparing C program and DC+ programboth the DC+ and the generated C program need to be translated into a common semantic domain.

STS- synchronous transition systems will be used.This is how STS worksSTS S=(V,,p)

VpA finite set V of typed variables

A satisfiable assertion characterizing the initial states ofsystem S

transition relation

all original program variables the initial state

obtained by a one-to-one translation ofthe list of constraints into logical formulas

This is how STS works IISolutions of p for given values of the input variables determine the values of the remaining variables.

Observable behavior of such a system can be understood in the following way:

initial state of the systemLegal State along the computationThis is how STS works IIIReminder: DC+ is Synchronous.

According to synchrony hypothesis:no time delay between the reception of new values for input variables and the generation of corresponding output values

all variables are updated simultaneously.

atomicBoundedDeterministicThe C codeThe result of the code generator is a C code.

Will have the following structure:ANSI-COne control loopeach iteration corresponds to one step of the DC+ program

The C code IIUnlike in DC+, here the variables are not updated simultaneously.

the control loop consumes new values for input variables and successively computes (one by one) the values of the remaining variables.

The C code IIIstates marked with a bullet, corresponds to the begin (and end) of the control-loop.

those states match the states of the original DC+ program.

Intermediate states, where only some variables have been updated, are not depicted since they do not correspond to any state of the DC+ program.

The C code IVFor the purpose of semantical comparison, the C program is also translated into an STS representation.Correct ImplementationNow, we have a common semantic domain for both the C and DC+ programs.

Well say that:Program C implements DC+ if for every computation of C there exists a computation of DC+ such that and agree state-wise on the values of observable variables, i.e., input and output variables

Correct Implementation-Formal definition

One last thing- a mappingNow, well need a mapping from DC+ C (abstraction to concrete).

The use-case code generator applies more than 100 optimizations.

Thus, the mapping domain will be the observable variables (I/O).

The mapping will assign a term over the concrete variables for every abstract variable.Finally- a logical rule If both of these proof obligations are found to be valid, we can conclude that C is a correct refinement of the corresponding DC+ program.

So?Now well move to the practical part.

CVTs Architecture.

CVTs Architecture

What we talked about so farAuto DecompositionThe right hand side of the implication is in the form of a conjunction.

Since the time it takes to verify a programusing BDD based tools is worst-caseexponential in the size and complexity of the formula, it is the size of the single formula that has to be verified that determines the bottleneck of the validity checking.

Before (practical) SAT solvers?

Auto Decomposition IIPaper claims to soon explain why.I couldnt find where this soon isAuto Decomposition IIWhy Decomposition?

Verifying each formula is exponential in its size. Decomposition causes linear increase in the amount of validation tasks.And linear decrease in each task size. which means exponential decrease in verification time of each formula.This is way Decomposition is so importantCone of InfluenceAfter breaking the right-hand side, the module returns to the left-hand side of the implication, and calculates the Cone of Influence.

COI: the portion of the formula in the left-hand side that is needed for proving the selected conjuncts on the right-hand side.

Cone of InfluenceHow to calculateCone of InfluenceExample

RecapCOI: the portion of the formula in the left-hand side that is needed for proving the selected conjuncts on the right-hand side.

Now, we have many pairs of files to be calculated (possibly even simultaneously)

The pairs are:Conjunct from the right sideThe conjunct COI from the left side

CVTs Architecture

What we talked about so farThe Abstraction Moduleabstraction is needed since we are trying to verify a formula which contains integer and float variables, as well as functions over these variables using a BDD-Based decision procedure for finite-state models.

The abstraction module treats these functions as uninterpreted functions, replacing them by new symbols.The Abstraction Module IIThe faithfulness of this technique depends on two things: the way that the compiler manipulates these functionsthe kind of functions we leave uninterpreted.

Should we interpret more function?The more we interpret, the more faithful the model is.(its also hard to interpret complex functions)The less we interpret, the smaller the model is.The Abstraction Module IIIThe abstraction works in an incremental manner.

CVT begins with maximum abstraction.

all functions except equalities, Boolean operators and if-then-else are left uninterpreted.

If the proof fails, CVT invokes the next level of abstraction.

Additionally, comparisons operators on integers (, etc.) are now being interpreted.

There are no more levels.ExampleExample for why this is necessary:If the compiler reads aa.

The first level of abstraction will result a false negative.

The second level of abstraction will result a true positive.This leaves us with a quantifier-free first-order logic formula which enjoys the small model property(i.e., it is satisfiable iff it is satisfiable over a finite domain).

Therefore the next issue is the calculation of a finite domain.

such that the formula is valid if and only if it is valid over all interpretations into this domain.Once we have a valid domain, checking whether the formula is satisfiable or not is relatively easy thing to do (BDDs).

So, which domain to use?Choosing a domainWhich function do we interpret?Level 0: equalities, Boolean operators and if-then-else Level 1: comparisons operators on integers (, etc.)Only order is importantIf there are n variablesThe domain [1..n] contains all possible rearrangement The domain [1..n] is ValidCVTs Architecture

What we talked about so farThe Range Minimization Module

Range minimizationThats R:The main algorithm

TLVTLV- the verifier module

SMV based tool.

Invoked for each pair of files (as created from the COI).

If equivalence proof fails:It is possible to isolate the conjunct that failed it.

EvaluationCase study: a turbine from the SACRES project.

5 units (manually separated).

DC+ is few thousands line of code with over 1000 variables.

Evaluation- ResultsUnverified conjuncts add a very large Cone Of Influence.

The EndThank you

Documents

The Code Validation Tool (CVT)