Code-Carrying Theory Aytekin Vargun Rensselaer Polytechnic Institute

Code-Carrying Theory

Aytekin Vargun

Rensselaer Polytechnic Institute

Outline

Introduction Proof-Carrying Code (PCC) Code-Carrying Theory (CCT) Generic Proofs Organizing Theorems and Proofs Conclusions and Future Work

Potential Problems to be Solved

Memory Safety illegal operations or illegal access to memory

Security unauthorized access to data or system

resources Functional Correctness

whether the code does correctly what it is formally required to do

Two Solutions

Proof-Carrying Code (PCC) Code-Carrying Theory (CCT)

Proof-Carrying Code (PCC)

Developed by Necula and Lee [1996] at CMU.

Basic Idea: Use machine-checkable proofs as certificates.

Proof construction is harder than proof checking Code producer provides the proof Code consumer checks it

Code-Carrying Theory (CCT) The consumer gives the specification of the

function The producer starts with axioms that define

functions The form of axioms is such that it is easy to extract

executable code from them.

Prove that the defined functions obey certain requirements. Termination Consistency Correctness

Code-Carrying Theory (CCT)

The producer transmits Axioms, Theorems, and Proofs

No explicit code transmission The consumer checks proofs to see if the

theorems are proved If proof checking succeeds, the consumer

applies the code extractor to the axioms and obtain the executable code

PCC/CCT Differences PCC starts from code and assertions, CCT starts

from assertions only and later extracts code from them

PCC concentrates on safety properties which are relatively easy to prove fully automatically, we have concentrated on functional correctness properties which are more difficult

We concentrate more on the proof issues with these more challenging types of properties, and less on programming language issues or other issues that PCC deals with more directly

Code-Carrying Theory (CCT) Proving Termination:

Use TCGEN to produce the termination condition (TC) and termination axioms

Prove TC

Proving Consistency: Use CCGEN to produce the consistency condition(CC) Prove it

Prove Correctness: Prove correctness conditions (CTC) given by the

consumer

ApplicationSpecific

Requirements

Code Producer

Code Consumer

CPU

Termination Condition (TC)

ApplicationSpecific

Requirements

Consistency Condition (CC)General

Requirements

CC TC

Proofs

BothCC and TC

Proved?Assert FDA

GeneralRequirements

Prove Correctness(CTC)

CTC Proved?

Axioms(FDA) Proofs

ProofChecker

ProofChecker

TC

CC

BothProofsCheck?

Assert FDA

Check Proof of CTC

CTC Proved?

Check Proof of CC

Check Proof of TC

ProofChecker

CODE

Code Producer

Code Consumer

Axioms(FDA) Proofs

DifferentTC

DifferentCC

BothProofsCheck?

Assert FDA

Check Proof of CTC

CTC Proved?

Check Proof of CC

Check Proof of TC

ProofChecker

CC TC

Proofs

Hacker

Axioms(FDA)

Proofs

CPU CODE

Code Producer

Code Consumer

Axioms(FDA) Proofs

TC

CC

BothProofsCheck?

Assert FDA

Check Proof of CTC

CTC Proved?

Check Proof of CC

Check Proof of TC

ProofChecker

CC TC

Proofs

Hacker

ProofsAxioms(FDA)

CPU CODE

Code Producer

Code Consumer

CPU

Axioms(FDA) Proofs

DifferentTC

DifferentCC

BothProofsCheck?

Assert FDA

Check Proof of CTC

CTC Proved?

Check Proof of CC

Check Proof of TC

ProofChecker

CODE

CC TC

Proofs

Hacker

Axioms(FDA)

Proofs

Issues

Encoding axioms and proofs Proof Checking Implementation of

CCGEN TCGEN CODEGEN

ATHENA

Implemented by K.Arkoudas A language for both:

Ordinary Computation Logical Deduction

ATHENA Ordinary Computation Language

Provides higher-order functions Has primitive functions for

Unification Matching Substitution

ATHENA Logical Language

Special Deductive Forms dcheck, dseq, assume, …

Primitive Deduction Methods mp, both, left-and, …

Declarations structure, declare, …

Directives load-file, clear-assumption-base, …

Calls to external automatic resolution theorem provers like SPASS and Vampire

ATHENA Advantages

Better Proof Readability Machine checkable proofs Makes it possible to formulate and write proofs as

methods Good for writing generic proofs

write the proof once and instantiate it to prove specific cases

But:

ATHENA No built-in rewriting methods

We added the following methods to be able to use equational rewriting: (setup c t) : initializes c with t (reduce c u E) : attempts to transform the term t in c to

be identical with the given term u by using theorem E as a left-to-right rewriting rule

(expand c u E) : attempts to transform the term t in c to be identical with the given term u by using theorem E as a right-to-left rewriting rule

(combine left right) : deduces (= t u) if left contains (= t t’) , right contains (= u u’), and if t’ and u’ are identical terms.

CCT - Tools

Small trusted computing base TCGEN + CCGEN + CODEGEN ≈1000 lines Tested with hundreds of axioms/theorems

and more than 10.000 lines of proofs

Termination of a function

Termination is undecidable But it can be solved in special cases Does a measure of arguments decrease in

the ordering with each recursive call of the function?

This requires an ordering relation to be defined every time

TCGENTermination of a function

Our approach is similar but does not use an ordering relation

We construct the proof of termination as a proof by induction that mirrors the recursion structure in the axioms

We generate a termination axiom for each axiom

Construct a termination condition Prove the termination condition using the

termination axioms

Function-defining Axioms:

(forall ?x (= (power ?x zero) one))

(forall ?x ?n (= (power ?x (succ ?n)) (Times ?x (power ?x ?n))))


(forall ?x (= (power ?x zero) one))

(forall ?x ?n (= (power ?x (succ ?n)) (Times ?x (power ?x ?n))))

Termination Axioms:

(forall ?x (power_t ?x zero))

(forall ?x ?n (if (and (power_t ?x ?n)

(Times_t ?x (power ?x ?n)))(power_t ?x (succ ?n))))

Termination Axioms:

(forall ?x (power_t ?x zero))

(forall ?x ?n (if (and (power_t ?x ?n)

(Times_t ?x (power ?x ?n)))(power_t ?x (succ ?n))))

Steps:Rename power to power_tCheck the right hand sides. If the rhs is a constant, eliminate it if there are nested function applications in rhs, conjunct themConstruct an implication from new lhs and rhs ``if rhs lhs’’Eliminate the applications of known total functionsAssert these and prove the termination condition

one is a constant

Termination Axioms:(forall ?x (power_t ?x zero))

(forall ?x ?n (if (power_t ?x ?n)

(power_t ?x (succ ?n))))

Termination Axioms:(forall ?x (power_t ?x zero))

(forall ?x ?n (if (power_t ?x ?n)

(power_t ?x (succ ?n))))

Termination Condition:(forall ?x ?n (power_t ?x ?n))

Termination Condition:(forall ?x ?n (power_t ?x ?n))

Times_t is total

CCGENConsistency of axioms

Input is function-defining axioms Output is a predicate (the consistency condition) It states that it is possible to define a function that

satisfies the axioms: For every tuple of values of the function domain,

there exists a range value y


(forall ?y (= (f ?y zero) one)))

(forall ?x ?y (if (not (= ?y zero)) (= (f ?x ?y) two)))


(forall ?y (= (f ?y zero) one)))

(forall ?x ?y (if (not (= ?y zero)) (= (f ?x ?y) two)))

(forall ?x ?w (if (= ?w zero) (= (f ?x ?w) one)))

(forall ?x ?w (if (not (= ?w zero)) (= (f ?x ?w) two)))

(forall ?x ?w (if (= ?w zero) (= (f ?x ?w) one)))

(forall ?x ?w (if (not (= ?w zero)) (= (f ?x ?w) two)))

Consistency Condition is:

(forall ?x ?w (exists ?y (and (if (= ?w zero) (= ?y one)) (if (not (= ?w zero)) (= ?y two)))))

Consistency Condition is:

(forall ?x ?w (exists ?y (and (if (= ?w zero) (= ?y one)) (if (not (= ?w zero)) (= ?y two)))))

Steps:Rename ?y to ?wAdd or update conditionsReplace (f ?x ?w) with ?y, conjunct the propositions, and add ``exists ?y’’

Proving Correctness

Application-specific Requirements(from the consumer)

(define sum-list@-empty (= (sum-list@ Nil) zero))

(define sum-list@-nonempty (forall ?L ?x

(= (sum-list@ (Cons ?x ?L)) (Plus ?x (sum-list@ ?L)))))

Application-specific Requirements(from the consumer)

(define sum-list@-empty (= (sum-list@ Nil) zero))

(define sum-list@-nonempty (forall ?L ?x

(= (sum-list@ (Cons ?x ?L)) (Plus ?x (sum-list@ ?L)))))

Correctness Condition:

(define sum-list-correctness (forall ?L (= (sum-list ?L) (sum-list@ ?L))))

Correctness Condition:

(define sum-list-correctness (forall ?L (= (sum-list ?L) (sum-list@ ?L))))

Function-defining Axioms (Producer)

(define sum-list-empty (= (sum-list Nil) zero))

(define sum-list-nonempty (forall ?L ?x

(= (sum-list (Cons ?x ?L)) (sum-list-compute ?L ?x))))

(define sum-list-compute-empty (forall ?x (= (sum-list-compute Nil ?x) ?x)))

(define sum-list-compute-nonempty (forall ?L ?x ?y (= (sum-list-compute (Cons ?y ?L) ?x) (sum-list-compute ?L (Plus ?x ?y)))))







Correctness Proof (Producer)

(by-induction sum-list-correctness (Nil (dseq (!setup left (sum-list Nil)) (!setup right (sum-list@ Nil)) (!reduce left zero sum-list-empty) (!reduce right zero sum-list@-empty) (!combine left right))) ((Cons x L) (dseq (!setup left (sum-list (Cons x L))) (!setup right (sum-list@ (Cons x L))) (!reduce left (sum-list-compute L x)

sum-list-nonempty) (!reduce right (sum-list-compute L x)

sum-list-compute-relation) (!combine left right))))))

Correctness Proof (Producer)

(by-induction sum-list-correctness (Nil (dseq (!setup left (sum-list Nil)) (!setup right (sum-list@ Nil)) (!reduce left zero sum-list-empty) (!reduce right zero sum-list@-empty) (!combine left right))) ((Cons x L) (dseq (!setup left (sum-list (Cons x L))) (!setup right (sum-list@ (Cons x L))) (!reduce left (sum-list-compute L x)

sum-list-nonempty) (!reduce right (sum-list-compute L x)

sum-list-compute-relation) (!combine left right))))))

Note:Executable but inefficient code can be extracted from these axioms

Define an efficient function

Application-specific Requirements (from the consumer)(define reverse-range-Correctness (forall ?i ?j

(if (valid (range ?i ?j)) (forall ?M

(= (access-range (reverse-range M (range i j)) (range i j))

(reverse (access-range M (range i j))))))))

Application-specific Requirements (from the consumer)(define reverse-range-Correctness (forall ?i ?j

(if (valid (range ?i ?j)) (forall ?M

(= (access-range (reverse-range M (range i j)) (range i j))

(reverse (access-range M (range i j))))))))

(define reverse-empty-range-axiom (forall ?i ?M

(= (reverse-range ?M (range ?i ?i)) Function-defining Axioms (Producer) ?M)))(define reverse-nonempty-range-axiom1 (forall ?i ?j ?M

(if (and (not (= ?i ?j)) (= (++ ?i) ?j)) (= (reverse-range ?M (range ?i ?j))

?M))))(define reverse-nonempty-range-axiom2 (forall ?i ?j ?M

(if (and (valid (range ?i ?j)) (and (not (= ?i ?j)) (not (= (++ ?i) ?j))))

(= (reverse-range ?M (range ?i ?j)) (reverse-range (swap ?M (* ?i) (* (-- ?j))) (range (++ ?i) (-- ?j)))))))


(= (reverse-range ?M (range ?i ?i)) Function-defining Axioms (Producer) ?M)))(define reverse-nonempty-range-axiom1 (forall ?i ?j ?M





Note:Specification is not executable. The correctness condition itself is a specification.

Note:Proof is by range inductionBasis cases:

Empty range: (range i i)Range of one element: (range i (++ i))

Induction Step:Assume for (range (++ i) (-- j))Show that it is true for (range i j)

CODEGENCode Extraction

Quantified Equations and Conditional Equations These are clauses of a recursive function definition CODEGEN has to be able to combine these into a

recursive function Target language is currently Oz Oz has pattern matching Possible to extract efficient code: Oz has ``last call

optimization’’. Executes tail-recursive functions in constant stack size


Can extract both: Memory-observing (examines data structures but

doesn’t make any changes) access, access-range, sum-list, find, find-if, power

Memory-updating functions (makes in-place changes) assign, assign-range, swap, reverse-range, rotate,

copy

Does optimizations when necessary













Code Extraction (Consumer)

fun {SumList L} case L of nil then 0 [] X|L then {SumListCompute L X} endEnd

fun {SumListCompute L X} case L of nil then X [] Y|L then {SumListCompute L (X + Y)} endend

Code Extraction (Consumer)

fun {SumList L} case L of nil then 0 [] X|L then {SumListCompute L X} endEnd

fun {SumListCompute L X} case L of nil then X [] Y|L then {SumListCompute L (X + Y)} endend

Note:There are two variables but:

``case [L X]’’ has been optimized to ``case L’’ by CODEGEN


(= (reverse-range ?M (range ?i ?i)) Function-defining Axioms (Producer) ?M)))

(define reverse-nonempty-range-axiom1 (forall ?i ?j ?M






(= (reverse-range ?M (range ?i ?i)) Function-defining Axioms (Producer) ?M)))

(define reverse-nonempty-range-axiom1 (forall ?i ?j ?M





Note:CODEGEN optimizes it

fun {ReverseRange M R } Code needs to be optimized case R of range(I I ) then M [] range(I J ) then if {And {Not (I == J )} {Not ({`++` I } == J )} } then

{ReverseRange {Swap M {`*` I } {`*` {`--` J } } } range({`++` I } {`--` J } ) } elseif {And {Not (I == J )} ({`++` I } == J )} then Mend

endend



endend



endend



endend

fun {ReverseRange M R } Optimized Code case R of range(I I ) then M [] range(I J ) then if {Not (I == J )}

then if ({`++` I } == J ) then M else {ReverseRange {Swap M {`*` I } {`*` {`--` J } } }

range({`++` I } {`--` J } ) } endend

endend

fun {ReverseRange M R } Optimized Code case R of range(I I ) then M [] range(I J ) then if {Not (I == J )}

then if ({`++` I } == J ) then M else {ReverseRange {Swap M {`*` I } {`*` {`--` J } } }

range({`++` I } {`--` J } ) } endend

endend

We have been working on simple functions. But: In analogy to STL, it is useful to have a library of

simple functions from which more complex functions can be composed, especially if the functions are generic

It is possible for CODEGEN to extract complex functions composed of such simple functions


Generic Proof Writing Proofs are very large Generic Proofs might be a solution No need to develop and transmit the similar proofs

to the consumer It is harder to write generic proofs but, Once the consumer has the generic proofs, he can

instantiate them with many different ways Athena is a higher order language:

We can express generic functions and proofs

Generic Proof Writing Generic property definitions and proofs are

constructed in the form of programs that are parameterized with operator mappings

Generic theorem: it is a generic property contains a single property, for which there is an

associated generic proof

Provide functions which perform operator mappings

Instantiate the generic proof with a particular operator mapping later

(let ((Plus (ops 'Plus)) (Zero (ops 'Zero)))


(match name ('sum-list@-empty (= (sum-list@ Nil)

Zero)) ('sum-list@-nonempty (forall ?L ?y

(= (sum-list@ (Cons ?y ?L)) (Plus ?y (sum-list@ ?L))))))))




(define (sum-list@-definition name ops)

Local Declarations

Name and parameterlist

Generic Axiom or Theorems

(match name ('sum-list-compute-relation (forall ?L ?x

(= (sum-list@ (Cons ?x ?L))(sum-list-compute ?L ?x))))))

(match name ('sum-list-compute-relation (forall ?L ?x

(= (sum-list@ (Cons ?x ?L))(sum-list-compute ?L ?x))))))

(define (sum-list-compute-relation name ops) Name and parameterlist

Axiom or Theorems

Generic Property Definitions in CCT

(dlet ((Zero (ops 'Zero)) (left (cell true)) (right (cell true)) (prop (method (name) (!property name ops Sum-list-theory))) (theorem (sum-list-correctness name ops)))

(dlet ((Zero (ops 'Zero)) (left (cell true)) (right (cell true)) (prop (method (name) (!property name ops Sum-list-theory))) (theorem (sum-list-correctness name ops)))

(by-induction theorem (Nil (dseq (!setup left (sum-list Nil)) (!setup right (sum-list@ Nil)) (!reduce left Zero (!prop 'sum-list-empty)) (!reduce right Zero (!prop 'sum-list@-empty)) (!combine left right))) ((Cons x L) (dseq (!setup left (sum-list (Cons x L))) (!setup right (sum-list@ (Cons x L))) (!reduce left (sum-list-compute L x)

(!prop 'sum-list-nonempty)) (!reduce right (sum-list-compute L x)

(!prop 'sum-list-compute-relation)) (!combine left right))))))

(by-induction theorem (Nil (dseq (!setup left (sum-list Nil)) (!setup right (sum-list@ Nil)) (!reduce left Zero (!prop 'sum-list-empty)) (!reduce right Zero (!prop 'sum-list@-empty)) (!combine left right))) ((Cons x L) (dseq (!setup left (sum-list (Cons x L))) (!setup right (sum-list@ (Cons x L))) (!reduce left (sum-list-compute L x)

(!prop 'sum-list-nonempty)) (!reduce right (sum-list-compute L x)

(!prop 'sum-list-compute-relation)) (!combine left right))))))

(define (sum-list-correctness-proof name ops)

Local Declarations

Name and parameterlist

Generic Proof

A Generic Proof method in CCT









(define (sum-list@-definition name ops)(define (sum-list@-definition name ops)

Instantiation of a Generic AxiomOperator Mappings:

(define (Monoid-ops op) (match op ('Plus Plus) ('Zero zero)))

(define (Times-ops op) (match op ('Plus Times) ('Zero one)))

(define (Monoid-ops op) (match op ('Plus Append) ('Zero Nil)))

Instantiated Axioms:

(= (sum-list@ Nil) zero)

(forall ?L ?y(= (sum-list@ (Cons ?y ?L)) (Plus ?y (sum-list@ ?L))))


(= (sum-list@ Nil) zero)

(forall ?L ?y(= (sum-list@ (Cons ?y ?L)) (Plus ?y (sum-list@ ?L))))


(= (sum-list@ Nil) one)

(forall ?L ?y(= (sum-list@ (Cons ?y ?L)) (Times ?y (sum-list@ ?L))))


(= (sum-list@ Nil) one)

(forall ?L ?y(= (sum-list@ (Cons ?y ?L)) (Times ?y (sum-list@ ?L))))


(= (sum-list@ Nil) Nil)

(forall ?L ?y(= (sum-list@ (Cons ?y ?L)) (Append ?y (sum-list@ ?L))))


(= (sum-list@ Nil) Nil)

(forall ?L ?y(= (sum-list@ (Cons ?y ?L)) (Append ?y (sum-list@ ?L))))

Conclusions

CCT provides strong assurance for correctness Only very small examples so far, but a basis for

tackling larger examples Readable proofs Generic proof writing Tools for organizing theorems and proofs

Future Work

Test CCT with more examples, including the ones that are larger and more complex

Complete the extension of CODEGEN to check preconditions where necessary

Use CCT to prove safety properties A Really Longer Term Goal: Verifying

Compiler – Tony Hoare’s grand challenge problem

Organizing Theorems and Proofs

We have a few hundred axioms, theorems, and proofs

Prove some lemmas and use them in the proofs of other theorems

Main idea: Group the related properties under the same theories

Searches for a stored theorem are faster

Organizing Theorems and Proofs

We define a structured theory as an abstract data type with the following functions theory: creates a structured theory from a generic

property function containing axioms evolve: extends an existing structured theory with a

new generic theorem and its proof; refine: creates a new structured theory as a

composition of one or more existing structured theories and a generic property function.

property: retrieves an instance of a generic property function, and its corresponding proof

Iterator Theory++, - -, *, I-, I+, I-I

Range Theoryvalid, range

Memory TheoryAccess, Assign, Swap

Memory Range TheoryAccess-range, Assign-range

Naturalszero, succ

ListsNil, Cons

++ preincrement -- predecrementI- iterator subtractionI+ iterator additionI-I iterator difference

Documents

Code-Carrying Theory Aytekin Vargun Rensselaer Polytechnic Institute