Abstractions, instantiations, and proofs of marking algorithms

ABSTRACTIONS, INSTANTIATIONS, AND PROOFS OF MARKING ALGORITHMS

Lawrence Yelowitz Department of Computer Science University of Pittsburgh Pittsburgh, PA 15260

Arthur G. Duncan Department of Mathematical Sciences I. U. P. U. I. Indianapolis, IN 46205

Abstract A detailed look is taken at the problem of factoring program proofs into a proof of the underlying algorithm, followed by a proof of correct implementation of abstract variables at the concrete level. We do this considering four different concrete "marking" algorithms and formulating a single abstract algorithm and set of abstract specifications that can be ins{antiated to each of the four concrete cases. An intermediate assertion, as well as sufficient conditions for correct initialization, invariance, and correctness at termination are given at the abstract level. Proofs at the concrete level are then given by exhibiting appropriate mapping functions (from the concrete state vector to the abstract variables), and showing that the sufficient conditions are true. Proofs of termination are given by instantiating "termination schemas".

i. INTRODUCTION

The notion of proving correctness of a program by first showing correctness of the underlying algorithm (which may manipulate high level variables or data structures), and then showing correctness of the implementation (but without reproving the underlying logic) has long intrigued researchers [e.g. Ye 72] and has played a part in modern program language design [e.g. WuLoSh76]. In this paper we explore and illustrate this idea in detail, by considering four different concrete programs that perform the same task [Kn 73, Section 2.3.5, Alg. A, B, C, E]. We develop a single abstract algorithm which operates on high level variables, and which performs the essential function of the four concrete programs. We give an abstract intermediate assertion, and sufficient conditions for initialization, correctness at termination, and invariance of the intermediate assertion. Proving correctness af; the concrete programs then can focus entirely on showing that certain sufficient conditions based on a mapping from concrete-to-abstract are satisfied. We also give three termination schemas which can be instantiated to prove that all four concrete programs terminate.

The concrete programs are designed to "mark" all memory cells in a certain

starting set S0, or in the transitive closure of a relation R 0 applied to S O .

Such programs are useful in the "garbage collection" aspect of memory management systems.

Previous researchers [Mo 72, To 74] have proved the correctness at the concrete level of one of the four algorithms (algorithm E [Sc Wa 67]), and [Su 76] has given a mechanical proof of this algorithm. These results provide valuable insights into the nature of proving marking algorithms, particularly since pointer variables are manipulated. We hope that by making a clear distinction between the underlying mathematical aspects of marking and the concrete representation of abstract variables, and that by providing an abstract algorithm that can be instantiated in several different ways, we are contributing to the understanding of marking algorithms in particular and program development and refinement in general. This research might also be viewed as a further exploration into control structure abstraction [Ge Ye

763. Below we use the control structures

if p then f fi, if p then f else g fie, while p d__~o f od, loop f until pod. Be~in .. end brackets are used as a notational convenience in separating a program frag- ment from surrounding text. The statement

13

S, "select x~X such that P(x)", is taken to have the following semantics (using Dijkstra's weakest precondition formula [Di 76]) :

wp(S,R)=(Vx) (x~X and P(x)=>R) and (3y~X)P (y) .

Thus an attempt to "select" a nonexistent x is equivalent to DiJkstra's "abort" statement.

2. SPECIFICATIONS AND DEVELOPMENT OF AN ABSTRACT MARKING ALGORITHM

2.1 Preliminaries

If W is a set and R is a relation on W, that is,

R~WxW then: <x,y>~R=(notationally) (x,R,y)

R(x)~{y l(x,R,y)}; if WIgW, then

R(WI)~ U R(x) ; dom(R) and ran(R) x~Wl

denote the domain and range of R;

R*(Wl)~Wl U R*(R(WI)).

2.2 Specifications for Marking Algorithms

Input: i. N (a finite static set; intuitively, the set of all memory cells to be considered for marking).

2. So~N (a static set correspond-

ing to all "immediately access- ible" cells).

3. R0~NxN (a static relation cor-

responding to "direct reacha- bility").

Output: Marked=Ro*(So). 2.3 Additional Abstract Program Variables

In addition to the constants N, SO,

R^, and the output variable, Marked, two u auxilliary program variables, S (a dynamic Subset of N), and R (a dynamic subset of RO) , are used. Intuitively, S contains elements which have been marked but which have descendants perhaps not yet marked, and R is the means of reaching these descendants. Let the abstract state vector <N, SO, RO, Marked, S, R> be denoted by

abst.

2.4 Abstract Intermediate Assertion

The output specification "Marked= R0*(S0)" may be rewritten "Marked gR0*(S0) ~

Marked", and in the style of Dijkstra [Di 76] general~zed to:

BII: S~Marked

BI2: Marked~Ro*(S O)

BI3: Ro*(S0)~Marked U R*(S)

BI4: R~R 0 .

Let "Bl(abst)" denote the conjunction of these four assertions ("BI" stands for "basic invarlant").

2.5 Initialization

Lemma 2.5-1: true => wp("(S,Marked,R) := (S0,So,R0)",Bl(abst)).

Proof: So~S0, proving Bll(abst). BI2,

BI3, and BI4 are equally trivial.

2.6 Correctness at Termination (Partial Correctness)

Lemma 2.6-1:

Bl(abst) and R*(S)~Marked=> Marked=Ro*(So).

C orallary 2.6-i: Bl(abst) an__~d S={}=>Marked=Ro*(S0).

Corallary 2.6-2: Bl(abst) an___dd R(S)={}=>Marked=R0*(So).

2.7 Maintaining the Invariance of Bl(abst)

Consider the concurrent assignment (S, Marked, R) :=(S', Marked', R'), and let abst' denote the updated state vector. Suppose there exist sets A, B, C which are subsets of N, and a set D~NxN with the properties that

(i) S' - S U A-B (ii) Marked' = Marked U C

(iii) R' = R-D

Theorem 2.7-1: Letting A, B, C, D and abst' be as described, and letting D2=ran(D) , suppose the following

six properties are satisfied:

i. A~Marked U C

2. C~R0* (S 0 )

3. B~Marked U C

4. R(B)~S U A-B

5. D2$Marked U C U S U A-B

6. R(D2)~S U A-B U (R-D)(S U A-B).

Then Bl(abst)=>Bl(abst').

Proof: Assume Bll(abst),i.e., S~Marked. To prove Bll(abst'),i.e., to prove S U A-B~Marked U C, it is clearly sufficient to prove A~Marked U C, which is the first condition listed in theorem i. Similarly, the second condition listed in theorem 1 is sufficient to prove the invariance of BI2. The invariance of BI4, i.e., the proof that R~R0=>R-DgR 0 is immediate.

Proof of the invariance of BI3 is now given. It must be shown that RD*(S0)$

Marked U C U (R-D) * (S U A-B). Based on Bl3(abst) it is sufficient to prove that Marked U R*(S) ~ Marked U C U (R-D)*(S U A-B). In proving this latter condition, it suffices

14

to show that

R*(S) ~ Marked U C U (R-D)*(S U A-B).

Let xeR*(S). We will show that x~Marked U C U (R-D)*(S U A-B).

Case i: xeS U A-B. Then xE(R-D)*(S U A-B) "immediately".

Case 2~ xaS and x~S U A-B. Then x~B, and by condition 3 in theorem i, xeMarked U C.

Case 3: x4S and x4S UA-B. Since xCR*(S), there is a sequence of elements

> in which p>l, x~ S <Xl, ..., Xp _ ,

(xi,xi+l)£R for i=l, p-l, and

x =x. We will show that there P

exists a sequence <yl,...,yq>_ in

which q~l, y~ S U A-B, (Yi' Yi+l ~

R-D for i=l, q-l, and yq=X, which

proves that xt(R-D)*(S U A-B), as

required. If <Xl,... , Xp> has

all the requisite properties of

<yl,...,yq>, we are done, so

assume that not all the requisite properties hold.

Subcase 3.1: Some pair (xi,xi+l)~R-D ,

i.e., (xi,xi+l)~D. Let i be the

largest index in the range i to p-i for which (xi,xi+l)~D. By

condition 5 in theorem i, Xi+l~

Marked U C U S U A-B, and by condition 6, xj~(R-D)*(S U A-B)

for i+l<j<p. Thus x ~Marked U C -- p

U (R-D)*(S U A-B).

Subcase 3.2: Each pair (xi,xi+l) ~R-D,

l<i<p-l, but Xl~S U A-B, i.e.,

Xl~B. By condition 4, x 2 S U A-B,

and thus the sequence <x2,...,Xp>

satisfies the required properties of < y l , . , , , y q > . QED.

Theorem 2.7-2: Suppose

i. Bl(abst) 2. (VY~ S)y~R(y) 3. x~S 4, A= R(x) U A' where A' Marked 5. B {x} 6. C=R(x) 7. D2=R(x)

8. D={<z,wmER lw~n 2}

Then Bl(abst').

Proof: We will show that the six conditions in theorem 1 are implied by the hypotheses of theorem 2. i. Show A~Marked U C, i.e., show R(x) O A'~

Marked U R(x). Trivial since A'~Marked.

2. Show C~_.R0*(So). By hypothesis 3 x~S,

by Bll(abst) xEMarked and by BI2, x6R0*(S0). By BI4, R(x)~R0(x). Since

R0(x)~Ro*(S 0) by the nature of the

closure operator "*", the desired re- sult follows.

3. Show BSMarked U C. Trivial, since x&Marked as shown in previous step.

4. Show R(B)~ S U A-B. If B={} this is trivial, so assume B={x}. By hypothesis 2, x/R(x), so R(x)~S U R(x) U A'- {x}.

5. Show R(x)~Marked U C U S U A-B. Proof is as in the previous step.

6. Show R(D2)$SUA-BU(R-D)(SUA-B).

Let z6R(D2) , i.e. there exists a y

such that (x,y~R and (y,z)G R. Note that yeS U A-B. If z6R(x) then z SUA-B. Otherwise (y,z)GR-D and thus z(R-D(SUA-B). QED.

Corollary 2.7-1: Suppose i. Bl(abst) 2. (VyGS)y~R(y) 3. x~S 4. A=A'UA"

where A'~R(x) A"~Marked

5. B={} 6. C=A' 7. D2=A' 8. D={<z,w>~R[w6D 2}

Then Bl(abst').

Corollary 2.7-2:

i. Bl(abst) 2. x~S 3. R(x)={} 4. A={} 5. B={x} 6. C : { } 7. D={}

Suppose

Then Bl(abst').

3. TERMINATION SCHEMAS

Three "termination schemas" are pre- sented which can be instantiated to prove termination of the four concrete algorithms given later in this paper.

Termination Schema 3-1

besin

(x,Y):=(Xo,Yo) ; while Y<Ymax d__oo

(x,y):=f(x,y) od

end

Suppose x, y take on only integer values, Xma x and Ymax are integer con-

stants, and suppose also that for any

15

"consecutive" state vectors (xi,Yi) , (Xi+l,Yi+ I) at the while test,

Xi+l>X i o_r_r (Xi+l=Xi an___dd Yi+l > yi ). Then

the schema terminates. Proof- this is a simple variant of the well-founded sets

technique of Floyd [FI 67].


begin

(x,y) :=(Xo,Y O) ; while Y>Ymin d__oo

(x,y):=f(x,y) od

end

Suppose x, y assume only integer

values, Ymin and x max are integer con-

stants with X~Xmax, Y~Ymin" Suppose that

for consecutive state vectors (xi,Yi),

(Xi+l,Yi+l) , Xi+l>Xi o~r (Xi+l=X and Yi+l<Yi ). Then the schema terminates.


begin loop

while Q1 d° fl

od loop

if not Q2 then f2

fi until QI o__r_r Q2 od

until not Q1 and Q2 od end

Suppose there is an upper bound on the total number of times fl and f2 can be executed. Then the schema terminates. Proof - it is routine to verify that we can never get into a situation in which the outer loop executes forever, but in which neither fl nor f2 are executed in any given iteration of the outer loop.

4. FROM ABSTRACT TO CONCRETE

Let V=<v ,...,v,> be a concrete state vector. Ther~ are m~pping functions taking a given value of V into the abstract sets R, S, and Marked, and taking the initial value V 0 into R^ and S O . In addition there are mapping ~unctions ~aking a given value of V into the sets A, B, C, D as described in section 2.7. Let us denote these functions by V-R, V-S, V-Marked, V-R0,

V-S_, V-A, V-B, V-C, and V-D. Let V-abst denote the entire abstract state vector based on these mappings. There is a single set of functions comprSsing V-abst for any one algorithm, but there may be several V-A, V-B, V-C, and V-D functions (e.g. one set of these functions for each branch of an if then else). This is allowable since in section 2.7 one need only show the exis- tence of the sets A, B, C,D for each operation.

For convenience and simplicity in this paper, we occasionally use abstract variables in concrete programs. For example, in algorithm B below, the abstract set S consists of those elements whose address is on a concrete stack. We use the abstract assignment statement "S:=S-{x}U R(x)" rather than the concrete assignments that perform the appropriate stack operations. Naturally a proof of the concrete algorithm must show that the concrete stack is manipulated correctly, but these routine details are omitted from this paper. Similar comments apply to the use of the abstract relation R rather than the appropriate concrete variables.

As a simple example, suppose we wanted to prove that the concrete program "V:=Vo; while Q(V)doV:=f(V)od" is a totally correct marking proggram. Th-~ creative aspect of this proof would be to construct the mapping functions mentioned above. Once this has been accomplished, it should be fairly routine to use the appropriate theory. Thus we would show correct initialization by showing V0-S=V0-So,

Vo-Marked=Vo-So, and Vo-R=Vo-Ro(lemma 2.5-1). Showing Correctness at termination might use any of the three results in section 2.6, e.g., no____~t Q(V)~ V-S={ }.

5. CONCRETE ALGORITHM A

5.1 Concrete State Vector

N - total set of nodes considered for marking

Let IN I = M ~ i.

ADDR:N +{I,...,M} is a i-i onto mapping.

For K ~ ~I,...,M} let nK~ ADDR-I(K)

(i.e., nK=the node at address K)

ALINK:N+N - static partial function. BLINK:N+N - static partial function. MARK:N+{true, false} - total predicate Concrete state vector V=<N,ADDR,ALINK,

BLINK,K,MARK>

5.2 Mapping Functions

V-S 0 - an arbitrary subset of N, specified prior to running algorithm A.

V-N 0 ~ {<x,y> Ix,y@N and y~S 0 and

y~{AelNK(x), BLINK(x)}}.

V-R ~ {<x,~ Ix,yKN and y~Marked and yG{ALINK(x), BLINK(x)}}.

V-Marked ~ {x Ix~N an__~d Mark(x)}.

V-S ~ {x Ix~Marked and ADDR(x)~K}.

5.3 Algorithm A

begin (K,Marked):=(l,S0) ; while K<M do

if nk~Marked then

K:=K+I else

16

(K,Marked):=(min(ADDR(R(nk))U

{K+l}),MmrkedUR(nk))

fie od

end

5.4 Correctness

5.4.1 Initialization

It is trivial that the first statement implements the abstract variable assignments required by lemma 2.5-1.

5.4.2 Correctness at Termination

K>M=> S={ }, so use corollary 2.6-1.

5.4.3 Invariance of Bl(abst)

Using the sets A,B,C,D as defined in section 2.7, in the if branch A=B=C=D={ }, so Bl(abst) => Bl(abst'), since abst '= abst. In the else branch it is routine to check that: A=R(nk) , B~{nk} , C=R(nk) ,

D2=R(nk). Furthermore, any element yGS cannot be an element of R(y) since otherwise y would be in Marked as well as not in Marked. Thus, by theorem 2.7-2 Bl(abst') remains invariant.

5.4.4 Termination

IMarkedl is bounded above bylR0*(S0)l,

and K is bounded above by M+I. Each iteration shows either a strict increase in IMarked I (perhaps with a decrease in K), or a strict increase in K (and Marked remains the same). Thus use termination schema 3-1.

6. CONCRETE ALGORITHM B


N, ADDR, ALINK, BLINK, MARK - as in algorithm A.

STACK - an array indexed from i to T, containing elements of ADDR(N).

Concrete state vector V=<N,ADDR,ALINK, BLINK,STACK,T,MARK> .

6.2 Mapping functions

V-S0,V-R0,V-R,V-Marked - as in algorithm A.

V-S ~ {x Ix~Marked and ADDR(x)£{STACK[I] ..... STACK[T]}}.

6.3 Algorithm B

begin ( S , M a r k e d ) : = ( S o , S o ) ; while S# { } do

select KGS; (S,Marked):=(S-{K}UR(K),MarkedU

R(K)) od

end

Al~gorithm B is expressed here in terms of the abstract variables. The relation- ship to the concrete variables STACK and T is immediate. A useful auxilliary assertion (in showing that S:=S-K is implemented correctly) is that there are no duplicate occurrences of any address in STACK[I] ... STACK[T].

6.4 Correctness

Correctlnltialization and correctness at termination follow from lemma 2.5-1 and corollary 2.6-1. Invariance of Bl(abst') (using A, B, C, D as in Sec- tion 2.7) follows from theorem 2.7-2, since K~S, A=R(K), B={K}, C=R(K), D2=R(K). Termination is assured due to termination schema 3-2, since IMarked is bounded above by IR0*(S0) I, S is bounded below

by zero, and each iteration shows an increase in IMarked l(and perhaps an increase in IS )'I or a decrease in IS J (and Marked stays the same).

7. CONCRETE ALGORITHM C

Algorithm C combines the stack of algorithm B with the basic idea of algorithm A, by utilizing a fixed-size stack. As long as the stack is not empty, algorithm C runs in "algorithm B mode". When the stack is empty the present algorithm runs in "algorithm A mode", but reverts to the more efficient "algorithm E mode" as soon as the stack once again becomes non empty.


V=<N,ADDR,ALINK,BLINK,KI,STACK,T,B,H,Mark> These concrete variables are as in

algorithms A and B. H represents the maximum stack size, where STACK is" circu- lar, and is indexed from 0 to H-I. T points to the top stack element, and B points to 1-below the bottom element.

7.2 Mapping Functions

V-So,V-Ro,V-R,V-Marked - as in algorithms A and B.

V-S ~ V-SIUV-S2, where

V_SI A ~ } if T=B = x IxGN and ADDR(x)~STACK[(B+I)modH],

S--~CK[(B+2)modH] .... STACK[T]} otherwise.

V-S2 ~ {x IxGMarked and ADDR(x)~KI}.

There is a constraint that IV-SI I<H. The abstract assignment "S:=SU{x}" is given by:

be$in SI:=SIU{x}; • if IS~ =H then

select y~Sl; (SI,KI) :=(Sl-{y},min(ADDR(y),

El)) fi

end

Thus Sl is momentarily allowed to achieve size H, but is immediately re- duced to size H-I.

The abstract assignment )'S:=SUW '' for a set W is given by:

begin for each x~W do

S:=SU{xT-- od

end

17

7.3 Al$orithm C

be$in s2:={ } ; S:=S0;

Marked:=S 0;

while S#{ } do if SI#{ } then

select xeSl; SI:=SI-{x}; S:=SUR(x) ; Marked:=MarkedUR(x)

else KI:=min(ADDR(S2)) ; (S,Marked,Kl):=(SUR(nKl),MarkedU

R(nKI),min(ADDR(R(nKI))U

{KI+I})) fie

od end

7.4 Correctness

Correct initialization and correctness at termination follow from lemma 2.5-1 and corollary 2.6-1. Invariance of Bl(abst') (using A, B, C, D as in section 2.7) follows from theorem 2.7-2 (hopefully no confusion arises over the abstract set B in section 2.7, and the concrete variable B here). In the if branch: x~Sl (thus x~S) ; A=R(x) UA'7-where A'~{y ly~Marked and ADDR(y)~mI~ADDR(R(x)U {El}))}, B {x}, C=R(x), D2=R(x). In the else branch: nKI~S2 (thus nKI~S), A=

R(nKI)UA' , where A'~{yly~Marked and

ADDR(y)~min(ADDR(R(nKI)U{KI+I}))},

B~{nKl}, C=R(nKI), D2=R(nKI).

Termination is assured using argu- ments identical to those in showing termination of algorithm B, for the if branch as well as the else branch.

8. CONCRETE ALGORITHM E (Schorre and Walte, Deutsch)

In the previous algorithms A, B, and C, the concrete relations ALINK and BLINK were static. In algorithm E these relations vary dynamically, but must be restored to their original values at the termination of the algorithm. Additional notation and proof rules are required to express appropriate assertions about these dynamic relations, and to express the mapping functions between the concrete variables and the abstract variables of section 2. Such notation and proof rules have been developed and illustrated [Du76, Ye 77], but there is not enough room here to give all the details. Instead we shall present the appropriate assertions formally and explain their meaning inform- ally. Proofs of the invarlance of the assertions can be given using the model in [Du 76] or [Ye 77], but we shall not give the proofs here.


y - a set of functions which can change dynamically, to be described below.

% - the "null" node; %~N.

N+ ~ NU{%}.

P0 - a node constant; P0~N.

P - a node variable assuming values in N.

T - a node variable assuming values in N+.

ATOM:N÷{O,I} - a dynamically changing function which must be restored to its original value at the termination of the algorithm.

In algorithm E in [Kn 73] ATOM is used for two distinct purposes - to help

implement the abstract relations R and R 0

(described below), and to help implement the set of functions 7. It is only for the second purpose that ATOM changes dynamically. In this paper the algorithm will not deal with the detailed Imple- mentation of y; instead we will treat y as an abstract variable which can be manipulated directly, and thus in our version of algorithm E, ATOM will remain static. We will, however, present the assertions (without proof) that are needed in showing the correctness of the detailed implementation of y using the dynamic function ATOM.

To describe the set of functions ~, let us define nameset ~ {ALINK,BLINK}, and codeset ~ {+,a,b}. Let G be a digraph over the set of nodes N+, in which every arc which has % as a source node also has I as a sink node. Every arc is labelled with two items of information, namely, an element from namesetxcodeset. No two distinct arcs from the same source node x may both have the "ALINK" label, nor may they both have the "BLINK" label. Intuitively, "ALINK" and "BLINK" are used as names of pointer fields, as in [Kn 73]. A code of "+" on an arc means the arc was present originally in the digraph (with the same "ALINK" or "BLINK" label it now has), and a code of "a" or "b" on an arc means the arc was not present originally in G.

The set of functions y describes the entire current status of the digraph G. For example , if there is an arc from node x to node y with label <ALINK,+> , we

express this as YALINK(X)=<+,y> , or

(notationally equivalently) as con(x, <ALINK,+~y, y). To redirect this arc to node z and change its arc code to "a", we use the assignment statement "~:= insert-arc(x,<ALINK,~ ,z,y)".

The notation con(x, (<ALINK,a> , <BLINK,~ )*,y,y) means that x=y, or there is a directed path of length > 1 from x to y in which the concatenation of arc labels is a word in the language given by the regular expression (<ALINK, a> ,<BLINK,b>)+. That is, each arc in the directed path is labelled with <ALINK,~ or <BLINK,b>.

18

In summary the concrete state vector is V=<y,N+,{ALINK,BLINK},{+,a,b},P0,P,T,ATOM,

MAR~ .

8.2 Mappin$ Functions

In the concrete state vector V the initial values of Y'P0' and ATOM are set prior to running algorithm E. Letting y^ and ATOM 0 denote the respective initial u values in which each arc code in YO is '+', we have the following mappingVfunc - tions:

V-R~(x,R,y) iff i. con(x,<ALINK,+> ,y,y) or

con(x,<BLINK,+> ,y,y) and

2. y#% and

3. ATOMo(X)=O and

4. y~Marked

V-R0~V-R with Y0 in place of y in

condition (I)

V-S ~ {P}USI, where

S1 ~ {x~N Icon(T,(<ALINK,~ ,<BLINK,b>)*, x,7)}

(Note that % cannot be in SI).

V-S 0 ~ {P0 } .

8.3 Additional Specifications

In addition to setting Marked=R0*(S0) ,

the following additional specifications are required at termination: i. ~ = Y0 (i.e., any ALINK or BLINK

pointers that were changed during execution have been restored to their original values).

2. ATOM=ATOM 0

3. P = P0

8.4 Algorithm E - Outer Level and Refinements

The following is an outer level version of Algorithm E.

begin (P,T,Marked) l=(P0,X,{P0}) ;

loop while R(P)# { } do

(y,T,P) := advance(y,T,P)

od loop

(y,T,P) :=backup(y,T,P)

until T=% or R(P)#{ }od

until T=% and R(P)={ } od

end

The specifications for the statement "(y,T,P):=advance(y,T,P)" are given ab- stractly as follows:

be$in select x~R(P)

(S,Marked,R):=(SU{x},MarkedU{x}p R-{<z,~ })

end

By corollary 2.7-1, if we can manipulate- the concrete variables (y,T,P) to achieve these specifications, then Bl(abst') remains invarlant. To verify that corollary 2.7-1 applies here, substitute the following for the variables in the corollary: P for x; {x} for A' ; { } for A"; { } for B; {x} for C; {x} for D 2. To check that

the second hypothesis of the corollary also holds, i.e., to show that (VyES) (y~R(y)), note that if y~S then y~Marked (due to BI(ahst)), but if y is in the range of R, then y~Marked.

The refinement of the 6tatement ~ "(y,T,P):=advance(y,T,P)" is as follows (introducing local variables "name" and "code") :

be$in select name~<ALINK,BLIN~ and x~R(P)

such that con(P,<name,+> ,x,y) ;

code:=if name='ALINK' then 'a' else 'b-~ fie;

(y,T,P):=(insert-arc(P,<name,code> , T,y),P,x) ;

Marked:=MarkedU{x}

end

The proof that this refinement implements the abstract specifications is too detailed to give here, since it requires the semantics of the "insert- arc" operation (see [Ye 77, Du 76] for details of the semantics).

The abstract specifications for the statement "(y,T,P):=backup(y,T,P)" are as follows:

begin

if SI#{ } then

S:=S-{P}

fi

end

It is easily seen from the definition of S1 that SI={ } iff T=%. Thus, from the outer level description of algorithm E, the statement "S:=S-{P}" (as it occurs in the refinement of the "backup" operation) is executed only when R(P)={ }. By corollary 2.7-2, Bl(abst') remains invariant. The refinement of this code is as follows:

begin if T#% then

select x~Sl, name~{ALINK,BLINK}, and code~{a,b} such that con(T,<name,cod~ ,x,y) ;

(y,T,P):=(insert-arc(T,<name,+> , P,y),x,T)

fi end

19

As with the previous refinement, the correctness of this refinement is too detailed to give here (but see [Ye 77, Du 76] for semantics of the "Insert-arc" operation).

8.5 Auxilliary Assertions

In addition to the assertion BI(abst) which holds for marking algorithms in general, several additional assertions about the concrete state vector are needed to prove the additional require- ments at termination, as well as the correctness of the refinements of the outer level operations "advance" and "backup". The following deflnlti@n is useful in stating these assertions.

Definition 8.5-1: The relation R2 (based on the concrete state vector V) is defined as:

(x,R2,y)~(con(x,(<ALINK,~ ,<BLINK,b>),y,y)

or

(T#% and x=P and y=T))

This relation formalizes the intui- tive idea of building up a stack in-situ by reversal of pointers, which is fundamental in algorithm E.

We use the notation "arc-code(x, name,y)=code" as an abbreviation for "3z&N+)(con(x,<name,code>,z,y))"

The auxilllary assertions are as follows.

I. (Vx,y~S)((x,R2,y) and arc-code(y, ALINK,7)=a =>

con(y,<ALINK,+> ,X,Yo)).

2. (Vx,yGS)((x,R2,y) and arc-code(y, BLINK,y)=b

con(y,<BLINK,+> ,x,Y0)).

Assertion (I) formalizes the "reversal- of-pointers stack" by saying that if <x,~R2 and y currently has an outgoing arc with label <ALINK,~ , then y originally had an arc leading into node x, with arc label <ALINK,+> . Assertion (2) is similar for BLINK.

3. (~x,yEN+)(con(x,<ALINK,b>,y,y)

or

(con(x,<BLINK,~ ,y,y)).

That is, there is no "mixing" of arc names and arc codes; each arc label is in the set{<ALINK,+> ,<ALINK,a> ,<BLINK,+>, <BLINK,~ }.

4. (Vx~Sl)(arc-code(x,ALINK,~)=a

<=>

arc-code(x,BLINK,y)#b).

That is, each node in the "reverse stack" (other than node P, which is not in SI) has exactly one of its pointers temporar- ily redirected.

5. P~S1.

6. T#% ~ (con(T,(<ALINK,~ ,<BLINK,b>)*, ~,Y)

and con(T,(<ALINK,~ ,<BLINK,~ ~,

P0,Y) and con(P0,(<ALINK,O ,<BLINK,

b>),X,y)).

That is, if T is not the null node %, then T can "reach" ~ along the appropriate path, T can reach P0 along this

path, and P0 is the final non-null node

along this path. The path is unique due to assertion (4).

7. T=% => P=P0"

8. (Vx~N-SI) (YAL INK (x) = (~0) ALINE (x)

and

YBLINK(X)=(Y0)BLINK (x))"

Thus, at termination (when SI={ }), all ALINK and BLINK pointers have been restored to their original values•

8.6 Further Refinement

Operations and tests on y are implemented by using the following assertions relating the abstract notion of arc code labels {a,b,+} and the concrete flag field ATOM in each node:

i. ~ x6Sl)i(ATOM(x)=l<=>arc-code(x,ALiNK, y)=a

and

ATOM(x)=0<=>arc-code(x,BLINK, y)=b

and

ATOM 0(x)=O).

2. (Vx~N-SI) (ATOM(x)=ATOM0(x)) .

Since the test whether an arc code is "a" or "b" occurs only for elements in S1 (in the refinement of the "backup" operation), these assertions are adequate for implementing the arc codes {a,b,+} correctly, as well as for guaranteeing that at termination the function ATOM has been restored to its original value ATOM 0 (since SI=( } at termination).

8.7 Conclusions About Algorithm

Correct initialization of the outer level version of algorithm E follows immediately from lemma 2.5-1. At termination, T=~ and R(P)={ }, which implies that R(S)={ }. Thus, at termination Marked=Ro*(S O) due to corollary 2.6-2. We hav 9 also indicated in section 8.4 that if the refinements of the outer level operations "advance" and "backup" meet their specifications, then the refined version of algorithm E is correct.

To show that algorithm E terminates, consider termination schema 3-3 in which:

20

QI~R(P)#{ },fl~(y,T,P):=advance(y,T,P),

Q2~T ~ ~, if ~Q2 then f2 fi ~ the

refinement of (y,T,P):=backup(y,T,P),

The total number of times fl or f2 can be executed is bounded above by IN1, since: each time fl is executed, an element of N not previously in S is added to S, and each time f2 is executed, an element in S not previously deleted from S is now removed from S.

9. CONCLUSIONS

A detailed illustration of a proof of an abstract algorithm has been given, followed by proofs of correct representation for four different concrete cases. It is perhaps noteworthy that in the concrete programs one is not required to think of an intermediate assetion, since the abstract assertion instantiates to all cases. The creative part was devel- opmlng the mapping functions for the sets A, B, C, D. Although many details ne- cessarily were omitted, we feel that not too much additional effort is required to check these proofs mechanically. We also feel that despite the complexity of algorithm E, it is significant that one need only verify the correctness of the refinements of the operations "advance" and "backup", without worrying about when or why they are called, to obtain a proof of total correctness of algorithm E. Hopefully, this sort of "abstract/concrete factorization" will be useful in proving programs approaching real-world complexity.

i0. ACKNOWLEDGEMENTS

We are grateful to Dr. S. L. Gerhart for comments on drafts of this manuscript, and for showing us similar research of her own [Ge 77]. Dr. R. London indicated the importance of a unified approach to several concrete marking algorithms after seeing different proofs of algorithm A by the first author. Professor Knuth unwit- tingly helped us by providing five different concrete algorithms to accomplish the same task [Kn 73]. Finally, we acknow- ledge the detailed comments and valuable suggestions of one of the referees.

BIBLIOGRAPHY

[Di 76] Dijkstra, E. W., A discipline of @rogrammi~, Prentlce-Hall, 1976.

[Du 76] Duncan, A. G., Studies in program correctness, Ph.D. dissertation, University of California, Irvlne, May 1976.

[FI 67] Floyd, R. W., Assigning meanings to programs, Proceed- in~s of a Symposium in Applied

[Ge Ye 76]

Mathematics 19, (ed. Schwartz, J. T.), Providence, Rhode Is- land: American Mathematical Society, 1967, pp. 19-32.

Gerhart, S., and Yelowitz, L., Control structure abstractions of the backtracking programming technique, IEEE Transactions on Software Engineerin&, vol. SE-2, no. 4, Dec. 1976, pp. 285- 292.

[Ge 77]

[Kn 73]

[Mo 72]

[Sc Wa 67]

Gerhart, S. L., Abstractions and proofs of marking algorithms, (private correspondence).

Knuth, D. E., The Art of Com- puter Programming, vol. i, Fundamental Algorithms, Addison- Wesley, 1973.

Morris, J. H., Verification- oriented language design, Tech- nical Report 7, Computer Science Dept. U. of California, Berkeley, Dec. 1972.

Schorre, H., and Waite, W., An efficient machine-independent procedure for garbage collection in various list structures, CACM I0.

[Su 76] Suzuki, N., Automatic verification of programs with complex data structures, Stanford U. Computer Science Dept. Report No. STAN-CS-76-552, Feb. 1976 (Ph.D. dissertation)

[To 74] Topor, R., The correctness of the Schorre-Waite llst marking algorithm, Report MIP-R-I04, School of Artificial Intelli- gence, U. of Edinburgh, July 1974.

[Ye 72 ]

[Ye Du 77]

Yelowitz, L., A symmetric, top- down structured approach to computer program/proof development, Ph.D. dissertation, The Johns Hopkins University, May 1972, published as IBM Technical Report FSC 73-5001, Bethesda, Maryland, July 1973.

Yelowitz, L., and Duncan, A. G., Data structures and program correctness: bridging the gap, Proceedings of the 1977 Confer- ence on Information Sciences and Systems, March 30-April I, 1977, sponsored by the Dept. of Elec- trical Engineering, The Xohns Hopkins University, pp. 113-117.

[Wu Lo Sh 76] Wulf, W. A., London, R. L., and Shaw, M. ~ An introduction to the construction and verification of Alphard programs, IEEETSE SE-2, 4, Dec. 1976, pp. 253-265.

21

Documents

Abstractions, instantiations, and proofs of marking algorithms