5
Pattern Recognition Letters l~ (1992)411-415 June 1992 North-Holland A shortest path metric on unlabeled binary trees Andr6 Bonnin and Jean-Marcel Pallo Ddparlement d'lnformatique. Universit# de Bourgogne, B.P. 138, 21004 Dijon, France Received 20 June 1990 A bslract Bonnin, A. and J.-M. Pallo, A shortest path metric on unlabeled binary trees, Pattern Recognition Letters 13 (1992) 411-415. We consider a transformation on rooted unlabeled binary trees which is a special instance of the well-known general rotation. Using lattice-theoretic results, we give an efficient algorithm for computing the shortest path distance between two binary trees. Keywords. Binary trees, distance metric, rotation lattice. Introduction Distance and similarity between rooted trees have been studied for various applications, in pat- tern recognition (Tanaka and Tanaka, 1988) and behavioral science (Boorman and Olivier, 1973), one is frequel~tly faced with the problem of com- paring different rooted labeled trees. However, concerning rooted unlabeled trees, the distance between two such trees may also be of interest. In- deed, an attractive problem is to compare the structure or the shape of two different trees paying no attention to the data possibly maintained by the trees. There seem to be two basic approaches to con- structing tree metrics (Boorman and Olivier, 1973). The first is to define elementary transformations on trees, and to define the distance between two trees as the minimum number of such transforma- tions needed to obtain the first from the second. Correspondence to: J.-M. Pallo, Dbpartement d'lnformatique, Universit6 de Bourgogne, B.P. 138, 21004 Dijon, France. Shortest path metrics are conceptually simple. Un- fortunately, this kind of metric may have a serious drawback: unless the transformation is carefully selected, the distance between trees may be very hard to compute effectively. A second approach to tree metrics is to represent a tree in terms of simpler structures, e.g. partitions (Day, 1981) or to define valuations on a poset of trees (Monjardet, 1981). These metrics are general- ly quite tractable from a computational point of view but not always conceptually clear. In this paper, v,e restrict our study of tree metrics to rooted, ordered, unlabeled, binary trees, i.e., trees for which internal nodes have always two Successors. Rotation is a well-known transformation on binary trees which is a local restructuring that alters the depths of some of the nodes in the tree but maintains the symmetric order of the nodes. See Figure 1. According to the first approach, the rotation distance between two binary trees is the minimum number of left and right rotatio~s necessary to 0167-8655/92/$05.00 © 1992 -- Elsevier Science Publishers B.V. All rights reserved 411

A shortest path metric on unlabeled binary trees

Embed Size (px)

Citation preview

Page 1: A shortest path metric on unlabeled binary trees

Pattern Recognition Letters l~ (1992)411-415 June 1992 North-Holland

A shortest path metric on unlabeled binary trees

A n d r 6 B o n n i n a n d J e a n - M a r c e l P a l l o

Ddparlement d'lnformatique. Universit# de Bourgogne, B.P. 138, 21004 Dijon, France

Received 20 June 1990

A bslract

Bonnin, A. and J.-M. Pallo, A shortest path metric on unlabeled binary trees, Pattern Recognition Letters 13 (1992) 411-415.

We consider a transformation on rooted unlabeled binary trees which is a special instance of the well-known general rotation. Using lattice-theoretic results, we give an efficient algorithm for computing the shortest path distance between two binary trees.

Keywords. Binary trees, distance metric, rotation lattice.

Introduction

Distance and similarity between rooted trees have been studied for various applications, in pat- tern recognition (Tanaka and Tanaka, 1988) and behavioral science (Boorman and Olivier, 1973), one is frequel~tly faced with the problem of com- paring different rooted labeled trees. However, concerning rooted unlabeled trees, the distance between two such trees may also be of interest. In- deed, an attractive problem is to compare the structure or the shape of two different trees paying no attention to the data possibly maintained by the trees.

There seem to be two basic approaches to con- structing tree metrics (Boorman and Olivier, 1973). The first is to define elementary transformations on trees, and to define the distance between two trees as the minimum number of such transforma- tions needed to obtain the first from the second.

Correspondence to: J.-M. Pallo, Dbpartement d'lnformatique, Universit6 de Bourgogne, B.P. 138, 21004 Dijon, France.

Shortest path metrics are conceptually simple. Un- fortunately, this kind of metric may have a serious drawback: unless the transformation is carefully selected, the distance between trees may be very hard to compute effectively.

A second approach to tree metrics is to represent a tree in terms of simpler structures, e.g. partitions (Day, 1981) or to define valuations on a poset of trees (Monjardet, 1981). These metrics are general- ly quite tractable from a computational point of view but not always conceptually clear.

In this paper, v,e restrict our study of tree metrics to rooted, ordered, unlabeled, binary trees, i.e., trees for which internal nodes have always two Successors.

Rotation is a well-known transformation on binary trees which is a local restructuring that alters the depths of some of the nodes in the tree but maintains the symmetric order of the nodes. See Figure 1.

According to the first approach, the rotation distance between two binary trees is the minimum number of left and right rotatio~s necessary to

0167-8655/92/$05.00 © 1992 - - Elsevier Science Publishers B.V. All rights reserved 411

Page 2: A shortest path metric on unlabeled binary trees

Volume 13, Number 6 PATTERN RECOGNITION LETTERS June 1992

left rotation b i ~

Figure I. The general definition of a rotation. Triangles denotes sublrees. The tree shown could be a part of a larger tree.

t ransform one tree into the other (Sleator et al., 1988). A heuristic search algorithm, with 0(/ ' /4)

time complexity, has been proposed (Pallo, 1987), which computes the rotation distance between two given binary trees of size n.

According to the later approach, we have defin- ed a supervaluation on binary trees (Patio, 1990) from the property that the left rotation transfor- mation induces a lattice structure on binary trees. This supervaluation can be used to defined metric on binary trees of time complexity O(n3/2).

Since the rotation transformation leads to a hard computational metric, we consider in this paper a more simple transformation called restricted rotation. We show that the left restricted rota- tion induces a graded lower semi-modular meet-semilattice structure on the set of binary trees. The restricted rotation distance between two binary trees is the minimum number of left and right restricted rotations needed to convert one tree into the other. This shortest path distance between binary trees of size n can be computed in time O(n~).

Preliminaries

in a (rooted, ordered, unlabeled) binary tree, every node except the root has a parent. Every in- ternal node o has a left and a right child. External nodes [] have no children. The external nodes of a binary tree Tare numbered by a preorder traver- sal of T.

The weight T o f a binary tree Tis the number of external nodes of T. Let B,, denote the set of binary trees of weight n + 1.

Definition !. Given T s B,,, the weight sequence o f T is the integer sequence (wr(!) . . . . . Wr(n)) where

wr(i ) is the weight of the largest subtree of T whose last external node is i (Pallo, 1986).

Lemma 1. An integer sequence (w n . . . . . wn) is the weight sequence o f a binary tree o f Bn i f f for all i~[1,n] :

l ~ wi<~i,

for a l l i ' e [ i - w i + l , i ] : i -w i<~i ' -w r.

Proof. If T~Bn then 1 <<.Wr(i)<<.i by Definition 1. Moreover, i - w r ( i ) + l is the first external node of the largest subtree T~ ~,, T whose last external node is i. If i ' is an external node of T/, i.e.,

i - wr(i)+ 1 <~i'<~ i,

then the largest subtree of T whose last external node is i ' is a subtree of T~ and thus

i - wr(i) + l <~ i ' - wr(i') + 1.

We prove that conditions are sufficient by induc- tion on n using the fact that the weight sequence of the tree admitting T'eB. , as left subtree and T"~ B.. as right subtree is

(wr,( l ) . . . . . wr,(n'),n' + I,

W.r.(i), .... wr.(n")), []

Definition 2. Left restricted rotation is a transfor- mation --. on B,, such that a subtree

of a tree of B,, is replaced by the subtree

Right restricted rotation is defined by ~ . Let --* denote the reflexive-transitive closure of -*.

Note that this transformation is so called a 'restricted' rotation because it is a special instance of the general rotation (s~e Figure 1) where first subtree A is always choosen as an external node [] .

412

Page 3: A shortest path metric on unlabeled binary trees

Volume 13, Number 6 PATTERN RECOGNITION LETTERS June 1992

For an example (n = 4) see Figure 2.

Lemma 2. Given T and T ' e B,, T--, T ' i f f for all iE[1,n] : wr(i )=wr, ( i ) with the exception o f some unique j satisfying wr , ( j )= l + wr( j ) .

Proof. Let j be the last external node of subtree B which appears in the figure of Definition 2. It can be seen that wr,( j ) = 1 + wr( j ) and wr(i ) = Wr,(i ) for all i~ : j . []

/,>,,\

1211 1121

1

~ 1212 1131 1113

/ "-.., 1 4x

1123 1114 -..., / --,.,

\

1134

1214

1234

Figure 2. The 14 trees of 84 ordered by -, and their weight se- quences.

The results

Main Theorem. Given two trees T and T '~ B,,, T * T ' i f f for all i~ [ l , n ] we have wr(i)<~ wr,(i ) and

w r ( i - wr(i))= w r ( i - wr(i) - 1)

. . . . . w r ( i - Wr,(i)+ 1)= 1

in case that wr(i)< Wr,(i).

Proof. For the sake of brevity, let us denote by T< T' the conditions to be satisfied in the theorem. The first condition, wr(i ) <<. wr,(i ) for all i, is necessary by Lemma 1. In order to increase wr(i) by 1, it must be that w r ( i - w r ( i ) ) = 1. Hence the second condition must be verified if wr(i) < wr,(i).

Conversely, let us assume T< T' and T~ T'. Let

i= max{k ~ [l,n] I wr(k)< wr.(k)},

and j = i - wr(i)+ 1. Thus w r ( j - 1)=1. Let

I= max{ke il ,nl I j = k - wr(k)+ 1}.

Let us show that i= l . if I>i then we have

Wr(I)=wr,(I), j = l - w r , ( I ) + l and

i - Wr,(i) + I < i - wr(i) + 1 =j

which gives a contradiction. Indeed, as iE [I- wr(I)+ l ,l] by Lemma 1 it follows that

I - wr(i) < i - wr(i)

and then we get

j = l - Wr,(i)+ l <i - wr(i)+ 1 =j .

Therefore T includes the subtree

j i

413

Page 4: A shortest path metric on unlabeled binary trees

Volume 13, Number 6 PATTERN RECOGNITION LETTERS June 1992

which, through application of the left restricted rotation ~ becomes the subtree

j i

of the tree 7"1. The weight sequence of 7"1 is the same as the

weight sequence of T except in i because Wry(i) = ! + wr(i). Thus

Wr~(i)<~Wr,(i) for all i.

If Wr,(i)< Wr,(i) then

Wr,( i - Wr,(i)) . . . . . Wr,( i - Wr,(i) + ! )= 1.

We have built a tree T~ such that T< T, and T--. Tt. By repeating this process, we shall find a finite sequence of trees T~ such that

T g T~ g Tz < ... g T, = T' and

T--, TL ~ T2-~ ... --, T, = T'.

By transitivity T ~ T'. I_il

Corollary I. (B,,, * ) is a poset with as least ele- n:ent 0:

w0=(l,l,...,l).

Proof. Assume that T ~ T' and T ' * T. From the main Theorem, for each i we have

wr(i)<~ wr.(i) and Wr.(i)<~ Wr(i ).

Thus wr(i)--Wr,(i ) for all i, i.e., T = T ' and the antisymmetry property holds. []

Corollary 2. The poset B, is a meet-semilattice.

Proof. Using the previous Main Theorem, the fallowing algorithm computes the weight sequence of the meet T A T ' of the trees T a n d T ' ~ B , .

Meet Algorithm Given Wr =(wr(I) . . . . . Wr(n)) and

Wr. = (Wr.(1), ..., Wr. (n)) f o r i : = l t o n

d o w,,,(i): = min(wr(i), Wr,(i)) wM(i): = max(wr(i), wr,(i))

enddo for i : = 2 to n do

if w,,(i)< WM(i) then for k: = win(i) to wM(i) - 1 do

if Wm(i - k) ~: 1 then wm(i - k): = 1 endif enddo

endif enddo

WT^ r': =Wm []

Lemma 3. The poset B, is graded, i.e., there exists an integer-valued function r defined on B,:

r t T ) = ~ wr(i) I ~<i~<n

such that T * T' and

r ( T ' ) = 1 + r t T ) i f f T ~ T ' .

Proof. if T--* T ' then r (T ' ) = 1 +r(T) by Lemma 2 since wr(i)= Wr,(i) for all i with the exception of some unique j satisfying wr,( j ) = l + wr(j) . []

Lemma 4. The poset B,, is lower semimodular, i.e., ,/or all TI, 7".,, T3 ~ B,, with Ti--* T3, T:--, T 3, TI ~: T2, there exists T4 e B,, such that 7"4 ~ TI and T4- r,..

Proof. Since T i - , 7"3 (resp. 7"., ~ Ta), by Lemma 2 we get Wry(i)= wr3(i) (resp. wr.,(i)= wr3(i)) for all i with the exception of some unique it (resp. i2) satisfying

wr3(in) = 1 + Wrt(i0

(resp. w~(i,.)= 1 + Wr.~(i2)).

We have i I #: i 2 since T t~ T2. Consider the se- quence w defined by wi= Wry(i) for i~:it, i~i2 and by

wi~ = wr3(i ,) - l = wr,(ii) and

wiz = wr3ti2)- 1 = wr2(i2).

This sequence satisfies the two conditions of Lem- ma 1, hence there exists a tree T4 satisfying wr4= w. Following Lemma 2, we obtain T 4 ~ T I and T4--, 7"2. []

414

Page 5: A shortest path metric on unlabeled binary trees

Volume 13, Number 6 PATTERN RECOGNITION LETTERS

Figure 3. Definition of the rotation on regular ternary trees.

June 1992

The metric

B, is a meet-semilattice (Corollary 2), with 0 (Corollary 1), graded by r (Lemma 3) and lower semimoOular (Lemma 4). Let d denote the restricted rotation distance, i.e., the minimum number of left and right restricted rotations need- ed to convert one tree into the other. Then, from Monjardet (1981), we have:

d(T, T') = r(T) + r(T') - 2r(T^ T').

Hence

d(T ,T ' )= ~ (wr(i)+ wr,(i)-2wr^r,(i)) . I <~i<.n

Let us remark that

O<<. d(T, T') <~ n(n - I)/2.

The complexity of computing d(T, T') is pro- portional to the number of comparisons of integers to finding T^ T'. In the worst (resp. average) case, the meet algorithm gives w,,(i)= 1 and wM(i)=i (resp. wM(i)= i/2). Thus the algorithm requires at most

( i -1 ) comparisons 2<~i~n- I

and in the average case

( i /2 - 1) comparisons. 2<<.i<~n- I

Finally we find a time complexity O(n z) and a space complexity O(n).

Conclusion

The metric on binary trees defined in a previous paper (Pallo, 1990), which uses a supervaluation on the rotation lattice of binary trees, has a time complexity O(n~/2). However, it is not a shortest path metric. The metric defined in the present

paper has a higher time complexity O(n-') but it is a shortest path metric using a transformation on binary trees which is a special case of the general rotation.

Rotation has been generalized to regular k-ary trees (Bonnin and Pallo, 1983). See for example Figure 3. We may hope that the method used here can be generalized to regular k-ary trees using a restricted rotation with the first subtree A of figure 3 always choosen as an external node ~ .

References

Barth~lemy, J.P., B. t.eclerc and B, Monjardet (1986). On the use of ordered sets in problems of comparison and consensns of classifications. J. Classification 3, 187-224.

Bonnin, A. and J.M. Pallo (1983). A-transformation dan~, les arbres n-aires. Discrete Math. 45, 153-163.

Boo,'man, S.A. and D.C. Olivier (1973). Metrics on spaces of finite trees. J. Math. Psycho/. 10, 26-59.

Culik, K. and D. Wood 11982). A note on some tree similarily measures, htform. Process. Lett. 15, 39-42.

Day, W.H.E. {1981). Tile complexity of computing metric distances between partitions. &lath. Social Sciences I , 269-287.

Luccio, F. and L. Pagli (1989). On the upper bound on the rota- tion distance of binary trees, lnjbrm. Process. Left. 31,

57-60. Monjardet, B. (1981). Metrics on partially ordered sets--A

survey. Discrete Math. 35, 173-184. Pallo, J.M. (1986). Enumerating, ranking and unranking

binary trees. Computer J. 29, 171-175. Pallo, J. (1987). On the rotation distance in the lattice of binary

trees. Inform. Process. Lett. 25, 369-373. Pallo, J.M. (1990). A distance metric on binary trees using

lattice-theoretic measures, hrform. Process. Lett. 34,

113-116. Sleator, D.D., R.E. Tarjan and W.P. Thurston (1988). Rota-

tion distance, triangulations and hyperbolic geometry. J. Amer. Math. Sac. 1,647-681.

Tanaka, E. and K. Tanaka (1988). The tree-to-tree editing pro- blem. Int. J. Pattern Recognition and Artificial Intelligence

2, 221-240.

415