Pattern Matching in Trees

  • Upload
    -

  • View
    228

  • Download
    0

Embed Size (px)

Citation preview

  • 7/27/2019 Pattern Matching in Trees

    1/28

    P a t t e r n M a t c h in g in T r e e sC H R I S T O P H M . H O F F M A N N A N D M I C H A E L J . O ' D O N N E L LPurdue U mversay, W est Lafayet te , ln& ana

    ABSTgACT. T ree pa ttern m atchin g is an interesting s pec ial pro blem w hich occurs as a cru cial step m anum ber o f programm mg tasks, for instance, design o f interpreters for nonp roced ural programminglanguages, au toma t ic implementat ions of abs t ract data types, code opt imizat ion m compi lers, symbo hccomputation, context searching in structure editors, and automatic theorem provmg. As with the sort ingproblem , the va r iat ions in requirements and resources for each app l icat ion seem to p reclude a uni form,umversal solution to the tree-pattern-matching problem. Instead, a collection o f well-analyzed techmques,f rom w hich specif ic appl icat ions ma y be selected an d adap ted, should be sought . Five new techniques fort ree pat tern matching are presented, analyzed for t ime and space complexi ty, and compared wi thpreviously known methods. Par t icular ly impo r tant are appl icat ions where the same p at terns are ma tchedagainst ma ny subjects and where a subject ma y be m odif ied increme ntal ly Therefore, methods whichspend some tune preprocessmg pat terns in order to impro ve the ac tual matching t ime a re includedCategories an d SubJect Descriptors' F.2.2 [Analysis of Algorithms and Prob lem Com plexity] ' No nn u-t ac t i ca l Algor i thms and Problems--pat tern matchmg, G 2 2 [Discrete Mathematics]: G rap h T h e o r y - - t r e e sGe neral Term s Algori thms, The oryAd di t ional Ke y W ords and Phrases. incremental pa t tern matching, bot tom -up matching, top-dow nmatching, subtree replacement systems, interpreter generation, theorem proving

    1 . I n t r o d u c t i o nM a n y c o m p u t i n g t e c h n i q u e s i n v o l v e s i m p l i f y i n g e x p r e s s i o n s ( t re e s ) b y r e p e a t e d l yr e p l a c i n g s p e c i a l t y p e s o f s u b e x p r e s s i o n s ( s u b t r e e s ) a c c o r d i n g t o a se t o f r e p l a c e m e n tr u l e s . F o r e x a m p l e ,

    ( 1) H o f f m a n n a n d O ' D o n n e l l [14 ] s h o w h o w t r e e r e p l a c e m e n t s m a y b e u s e d i na u t o m a t i c a l l y g e n e r a t e d i n t e rp r e t e r s fo r n o n p r o c e d u r a l p r o g r a m m i n g l a n gu a g e s . T h ed e f i n i n g e q u a t i o n s f o r t h e p r o g r a m m i n g l a n g u a g e a r e t a k e n a s t h e r e p l a c e m e n t r u le s .A n i n t e r p r e t e r m a y t h e n p r o c e s s a n i n p u t e x p r e s s i o n b y r e p l a c i n g su b e x p r e s s i o n sa c c o r d i n g t o t h e g i v e n r u l e s u n t i l n o m o r e r e p l a c e m e n t s a r e p o s s i b l e . I n t e r p r e t e r sm a y b e g e n e r a t e d w h i c h a r e a b s o l u t e l y f a i t h f u l t o th e s e m a n t i c s o f t h e l a n g u a g e a sg i v en b y t h e d e f i n i n g e q u a t i o n s . T h e t r e e - r e p l a c e m e n t a p p r o a c h is v e r y c o n v e n i e n tf o r p r o d u c i n g i n t e r p r e te r s f or e x i s t in g la n g u a g e s s u c h a s L I S P a n d L U C I D o r f o ri m p l e m e n t i n g e x p e r i m e n t a l l an g u a g e s . E l s e w h e r e , t h e m e r i t s o f t h e l a n g u a g e o fe q u a t i o n s a s a p r o g r a m m i n g l a n g u a g e i n it s o w n r i g h t a r e e x a m i n e d [1 5] .

    ( 2) G u t t a g e t a l . [1 2] a n d W a n d [341 s u g g e s t t h a t d e f i n i n g e q u a t i o n s m a y b e t r e a t e da s tr e e r e p l a c e m e n t r u l e s to y i e l d d i r e c t i m p l e m e n t a t i o n s o f a b s t r a c t d a t a t y p e s .G u t t a g e t a l . [ 1 3 ] d e s c r i b e a w o r k i n g s y s t e m b a s e d o n t h i s i d e a , a s d o e s G o g u e n [ 1 1 ] .Thi s work was suppor t ed m par t by the Nat ional Sc ience Foun dat ion under Gran t M CS 78-01812.Au thors ' address" De par tm ent of Co mp uter Science, Purdue Umverslty, W est Lafayette, IN 47907Permission to copy wi thout fee al l or p ar t o f this m ater ial i s granted p rovided that the copies are not m adeor dist r ibuted for di rect commercial advantage, the A CM copyr igh t not ice and the t i t le of the pub l icat ionand i t s date appear , an d n ot ice is given that co pying is by permission of the Associauon for Co mpu t ingMachinery To co py otherwise, or to republish, reqmres a fee and /o r specific permission. 1982 AC M 0004 -5411/82/0100-0068 $00 75Journalof the Assoclallon or ComputingMachinery,Vol 29, No I, January 1982,pp 68-95

  • 7/27/2019 Pattern Matching in Trees

    2/28

    P a t t e r n M a t c h i n g i n T r e e s 69S uc h a sy s t e m doe s no t d i f fe r i n es se nc e f r om the i n t e r p re t e r s o r e qua t iona l p r og r a m si n (1 ) b u t i n th is c a s e w o u l d b e e m b e d d e d i n to a p r o c e d u r a l l a n g u a g e a s s u b r o u ti n e .

    ( 3 ) I n t e r m e d i a t e c o d e p r o d u c e d b y a c o m p i l e r m a y b e r e p r e s e n t e d b y t r e e s .C e r t a i n t y p e s o f c o d e o p t i m i za t io n s , f o r e x a m p l e , t h e e h m i n a t i o n o f r e d u n -d a n t o p e r a t i o n s a n d c o n s t a n t p r o p a g a t io n , m a y b e v i e w e d as re p l a c e m e n t r u l e s[10, 16, 331.( 4 ) I n [ 7 ] Co l l i n s r e p r e se n t s a lge b r a i c t e r ms a s t r e e s a nd f o r mu la t e s symbo l i cc ompu ta t i on a s t r e e r e p l a c e me n t s . T he r e p l a c e me n t r u l e s f o r ma l i z e ope r a t i ons suc ha s d i f f e r e n t ia t i on a nd c e r t a in a lge b r a i c s imp l i fi c a ti ons .(5 ) O ne a pp r oa c h to the a u tom a t i c p r ov ing o f e qua t iona l t he o r e m s is t o t r e a t a se to f e q u a t i o n a l a x i o m s a s re p l a c e m e n t r u le s a n d t r a n s fo r m o n e s i d e o f t h e e q u a t i o n t ob e p r o v e d i n t o t h e o t h e r b y a s e q u e n c e o f tr e e re p l ac e m e n t s . K n u t h a n d B e n d i x [ 20 ]d i sc us s som e o f t he c a ses in w h ic h t r e e r e p l a c e m e n t s y i e ld e f fi c i e n t t he o r e m p r ove r s .M os t s t ud i e s o f e qua t iona l t he o r e m p r ov ing , suc h a s [ 9 , 22 , 25 , 31 ], ha v e no t u se d ther e p l a c em e n t s y s t e m a p p r o a c h . C h e w [6 ] h a s r e c e n t ly d e v e l o p e d a n a l g o r i th m c o m -b i n i n g r e p l a c em e n t s y s t e m s w i t h t h e m e t h o d s o f N e l s o n a n d O p p e n [2 5].

    M a n y o f t h e t h e o re t ic a l p r o p e r t i es o f tr e e r e p l a c em e n t s y s t e m s h a v e b e e n s t u d i e din [3a, 11, 23 , 26 , 30] . In th i s pa pe r w e deve lop theo re t ica l ly and prac t ica l ly e f f ic ien ta lgo r i thm s f o r one o f t he ke y te c hn ic a l is sue s i n imp le m e n t ing r e p l a c e m e n t sy s t e ms .A n imp le m e n ta t i on o f a tr e e re p l a c e m e n t sy s t e m r e qu i r e s p r a c t i c a l so lu t i ons f o rthe fo l lowing:( a) a m e t h o d f o r f in d i n g s u b t r e es w h i c h m a y b e r e p la c e d ;( b ) a w a y o f c h o o s in g t h e n e x t r e p la c e m e n t t o b e p e r f o rm e d ;(c ) a w a y o f a c tua l l y r e p la c ing the sub t r e e .P a r t ( c ) i s a n e a sy p r og r a m m ing p r ob le m ; ( b ) i s a que s t i on w h ic h i s qu i t e c om pl i c a t e dm i ts t he o r e t ic a l e f f e ct s . I t ha s b e e n t r e a t e d a bs t r a c t l y i n [ 26 ] a nd a lgo r i thm ic a l ly i n[14]. Pa r t ( a ) i s the sub jec t o f th i s pa per .A l a rg e p a r t o f t h e o v e r h e a d i n im p l e m e n t i n g t re e r e p la c e m e n t s c o m e s f ro m t h er e pe a t e d s e a r c h ing f o r t he ne x t sub t r e e t o be r e p l a c e d . T h i s i s e s se n t i a l l y a t r e e -p a t t e rn - m a t c h i n g p r o b l e m . W e b e l i e v e t h a t g o o d s o l u ti o n s to t h e p r o b l e m o f tr e epa t t e r n ma tc h ing a r e a p r e r e qu i s i t e f o r ma k ing imp le me n ta t i ons ba se d on t r e er e p l a c e m e n t s c om pe t i t i ve i n e f f i c ie nc y w i th a d hoc m e thods , e spe c i a l l y i n t he r e a lmof i n t e r p re t e r s f o r nonp r oc e d u r a l l a ngua ge s .T r e e p a t t e r n m a t c h i n g is a n a lo g o u s t o t h e p r o b l e m o f p a t t e r n m a t c h i n g i n s tr in g ss tud i e d i n [1 , 4 , 21]. W e c ons ide r tw o e s se n t i a l ly d i f f e r e n t w a ys o f e x t e nd ing theK n u t h - M o r r i s - P r a t t s t ri n g - m a t ch i n g a l g o ri th m t o tr e e p a t te r n s, e a c h w i t h s e v e ra lva r ia t ions .O ne ma y v i e w f i r s t - o r de r un i f i c a t i on a s a t r e e - pa t t e r n - ma tc h ing p r ob le m [ 3 , 28 ,29 ]. H ow e ve r , f ir s t -o r de r um f i c a t i on d i f f e rs fr om the t r e e pa t t e r n m a tc h ing c ons ide r e dhe r e i n t ha t a pa t t e r n i s m a tc he d a ga in s t t he e n t i re sub j e c t tr e e a nd no t a ga in s t p r op e rsubt rees a s we l l . Pa t te rn ma tch ing in our sense has been s tud ied in [18 , 23 , 24 , 27] .W i t h t h e e x c e p t io n o f [2 3 ], t h e s e p a p e r s e x a m i n e t h e p r o b l e m w i t h o u t c o n s i d er i n gthe spe c i f ic r e qu i r e me n t s o f sub t r e e r e p l a c e m e n t sy s te ms . K a r p e t a l. [ 18 ] g ive a na lgo r i t hm w h ic h f i nds a ll m a tc he s o f a pa t t e r n t r e e t o sub t r e e s o f a sub je c t . Byp r e p r oc e s s ing the pa t t e r n ( s ) i nvo lve d w e ge t mor e e f f i c i e n t me thods . Re c e n t ly ,O v e r m a r s a n d v a n L e e u w e n [ 2 7 ] h a v e s t u d i e d t r ee p a t t e rn m a t c h i n g , b u t w i t h ad i f fe r e n t c la s s o f t re es . T h e y d i s co v e r e d i n d e p e n d e n t l y m a n y o f t h e t e c h n i q u e s w ede ve lop in S e c t ion 8 , a nd the i r f a s t e s t a lgo r i t hm ha s a pe r f o r ma nc e e qua l t o ou r

  • 7/27/2019 Pattern Matching in Trees

    3/28

    7 0 c . M . H O F F M A N N A N D M . J . O ' D O N N E L LA lgor i t hm D . W e d i sc us s t he i r r e su l t s a nd the r e l a t i onsh ip t o ou r w or k in S e c t ion 9 .K r o n ' s w o r k [ 2 3 ] i s r e la t e d t o t h e b o t t o m - u p t e c h n i q u e s o f S e c t io n s 3 a n d 4 . W ed i sc us s t he de t a i l s a t t he e n d o f S e c t ion 4 .

    I n a p p l i ca t i o n s o f tr e e r e p l a c e m e n t s t h e s a m e s e t o f r u le s i s t y p i c a ll y u s e d m a n yt ime s . P r e p r oc e s s ing o f t he r u l e s is a dva n ta ge o us i f it spe e ds up t he i r a pp l i c a ti on .E a c h r e p l a c e m e n t c a u s e s a l o c a l c h a n g e i n t h e s u b j e c t tr ee . S o o u r p a t t e r n -m a t c h i n gt e c h n iq u e s s h o u l d b e a b l e t o r e s p o n d i n c r e m e n t a l ly t o lo c a l c h a n g e s i n t h e s u b j e c t t oa v o i d r e p e a t e d r e s c an n i n g o f t h e e n t ir e t r ee . F o r t h e s a k e o f a s im p l e p r e s e n t a ti o n w ed i sc us s e a c h a lgo r i t hm in t e r ms o f a s t a ti c sub j e c t f i rs t a nd the n in t r oduc e a d a p ta t i on sto ha nd le c ha ng ing sub je c t s .I n S e c t io n 2 w e p r ec i s el y d e f i n e th e m a t c h i n g p r o b l e m a n d o u r c r i te r ia f o r a g o o ds o l ut io n . T h e r e m a i n d e r o f th e p a p e r d i v i d e s in t o t w o p a r ts , c o r r e s p o n d i n g t o t h e t w ob a s i c a p p r o a c h e s w e g iv e. S e c t i o n s 3 - 7 d e v e l o p t h e b o t t o m - u p a p p r o a c h t o p a t t e rnma tc h ing . H e r e w e m a tc h i n a sub j e c t t r e e by t r a ve r s ing i t f r om th e l e a ve s t o t her o o t. T h i s m e t h o d i s a s ig n i fi c an t g e n e r a li z a ti o n o f t h e K n u t h - M o r r i s - P r a t t s tr in g -m a t c h i n g a l g o ri th m . I n S e c t io n s 8 a n d 9 w e g i v e o u r s e c o n d a p p r o a c h , m a t c h i n g t o pd o w n b y t r a v e r s i n g t h e s u b j e c t r o o t t o l e a v e s . W h i l e t h e b o t t o m - u p m e t h o d g e n e r -a l i z e s s t r i ng ma tc h ing , t he t op - dow n me thod r e duc e s t r e e ma tc h ing to a s t r i ng -m a t c h i n g p r o b l e m .T h e b o t t o m - u p m e t h o d i s c h a r a c te r i ze d b y m o r e e x p e n s i v e p r e p r o ce s s in g b u t f a s te rm a t c h i n g a n d a b e t t e r re s p o n s e to l o c a l c h a n g es . I t is d e v e l o p e d f r o m t h e n o t i o n o fm a t c h s e t s - - s e t s o f s u b p a t te r n s w h i c h m a t c h a t a p a r t i c u la r t r e e n o d e . T h e b a s i cm a tc h ing a lgo r i t hm i s i n t r odu c e d m S e c t ion 3 . P r ope r t i e s o f m a tc h s e ts a r e s tud i e din S e c t ion 4 . S inc e i t t u r n s ou t t ha t c e r t a in t r e e pa t t e r n s ha ve e xpone n t i a l l y ma nyd i f f e r e n t ma tc h s e t s , w h ic h w ou ld l e a d t o a n e xpone n t i a l p r e p r oc e s s ing a lgo r i t hm,w e in t r oduc e i n S e c t ion 5 a r e s t r i c t i on on t r e e pa t t e r n s w h ic h a l l ow s e f f i c i e n tp r e p r oc e s s ing a lgo r i t hms . S e c t ion 6 g ive s t he p r e p r oc e s s ing a lgo r i t hm a nd d i s c us se si ts r e l a t ionsh ip w i th t he p r e p r oc e s s ing a lgo r i t hms in [1 , 21 ]. I n S e c t ion 7 w e ske t c h abe t t e r p r e p r oc e s s ing a lgo r i t hm f o r b ina r y t r e e pa t te r n s .S e c ti o n s 8 a n d 9 g iv e o u r t o p - d o w n a l g o r i th m a n d d i s cu s s p o s s i b l e i m p r o v e m e n t s .T h e s e a l g o r i t h m s h a v e b e t t e r p r e p r o c e s s i n g t i m e s t h a n t h e b o t t o m - u p m e t h o d , b u tt h e m a t c h i n g t im e s a n d u p d a t e b e h a v i o r a r e in f e ri o r t o th e b o t t o m - u p m e t h o d . T r e epa t t e r n s a r e r e duc e d to s t r i ngs w h ic h a r e ma tc he d a long pa th s i n t he sub j e c t , a s i n[ 18 ] . T he p r e p r oc e s s ing f o r t h i s t e c hn ique i s l i t t l e mor e t ha n the p r e p r oc e s s inga l g o r it h m f o r st ri n g m a t c h in g [1 ]. T h e b a s i c i d e a o f t h e t o p - d o w n m e t h o d l ie s in t h euse o f c oun te r s f o r c oo r d in a t ing t h e m a tc he s o f d i f f e r e n t pa th s t r ings . T h i s c ou n t inga l so t u r n s ou t t o be t he l imi t i ng f a c to r o f t he a lgo r i t hm a nd i s r e spons ib l e f o r t hew o r s t - ca s e b o u n d . W e c a n i m p r o v e t h is b o u n d o n m a c h i n e s w i t h b it -s t ri n g o p e r a t io n s ,a s i nd i c a t e d i n S e c U on 9.F o r t he r e s t r ic t e d c l as s o f t r e e pa t t e rn s i n t r odu c e d in S e c t ion 5 w e ha v e p r e p r o -c e s s ing a lgo r i t hms w h ic h r e qu i r e

    O ( p a t s i z e 2 + p a t s i z e "an~ h t )s teps . Here p a t s i z e i s t he su m o f t he pa t t e r n s iz es , h t t he he igh t o f a spe c i f i c t r e ew h i c h h a s t o b e c o n s t r u c t e d a s p a r t o f p r ep r o c es s in g , a n d r a n k t he h ighe s t r a nk inthe a lpha be t . I n t he w or s t c a se h t m a y b e a s b ig a s p a t s i z e . T h e a c t u a l m a t c h , b o t t o mup , r e qu i r e s O ( s u b s iz e + m a t c h ) t im e , w h e r e s u b s i z e i s t he s iz e o f t he sub j e c t t re e a ndm a t c h i s t h e n u m b e r o f m a t c h e s f o u n d . F o r b i n a r y a l p h a b e t s w e h a v e a p r e p r o c es s i n ga l g o ri t h m w h i c h r e q u i re s o n l y O ( p a t s u e h t z) s t ep s w h e n c o u p l e d w i t h a m o d i f i e db o t t o m u p m a t c h i n g a l g o r it h m r e q u i r i n g

    O ( s u b s i z e h t + m a t c h ) .

  • 7/27/2019 Pattern Matching in Trees

    4/28

    P a t t e rn M a t c h i n g i n T r e es 71a/ \b

    b \bFigure 1

    F o r t o p - d o w n m a t c h i n g w e h a v e a n O ( p a t s i z e ) p r e p r o c e s s in g a l g o r i t h m . H e r e w en e e d n o r e s t r i c t i o n s o n t h e t r e e p a t t e r n s . T h e m a t c h i n g r e q u i r e sO ( s u b s i ze s u f p a t n o )

    s t e p s , w h e r e s u f i s a q u a n t i t y d e p e n d i n g o n t h e s t r u c t u r e o f t h e p a t t e r n s u f fi x e s ( a tm o s t e q u a l t o t h e m a x i m u m h e i g h t o f a p a t t e r n ) a n d p a t n o is th e n u m b e r o f t re ep a t t e rn s t o b e m a t c h e d . F o r m a c h i n e s w i t h b i t -s t ri n g o p e r a t i o n s w e c a n , w i t h in t h es a m e t im e b o u n d f o r p r e p r o c es s in g , m a t c h u s i n g a d i f fe r e n t t e c h n iq u e i n o n l yO(subs i ze pa tn o ) s te p s. I f e a c h p a t t e r n h a s a h e i g h t n o t e x c e e d i n g th e n u m b e r o fb i ts in a m a c h i n e w o r d , t h e n t h is a l g o r i t h m is o f p r a c t i c a l i m p o r t a n c e .

    I n S e c t io n 1 0 w e d i s c u s s o t h e r p o s s i b i li t i e s o f b o t t o m - u p t r e e p a t t e r n m a t c h i n g o nm a c h i n e s w i th b i t - st r i n g o p e r a t io n s , a n d a t r a d e - o f f p r i n c ip l e f o r m a t c h i n g t i m ev e r s u s p r e p r o c e s s i n g t i m e a n d s p a c e .2. T h e T r e e - M a t c h i n g P r o b l e mW e a r e g i v e n a f in i te r a n k e d a l p h a b e t Z o f f u n c t i o n s y m b o l s , i n c l u d i n g c o n s t a n t s a sn u l l a r y f u n c t io n s . S d e n o t e s t h e s e t o f Z - t e r m s , f o r m a l l y d e f i n e d a s f o ll o w s .

    Def in i t i on 2 .1( i) F o r a l l b in Z o f r a n k 0 , b i s a Z - t e r m .( ii ) I f a is a s y m b o l o f r a n k q i n ]g , t h e n a (t l . . . . . tq ) i s a Z - t e r m p r o v i d e d e a c h o f

    the t~ is.( ii i) N o t h i n g e l se is a Z - t e r m .W e v i e w Z - t e r m s a s l a b e l e d o r d e r e d t r e e s . T h u s t h e t e r m a(a(b , b) , b) i s t h e t r e e o fF i g u r e 1. N o t e t h a t t h e tr e e s a(a(b , b) , b) a n d a(b, a(b, b)) a r e c o n s i d e r e d t o b ed i ff e re n t . I n t h e f o l lo w i n g w e u s e " Z - t r e e " a n d " Z - t e r m " i n t e rc h a n g e a b l y .

    W e a r e a l s o g iv e n a s p e c i a l n u l l a r y s y m b o l v, n o t i n Z , t o s e r v e a s p l a c e h o l d e r f o ra n y Z - t r e e . W e d e f m e t h e s e t o f Z U { v } - t e r m s j u s t a s ] g - te r m s b u t a d d t o ( i) t h a t vi s a Y . t_J { v } - t e r m . S o d e n o t e s t h e s e t o f Z U { v } - t e r m s .Def in i t i on 2.2. A t ree pa t t ern i s a n y t e r m i n S o . I f b ( tl , . . . , tq ) i s a t e r m , t h e nd e f i n e s o n , ( b ( tl . . . . . tq)) to b e t~ fo r 1 _< i

  • 7/27/2019 Pattern Matching in Trees

    5/28

    72

    F I G 2 . ( a ) S ubJe c t t r e e . ( b ) P a t t e r n .

    C . M . H O F F M A N N A N D M . J . O ' D O N N E L L( a ) a * ( b ) a/ \ a / \ ,

    a a* b \ vb/\b

    Def ini t ion 2.4 (The Match ing P rob lem) . A m atch ing prob lem c ons i s t s o f a f i n i tese t o f pa t t e rn s p l . . . . . pk i n S o a nd a sub j e c t t re e t i n S . A solut ion t o a m a t c h i n gp r ob le m i s a l is t o f a l l t he pa i r s ( n , i) , w he r e n i s a no de in t a nd p , m a tc he s a t n .O u r d e f i n i t i o n i s m o t i v a t e d p r i n c i p a l l y b y a l g o r i t h m i c p r o b l e m s a r i s i n g i n t h ei m p l e m e n t a t io n o f s u b t r e e r e p l a c e m e n t s y s te m s . A l l o w i n g d i f f er e n t s u b s t i tu t i o n s f o rd i f f e r e n t oc c u r r e nc e s o f v i s e qu iva l e n t t o u s ing a d i f f e r e n t va r i a b l e symbo l a t e a c hoc c u r r e nc e . T h i s r e s t r i c t i on i s mo t iva t e d by the o r e t i c a l p r ob l e ms w h ic h a r i s e w he nr e pe a t e d va r i a b l e s a r e pe r mi t t e d i n t he spe c i f i c a ti on o f t he r e p l a c e m e n t a x iom s [26 ,

    Sec . VII] .N o t e t h a t S o c o n t a i n s S a s s u b se t . T h u s e v e r y E - t r e e i s a l s o a p a tt e rn . W e d e v e l o pou r r e su l t s a s suming pa t t e r n s c on ta in a t l e a s t one oc c u r r e nc e o f v , s i nc e pa t t e r n sw i thou t va r i a b l e oc c u r r e nc e s a r e un in t e r e s t i ng f r om a p r a c t i c a l v i e w po in t . T h i sa s s u m p u o n d o e s n o t l im i t o u r r es u lt s.O u r m a t c h i n g p r o b l e m i s i n s o m e w a y s m o r e s p e c i f i c , a n d i n s o m e w a y s m o r ege ne r a l, t ha n f i r s t- o r de r un i f ic a t ion . O ur u se o f v c o r r e sp ond s t o a l low ing t e r ms w i thnon r e pe a t e d va r i a b l e s a s pa t t e r n s , w h i l e i n f i r s t - o r de r un i f i c a t i on r e pe a t e d va r i a b l e sa r e a l l o w e d a n d v a r i a b l e s m a y a l s o a p p e a r i n t h e s u b j e c t . O n t h e o t h e r h a n d , i n

    un i f i c a t i on on ly tw o t r e e s a r e ma tc he d a ga in s t e a c h o the r , a nd on ly a t t he r oo t ,w h e r e a s w e m a t c h a n y n u m b e r o f p a t t e rn s a n y w h e r e i n t h e s u b j e c t t re e .Def ini t ion 2.5. The size o f a t r e e i s t h e t o t a l n u m b e r o f s u b t r e e s ( e q u iv a l e n tl y ,nodes) in i t . The size o f a f o r e s t i s the su m o f the s izes o f a l l t r ees in it . Th e he igh t o fa t r ee i s t h e n u m b e r o f e d g e s i n a l o n g e s t p a t h f r o m t h e r o o t t o a l e a f o f t h e t r ee .W e a r e e spe c ia l l y i n t e r e s te d i n a pp l i c a t i ons i n w h ic h t he s e t o f pa t t e r n s r e ma insf ix e d a n d i s t o b e m a t c h e d a g a i n st a s e q u e n c e o f s u b j e c t tr ee s . W e t h e r e f o re c o n s i d e rp r e p r oc e s s ing the t r e e pa t t e r n s a nd d i s t i ngu i sh preprocessing t ime , i n v o l v i n g o p e r a -t io n s o n t h e p a t te r n s i n d e p e n d e n t o f a n y s u b j e c t tr e e, a n d match ing t ime , i nvo lv ing

    a l l sub j e c t de pe nde n t ope r a t i ons . Min imiz ing ma tc h ing t ime i s t he f i r s t p r io r i t y .P r e p r oc e s s ing t ime i s t he n min imiz e d w i th r e spe c t t o a f i xe d p r oc e s s f o r ma tc h ing .T r a d e - o f f s b e t w e e n p r ep r o c e ss i n g t i m e a n d m a t c h i n g t im e a r e c o n s i d e r e d i f t h ei m p r o v e m e n t i n p r e p r o ce s s in g i s d r a m a t i c a n d t h e d e g r a d a t i o n i n m a t c h i n g i s s m a l l.W e a l s o c o n s i d e r th e s p a c e r e q u i r e m e n t s i n p r e p r o c e s s in g a n d m a t c h i n g .W e a r e e s p e c i a l l y i n t e r e s t e d i n a l g o r i t h m s w h i c h m a y d e a f l y b e a d a p t e d t oa s s im i l a te l o c a l ch a n g e s t o t h e s u b j e c t w i t h o u t r e s c an n i n g t h e e n t i r e t re e . F o r b o t t o m -u p m a t c h i n g w e a c h i e v e l i n e a r m a t c h i n g t i m e s , b u t p r e p r o c e s s i n g t i m e m a y b ee x p o n e n t i a l . T o k e e p b o t t o m - u p p r e p r o c e s s i n g t i m e p o l y n o m i a l , w e n e e d s o m ea d d i t i o n a l c o n s tr a in t s o n p a t te r n s. F o r t o p - d o w n m a t c h i n g w e l o w e r t h e p r e p r o c e ss -ing t ime to l i ne a r, w i th n o r e s t ri c t ions o n p a t t e rn s , a t t he c os t o f a s l i gh t i nc r e a se i nm a t c h i n g t i m e . T h e b o t t o m - u p m e t h o d a d a p t s m o r e e a s il y t o c h a n g e s i n th e s u b j ec t .F o r t h e r e m a i n d e r o f t h is p a p e r , c o m p l e x it i es w i ll b e e x p r e ss e d i n t e r m s o fpatno: t h e n u m b e r o f d i f fe r e n t p a tt e r n s i n v o l v e dp a t s i z e : t he s i ze o f t he pa t t e r n f o r e s tsubsize: t he s iz e o f t he sub j e c t t re esym: t h e n u m b e r o f s y m b o l s i n t h e a l p h a b e t 21

  • 7/27/2019 Pattern Matching in Trees

    6/28

    Pattern Matching in Trees 73rank: t h e h i g h e s t r a n k ( a ri ty ) o f a n y s y m b o l i n 1~match: t h e n u m b e r o f m a t c h e s w h i c h ar e f o u n d

    A l l s u gg e s te d m e t h o d s f o r tr e e m a t c h i n g s h o u l d b e c o m p a r e d t o t h e n a i v e a l g o r i t h m( b a s e d o n a s i m p l e f o r m o f u n i f i c a t i o n ) , w h i c h m e r e l y t ri e s e v e r y p a t t e r n a t e v e r yp o s i t i o n i n t h e s u b j e c t t r e e . T h e n a i v e a l g o r i t h m d o e s n o p r e p r o c e s s i n g b u t t a k e sO(/oatsize x subsize ) m a t c h i n g t im e .3 . Th e Bo t t o m -U p M a t ch i n g A l g o r i th mT h e k e y i d e a o f t h e b o t t o m - u p m a t c h i n g a l g o r i t h m i s t o f in d , a t e a c h p o i n t i n t h es u b j e c t t r ee , a ll p a t te r n s a n d a l l p a r t s o f p a t t e r n s w h i c h m a t c h a t t h is p o i n t . L e t n b ea n o d e i n t h e s u b j e c t la b e l e d w i t h t h e q - a r y s y m b o l b, a n d s u p p o s e w e w i s h toc o m p u t e t h e s e t M o f a l l t h o s e p a t te r n s u b t r e e s o t h e r t h a n v w h i c h m a t c h a t n i n t h es e n s e o f D e f i n i t i o n 2.3 . ( S in c e v m a t c h e s a n y w h e r e , w e a l w a y s h a v e a m a t c h o f v .)S u p p o s e w e h a v e a l r e a d y c o m p u t e d s u c h s e ts f o r e a c h o f t h e s o n s o f n, a n d c a ll th e s es e t s , f r o m l e f t t o r i g h t , M 1 . . . . M q . T h e n M c o n t a i n s v p l u s e x a c t ly th o s e p a t t e r ns u b t r e e s b (t l . . . . . tq ) s u c h t h a t t , i s i n M , f o r 1 _< i _< q . T h e r e f o r e w e c o u l d c o m p u t eM b y f o r m i n g t r e e s b (6 . . . . . t q ) f o r a l l c o m b i n a t i o n s ( t l , . . . , t q ) , w he re the t~ a rec h o s e n f r o m M , , a n d t h e n a s k i n g w h e t h e r e a c h c a n d i d a t e f o r m e m b e r s h i p i n M is as u b p a t t e r n . O n c e w e h a v e a s s i g n e d t h e s e s e t s t o e a c h n o d e i n t h e s u b j e c t t r e e , w eh a v e e s s e n ti a ll y so l v e d t h e m a t c h i n g p r o b l e m , s in c e e a c h m a t c h i s s ig n a l e d b y t h ep r e s e n c e o f a c o m p l e t e p a t t e r n i n s o m e s e t.N o t e t h a t t h e r e c a n b e o n l y f i n i t e ly m a n y s u c h s e t s M , b e c a u s e b o t h Y. a n d t h e s e to f s u b p a t t e r n s a r e l 'm i te . T h u s w e c o u l d / o r ec o m / o u t e t h e s e s e ts , c o d e t h e m b y s o m ee n u m e r a t i o n , a n d t h e n c o n s t r u c t t ab le s . G i v e n a n o d e s y m b o l b a n d t h e c o d e s o f th eM , t h es e ta b l e s g iv e th e c o d e f o r M . I n t h e c a se o f a q - ar y s y m b o l b, w e w o u l d h a v ea q - d i m e n s i o n a l m a t r i x f o r t h a t s y m b o l .

    G i v e n s u c h t a b l e s , t h e m a t c h i n g a l g o r i t h m b e c o m e s t r i v i a l : T r a v e r s e t h e s u b j e c tt r ee i n p o s t o r d e r a n d a s s i g n t o e a c h n o d e n t h e c o d e c r e p r e s e n t i n g t h e s e t o f p a r t i a lm a t c h e s a t n a s d i s c u ss e d . T h e t a b le s c o n s i st o f a r r a y s , o n e f o r e a c h a l p h a b e t s y m b o l .I f n o d e n i s l a b e l e d w i t h th e q - a r y s y m b o l b, t h e n t h e q - d i m e n s i o n a l a r r a y f o r b isu s e d . T h e c o d e c a t n i s t h e v a l u e i n d e x e d b y t h e t u p l e ( c t . . . . cq ) w h e r e c, i s t h ec o d e a s s i g n e d t o t h e i t h s o n o f n ( f r o m t h e l e ft ). I f t h e s e t r e p r e s e n t e d b y c c o n t a i n st h e p a t t e r n / o , t h e n t h e p a i r ( n , i ) i s a d d e d t o t h e s o l u t i o n .

    T h e m a t c h i n g t i m e o f t h is a l g o r i t h m is c le a r ly O(subs ize) f o r c o m p u t i n g a ll c o d esp l u s O ( m a t c h ) f o r l is t in g t h e s o l u t i o n . T h e c o n s t a n t o f l i n e a r i t y in v o l v e s o n e a r r a yr e f e r e n c e f o r c o m p u t i n g t h e c o d e s , a s i n g l e t e s t t o d e t e r m i n e w h e t h e r a c o m p l e t ep a t t e r n m a t c h i s p r e s e n t, p l u s th e o v e r h e a d f o r t h e p o s t o r d e r t r a v e r s a l. N o t e t h a t t h ec o d e s m a y b e a s s ig n e d s o t h a t a l l c o d e s i n d i c a ti n g m a t c h e s a r e c o n t i g u o u s . T h e s p a c er e q u i r e m e n t s d e p e n d o n t h e t a b l e s iz e a n d a r e d i s c u s s e d in S e c t i o n 4.

    E x a m p l e 3 . I . C o n s i d e r a m a t c h i n g p r o b l e m i n w h i c h t h e p a t t e rn s/Ol = a(a(v, v), b) an d /o2 = a(b , v )

    a r e t o b e m a t c h e d . A s s u m e t h e a l p h a b e t 1~ is { a, b , c } , w h e r e a i s b i n a r y a n d b a n dc a re n u l l a r y s y m b o l s . F o r r e a s o n s t o b e e x p l a i n e d l a t er , o f t h e t h i r t y - t w o p o s s i b l es e ts o f p a t t e r n s u b t r e e s o n l y t h e f o l l o w i n g f iv e c a n a r i s e a s r e s u l t o f m a t c h i n g :Se t 1 = (v} ,Set 2 = {b, v) ,Se t 3 = (a (v , v ) , v} ,S e t 4 = ( a ( b , v), a(v, v), v ) ,Se t 5 - (a(a(v , v ) , b ) , a (v , v ) , v ) .

  • 7/27/2019 Pattern Matching in Trees

    7/28

    7 4

    Figure 3

    C. M. HOFFMA NN AND M.Table for no de label a.

    Right sonLeft son 1 2 3 4 5

    1 3 3 3 3 32 4 4 4 4 43 3 5 3 3 34 3 5 3 3 35 3 5 3 3 3Table for n ode label b' 2Table for node label c 1

    a 3

    Figure 4

    J . O'DONNELL

    T h u s , a s s i g n i n g a 4 to s o m e n o d e n o f a s u b j e c t w o u l d i n d i c a t e t h a t e a c h o f t h em e m b e r s o f S et 4 m a t c h e s a t n. I n p a r t ic u l a r , p z m a t c h e s . A s s ig n i n g 5 im p l i e s a m a t c ho f p l .

    F i g u r e 3 s h o w s t h e t a b l e s f o r a , b , a n d c . F o r i n s t a n c e , t h e e n t r y a t (3 , 2) in t h et a b l e f o r a i s 5 , b e c a u s e a t t h e l e ft s o n w e h a v e a m a t c h o f b o t h a ( v , v ) a n d v , a n d a tt h e r i g h t s o n o f b a n d o f v. F o r t h e n u l l a r y s y m b o l s b a n d c t h e t a b l e s a r e0 - d i m e n s i o n a l , c o n s i s t i n g o f o n e e n t r y e a c h .

    F i g u r e 4 s ho w s t h e c o m p l e t e a s s ig n m e n t o f co d e s w h e n u s in g t h e b o t t o m - u pa l g o r i t h m w i th t h e s e ta b le s . N o t e t h a t p l m a t c h e s a t th e n o d e w i t h c o d e 5 a n d p 2 att h e n o d e w i t h c o d e 4 . [ ]

    T h e r e i s s o m e s i m i la r it y b e t w e e n b o t t o m - u p m a t c h i n g a n d f o r m a l p a r s in g m e t h o d ss u c h a s L R ( k ) p a r s in g . I n b o t h c a s e s a f in i te n u m b e r o f p o s s ib l e c o n f i g u r a t i o n s a r ep r e c o m p u t e d , a n d t a b l e s a r e f o r m e d t o d r i v e t h e p a r s i n g / m a t c h i n g p r o c e s s . A s w i t hL R ( k ) p a r s i n g , o u r t a b l e s w i ll s o m e t i m e s b e v e r y l a rg e , b u t w e i s o la t e a s ig n i f i c a n tc l as s o f p r o b l e m s i n w h i c h t h e t a b l e s i ze is k e p t s m a l l .

    W h e n a lo c a l c h a n g e i s m a d e t o a su b j e ct tr ee , m a t c h i n g c o d e s m u s t b e r e c o m p u t e df o r t h e c h a n g e d p o r t i o n a n d s o m e a n c e s to r s o f t h e c h a n g e d p o r t io n . I n S e c t io n 4 w es ee t h a t t h e n u m b e r o f a n c e s to r s w h o s e c o d e s m u s t b e r e c o m p u t e d is b o u n d e d b y t h el ar g e st h e ig h t o f a p a t te r n . N o t e t h a t in t h e s e a n c e s to r s n e w m a t c h e s c o u l d a p p e a r o ro l d m a t c h e s d i s a p p e a r . T h u s i t s e e m s i n t u i t i v e l y u n l i k e l y t h a t a n y m e t h o d c o u l du p d a t e w i th l e ss re c o m p u t a t i o n .4 . P a t t e rn R e l a ti o n s a n d M a t c h S e t sW e n o w t u r n t o s tu d y i n g t h e s e ts o f p a r t i a l m a t c h e s u s e d i n t h e b o t t o m - u p m a t c h i n ga l g o r i t h m o f S e c t i o n 3 . W e b e g i n b y p r e c i s e l y d e f i n i n g t h e s e se t s a n d d e r i v i n gp r o p e r t i e s w h i c h w e w i l l l a t e r e x p l o i t i n d e s i g n i n g g o o d p r e p r o c e s s i n g a l g o r i t h m s .

    D e f i n i t i o n 4 .1 . L e t F = ( p l . . . . . p k } b e a s e t o f p a t t e r n s i n S v a n d P F t h e s e t o fa l l su b t r e e s o f t h e p , . A s u b s e t M o f P F is a m a t c h s e t f o r F i f t h e r e e x i s t s a t r e e t i nS s u c h th a t e v e r y p a t t e r n i n M m a t c h e s t a t t h e r o o t a n d e v e r y p a t t e r n m P F - Md o e s n o t m a t c h t a t t h e r o o t .

  • 7/27/2019 Pattern Matching in Trees

    8/28

    P a t t e r n M a t c h i n g i n T r e e s 75N o t e t h a t i f v i s i n P F , t h e n v i s i n e v e r y m a t c h s et . O b s e r v e a l s o t h a t t h e c o n c e p to f m a t c h s ets d e p e n d s o n t h e p a t t e r n f o re s t F .E x a m p l e 4 .1 . C o n s i d e r t h e p a t t e r n fo r e s t F = { p , , p 2} , w h e r e p , a n d p 2 a r e a s i nE x a m p l e 3 .1 . T h e n t h e se t M = {a(b , v) , a(v , v) , v} is a m a t c h s e t b e c a u s e o f t h e t r e e

    a(b , c) . H o w e v e r , M ' = {a(b , v) , v} i s n o t a m a t c h s e t, b e c a u s e a m a t c h o f a ( b , v )i m p l ie s a m a t c h o f a ( v , v ) a t t h e s a m e n o d e . [ ]O b s e r v e t h a t t h e s e t o f a ll p o s s ib l e m a t c h s e ts c o n t a i n s a l l se ts w h i c h t h e b o t t o m -

    u p m a t c h i n g a l g o r i th m c o u l d a s s i g n (i n e n c o d e d f o r m ) i n a n y s u b j e c t tr ee , g iv e n t h ep a t t e r n f o r e s t F .

    G i v e n F , le t M a t c h ( t ) d e n o t e t h e m a t c h s e t w h i c h m u s t b e a s s ig n e d a t t h e r o o t o ft h e s u b j e c t tr e e t. P F is t h e s e t o f a ll p a t t e r n s u b t r e e s f r o m F . W e c a n n o w f o r m a l l ys t at e t h e tw o p r o p e rt ie s o n w h i c h t h e b o t t o m - u p m a t c h i n g a l g o r i t h m i s b a s ed .

    Def in i t i on 4 .2(1 ) I f a i s a n u l l a r y s y m b o l , t h e n_ - ~ { a ' v } i f a is i n P F ,M a t c h ( a ) t { v } o t h e r w i s e .

    (2) I f a i s q -a ry , a > 0 , t henM a t c h ( a ( t , . . . . , t q ) ) - - {v) t _J { p ' l p ' h a s r o o t a a n d is in P F , a n d f o r

    1 _ j _ q , s o n j ( p ) i s i n M a t c h ( 6 ) } .N o t e t h a t b e c a u s e o f (2 ), M a t c h ( t ) d o e s n o t d e p e n d o n a n y n o d e i n t w h o s e d i s ta n c ef r o m t h e r o o t e x c e ed s th e m a x i m u m h e i g h t o f a p a tt e r n . B e c a u s e o f t h is a n d t h em a n n e r i n w h i c h c o d e s a r e a s s i g n e d , t h e b o t t o m - u p m a t c h i n g a l g o r i t h m r e s p o n d sw e l l t o l o c a l c h a n g e s i n a s u b j e c t t r e e . S e e [ 1 5 ] f o r d e t a i l s .

    I n p r i n ci p le , t h e r e q u i r e d e n u m e r a t i o n o f s et s a n d t a b le s m a y b e g e n e r a t e d b y as i m p l e c l o s u r e s t r a t e g y w h i c h s t a r t s w i t h M a t c h ( a ) f o r a l l n u l l a r y s y m b o l s a a n dr e p e a t e d l y cl o se s u n d e r t h e o p e r a t i o n ( 2) o f D e f i n i t i o n 4 .2 . S u c h a n a l g o r i t h m w o u l dr e q u i r e

    O(s et (rand+l) s ym x pa ts ize )t i m e , w h e r e se t i s t h e n u m b e r o f d i s t in c t m a t c h s e t s g e n e r a t e d . T h e t a b l e s iz e w o u l db e O(se t rank x sym ) . I n o r d e r t o i m p r o v e t h is t im e l im i t a n d t o b o u n d t h e s iz e o f set ,w h i c h c o u l d b e a s b a d a s 0 ( 2 p at . .. ) , w e n e e d t o u n d e r s t a n d c e r t a i n r e l a t io n s b e t w e e np a t t e rn s a n d m e m b e r s o f m a t c h s ets . W e d e f i n e t h e f o l l o w i n g r e la t io n s o n t re ep a t t e r n s .

    Def in i t i on 4 .3 . L e t p a n d p ' b e p a t t e r n s in S o. T h e n p i s i ncons i s t en t w i t h p '( w r i t t e n p U ' ) i f t h e r e is n o s u b j e c t tr e e t in S w i t h b o t h p a n d p ' i n M a t c h ( t ) . p a n dp ' a r e i n d e p e n d e n t ( w r i t t e n p ~ p ) i f t h e r e a r e t r e e s t l, t2, t3 i n S s u c h t h a t p i s i nM a t c h ( h ) , p ' i s n o t i n M a t c h ( 6 ) , p i s n o t i n M a t ch ( t2 ) , p ' i s i n M a t c h (t 2 ), a n d p a n dp ' a r e i n M a t c h ( t 3 ) . p s u b s u m e s p ' ( p >_ p ' ) i f, f o r a l l t i n S , p i n M a t c h ( t ) i m p l i e st h a t p ' is in M a t c h ( t) . p s t r ic t ly s u b su m e s p ' ( p > p ' ) i f p _ p ' a n d p # p ' . p < p ' i f fp ' > p .

    E x a m p l e 4.2. a(b , v ) l l a ( c , v ) , s i n c e b a n d c c a n n o t b o t h b e m a t c h e d i n t h es a m e p o s i t i o n , a(b , v) - a(v , c) , s ince a(b , v ) i n M a t c h ( a ( b , b ) ) , a(v , c ) n o t i nM a t c h ( a ( b , b ) ) ; a(b , v ) n o t i n M a t c h ( a ( c , c ) ) , a(v , c ) i n M a t c h ( a ( c , c ) ) ; a(b , v ) inM a t c h ( a ( b , c ) ) , a(v , c ) i n M a t c h ( a ( b , c ) ) . F i n a l l y , a(b , v) > a(v , v) . [ ]

  • 7/27/2019 Pattern Matching in Trees

    9/28

    7 6

    F i g u r e 5

    C . M . H O F F M A N N A N D M . J . O ' D O N N E L L

    p2---._...._ ~ p p4 p5 pe

    G i v e n d i s t in c t p a t t e r n s p a n d p ' , e x a c t l y o n e o f t h e r e l a t i o n s ]l, ~ , > , a n d < m u s th o l d b e t w e e n p a n d p ' . T h e e l e m e n t a r y p r o p e r ti e s o f t h e t h re e r e la t io n s a r e s u m m a -r iz e d b e l o w . N o t e t h a t i n t h e a b s e n c e o f v a r i a b l e s d is t in c t p a t t e r n s m u s t b e i n c o n -s i s ten t .

    PROPOSITION 4.1. F o r t r e e s p a , p 2, p 3 in S v :( a ) p ~ > p z a n d p z > p 3 i m p l i es p ~ > p 3 ;(b ) p~ [ Ip2 i f f p 2 Ilpx;( c ) p , ~ p 2 i f f p 2 ~ p l ;(d ) p l l iP2 a n d p 3 > p 2 i m p l ie s p l [[p3;( e ) p l ~ p 2 a n d p 2 > p 3 i m p l i e s p x ~ p 3 o r p l > p 3 .

    R e c a l l t h a t M ' o f E x a m p l e 4 .1 i s n o t a m a t c h s et b e c a u s e a ( b , v ) s u b s u m e s a (v , v ) .T h e i n c l u s i o n o f o n e p a t t e r n ( e .g ., a ( v , v ) ) i n M m a y b e t h e c o n s e q u e n c e o f t h ep r e s e n c e o f a n o t h e r p a t t e r n w h i c h s u b s u m e s i t (e .g ., a ( b , v ) ) . T h e r e f o r e , t h e r e m a y b ea s u b se t o f p a t t er n s i n M w h i c h c o m p l e t e ly d e te r m i n e s M . W e p a r t i t io n e a c h m a t c hs e t M i n t o a s e t M 0 o f p a ir w i s e i n d e p e n d e n t t r e e s a n d a s e t M ~ o f tr e e s s u b s u m e d b ys o m e t r e e i n M o . M o i s c a l l e d t h e b a s e o f M .

    PROPOSITION 4.2. G i v en a p a t t e r n f o r e s t F a n d m a t c h s e t M f o r F , t h er e is a u n i q u ep a r t i t i o n o f M i n to s e t s M o a n d M ~ s u c h t h a t f o r d i s t in c t p l , p 2 in M o, p a ~ p 2 h o l d s , a n df o r e a c h p ' i n M ~ t h e re i s a p i n M o s u c h t h a t p > p ' .

    O b s e r v e t h a t d i f f e r e n t m a t c h s et s m u s t h a v e d i f f e r e n t b a s e s et s, o w i n g t o P r o p o s i -t i o n 4 .1 a . T h u s w e m a y r e p r e s e n t m a t c h s et s b y t h e i r b a s e s e ts .

    D e f i n i t i o n 4 .4 . G i v e n a p a t t e r n f o r e s t F , t h e i n d e p e n d e n c e g r a p h G I o f F i s asf o l l o w s : T h e v e r t i c e s o f G I a r e d i s t i n c t t r e e s i n P F . T h e r e i s a n u n d i r e c t e d e d g eb e t w e e n p a n d p ' i f f p ~ p ' .

    E x a m p l e 4 . 3 . C o n s i d e r t h e p a t t e r n f o r e s t F = { p~ , p 2, p 3 } , w h e r e p ~ ffia ( b ( b ( v ) ) , v ), p z = a ( b ( v ) , b ( v ) ) , a n d p a = a ( v , b ( b ( v ) ) ) . T h e r e a r e t h r e e a d d i t io n a lt r e e s i n P F : p4 = b ( b ( v ) ) , p 5 = b ( v ) , a n d / ' 6 = v . S i n c e t h e t r e e s p l , p 2 , p 3 a r e p a i r w i s ei n d e p e n d e n t , w h e r e a s n o o t h e r t r e e p a i r s a re , th e i n d e p e n d e n c e g r a p h G~ o f F isa s s h o w n i n F i g u r e 5 , w i t h a c o n n e c t e d c o m p o n e n t p , , p2 , p 3 a n d t h r e e i s o l a te dp o i n t s . [ ]

    F r o m t h e i n d e p e n d e n c e g r a p h w e c a n d e r i v e a n u p p e r b o u n d o n t h e n u m b e r o fp o s s i b l e m a t c h s e ts o f a g i v e n p a t t e r n f o r e st .

    THEOREM 4.3. T h e n u m b e r o f p o s s ib l e m a t c h s e ts o f a p a t t e r n f o r e s t F i s a t m o s tt h e n u m b e r o f c l iq u e s i n th e i n d e p e n d e n c e g r a p h G i o f F , c o u n t i n g a l l s u b cl iq u e s ,in c lu d in g th e t r i v ta l o n e s .

    T h i s t h e o r e m f o l l o w s e a s i l y f r o m P r o p o s i t i o n 4 . 2 . T o i l l u s t r a t e i t , c o n s i d e r F o fE x a m p l e 4 .3 . T h e t h e o r e m w o u l d l i m it t h e n u m b e r o f m a t c h s e ts o f F t o te n , f o r G~h a s s ix t ri v i a l c li q u e s , t h r e e c l i q u e s o f s iz e 2, a n d o n e c l i q u e o f s i ze 3 . W e w o u l d t h u se x p e c t s ix m a t c h s e ts w ~th a b a s e s e t o f a s in g l e t o n , t h r e e m a t c h s e ts w i t h b a s e s e t sc o n s is t in g o f tw o t re e s e a c h , a n d o n e m a t c h s e t w i t h a b a s e s e t o f t h r e e e l e m e n t s .H o w e v e r , in t h i s e x a m p l e t h e r e is n o m a t c h s e t w i t h t h e b a s e { p , , p 3} , s in c e m a t c h i n g

  • 7/27/2019 Pattern Matching in Trees

    10/28

  • 7/27/2019 Pattern Matching in Trees

    11/28

    78 C. M. HOFFMANN AND M. J. O'DONNELLNote that mutual subsumption, in opposite directions, of disjoint subtrees is

    necessary but not sufficient for independence, since it does not rule out the possibilitythat other subtrees are inconsistent. For example, a(b, v , c) and a(v, b, d) areinconsistent, yet there are disjoint subtree pairs satisfying the "only if " condition ofProposition 4.5.

    Proposition 4.5 is used when testing the restrictions imposed on tree patterns in thenext section.We have recently learned that the idea of bottom-up tree pattern matching wasdiscovered independently by Kron [23]. He calls match sets "batches" and defines

    the relations >, I I , - (which he calls "more specific than ," "not overlapping," and"intersecting," respectively) equivalently by containment and intersection propertiesof the sets of 52-terms which two patterns match at the root.

    He matches patterns in a subject tree using an automaton as well. Instead of usingmatrices as tables, however, he computes the match set to be assigned to node n withq sons by a subautomaton which, m q transition steps reading the match set codes ofthe sons, determines the code for the new match set. There is one subautomaton peralphabet symbol. As a result, his match time is O(subsize). One can visualize eachsubautomaton as a trie encoding of one of our matrices. Depending on the patternstructure, this leads to smaller space requirements in certain cases.The preprocessing of Kron is essentially the method sketched in the paragraphsfollowing Definition 4.2. Because of Theorem 4.4, this preprocessing takes timeexponential in the pattern size in the worst case. As Kron tells us, he was aware ofthis, but it was not a concern of his research in [23]. We are going further andanalyzing match sets seeking a definition of a subclass of tree patterns with polyno-mial preprocessing time. We give such a definition in the following section.Preprocessing in Kron's sense has been used in practical situations by Wilhelm[10]. Since this work seems to accomplish practically viable preprocessing times, weconclude that the exponential worst case of bottom-up matching does not arisefrequently in these applications.5. Simple Pattern ForestsBecause of the exponential growth of the number of match sets for certain patternforests (Theorem 4.4), we wish to restrict patterns when generating tables to drive thebottom-up matching algorithm of Section 3. Theorem 4.3 suggests disallowingindependence among pattern subtrees. This restriction is not as drasUc as it mightseem and has not seriously hindered us when generating interpreters for LISP,LUCID, and the Combinator Calculus using these techniques [14].

    Definit ion 5.1. A pat tern forest F is simple if it contains no independent subtrees.For simple forests, the independence graph has no edges; hence, by Theorem 4.3,

    the number o f distinct match sets is at most the size of the forest. Furthermore, simpleforests have a number of useful properties which can be exploited in the design ofefficient matching algorithms.Definit ion 5.2. If F is a pattern forest, and p, p ' are subpatterns in PF, then pimmediately subsumes p', p > , p', if fp > p' and there is no other subpattern p " inPF such that p > p " and p" > p '. Immediate subsumption is the transitive reducUonof subsumption on the set of all subpatterns of F.

    Defimtion 5.3. The immediate subsumption graph Gs of the forest F has as verticesall distinct subpatterns in F. There is a directed edge from p to p' iff p >, p' . Ingeneral, Gs is a directed acycllc graph with v as the only leaf.

  • 7/27/2019 Pattern Matching in Trees

    12/28

    P a t t e r n M a t c h i n g i n T r e e sa ( a ( v , v ) , b ) a ( b , v )

    F IG 6 T h e i m m e d i a t e s u b s u m p t lo n g r a p h o f F .

    79

    LEMMA 5.1. T h e i m m e d i a t e s u b s u m p t i o n g r a p h G s o f a si m p l e o r e s t F i s a n i n v e rt e dt r e e wi th v a s r o o t .

    P R O O F. L e t p , p ' , a n d p " b e d is t in c t s u b t r ee s i n F , a n d a s s u m e th a t p s u b s u m e sb o t h p ' a n d p " , b u t n e i t h e r p > p " n o r p " > p ' . S i n ce p s u b s u m e s b o t h t re e s, p ' IIp "is i m p o s s i b l e ( P r o p o s i ti o n 4.1 d); h e n c e p ' a n d p " m u s t b e in d e p e n d e n t . B u t t h e n Fc a n n o t b e s i m p l e. H e n c e e i t h e r p ' > p " o r p " > p ' . [ ]O bse r ve t ha t f o r s im p le f o re s t s, t he ba se s e t Mo o f a ny m a tc h s e t m us t b e as ing l e ton . U s ing L e mma 5 .1 a nd P r opos i t i on 4 .2 , w e thus e a s i l y ob t a inTHEOREM 5.2. L e t F b e a a m p l e f o r e s t a n d M a n y m a t c h s e t f o r F w i t h b a se se t{ p } . T h e n M c o n s is ts p r e c t s el y o f th e t r ee s e n c o u n t e r e d o n t h e p a t h f r o m p t o v in G s .T hi s t he o r e m i s t he c e n t r a l r e su l t f o r s imp le f o r e s t s . I t f r e e s u s f r om ha v ing toc ons t r uc t e xp l i c i t l y t he i nd iv idua l ma tc h s e t s , f o r G s p r ov ide s t he m a t onc e a longw i th t he i r s t r uc tu r e a nd in t e r r e l a t i on . W e c onc lude t he s e c t i on w i th a n e xa mplei l lu s t r a ti ng T he o r e m 5 .2 , a nd a d i s cus s ion o f t he r e l a t i onsh ip b e tw e e n G s a nd thef a i lu r e f unc t ion f c ons t r uc t e d i n t he a lgo r i t hm f o r st ri ng pa t t e r n m a tc h ing in [ 1, 21 ].E x a m p l e 5 .1 . T he pa t t e r n f o r e s t F = { a ( a ( v , v ) , b ) , a ( b , v)} i s s imple , s ince the rea r e no i nde pe nde n t t r e e s o r sub t r e e s . I t s imme d ia t e subsumpt ion r e l a t i on i s

    b > , v , a (v , v ) > ~ v ,a ( b , v ) > ~ a ( v , v ) , a ( a ( v , v ) , b ) > ~ a ( v , v ) ,

    w h i c h h a s t h e g r a p h G s s h o w n i n F i g u r e 6 . F r o m t h i s g r a p h w e t h e n o b t a i n a sposs ib l e ma tc h s e ts the f i ve se ts o f E xa m ple 3 .1 :{ v } ,{b, v},{ a ( v , v ) , v } ,{ a ( b , v ) , a ( v , v ) , v } ,( a ( a ( v , v ) , b ) , a ( v , v ) , v } .

    N ote t he c o r r e spo nde n c e o f t he se s e ts t o t he pa th s i n G s . [ ]T h e r e i s a c o n n e c t io n b e t w e e n t h e im m e d i a t e s u b s u m p t i o n g r a p h G s a n d t h ef a i lu r e f unc t ion f u s e d in s t ri ng - pa t t e r n - ma tc h ing a lgo r i thm s in [ 1, 21 ]. T h i s c on ne c -t i on i s obs e r ve d b y v i sua li z ing a s t r ing pa t t e r n ala2 . . am a s t h e n o n b r a n c h i n g t re e

    a m (. a 2 ( a l ( v ) ) . . . ) . N o t e t h e r e v e r s al o f t h e c h a r a c te r s e q u e n ce . T h e a d d i t i o n o f v a sa l e a f pe r mi t s u s t o c onc e p tua l i z e t he a , as sym bo l s o f a r i t y 1 a nd pe r mi t s s l i d ing t henonbr a nc h ing t r e e i n t he sub j e c t . Ma tc h ing th i s pa t t e r n i n t he sub j e c t b lb 2 . . . b ni s now e qu iva l e n t t o ma tc h ing the nonb r a nc h ing t r e e pa t t e r n i n t he t r e eb n ( . . , b 2 ( b ~ ( c ) ) . . . ) , w he r e c i s a ne w nu l l a r y symbo l . H a v ing t r a ns l a t e d t he s t r i ng -m a t c h i n g p r o b l e m i n to a t r e e -m a t c h i n g p r o b l e m i n th i s w a y , w e n o w o b s e r v e th a t G sis j u s t t he g r a p h o f t he f a i l u re f un c t ion f c ons t r uc t e d f o r t he o r ig ina l s t r i ng p r ob le mby the a lgo r i t hms in [ 1 , 21 ] . T o obse r ve t h i s , no t e t ha t a sub t r e e c o r r e sponds t o a

  • 7/27/2019 Pattern Matching in Trees

    13/28

    80 C . M . H O F F M A N N A N D M . J . O ' D O N N E L Lpa t t e rn p re f ix , and t h a t p > p ' i f f p ' is a pa t t e rn p re f i x w h i ch m a t che s , a s su f fi x , i nt h e p a t t e r n p r e f i x p . H e n c e p > , p ' i f f p ' is th e l o n g e s t p r o p e r p r e f ix o f p w h i c hm at ches , a s su f f ix , in t he p re f i x p , w h i ch i s j u s t t h e de f i n i t i on o f t he f a i l u re func t i on .No t e a l so t ha t because o f P ropos i t i on 4 .5 , pa t t e rn fo res t s de r i ved f rom s t r i ngp a t t e r n s m u s t b e s im p l e , b e c a u s e n o n b r a n c h i n g t r e e s c a n n o t h a v e d i s jo i n t su b t re e s .H e n c e t h e r e i s n o c o u n t e r p a r t i n s t r i n g m a t c h i n g t o t h e e x p o n e n t i a l e x p l o s i o n o fm a t c h s ets , w h i c h c a n o c c u r f o r n o n s i m p l e f o re s ts i n t re e m a t c h i n g .6 . T a b l e C o n s tr u c ti o n f o r S i m p l e F o r e s tsF o r a s i m p le p a t t e r n f o re s t F , t h e t a b le s t o d r i v e th e b o t t o m - u p a l g o r i t h m o f S e c t io n3 m a y be con s t ruc t ed i n two s teps. F i r s t, cons t ruc t t he sub sum pt i on g r aph (~s w hosev e r ti c es a r e t h e tr e e s i n P F . (~s h a s a d i r e c t e d e d g e f r o m p t o p ' i f f p _ p ' . O b s e r v et h a t t h i s is e q u i v a l e n t to f i n d i n g a l l m a t c h s e ts w h i c h c a n o c c u r w h e n m a t c h i n g i na n y s u b je c t. T h e n , f o r e a c h a l p h a b e t s y m b o l a o f a r i ty m , w e u s e G s t o c o n s t r u c t at ab l e Ta such t ha t T a[ p~ . . . . . p m ] is t h e m a t c h - s e t c o d e w h i c h s h o u l d b e a s s ig n e d t oa n y n o d e l a b e l e d a a t w h o s e s o n s w e h a v e a s s ig n e d t h e m a t c h - s e t c o d e s p l t o pmfrom lef t to r igh t , respect ive ly .W e f re d i t con ven i en t t o r ep res en t a m a t c h s e t M by i ts base s e t t r ee , t ha t i s , by t hel a rges t ( i n t he s ense o f > ) t r ee i n M . Th i s is a r eason ab l e cho i ce s ince , by P ropos i t i on4 .2 a n d T h e o r e m 5 .2 , t h e l a rg e s t t r e e i n M c o m p l e t e l y d e t e r m i n e s M . T h e a d v a n t a g e so f th i s c o d i n g is t h a t w e c a n n o w d e f i n e t h e e n t r y T a [ p l . . . . . p ro ] as t he l a rges t t r eei n P F s u b s u m e d b y a ( p 1 , . . . , p r o ) , b e c a u s e o f o b s e r v a t io n ( 2 ) b e lo w . N o t e t h a t t h et r ee a ( p l . . . . . p ro ) n e e d n o t o c c u r in P F .

    To con s t ruc t (~ s, obse rve t h a t fo r d i s ti nc t pa t t e rn s p , p ' ,(1 ) I f p > p ' , t h e n h e i g h t ( p ) _ h e i g h t ( p ' ).(2 ) L e t p ffi a ( p ~ . . . . . p ro ). T h e n p > p ' i f fe i th e r p ' = v o r p ' = a ( p i . . . . . p 'm ), w h e r ep ~ _> p ~ fo r l < _ j < _ m .S o w e m a y p r oc e ss p a t te r n s i n o r d e r o f in c r e as in g h e i g h t a n d c o m p a r e e a c h p a t t e r nt o a l l pa t t e rn s o f no g re a t e r he i gh t u s i ng obse rva t i on (2) . S i nce t he sub pa t t e rn s p , a n dp" i n (2 ) above a re o f st r ic t ly sm a l l e r he i gh t t ha n p an d p ' , r e spec t ive l y , p~ _ p~ hasa l r e a d y b e e n c h e c k e d b y t h e t im e p is c o m p a r e d t o p ' .A l g o r i t h m AI n p u t : S i m p l e p a t t e r n f o r e s t F .O u t p u t : S u b s u m p u o n g r a p h G s f o r F.M e t h o d :I . L i s t t h e t r e es m P F b y i n c r e a s i n g h e i g h t2 . I m t i a h z e (~ s t o th e g r a p h w i t h v e m c e s P F a n d n o e d g es .3. F o r e a c h p = a ( p t . . . . p ro ), m > _ O , o f h e i g h t h , b y i n c r e a s in g o r d e r o f h e i g h t , d o4. f or e a c h p ' m P F o f h e i g h t _< h d o5 . I f p ' = v o r

    p ' = a ( p i . . . . . p ' ) w h e r e , f o r 1

  • 7/27/2019 Pattern Matching in Trees

    14/28

    P a t t e r n M a t c h i n g in T r e e s 8 1T o g e n e r a t e t h e t a b l e T ~, r e c a ll th a t f o r t h e m - a r y s y m b o l a a n d t re e s p l . . . . . prni n P F , T a [p ~ . . . . . p m ] = p , w h e r e p i s t h e l a r g e s t ( in t h e s e n s e o f > ) t r e e in P F s u c h

    t h a t a ( p ~ . . . . . p ro) > -- p . T h i s c a n b e s e e n a s f o ll o w s . I f a ( p ~ . . . . , p ro ) >-- t, t h e n e i t h e rt = v o r t = a ( p' ~ . . . . . p 'm ) an d , fo r 1 ___ i

  • 7/27/2019 Pattern Matching in Trees

    15/28

    82 C. M. HOFFMANN AND M. J. O'DONNELLTAB LE I TABLET o GENERATED FOR THE SYMBOL a

    Right subtree matchLeftsubtree match v b a(v, v) a(b, v) a(a (v, ), b)v a ( v , v ) a ( v , v ) a ( v , v ) a ( v , v ) a ( v , v )b a ( b , v ) a ( b , v ) a ( b , v ) a ( b , v ) a ( b , v )a ( v , v ) a ( v , v ) a ( a ( v , v ) , b ) a ( v , v ) a ( v , v ) a ( v , v )a ( b , v ) a ( v , v ) a ( a ( v , v ) , b ) a ( v , v ) a ( v , v ) a ( v , v )a ( a ( v , v), b ) a ( v , v ) a ( a ( v , v ) , b ) a ( v , v ) a ( v , v ) a ( v , v )

    s u b t r e e s i n c o r r e s p o n d i n g p o s i t io n s w h i c h s u b s u m e e a c h o t h e r i n o p p o s i t e d ir e ct io n s .I f suc h a pa i r e x i s ts , t he n t he p a t t e r n f o r e s t i s no t s imp le .E x a m p l e 6 .1 . W e i ll u s t ra t e A lgo r i t hm B w i th t he t a b l e T a ge ne r a t e d f o r t he sym bo l

    a , g ive n the pa t t e r n f o r e s t o f E xa m ple 5 .1 . T he t a b l e is e s se n t i a ll y t ha t o f E x a m ple3 .1 ; how e ve r , f o r r e a da b i l i t y w e r e p r e se n t e n t r i e s a nd inde x va lue s by t r e e s , r a the rt h a n e n u m e r a t i n g t h e m .I n t h i s e xa m ple , a l l ta b l e e n t r i e s a r e a s s igne d b y s t e p 5 , so none o f t he m i s v .C o n s i d e r p = a(a(v , v ) , b ) in t he t r a ve r sa l o f s t e p 3 . T h e m - tup l e s o f st e ps 4 a nd 5n o w r a n g e o v e r th e s e t s p ~ i n (a(v , v ) , a (a(v , v ) , b ) , a (b , v)} , s ince a(a(v , v ) , b ) a n da(b , v ) a r e t h e t w o t r e es s u b s u m i n g a(v , v), a nd p~ in { b} , s i nc e t he r e i s no o the r t r e es u b s u m i n g b . S o a(a(v , v) , b ) i s e n t e r e d i n Ta[a(v, v) , b] , Ta[a(a(v, v) , b ) , b ] , a ndT~[a(b, v), b]. T h e e n t r y Ta[a(v, v), b] h a d a l r e a d y b e e n a s s ig n e d t h e s m a l le r p a t t e r na(v, v) , s inc e a(v , v ) > v a nd b > v , bu t t h is e n t r y i s w ip e d ou t by a(a(v , v ) , b ) a t th i st ime . T a b le I show s the t a b l e T ~. [ ]

    C l e a r l y A l g o r it h m B c o n s t i tu t e s t h e b o t t l e n e c k o f p r ep r o c e ss i n g , b o t h i n s p a c e a n di n t i m e r e q u i r e m e n t s . O f t e n t h e s i t u a t i o n c a n b e i m p r o v e d b y i n t r o d u c i n g o n e o rm o r e p a i r i n g f u n c t i o n s , t h e r e b y r e d u c i n g r a n k t o 2 . A l t h o u g h p a i r i n g i s a l w a y sposs ib l e , i t ne e d no t p r e se r ve s im p l i c i ty o f t he f o r e s t a nd i s t hus o f l imi t e d va lue .E x a m p l e 6 .2 . Co ns ide r t he pa t t e r n f o r e s t { a (b , v , c ) , a(v, b , d) , a(e, c , v)} . Al lsub t r e e s o the r t h a n v a r e pa i r w i se i nc ons i s te n t , a n d thus t he f o r e s t is s imp le .I n t r oduc ing a pa i r i ng f unc t ion , no ma t t e r w h ic h sub t r e e s a r e pa i r e d , w i l l i n t r oduc einde pe nde nc e . F o r e xa mple , pa i r i ng t he f i r s t a nd se c ond sub t r e e r e su l t s i n a ne w

    fores t {a ' (pa i r (b , v ) , c ) , a ' (pa i r (v , b ) , d ) , a ' (pa i r (e , c ) , v )} in which pa i r (b , v ) andpa i r( v , b ) a r e i nde pe n de n t sub t r e es . [ ]T he r e i s a d i f f e r e n t a pp r oa c h to spe e d ing up p r e p r oc e s s ing . Re c a l l t ha t G sge ne r a l iz e s t he f a i l u r e f unc t ion o f s t ri ng ma tc h ing . W e suspe c t t ha t t he r e i s a ne f fi c ie n t b o t t o m - u p m a t c h i n g a l g o r it h m u s i n g G s d i re c tl y , w i t h o u t a n y t a b l es . S o f a rw e h a v e o n l y a c h ie v e d a ru n n i n g t i m e o f

    O(subs i ze x pa t s i ze h t )b y t h i s a p p r o a c h , w h i c h i s i n f er io r t o t h e n a i v e m e t h o d .7. Fas ter Preprocessing fo r Binary S im ple Fores tsA l g o r i t h m A i s q u a d r a t i c i n p a t s i z e s ince i t con s t ruc ts (~s , the t r ans i t ive c losu re of Gs ,r a t h e r t h a n G s . I t s e e m s t h e r e s h o u l d b e a n a l g o r i t h m f o r c o m p u t i n g G s f o r s i m p l epa t t e r n f o r e s t s w h ic h r e qu i r e s O (p a t s i z e ) s t e p s o n l y . S o f a r , w e h a v e n o t f o u n d a na lgo r i t hm th i s e ff ic i e n t, bu t i n t he spe c i a l c a se o f b ina r y s im p le pa t t e r n f o r e s ts w e c a nc ons t r uc t G s i n O(pa ts i ze h t 2) s teps . Here h t m a y b e a s l a r g e a s pats ize, bu t i t i su s u a l l y m u c h s m a l le r. G i v e n t h e a l g o r it h m f o r c o m p u t i n g G s , i t i s t h e n p o s s i b l e t o

  • 7/27/2019 Pattern Matching in Trees

    16/28

    Pattern Matching in Trees a/ 1 / ~ 2

    o

    b/ \ vFigure7

    83

    a da p t i t to do t he p a t t e r n m a tc h ing a s w e ll , bypa s s ing the e xp e ns ive s t e p o f t a b l ege ne r a t i on . W e ske t c h t he i de a o f t h i s a lgo r i t hm ne x t .Re c a l l t ha t i n a s imp le f o r e s t F , f o r e a c h su bpa t t e r n p i n P F the r e i s e xa c t ly onel ar g es t s u b s u m e d s u b p a t t e r n p ' i n P F , e x c e p t w h e n p = v. L e t f ( p ) de no te t h i s t r e ep ' , t h a t is , t h e t re e i m m e d i a t e l y s u b s u m e d b y p . D e n o t e t h e ith i te ra te o f f b y f ' ( p ),0 _ i , w he re

    f O ( p ) = p ,f ' + ~( p ) = f ( f ' ( p ) ) .

    N o t e t h a t G s i s t h e g r a p h o f th e f u n c t io n f .C o n s i d e r c o m p u t i n g f ( p ) , w h e r e t h e r o o t o f p i s a b i n a r y s y m b o l , t h a t is , p =a(p~, p2). W e s h o u l d e x a m i n e t r ee s o f t h e f o r m a ( f ' ( p l ) , f J (p 2 ) ), i + j > 0, as po ssib lec a n d i d a t e s f o r f ( p ) . F or t h i s pu r pose w e w i l l ma in t a in s e t s S(a , p l ) , w he r e a i s i n Za nd px is a pa t t e r n sub t r e e . E a c h se t c on ta in s pa i r s ( p2 , p ) o f subpa t t e r n s . T h e pa i r(p2, p ) is in S ( a , p l ) i f f p = a(p l , p2) is i n P F . I n c o m p u t i n g f ( p ) w e n o w p r o b e i nthe se t s S ( a , p l ) , S ( a , f( p a ) ) , S ( a , f Z ( p l ) ) . . . . f o r pa i r s w hose f i r s t c ompone n t i s p2 ,f (p2) , e tc . T he f ir s t suc h pa i r f ou nd ( o the r t ha n the pa i r ( p2 , p ) i n S ( a , p l ) ) m u s t b ef ( p ) , s in c e F i s a s im p l e f o re s t. W e m a k e a t m o s t O ( h t 2 ) p r o b e s , s i n c e f n t ( t ) = v, fo ra n y s u b p a t t e rn .W e c a n m a ke a s ing l e p r ob e e f f ic i e n t ly by r e p r e se n t ing t he s e t S (a, p~) b y a n a r ra yi n w h i c h t h e s e c o n d c o m p o n e n t o f a p a i r is s to r e d a s t h e e le m e n t i n d e x e d b y t h e f i rs tc o m p o n e n t . I n o r d e r t o a v o i d a n O(p atsize 2) overhead for in i t ia l iz ing a l l vec tor s , weuse t he c on s t a n t t ime a r r a y i n i t i a li z a ti on o f [2 , E x . 2 .12 ]. T he r unn ing t ime o f t hea lgo r i t hm i s t hus O(patsize htZ).

    O b s e r v e t h a t t h e a l g o r it h m c a n b e a d a p t e d t o d o t h e m a t c h i n g u s i n g t h e s e tsS(a , p~) w i thou t u s ing t he t a b l e ge ne r a t i on ( A lgo r i t hm B) . T h i s l e a ds t o a ma tc h inga l g o r it h m w h i c h r e q u i re s a t m o s t O(subsize x h t 2) steps.8. Top-D own Matching AlgorithmL i k e t h e b o t t o m - u p m a t c h i n g a l g o r it h m , o u r t o p - d o w n m a t c h i n g a l g o r it h m i s r e l a te dto t he K nu th - M or r i s - P r a t t s t r ing - m a tc h ing a lgo r i thm . I n s t e a d o f ge ne r a l iz ing s tr ingm a t c h i n g , h o w e v e r , th e t o p - d o w n a p p r o a c h r e d u c e s tr e e m a t c h in g t o st ri n g m a t c h i n g .T h e t o p - d o w n m e t h o d h a s s l o w e r m a t c h i n g t i m e t h a n t h e b o t t o m - u p , b u t b e t t e rp r e p r oc e s s ing t ime .T h e k e y ide a o f r e duc ing t r e e pa t t e r n m a tc h ing to s tr ing m a tc h ing is t o r e ga r de a c h pa th f r om r oo t t o l e a f i n a t re e a s a s tr ing i n w h ic h sym bo l s i n the a lph a be t a r ei n t e r l e a v e d w i t h n u m b e r s i n d i c a t i n g w h i c h b r a n c h f r o m f a t h e r t o s o n h a s b e e nf o l low e d . S inc e va r i a b l e s a lw a ys ma tc h , w e do no t i nc lude t he m in t he se s tr ings .

    E x a m p l e 8.1 . Th e t r ee pa t t e rn a(a(b, v) , c) i s a s soc i a te d w i th t he s e t o f s tr ings{ a l a l b , a l a 2 , a 2 c } . N o t e t h a t w e h a v e o m i t te d t h e s y m b o l v f r o m t h e e n d o f th ese c ond s t ri ng . F igu r e 7 show s how the s e t o f s t ri ngs a ppe a r s i n t he g ive n tr e e . [ ]

  • 7/27/2019 Pattern Matching in Trees

    17/28

    84

    FIG. 8 (a) Tree pattern (b) Associated the

    C. M. HOFFMANN AND M. J. O'DONNELL

    (a)

    T h i s i d e a w a s f i rs t n o t i c e d b y K a r p e t a l. [1 8 ] a n d u s e d i n a t r e e - m a t c h i n g a l g o r i t h mw i t h n o p r e p r o ce s s in g . T h e i r a l g o r i t h m a c h i e v e d a m a t c h i n g t i m e o f

    O((pats ize + subsize) x log(pats ize))f o r o n e p a t t e r n , w h i c h m u s t b e a f u l l b i n a r y t re e . F o r s e v e r a l p a t t e r n s t h e i r a l g o r i t h mw o u l d r e q u i re

    O((pats ize + subsize) x log(pats ize) x patno) .O u r c o n t r i b u t i o n i s t o s h o w h o w , u s i n g t h e K n u t h - M o r r i s - P r a t t a l g o r i t h m f o rs t r i n g m a t c h i n g , w e c a n i m p r o v e t h e b o u n d s t o O(pats i ze) p r e p r o c e s s i n g , p l u sO(subsize x patno) f o r m a t c h i n g , i n t h e c a s e o f p a t t e r n s w h i c h a r e f u l l t re e s . I f t h ep a t t e rn s a r e n o t f u l l t re e s, m o r e t i m e f o r m a t c h i n g i s n e e d e d . W e t h u s i m p r o v e t h eb o u n d o f K a r p e t a l. b y a f a c t o r o f log(pats ize) .

    F o r s i m p l i c it y o f p r e s e n t a t i o n w e d e v e l o p o u r r e s u l t s f o r th e c a s e o f a s i n g le t re ep a t t e r n f ir s t. G i v e n t h e p a t t e r n p , i t i s e a s y t o g e n e r a t e a l l p a t h s t r in g s f o r t h e r o o t - t o -l e a f p a t h s . W e c o u l d t h e n u s e th e a l g o r i t h m o f A h o a n d C o r a s i c k [1] t o p r o d u c e a na u t o m a t o n w h i c h r e c o g n i z e s e v e r y i n s t a n c e o f a p a t h s t r i n g w i t h i n a s u b j e c t t r e e .S i n ce t h e c o m b i n e d l e n g t h o f a l l st ri n g s c o u l d b e O(patsize2), w e n e e d t o m o d i f y th isc o n s t r u c t i o n s o a s to a v o i d g e n e r a t i n g t h e s t r i n g s e x p l ic i tl y . I n t h i s w a y w e c a n l o w e rt h e p r e p r o c e s s i n g t o O(pats ize) .T h e f i r s t s t e p i n t h e A h o - C o r a s i c k a l g o r i t h m i s t o b u i l d a t r i e f o r t h e p a t h s t r i n g so f t h e t r e e p a t t e r n p . T h i s t r i e is c a ll e d t h e " g o t o f u n c t i o n " i n [ 1 ]. A t r ie i s a t re ew h o s e n o d e s r e p r e s e n t t h e d i s t in c t p r e fi x e s o f t h e p a t h s t ri n g s . I f n o d e n r e p r e s e n tsx a n d n ' r e p r e s e n t s x a , a i n ~ t3 N , t h e n n i s f a t h e r o f n ' , a n d t h e e d g e f r o m n t o n 'i s l a b e l e d a . W e i l l u s t r a t e t h e c o n s t r u c t i o n w i t h a n e x a m p l e . S i n c e i t a m o u n t s t o as i m p l e t re e t r a n s f o r m a t i o n , w e d o n o t f o r m a l l y g i v e a n a l g o r i th m .

    E x a m p l e 8 .2 . T h e p a t t e r n t r e e a(a(b, v), c) h a s t h e a s s o c ia t e d tr ie s h o w n i n F i g u r e8 . F o r e x a m p l e , t h e m a r k e d n o d e re p r e s e n t s t h e p r e f i x a 2 . [ ]

    I n f o r m a l l y , th e t h e is c o n s t ru c t e d b y f ir s t e n u m e r a t i n g t h e o u t e d g e s o f e v e r yp a t t e r n n o d e a n d t h e n s p li tt in g e v e r y n o d e l a b e l e d w i t h a s y m b o l o th e r t h a n v i n t ot w o n o d e s c o n n e c t e d b y a n e d g e w h i c h i s la b e l e d w i t h t h e o r i g in a l n o d e l ab e l.T h e s u b s e q u e n t s te p s m c o n s t r u c ti n g a m a t c h i n g a u t o m a t o n a r e e x a c tl y as in [ 1],f o r w e a r e n o w d e a l i n g w i t h a s t ri n g p r o b l e m . T h u s t h e e n t ir e c o n s t r u c t i o n r e q u ir e sO(pats i ze) s te p s i f w e u s e a f a i lu r e - f u n c t i o n r ep r e s e n t a ti o n o f th e a u t o m a t o n a n dO (p a t s i z e x s ym ) i f w e u s e a t r a n s i t i o n - m a t r i x r e p r e s e n t a t i o n .W e n e e d t o i n c l u d e i n th i s c o n s t r u c ti o n a s i m p l e m o d i f i c a t i o n w h i c h r e co r d s, w i t h

    e a c h a c c e p t i n g s t a t e o f t h e a u t o m a t o n , t h e l e n g t h ( s ) o f t h e a c c e p t e d s t ri n g (s ) . T h el e n g t h o f a p a t h s t ri n g is th e n u m b e r o f a l p h a b e t s y m b o l s in i t ( n u m b e r s a r e i g n o re d ) .T h u s t h e l e n g t h f o r a 2c a n d a l a 2 i s 2 i n b o t h c a s es .

  • 7/27/2019 Pattern Matching in Trees

    18/28

    Pattern MatchinginTreesa/ \

    a / \b v

    ( a )b , c

    b , c , 1 , 2

    a I I~?

    b l ~ ~ c

    8 5

    (b )F I o . 9 ( a ) P a t t e r n ( b ) M a t c h i n g a u t o m a t o n .

    Example 8 .3 . I n F i g u r e 9 w e g iv e t h e a u t o m a t o n a s s o c ia t e d w i t h t h e p a t t e r n o ft h e p r e v i o u s e x a m p l e . A c c e p t i n g s t a te s a r e c i rc l e d t w i c e a n d a r e l a b e l e d w i t h t h el e n g t h o f t h e a c c e p t e d p a t h s tr in g . [ ]

    W e n o w h a v e t o s ol ve t h e p r o b l e m o f h o w t h e m a t c h i n g a l g o r i t h m c a n d e c i d ew h e t h e r t w o d i f f e r e n t p a t h s tr in g s b e g i n a t t h e s a m e n o d e a n d t h u s c o n t r i b u t e t o ap a t t e r n m a t c h a t t h a t n o d e . F o r t h i s p u r p o s e w e a s s o c i a t e w i t h e a c h n o d e a c o u n t e r ,i n i ti a l iz e d t o z e r o . E a c h c o u n t e r w i l l r e c o r d t h e n u m b e r o f d i s t in c t r o o t - t o - l e a f p a t h sw h i c h m a t c h b e g i n n i n g a t t h a t n o d e .

    L e t u s t r a v e r s e t h e s u b j e c t tr e e t i n p r e o r d e r , c o m p u t i n g t h e a u t o m a t o n s t a t e s a s w ev i si t n o d e s a n d t r a v e rs e e d ge s . F o r r e c o v e r i n g f o r m e r st a te s w h e n r e t u r n i n g f r o m ac o m p l e t e l y t ra v e r s e d s u b t r e e w e c a n u s e th e t r a v e r s a l s ta c k . E v e r y t i m e t h e m a t c h i n ga u t o m a t o n e n t e rs a f i n a l s ta te , w e h a v e m a t c h e d o n e o r m o r e p a t h s tr in g s, a n d w es h o u l d i n d i c a t e t h i s f a c t a t t h e p o i n t s a t w h i c h t h e m a t c h e d p a t h s b e g i n . S o w ei n c r e m e n t t h e c o u n t e r s o f t h o s e n o d e s b y 1. T h e t r a v e r s a l st a c k f o r t h e p r e o r d e rt ra v e r sa l is k e p t i n a n a r ra y . T h u s w e c a n f m d t h e b e g in n i n g n o d e o f a m a t c h e d p a t hs t r i n g i n t h e t r a v e r s a l s t a c k a n d c a n a c c e s s i t i n c o n s t a n t t i m e o n c e w e k n o w t h el e n g t h o f t h e m a t c h e d s tr in g .

    A t t h e e n d o f t h e t r a v e rs a l th e p a t t e r n m a t c h e s a t e a c h n o d e w h o s e c o u n t e r e q u a l st h e n u m b e r o f l e a v e s i n t h e p a t t e r n ( i.e ., t h e n u m b e r o f p a t h s tr in g s ). W e c a n n o wg i v e t h e m a t c h i n g a l g o r it h m .

    W e w i l l u s e a n a r r a y o f t r ip l e s ( n , s , j ) a s t ra v e r s a l s t a c k , w h e r e n i s a n o d e i n t h es u b j e c t tr e e , s th e s t at e t h e a u t o m a t o n h a s e n t e r e d w h e n t h e t r a v e r s a l v is it s n , a n d ja n u m b e r i n d ic a t in g h o w m a n y s o ns o f n h a v e b e e n v i si te d . A d d i ti o n a l ly , w e h a v e a na r r a y Count, i n d e x e d b y n o d e s n o f th e s u b j e ct t re e , w h i c h c o n t a i n s t h e a s s o c ia t e dc o u n t e r s .

  • 7/27/2019 Pattern Matching in Trees

    19/28

    86 . M . H O F F M A N N A N D M . J . O ' D O N N E L LW e a s s u m e t h a t t h e a l g o r i t h m u s e s a t r a n s i t i o n - t a b l e r e p r e s e n t a t i o n o f t h e a u t o m -

    a t o n a n d i n d i c a t e b y A I s, c ] t h e s t a t e t h e a u t o m a t o n e n t e r s w h e n i n s ta t e s r e a d i n gs y m b o l c .

    W e u s e a p r o c e d u r e T a b u l a t e , w h i c h m a i n t a i n s t h e c o u n t e r s a n d u p d a t e s t h e l i s t o fm a t c h e s f o u n d . T h i s p r o c e d u r e c a n a c c e ss t h e s ta c k o f t r i p le s .Algorithm D (Top D own Matchmg)Input A s t r ing ma tching au to ma to n for t r ee pa t t e rn p m t r ans i t ion ma t r ix r epre senta t ion , and asubject tree t .Output. A hst, M atch, of a l l nod es in t a t wh ich p matches.Comment. A Is, c ] is the s ta te en te red f rom s un de r inp ut c m the m a tch ing au toma ton .Stack[t] ~denote s the ah com ponen t of the t r ip le st acked a t pos i t ion t m the a r ray S tackson,(n) denotes the z th son of t ree nod e nMethod:

    1 Ma tch := empty ,2 For a l l nodes n m t do Count [n] = 0 ,3 Nex tstate = A [start state, label(roo t of t)];4 . Top '= l ,5 Stack[To p] = ( root of t , Nextstate, 0) ,6. Tabula te(Nexts ta te ) ,7 . W hi le Top > 0 do begin8 (Thisn ode , Thiss ta te , Nso ns) .= Stack[Top] ,9 . I f Nson s = an ty(Th isnode ) then Top = Top - 1 ,10. else begin11 Nsons '= Nson s + 1 ,12. Stack [To p].3 .= Nsons;13. Ints ta te = A[T hlss ta te , Nson s] ,14. Tabula te(Ints ta te ) ;15. Ne xtn od e = sonNBo~(Thisnode),16. Nexts ta te '= A [ lnts ta te , labe l(Nex tnode)] ,17. To p .= To p + 1,18. Stack[To p] = (Nex tnod e , Nexts ta te , 0) ,19. Tabula te(Nexts ta te ) ,20. end ( i f )21 end (while)Procedure Tab ula te (Sta te)I . For a l l s such tha t S ta te ha s a m a tch of l ength s2. do begin3 n = Stac k[T op - s + 1].1;3 Co unt [n] .= Cou nt [n] + 1 ,4 I f Cou nt [n] = nu m ber of l eaves in pa t t e rn then5 Ad d n to Ma tch ,6 en d (for)

    E x c e p t f o r t h e w o r k o f p r o c e d u r e T a b u l a t e , t h e c o m p l e x i t y o f A l g o r i t h m D i sO ( s u b s i z e ) , s i n c e e a c h e d g e i s t r a v e r s e d a t m o s t t w i c e . T h i s i s a l s o t r u e f o r t h e f a i l u r e -f u n c t i o n r e p r e s e n t a t i o n o f t h e m a t c h m g a u t o m a t o n ( se e [1 ]) . T h e t o t a l w o r k o fp r o c e d u r e T a b u l a t e i s p r o p o r t i o n a l t o t h e n u m b e r o f t i m e s a n y c o u n t e r h a s b e e ni n c r e m e n t e d , o r e q u i v a l e n t l y , t o t h e s u m o f a l l c o u n t e r v a l u e s u p o n c o m p l e t i o n o ft h e t r a v e r s a l . W e c a n e s t i m a t e t h is s u m b y d e r i v i n g a b o u n d o n t h e n u m b e r o fd i f f e r e n t c o u n t e r s w h i c h c a n b e i n c r e m e n t e d i n a n a c c e p t i n g s t a te , f o r t h i s w i l l a l sob o u n d t h e w o r k d o n e f o r e a c h c a l l o f t h e p r o c e d u r e .

    D e f i n i t i o n 8 .1 . G i v e n a t re e p a t t e r n p a n d a p a t h s t r i n g s o f p , t h e s u f f i x n u m b e ro f s i s t h e n u m b e r o f p a t h s t r i n g s s o f p w h i c h a r e s u f f i x e s o f s, i n c l u d i n g p i ts e l f. T h es u f f i x i n d e x o f p i s t h e m a x i m u m s u ff ix n u m b e r o f t h e p a t h s t r i n g s o f p .

  • 7/27/2019 Pattern Matching in Trees

    20/28

    P a t t e r n M a t c h i n g i n T r e e s 8 7E q u i v a l e n t l y , th e s u f f ix in d e x i s t h e l a r g es t n u m b e r o f c o u n t e r s w h i c h c o u l d b e

    i n c r e m e n t e d i n a n y a c c e p t st at e o f t h e a u t o m a t o n .E x a m p l e 8 .4 . F o r t h e p a t t e r n p = a(a(a(v , b) , c) , b) w e h a v e t h e p a t h s t r i n g s

    a l a l a l , a l a l a 2 b , a l a 2 c , a 2 b . T h e s u ff ix n u m b e r o f a l a l a l i s 1 , w h e r e a s t h e s u f f i xn u m b e r o f a l a l a 2 b i s 2 , s ince a 2b is a s u f f ix w h i c h o c c u r s a s r o o t t o l e a f p a t h i n p .T h e s u f f ix i n d e x o f p i s a l s o 2 . [ ]

    THEOREM 8.1. A l g o r i t h m D r e q u i r e s O ( s u b s i z e x s u f ) s t ep s, w h e r e s u f i s t h e s u f f i xi n d e x o f t h e p a t t e r n t o b e m a t c h e d .F o r p a t t e r n s w h i c h a r e f u l l t r e es , t h a t is , a l l p a t h s t ri n g s a r e o f e q u a l l e n g t h , s u f

    m u s t b e 1, s i n c e a d i s ti n c t p a t h s t r in g s l c a n b e a p r o p e r s u f f i x o f a d i s t i n c t p a t hs t r i n g s2 o n l y i f s l i s s h o r t e r t h a n s2 . T h i s g i v e s u s

    COROLLARY 8.2. I f A lg o r i t h m D m a t c h e s a p a t t e r n w h ic h i s a f u l l t re e , th e n o n l yO ( s u b s i ze ) s t e p s a r e n e e d e d .I n t h e w o r s t c a s e , s u f c o u l d b e O ( p a t s i z e ) .

    E x a m p l e 8 .5 . C o n s i d e r t h e p a t t e r n ,pk = a ( a ( . . , a ( v , b ) . . . b ), b ).

    k t i m e sI ts s u f f ix i n d e x i s k , o w i n g t o t h e p a t h s t r in g ( a l ) k - l a 2 b , w h i c h h a s e v e r y s h o r t e r p a t hs t ri n g a s s u f fi x . N o t e t h a t p a t s i z e i s 2k + 1. [ ]

    COROLLARY 8.3. T h e b o u n d o f O (s u b s i ze x p a t s i ze ) f o r A l g o r i t h m D is a t t a in e df o r c e r ta i n p a t t e r n s .P R O O F . C o n s i d e r m a t c h i n g t h e p a t t e r n p k o f E x a m p l e 8 .5 in t h e s u b je c t ,

    tn = a ( a ( . . , a ( c , b ) . . . b ) , b ) ,J

    n t i m e sw h e r e n = k + m . T h e n t h e s u m o f t h e c o u n t e r v a l u e s i n tn a f t e r A l g o r i t h m D h a sf i n is h e d e x c e e d s m k . N o t e t h a t p a t s i z e i s 2 k + 1 a n d s u b s i z e i s 2n + 1. [ ]

    W e t h u s h a v e i n A l g o r i t h m D a p e r f o r m a n c e r a n g e a n y w h e r e b e t w e e n t h a t o f t h eb o t t o m - u p a l g o r it h m a n d t h a t o f t h e n a iv e m a t c h i n g a l g o ri th m , d e p e n d i n g o n t h es t r u c t u r e o f t h e p a t t e r n .

    W i t h o u t g o i n g i n to d e t ai ls w e n o t e t h a t A l g o r i t h m D m a y b e a d a p t e d t o a s s im i l a tel o c a l c h a n g e s i n t h e s u b j e c t tr e e. A s i n t h e c a s e o f th e b o t t o m - u p a l g o r i th m , w e n e e dt o re p r o c e s s o n l y a s m a l l a r e a s u r r o u n d i n g t h e p a r t w h i c h h a s c h a n g e d . H o w e v e r , th ea l g o r i t h m i c d e ta i ls a r e f a r m o r e c o m p l i c a t e d t h a n i n t h e c as e o f t h e b o t t o m - u pa l g o r it h m , a l t h o u g h i n p r in c i p l e q u i t e s t r a ig h t f o r w a r d .

    W e c o n c l u d e t h i s s e c ti o n w i t h a b r i e f d i s cu s s io n o f h o w t o m a t c h m o r e t h a n o n et r e e p a t t e r n , u s i n g th e a p p r o a c h o f A l g o r i t h m D .

    R e c a l l t h a t w e r e p r e s e n t a t r e e p a t t e r n b y i t s r o o t - t o - l e a f p a t h s tr in g s . W e c a n d ot h is f o r s e v e r al p a t t e r n s a s w e l l, b u t w e s h o u l d k e e p t r a c k o f w h i c h p a t t e r n (s ) e a c hp a t h s t r i n g c o m e s f r o m . T h e p r e p r o c e s s i n g a l g o r i t h m c a n b e a d a p t e d t o p r o c e s ss e v e ra l p a t t e rn s b y b u i l d i n g s e p a r a t e l y f o r e a c h p a t t e r n t h e a s s o c i a te d t r ie a n d t h e nm e r g i n g t h e s e tr ie s, k e e p i n g t r a c k o f w h i c h p a t t e r n (s ) e a c h p a t h s t ri n g a t a l e a f o f t h et ri e b e l o n g s to . T h i s c a n b e d o n e i n O ( p a t s i z e ) s t e p s r e s u l t i n g i n a t r i e o f O ( p a t s i z e )n o d e s . N o w a p p l y t h e m e t h o d s o f [1] t o c o m p l e t e t h e t ri e to a m a t c h i n g a u t o m a t o n .

  • 7/27/2019 Pattern Matching in Trees

    21/28

    88

    Figure 10

    C. M. HOFFMANN AND M. J. O'DONNELL

    In t h e case o f a s i ng le pa t t e rn we as soc i a t ed w i t h eac h o f t he f i na l s ta t e s a l is t o f t hel e n g th s o f t h e m a t c h e d p a t h s tr in g s. F o r m u l ti p l e p a tt e r n s w e n o w a s s o c ia t e w i t hf i na l s t a te s lis ts o f pa i rs . E ac h pa i r g i ves t he l en g t h o f t he m a t c he d pa t h s t r ing an dt h e p a t t e r n t o w h i c h i t b e lo n g s .I t r e m a i n s t o e x p l a i n h o w w e c a n c o r r e l a te m a t c h e s o f i n d i v i d u a l p a t h s tr in g s. W edo t h i s s i m p l y by as soc i a t ing p a t n o c o u n t e r s w i t h e a c h n o d e i n t h e s u b j e c t tr e e a n dd e d i c a ti n g th e i th c o u n t e r to c o u n t i n g h o w m a n y p a t h s tr in g s o f t h e i t h p a t t e r n h a v eb e e n m a t c h e d , b e g i n n i n g a t t h a t n o d e . I f t h e i t h c o u n t e r r e a c h e s a v a l u e e q u a l t o t h en u m b e r o f le a v es o f th e i t h p a t te r n , t h e n w e h a v e ju s t m a t c h e d t h e i th p a t te r n .A s b e f o r e , t h e w o r k i s p r o p o r t io n a l t o t h e s u b j e c t s iz e p l u s t h e s u m o f a l l c o u n t e rv a l u e s a n d c a n b e e s t i m a t e d a s

    O(subs ize m ax (su f ) x pa tno) ,w h e r e t h e m a x i m u m is t a k e n o v e r a ll tr e e p a t t e r n s in t h e f o re s t. T h i s b o u n d is e a s il ys h o w n t o b e t h e b e s t p o s si bl e, g e n e r a l i z in g C o r o l l a r y 8 . 2 . F u r t h e r m o r e , i f n o p a t hs t ri n g is a s u f fi x o f a n o t h e r , t h e n w e h a v e o n l y O ( s u b s i z e) s t e ps f o r m a t c h i n g s u c h apa t t e rn fo res t.9 . I mprovem ent s t o Top - D own Match ing an d R e la ted W orkR e c e n t l y , L a n g e t a l. [ 2 4 ] i m p r o v e d A l g o r i t h m D b y b a s i n g t h e m a t c h i n g o f p a t hs tr in g s o n t h e B o y e r - M o o r e a lg o r i t h m [ 4 ]. S i n c e t h e B o y e r - M o o r e a lg o r i t h m r e q u i r e st he ab i l i ty t o sk i p po r t i ons o f t he sub j ec t s tr ing , a d i f f e ren t r ep rese n t a t i on o f t r ees isu sed : T rees a re r ep rese n t ed by o rd e red l is ts o f le f t pa ths.

    E x a m p l e 9 . 1 . F o r t h e t r e e t = a(b(c), a(d , c)) t he l is t o f l e f t pa t h s i s (abc , ad, c ) ,as show n i n F i gu re 10 . [ ]W e c a n o b t a i n l e ft p a th s b y f ir s t d e l e t in g f r o m e a c h p a t h s t ri n g t h e l o n g e s t p re f ixe n d i n g w i th a b r a n c h n u m b e r g r e a t e r t h a n 1 a n d t h e n d e l e ti n g t h e r e m a i n i n g b r a n c hn u m b e r s . T h u s , f r o m a 2 a l d w e o b t a i n ad , a n d f r o m a 2 a 2 c we ge t c . The l is t o f t hesel e ft p a t h s u n i q u e l y d e t e r m i n e s a b i n a r y t r e e . F o r a l p h a b e t