10
U~scrrrs Applwd Mathcmaric+ 24 (1989) 187-196 North-Holland ANTI-COMMUTATIVE LANGUAGES AND n-CODES* Krcclvcd 27 October 1965 1. Introduction Languages derived from or related to codes have an importan role in the study of the combinatorics of words. There are various mechanisms and tools to define and analyse codes. In particular, many classes of codes can be obtained as the classes of antichains with respect to certain partial orders on free monoids. .4s very simple examples we only mention the prel‘ix codes, the hypercodes, and the block codes. Some detail? are provided in [2,5,6, lo]. In this paper we study a hierarchy C(X) c_ ... !z C,~(X) G C,,_,(X) c_ .‘. r C,(X) 5 C,(X) of classes of languages over an alphabet Xwith C(X) the class of codes over A’. ‘I’he languages in C,,(X) are called n-codes; an n-code is a language each ol’ whose N- element subset is a code. The original motivation for considering this hierarchy came from the analysis of C2{X) which had been shown to bc the bet 01’ antichains with respect to a partial order derived from anti-commutativity [2]. However, the n-code property seems to be interesting in its own right. We first prove that we are indeed dealing with a propel- hierarchy. There is a major difl’ercncc between languages in Cl(X) and those in C,,(X) for II >2. Whereas C,(X) is the class of anrichains with respect to a certain partial order, * This work was supported by the Natural Science and Engineering Research Council of Canada. Granti ,47877 and A0233.

Anti-commutative languages and n-codes

  • Upload
    m-ito

  • View
    216

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Anti-commutative languages and n-codes

U~scrrrs Applwd Mathcmaric+ 24 (1989) 187-196

North-Holland

ANTI-COMMUTATIVE LANGUAGES AND n-CODES*

Krcclvcd 27 October 1965

1. Introduction

Languages derived from or related to codes have an importan role in the study

of the combinatorics of words. There are various mechanisms and tools to define

and analyse codes. In particular, many classes of codes can be obtained as the

classes of antichains with respect to certain partial orders on free monoids. .4s very

simple examples we only mention the prel‘ix codes, the hypercodes, and the block

codes. Some detail? are provided in [2,5,6, lo].

In this paper we study a hierarchy

C(X) c_ ... !z C,~(X) G C,,_,(X) c_ .‘. r C,(X) 5 C,(X)

of classes of languages over an alphabet Xwith C(X) the class of codes over A’. ‘I’he

languages in C,,(X) are called n-codes; an n-code is a language each ol’ whose N-

element subset is a code. The original motivation for considering this hierarchy

came from the analysis of C2{X) which had been shown to bc the bet 01’ antichains

with respect to a partial order derived from anti-commutativity [2]. However, the

n-code property seems to be interesting in its own right.

We first prove that we are indeed dealing with a propel- hierarchy. There is a

major difl’ercncc between languages in Cl(X) and those in C,,(X) for II >2.

Whereas C,(X) is the class of anrichains with respect to a certain partial order,

* This work was supported by the Natural Science and Engineering Research Council of Canada.

Granti ,47877 and A0233.

Page 2: Anti-commutative languages and n-codes

INN M. l lo el a/.

there is no b ina ry re la t ion ~o such that C,,(X) would be the set o f g - independen t

l anguages for n > 2 . In this way, C,(X) is s imilar to C(X). It is known that there

is no length-preserv ing b inary re la t ion nor any posi t ive c o m p a t i b l e par t ia l o rder

with the class o f codes being their an t icha ins [5, 9]. The la t ter s ta tement can be

ex tended to C2(X) as well.

The n-code h ierarchy is " s k e w " with respect to the C h o m s k y h ierarchy o f

languages ; that is, for any given language classes Fc_ F ' of the C h o m s k y h ierarchy

and for any n there are n-codes in F \ F ' which are not (n + l ) -codes . For example ,

the Thue set T o f square- f ree words over an a lphabe t X with I X I > 2 is a non-

a lgebra ic type 1 language and also a 2-code, but not a 3-code. It is obvious that as a consequence o f the dec idab i l i ty o f the code p r o p e r t y also

the n-code p rope r ty can be decided for finite languages. We show that for ra t iona l

languages the 2-code p rope r ty is dec idable . Fo r n > 2 the dec idab i l i ty ques t ion is

open .

The dec idab i l i ty result for C~_(X) is based on cer ta in s t ruc tura l p roper t ies o f

2-codes concern ing pr imi t ive words . In pa r t i cu la r , the 2-codes over X are subsets

o f cross sect ions o f the equivalence re la t ion on X * def ined by equal i ty o f roots .

Using this fact , fur ther insight into the s t ructure o f 2-codes can be gained. This pape r has the fo l lowing sect ions - in add i t i on to this i n t roduc t ion : In Sect ion

2 we in t roduce no ta t ion and basic not ions . I tems not def ined there or in the subse-

quent sect ions can be found in the books [1,4,8,9] which we use as s t anda rd

references . In Sect ion 3 the h ierarchy o f n-codes is in t roduced and their p roper t ies

concern ing b inary re la t ions are p roved . The role o f pr imi t ive words is invest igated in Sect ion 4. Moreover , in Sect ion 4, the n-code h ierarchy is c o m p a r e d to the Chore -

sky h ierarchy, and some dec idab i l i ty results are proved . Final ly , Sect ion 5 conta ins

a few conc lud ing remarks .

2. Notat ion and basic not ions

A n a lphabe t is a finite n o n e m p t y set. Let X be an a lphabe t . Then X * denotes the

free m o n o i d genera ted by X, that is, the set o f all words over X, inc luding the empty

word 1, and X~ X * \ I . For w e X * , by Iwl we deno te the length o f w. A language over X is a set L _c X*. For any language L and any n e N where

- {0, 1,2 . . . . } let

L " ' = {wl aveL: v"= w}.

A word w is called primitive if w = u " implies n - 1. Let Q denote the set o f all

p r imi t ive words over X, where the a lphabe t X is unde r s tood . For w • X ~ let V ~,

deno te the unique word u • Q such that w = u" for some n • N.

Let g be a b ina ry re la t ion on X *. A language L c_ X * is said to be g-independent or a ~-antichain if u, u • L and u~ov implies u - b.

Page 3: Anti-commutative languages and n-codes

Ant i -commutat ive languages and n-codes 189

As s t a n d a r d reference for fo rma l languages and acceptors we use [4]. In par -

t icular , we use the fo l lowing no t a t i on for famil ies o f languages over an a lphabe t X :

- F i n ( X ) = finite languages ,

- R a t ( X ) =

- A l g ( X ) =

- C s ( X ) -

- Rec (X) =

- R E ( X ) =

- P ( X ) =

- D O L ( X ) =

ra t iona l ( = regular = type-3) languages ,

a lgebra ic ( = context - f ree = type-2) languages ,

context-sensi t ive ( = t ype - l ) languages ,

recursive languages ,

recursively enumerab le ( = type-0) languages , 2 x* = general languages ,

de terminis t ic 0 L indenmaye r languages.

3. n-codes and binary relations

Let X be a f inite a lphabe t , IX ] >_ 2, and let n e N. A language L over X is said

to be an n-code if L c _ X +, L is nonempty , and every subset o f L with at most n

e lements is a code. L is said to be anti-commutative if Lc_X + and u v ~ v u for

u, v e L, u :~ v. Let C(X) , Cn(X), and A (X) denote the famil ies o f codes, n-codes ,

and an t i - commuta t ive languages over X, respectively.

C lea r ly

c ( x ) c ... c_ c n ( x ) c c ~ _ ~ ( x ) c ... c c 2 ( x ) c_ c ; ( x ) ,

where CI(X) is t r ivial , tha t is,

c ~ ( x ) = (2 x+ \ {0}).

F u r t h e r m o r e ,

C2(X) = A ( X )

f rom the fact that a set {u, v} is a code if and only if uv ~ vu (see e.g. [9]). Clear ly

also

c ( x ) = 0 G ( x ) . i 1

Tha t the above inclusions are p rope r is easily seen by the fo l lowing example f rom

[91: Let C = {a~ . . . . . a ,} c X +

be a code over X with ]C ] = n. The existence o f C is gua ran teed by the fact tha t any

f ini te ly genera ted free m o n o i d can be e m b e d d e d into X + when ]XI>_2. N o w cons ider

L --- {al . . . . . a~, a l . . . a , } .

Obvious ly , L is an n-code but not an (n + 1)-code.

In fact this example is jus t a special case o f a more general cons t ruc t ion .

Page 4: Anti-commutative languages and n-codes

190 ,~/1. l t o et al.

P r o p o s i t i o n 3.1. L e t C b e a n y k - c o d e o r a c o d e o v e r X , a n d l e t 1 < n < _ i C a n d

2 n - l <_ k . I f a] . . . . , a,, a r e a n y n d i s t i n c t e l e m e n t s o f C, t h e n t h e s e t

L = C U { a l . . . a , , }

is a n n - c o d e b u t n o t a n (n + 1 ) - c o d e .

P r o o f . It is i m m e d i a t e that L is no t an ( n + 1)-code. In o rde r to show that it is an

n - c o d e cons ide r a set

L ' = { w l , . . . , w , , i } U { a l . . . a , , } ,

where w] , . . . , w,, ] are n 1 d is t inc t e l ements o f C. C o n s i d e r also the set

,L " = { w 1 . . . . . wit l } U { ( 1 l . . . . . art } •

Since L"c_Cand C i s a k-code or a code, also L" is a code as k>_2n-1. Suppose L ' is not a code. Thus there exists a word wi th two d i f fe rent representa-

t ions over L ' ,

X I " " X r .Yl " ' Y m

with x / , y ~ L ' for i = 1 . . . . . r a n d j 1 . . . . . m.

Now, i t ' x I = w k , y t w/~ for s o m e k , h, t h e n x I Yl as L" is a code. The re fo re , we

m a y a s s u m e that

xl w I a n d )'1 = a l - . . a , ,

that is,

W1 X2 " " X r = (11 (12 "" • a n Y 2 . . . ) ' m "

Viewing the word over L" yields the f ac to r i za t i ons

W1 (X21 . . . . V2t,'2 ) " " (Xr I "'" Xrk , ) = (11 " ' ' a n (.Y21 " " )'2t12 ) " " ( ) ' m I ' ' ' ) ' m h m ),

where

xi (x,i ...xix,) a n d v i = (Y/I "''Yj/lz)'

T h e r e f o r e , w] =a~ an d

(X21. . .Xzk~) .. . (X,I . . . . r,.k,) a 2 . . . (1,,(Y:t . . . Y zh~ ) . . . (Y,,,i . . . y,,,h,,,).

If .k'~ X21 . . . . V2k2 (11 "'" Hn ,

t h e n (1j = a 2 , a c o n t r a d i c t i o n ! T h e r e f o r e , x_,= w, for some s a n d w, (12- I t e ra t ing

this a r g u m e n t yields x i = ( 1 ~ for i = 1 . . . . . n a n d also x i e {wj . . . . . w,, ]}. Th i s is im-

poss ib le as all words aj . . . . . at, were chosen d is t inc t . [i]

In P r o p o s i t i o n 3.1 k = n is no t poss ib le . For the set

C = a ~ b + U b ~ a ~ U { a b a 2 b 2 a 3 b 3}

Page 5: Anti-commutative languages and n-codes

Anti-commutative languages and n-codes 191

in a 3-code while

C U {a2b2a3b3ab}

is not a 3-code. The set

B = a+b + U b + a +

is an example of a language in C3(X) \ C4(X) which is not obta ined by the con- s t ruct ion given in Propos i t ion 3.1. So far no general izat ion of this example to ar- b i t ra ry n is known.

The family C2(X) of 2-codes is o f part icular interest. It was proved in [2] that C=(X) coincides with the family of antichains with respect to the part ial order <c on X * defined by

X<_cy ~ :Iu E X * : y = x u = u x .

For fur ther results concerning the relation between part ial orders on X * and codes the reader is referred to [2, 5, 9, 10].

A binary relat ion ~o c_ X * x X * is called length-preserving if it satisfies the follow- ing condit ions:

(1) V u e X * : uQu;

(2) uQv implies l ul _< It) l; (3) upo and lul = Iv I together imply u=o .

A length-preserving binary relation on X * is reflexive and ant i -symmetr ic , but not necessarily transit ive or compat ib le . A binary relation ~ c X * x X * is said to be posit ive if

(4) V u e X * : 1Qu

holds true. Observe that the part ial order -<c is bo th length-preserving and positive, but not compat ib le . Posit ive length-preserving part ial orders are called strict in [9] and elsewhere.

Whereas quite a few interesting classes of codes - the prefix codes, suffix codes, bi-prefix codes, hypercodes , to ment ion only a few examples - can be character ized as the classes o f antichains o f certain partial orders on X * , the class C ( X ) of codes cannot be described in such a way: there is no length-preserving binary relat ion nor any posit ive compat ib le part ial order on X* , say 6, such that C ( X ) coincides with the class of ~o-antichains [5, 9]. A far s t ronger s ta tement can be proved for the classes C ( X ) .

Propos i t ion 3.2. Let n > 3 and I x I >2 . There is no binary relation Q on X * such

that Cn(X) is the class o f all p-antichains.

For n = 2 a slightly weaker s ta tement can be made .

Page 6: Anti-commutative languages and n-codes

192 M . l to el al.

Proposition 3.3. Let IX ] _> 2. There is" no compat ib le binary relation ~o over X * such

that C2(X) is the class o f all &ant ichains .

These results seem to indicate that n-codes are rather complex objects. This will be clarified to a certain extent in the sequel.

4. n-codes and primitive words

It is well known that a pair ~ , y e X + of words forms a code if and only if x y : # y x

or, equivalently, if 1/x#:l/y (see [9], for example). For words x, y e X ~ define the r e l a t i o n - i by

x - ~ y ~ V x V).

Obviously, - 1 is an equivalence relation on X ' .

Lemma 4.1. Let ]X I> 2 and k c_ X +. The Jb l lowing s ta tements are equivalent:

(1) L e A ( X ) ;

(2) L e C2(X); (3) L is conta ined in a cross sect ion o f ~ .

L is a m a x i m a l 2-code ~f and only i f it is a cross sect ion o f ~ .

For a proof of Lemma 4.1 see [2]. Its assertion (3) allows for a useful l - l - co r - respondence between 2-codes L over X and mappings

which is given by

u ~ L ¢* Z( I /U): / :OAI /UJq")=U.

Conversely if f is a mapping of Q into bd, then

L/ - {uJC"l l u E Q A f ( u ) :~0}. This representation of C2(X) implies the following corollary:

Corollary 4.2. For I x I _>2 one has IC2(X) I - ~1, and thus C2(X) is not recursively

enumerable . In part icular there are 2-codes which are not even type 0 languages.

This result seems to indicate that the classes of n-codes may be " s k e w " with respect to standard language classes. Further details substantiating this impression will be provided in a follow-up paper.

Proposilio. 4.3. Let IX 1_>2 and let L ~ C2(X) . The f o l l o w i n g proper t ies obtain:

Page 7: Anti-commutative languages and n-codes

Anti-commutative languages and n-codes 193

(1) if L is rational, then fL is bounded; (2) i f f L is unbounded, then the order o f the elements o f the syntactic monoid

syn L o f L is unbounded; (3) there is a context-free 2-code L with fL unbounded.

Proof. As fL is unbounded, for any k e N there exists m > k such that f r e e L for some f e Q, and therefore, f n ¢ L for n ~:m. Thus, the words f, f2 . . . . . f ,n are pair- wise incongruent modulo the principal congruence PL of L. This implies (2), and therefore syn L is infinite, that is, L is not rational. Now consider the language

L = ~J abi(ab+) i i = 0

over the alphabet X = {a,b}. Obviously, L is context-free. To show that L is a 2-code suppose that

abiabhlabh2.., abhi = fs , abJabktab~2.., abkJ = f t

for some f ~ Q and s,t>_ 1. I f f = ab i, then f = ab j, that is, i = j and therefore s = t. Otherwise, we have to assume that

f = abiabh~.., abhP = abJabkl.., abk,t

for some p,q>_ 1. Then obviously i = j and p = q and thus s = t . Therefore, L is a 2-code, in fact, it is a code. As (abi)i+l~L and a b i~Q for

every i_> 1 it follows that fL is unbounded. []

Observe that boundedness of fL does not imply rationality for L. The language L = Q is such an example of a nonrational language with fL bounded.

For a language L over X let I/L denote the language

I/-L = {u ] u 6 Q A a u ~ L : u = [/v}.

If L ~eO, then I/L e C2(X). For L ~ C2(X) one has I/L = Q if and only if L is a max- imal 2-code. However, the following result implies that [/L :~ Q if L ~ Cn(X) for n>_3.

Proposition 4.4. Let IX] >_ 2 and n >_ 3. For every n-code L c_ X * the set Q \ ~L is

infinite.

Corollary 4.5. Let IX] _> 2. Then the following statements hold true:

(1) i f L ~ C2(X) CI Rat(X) and L is infinite, then L N Q is infinite; (2) i f L ~ C2(X)f3 Rat(X) and L is infinite, then L fq Q is rational i f and only i f

L f) (X* \ Q) # f in i te ; (3) for any finite set M S 0, M c_ { 1, 2 . . . . }, there is a rational 2-code L such that

L f) Q(m) is infinite for all m ~ M.

Page 8: Anti-commutative languages and n-codes

194 M . l ip et al.

C o r o l l a r y 4.6. Let IX I >> _ 2. Every infinite rational code over X contains infinitely m a n y pr imi t ive words.

We conc lude this sect ion with a descr ip t ion o f the re la t ion between D0L languages

and 2-codes.

P r o p o s i t i o n 4.7. Let IX ] >_ 2 and let L e DOL(X) be infinite. I f L ¢. C2(X) , then there is an integer k such that ]L'] <_ k f o r every L ' c_ L which is a 2-code.

P r o o f . Let L be an inf ini te D0L language genera ted by the D0L system

G = ( X , h , wo), and let wi=h~(wo) for i e ~ . Suppose that L is not a 2-code. Then

there exist i, k e ~ , i < k , such that {w i, wk} is not a code, that is, w i - - p " , w k - - p m

for some p e Q and n, m_> 1, n 4= m. Choose k min imal with this p rope r ty . Then the

set

/ ~ - - {W0, Wl . . . . . Wi I}

is a 2-code.

Now let t - k - i . F r o m w i - - p n and w ~ - w j + ~ - p " = h t ( p " ) - ( h ~ ( p ) ) " it fol lows

that h t ( p ) - p I for some /_> 1. Therefore , re=In and

wi, ,r +, - ha (h ' (w, ) ) = h't(h~(p)) '' - (h~(p))/'''

for r e N a n d s = 0 , 1 . . . . , t - 1 . Let

Then

L, {w,,,.,~ I rc N}.

! 1

L = L u U L , ,s = 0

with L~ inf ini te for all s. If L ' is any 2-code con ta ined in L, then IL'DL~I_< 1 and,

there fore , IL'I < _ t + i = k . LJ

C o r o l l a r y 4.8. Every infinite context - free D0L language is a 2-code.

Proof. This result fol lows f rom the preceding p r o o f by the " p u m p i n g l e m m a " for

con tex t - f ree languages. It can also be ob ta ined as a weak version of a resuh due to

[3] which states that every infini te contex t - f ree D0L language is a pref ix code or a

suff ix code, which implies that it is a 2-code. [ i

As an example o f a D0L language which is a 2-code but not a 3-code cons ider

the set {a, b, ab, bab, abbab . . . . },

that is, the Fibonacci language over X = {a, b} which is genera ted by the D0L rules

a ~ b , b - ~ a b . We now proceed to prove that the p rope r ty o f being a 2-code is dec idab le for

Page 9: Anti-commutative languages and n-codes

A n t i - c o m m u t a t i v e languages and n-codes 195

rat ional lahguages. The following statement provides the main argument for the proof .

P ropos i t ion 4.9. Let L 6 R a t ( X ) . It is decidable whether there exists a word w ~ X + such that w i t L and wJGL for two different powers o f w.

Proof. Let A = (X, S, 6, q0, F ) be a finite state acceptor with L = L(A). For qa, q/~ e S let

A@q/~ - ( X , S, 6, q~, {q/~}), and let

Lqaq/~ = L(Aq~q/~) .

Obviously, the following two statements are equivalent:

(1) 3 w ~ X + 3 i > j > 0 : wi, wJ~L, and

(2) 3ql ,q 2 . . . . ~S:

f'~Lqi ,q i \ {1} 4:0, and I{ql,q2 . . . . } N F ] ->2. i>1

Thus, in order to decide (1) one could try to decide (2). Observe first that the sequence of states reached by consecutive powers o f a fixed

input word is ultimately periodic. Thus, if

qi = 6(qo, wi),

then the sequence has the form

qo , q l , . . . , q i , q i+ 1, . . . , q i + p = q i .

Therefore in deciding (2) we may restrict ourselves to considering sequences o f this fo rm where i + p < _ n - 1 with n = ]S]. There are no more than n ! 2 n such sequences and one checks each of them separately.

Now consider such a sequence. The condit ion

] {ql, q2 . . . . } A F ] _ > 2

is satisfied if and only if

(a) there are two indices O<_a<fl<i with q~,qB~F, or (b) there is i<_a<i+p with q ~ F .

Thus the above condit ion can easily be checked. Finally,

~-] Lq, i qi = Lqi lq, ' i 1 i 1

and therefore also the condit ion

f~ Lqi ,qi\ {1} . 0 i - I

is decidable. []

Page 10: Anti-commutative languages and n-codes

196 M. l to et al.

Proposition 4.10. Let L eRat (X) . It is decidable whether L e C 2 ( X j holdsv I f

L ~ Fin(X), then it is also decidable whether L e C , ( X ) f o r n > 2.

Proof. The first statement is a consequence of Proposition 4.9. The second one follows from the fact that for testing the n-code property on a finite set it is suffi- cient to check the code property on its (finitely many) subsets of size n. Zd

5. Concluding remarks

This paper focusses on the following problem areas concerning n-codes:

(1) n-code hierarchy; (2) definition by binary relations; (3) relation to the set Q of primitive words; (4) comparison with the Chomsky hierarchy;

(5) decidability of the n-code property.

Several results concerning more detailed structural descriptions of n-codes, their syntactic monoids, and properties like maximality have been omitted here and will be presented in a follow-up paper. Open problems abound - we mentioned the decidability or undecidability of the n-code property for rational languages; a more

precise comparison to other language classes would be another simple example; some other more intricate ones have been omitted to keep the presentation consice.

References

[1] .I. Bcrstel and D. Pert-in, Theory of ( 'odes (Acadcmic Press, Ncw York, 1985).

[2] P.H. l)ay and H.J. Shyr, l.anguages defined by some partial orders,, Soochm~ J. Mafl~. 9(1983)

53-62. [3] T. Head and G. Thierrin, Polynomially bounded D01. systems yieM cocies, in: l . ('unlnfitL'.:,, tit.,

Combinatorics on Words (Academic Press, Nev, York, 1983) 167 174.

[4] .I.E. Hopcroft and F.I). Ulhnan, Introduction to Automata 1 heory, l.anguages, and (<mlpula~ioJ/

(Addison-Wesley, Reading, MA, 19791. [5] H..l{irgensen, H.J. Sh_',r and (i. Thierrin, ( 'odes and compatible partial orders on Irce motloid~,,

Ast,~risque, to appear. [6] H. J{irgensen and (5. Thierrin, Infix codes, in: Proceedings Comp. Sci. (onl*., Gy6r (1985). [7] .I. Karhtnn/.iki, On three-element codes, in: J. Paredaens, ed., Proceedings ICALP 1984, I.ecturc

Notes ( 'omputer Science 172 (Springer, Berlin, 1984) 292-302. [8] M. Lothaire, Combinatorics oil Words (Addison-Wesley, Reading, MA, 19831.

[9] H.J. Shyr, Free monoids and languages, Leclure Noles, Dept. Malh., Soocho,,~ Univ., 1979.

[10l H..I. Shyr and G. Tilieri-in, Codes and binary relalions, in: S6minaire d'Algbbre P. l)ubreil, 1975..76, Iccture Notes ill Mathemalics 586 {Springer, Berlin, 1977) 180 188.