Upload
m-ito
View
216
Download
3
Embed Size (px)
Citation preview
U~scrrrs Applwd Mathcmaric+ 24 (1989) 187-196
North-Holland
ANTI-COMMUTATIVE LANGUAGES AND n-CODES*
Krcclvcd 27 October 1965
1. Introduction
Languages derived from or related to codes have an importan role in the study
of the combinatorics of words. There are various mechanisms and tools to define
and analyse codes. In particular, many classes of codes can be obtained as the
classes of antichains with respect to certain partial orders on free monoids. .4s very
simple examples we only mention the prel‘ix codes, the hypercodes, and the block
codes. Some detail? are provided in [2,5,6, lo].
In this paper we study a hierarchy
C(X) c_ ... !z C,~(X) G C,,_,(X) c_ .‘. r C,(X) 5 C,(X)
of classes of languages over an alphabet Xwith C(X) the class of codes over A’. ‘I’he
languages in C,,(X) are called n-codes; an n-code is a language each ol’ whose N-
element subset is a code. The original motivation for considering this hierarchy
came from the analysis of C2{X) which had been shown to bc the bet 01’ antichains
with respect to a partial order derived from anti-commutativity [2]. However, the
n-code property seems to be interesting in its own right.
We first prove that we are indeed dealing with a propel- hierarchy. There is a
major difl’ercncc between languages in Cl(X) and those in C,,(X) for II >2.
Whereas C,(X) is the class of anrichains with respect to a certain partial order,
* This work was supported by the Natural Science and Engineering Research Council of Canada.
Granti ,47877 and A0233.
INN M. l lo el a/.
there is no b ina ry re la t ion ~o such that C,,(X) would be the set o f g - independen t
l anguages for n > 2 . In this way, C,(X) is s imilar to C(X). It is known that there
is no length-preserv ing b inary re la t ion nor any posi t ive c o m p a t i b l e par t ia l o rder
with the class o f codes being their an t icha ins [5, 9]. The la t ter s ta tement can be
ex tended to C2(X) as well.
The n-code h ierarchy is " s k e w " with respect to the C h o m s k y h ierarchy o f
languages ; that is, for any given language classes Fc_ F ' of the C h o m s k y h ierarchy
and for any n there are n-codes in F \ F ' which are not (n + l ) -codes . For example ,
the Thue set T o f square- f ree words over an a lphabe t X with I X I > 2 is a non-
a lgebra ic type 1 language and also a 2-code, but not a 3-code. It is obvious that as a consequence o f the dec idab i l i ty o f the code p r o p e r t y also
the n-code p rope r ty can be decided for finite languages. We show that for ra t iona l
languages the 2-code p rope r ty is dec idable . Fo r n > 2 the dec idab i l i ty ques t ion is
open .
The dec idab i l i ty result for C~_(X) is based on cer ta in s t ruc tura l p roper t ies o f
2-codes concern ing pr imi t ive words . In pa r t i cu la r , the 2-codes over X are subsets
o f cross sect ions o f the equivalence re la t ion on X * def ined by equal i ty o f roots .
Using this fact , fur ther insight into the s t ructure o f 2-codes can be gained. This pape r has the fo l lowing sect ions - in add i t i on to this i n t roduc t ion : In Sect ion
2 we in t roduce no ta t ion and basic not ions . I tems not def ined there or in the subse-
quent sect ions can be found in the books [1,4,8,9] which we use as s t anda rd
references . In Sect ion 3 the h ierarchy o f n-codes is in t roduced and their p roper t ies
concern ing b inary re la t ions are p roved . The role o f pr imi t ive words is invest igated in Sect ion 4. Moreover , in Sect ion 4, the n-code h ierarchy is c o m p a r e d to the Chore -
sky h ierarchy, and some dec idab i l i ty results are proved . Final ly , Sect ion 5 conta ins
a few conc lud ing remarks .
2. Notat ion and basic not ions
A n a lphabe t is a finite n o n e m p t y set. Let X be an a lphabe t . Then X * denotes the
free m o n o i d genera ted by X, that is, the set o f all words over X, inc luding the empty
word 1, and X~ X * \ I . For w e X * , by Iwl we deno te the length o f w. A language over X is a set L _c X*. For any language L and any n e N where
- {0, 1,2 . . . . } let
L " ' = {wl aveL: v"= w}.
A word w is called primitive if w = u " implies n - 1. Let Q denote the set o f all
p r imi t ive words over X, where the a lphabe t X is unde r s tood . For w • X ~ let V ~,
deno te the unique word u • Q such that w = u" for some n • N.
Let g be a b ina ry re la t ion on X *. A language L c_ X * is said to be g-independent or a ~-antichain if u, u • L and u~ov implies u - b.
Ant i -commutat ive languages and n-codes 189
As s t a n d a r d reference for fo rma l languages and acceptors we use [4]. In par -
t icular , we use the fo l lowing no t a t i on for famil ies o f languages over an a lphabe t X :
- F i n ( X ) = finite languages ,
- R a t ( X ) =
- A l g ( X ) =
- C s ( X ) -
- Rec (X) =
- R E ( X ) =
- P ( X ) =
- D O L ( X ) =
ra t iona l ( = regular = type-3) languages ,
a lgebra ic ( = context - f ree = type-2) languages ,
context-sensi t ive ( = t ype - l ) languages ,
recursive languages ,
recursively enumerab le ( = type-0) languages , 2 x* = general languages ,
de terminis t ic 0 L indenmaye r languages.
3. n-codes and binary relations
Let X be a f inite a lphabe t , IX ] >_ 2, and let n e N. A language L over X is said
to be an n-code if L c _ X +, L is nonempty , and every subset o f L with at most n
e lements is a code. L is said to be anti-commutative if Lc_X + and u v ~ v u for
u, v e L, u :~ v. Let C(X) , Cn(X), and A (X) denote the famil ies o f codes, n-codes ,
and an t i - commuta t ive languages over X, respectively.
C lea r ly
c ( x ) c ... c_ c n ( x ) c c ~ _ ~ ( x ) c ... c c 2 ( x ) c_ c ; ( x ) ,
where CI(X) is t r ivial , tha t is,
c ~ ( x ) = (2 x+ \ {0}).
F u r t h e r m o r e ,
C2(X) = A ( X )
f rom the fact that a set {u, v} is a code if and only if uv ~ vu (see e.g. [9]). Clear ly
also
c ( x ) = 0 G ( x ) . i 1
Tha t the above inclusions are p rope r is easily seen by the fo l lowing example f rom
[91: Let C = {a~ . . . . . a ,} c X +
be a code over X with ]C ] = n. The existence o f C is gua ran teed by the fact tha t any
f ini te ly genera ted free m o n o i d can be e m b e d d e d into X + when ]XI>_2. N o w cons ider
L --- {al . . . . . a~, a l . . . a , } .
Obvious ly , L is an n-code but not an (n + 1)-code.
In fact this example is jus t a special case o f a more general cons t ruc t ion .
190 ,~/1. l t o et al.
P r o p o s i t i o n 3.1. L e t C b e a n y k - c o d e o r a c o d e o v e r X , a n d l e t 1 < n < _ i C a n d
2 n - l <_ k . I f a] . . . . , a,, a r e a n y n d i s t i n c t e l e m e n t s o f C, t h e n t h e s e t
L = C U { a l . . . a , , }
is a n n - c o d e b u t n o t a n (n + 1 ) - c o d e .
P r o o f . It is i m m e d i a t e that L is no t an ( n + 1)-code. In o rde r to show that it is an
n - c o d e cons ide r a set
L ' = { w l , . . . , w , , i } U { a l . . . a , , } ,
where w] , . . . , w,, ] are n 1 d is t inc t e l ements o f C. C o n s i d e r also the set
,L " = { w 1 . . . . . wit l } U { ( 1 l . . . . . art } •
Since L"c_Cand C i s a k-code or a code, also L" is a code as k>_2n-1. Suppose L ' is not a code. Thus there exists a word wi th two d i f fe rent representa-
t ions over L ' ,
X I " " X r .Yl " ' Y m
with x / , y ~ L ' for i = 1 . . . . . r a n d j 1 . . . . . m.
Now, i t ' x I = w k , y t w/~ for s o m e k , h, t h e n x I Yl as L" is a code. The re fo re , we
m a y a s s u m e that
xl w I a n d )'1 = a l - . . a , ,
that is,
W1 X2 " " X r = (11 (12 "" • a n Y 2 . . . ) ' m "
Viewing the word over L" yields the f ac to r i za t i ons
W1 (X21 . . . . V2t,'2 ) " " (Xr I "'" Xrk , ) = (11 " ' ' a n (.Y21 " " )'2t12 ) " " ( ) ' m I ' ' ' ) ' m h m ),
where
xi (x,i ...xix,) a n d v i = (Y/I "''Yj/lz)'
T h e r e f o r e , w] =a~ an d
(X21. . .Xzk~) .. . (X,I . . . . r,.k,) a 2 . . . (1,,(Y:t . . . Y zh~ ) . . . (Y,,,i . . . y,,,h,,,).
If .k'~ X21 . . . . V2k2 (11 "'" Hn ,
t h e n (1j = a 2 , a c o n t r a d i c t i o n ! T h e r e f o r e , x_,= w, for some s a n d w, (12- I t e ra t ing
this a r g u m e n t yields x i = ( 1 ~ for i = 1 . . . . . n a n d also x i e {wj . . . . . w,, ]}. Th i s is im-
poss ib le as all words aj . . . . . at, were chosen d is t inc t . [i]
In P r o p o s i t i o n 3.1 k = n is no t poss ib le . For the set
C = a ~ b + U b ~ a ~ U { a b a 2 b 2 a 3 b 3}
Anti-commutative languages and n-codes 191
in a 3-code while
C U {a2b2a3b3ab}
is not a 3-code. The set
B = a+b + U b + a +
is an example of a language in C3(X) \ C4(X) which is not obta ined by the con- s t ruct ion given in Propos i t ion 3.1. So far no general izat ion of this example to ar- b i t ra ry n is known.
The family C2(X) of 2-codes is o f part icular interest. It was proved in [2] that C=(X) coincides with the family of antichains with respect to the part ial order <c on X * defined by
X<_cy ~ :Iu E X * : y = x u = u x .
For fur ther results concerning the relation between part ial orders on X * and codes the reader is referred to [2, 5, 9, 10].
A binary relat ion ~o c_ X * x X * is called length-preserving if it satisfies the follow- ing condit ions:
(1) V u e X * : uQu;
(2) uQv implies l ul _< It) l; (3) upo and lul = Iv I together imply u=o .
A length-preserving binary relation on X * is reflexive and ant i -symmetr ic , but not necessarily transit ive or compat ib le . A binary relation ~ c X * x X * is said to be posit ive if
(4) V u e X * : 1Qu
holds true. Observe that the part ial order -<c is bo th length-preserving and positive, but not compat ib le . Posit ive length-preserving part ial orders are called strict in [9] and elsewhere.
Whereas quite a few interesting classes of codes - the prefix codes, suffix codes, bi-prefix codes, hypercodes , to ment ion only a few examples - can be character ized as the classes o f antichains o f certain partial orders on X * , the class C ( X ) of codes cannot be described in such a way: there is no length-preserving binary relat ion nor any posit ive compat ib le part ial order on X* , say 6, such that C ( X ) coincides with the class of ~o-antichains [5, 9]. A far s t ronger s ta tement can be proved for the classes C ( X ) .
Propos i t ion 3.2. Let n > 3 and I x I >2 . There is no binary relation Q on X * such
that Cn(X) is the class o f all p-antichains.
For n = 2 a slightly weaker s ta tement can be made .
192 M . l to el al.
Proposition 3.3. Let IX ] _> 2. There is" no compat ib le binary relation ~o over X * such
that C2(X) is the class o f all &ant ichains .
These results seem to indicate that n-codes are rather complex objects. This will be clarified to a certain extent in the sequel.
4. n-codes and primitive words
It is well known that a pair ~ , y e X + of words forms a code if and only if x y : # y x
or, equivalently, if 1/x#:l/y (see [9], for example). For words x, y e X ~ define the r e l a t i o n - i by
x - ~ y ~ V x V).
Obviously, - 1 is an equivalence relation on X ' .
Lemma 4.1. Let ]X I> 2 and k c_ X +. The Jb l lowing s ta tements are equivalent:
(1) L e A ( X ) ;
(2) L e C2(X); (3) L is conta ined in a cross sect ion o f ~ .
L is a m a x i m a l 2-code ~f and only i f it is a cross sect ion o f ~ .
For a proof of Lemma 4.1 see [2]. Its assertion (3) allows for a useful l - l - co r - respondence between 2-codes L over X and mappings
which is given by
u ~ L ¢* Z( I /U): / :OAI /UJq")=U.
Conversely if f is a mapping of Q into bd, then
L/ - {uJC"l l u E Q A f ( u ) :~0}. This representation of C2(X) implies the following corollary:
Corollary 4.2. For I x I _>2 one has IC2(X) I - ~1, and thus C2(X) is not recursively
enumerable . In part icular there are 2-codes which are not even type 0 languages.
This result seems to indicate that the classes of n-codes may be " s k e w " with respect to standard language classes. Further details substantiating this impression will be provided in a follow-up paper.
Proposilio. 4.3. Let IX 1_>2 and let L ~ C2(X) . The f o l l o w i n g proper t ies obtain:
Anti-commutative languages and n-codes 193
(1) if L is rational, then fL is bounded; (2) i f f L is unbounded, then the order o f the elements o f the syntactic monoid
syn L o f L is unbounded; (3) there is a context-free 2-code L with fL unbounded.
Proof. As fL is unbounded, for any k e N there exists m > k such that f r e e L for some f e Q, and therefore, f n ¢ L for n ~:m. Thus, the words f, f2 . . . . . f ,n are pair- wise incongruent modulo the principal congruence PL of L. This implies (2), and therefore syn L is infinite, that is, L is not rational. Now consider the language
L = ~J abi(ab+) i i = 0
over the alphabet X = {a,b}. Obviously, L is context-free. To show that L is a 2-code suppose that
abiabhlabh2.., abhi = fs , abJabktab~2.., abkJ = f t
for some f ~ Q and s,t>_ 1. I f f = ab i, then f = ab j, that is, i = j and therefore s = t. Otherwise, we have to assume that
f = abiabh~.., abhP = abJabkl.., abk,t
for some p,q>_ 1. Then obviously i = j and p = q and thus s = t . Therefore, L is a 2-code, in fact, it is a code. As (abi)i+l~L and a b i~Q for
every i_> 1 it follows that fL is unbounded. []
Observe that boundedness of fL does not imply rationality for L. The language L = Q is such an example of a nonrational language with fL bounded.
For a language L over X let I/L denote the language
I/-L = {u ] u 6 Q A a u ~ L : u = [/v}.
If L ~eO, then I/L e C2(X). For L ~ C2(X) one has I/L = Q if and only if L is a max- imal 2-code. However, the following result implies that [/L :~ Q if L ~ Cn(X) for n>_3.
Proposition 4.4. Let IX] >_ 2 and n >_ 3. For every n-code L c_ X * the set Q \ ~L is
infinite.
Corollary 4.5. Let IX] _> 2. Then the following statements hold true:
(1) i f L ~ C2(X) CI Rat(X) and L is infinite, then L N Q is infinite; (2) i f L ~ C2(X)f3 Rat(X) and L is infinite, then L fq Q is rational i f and only i f
L f) (X* \ Q) # f in i te ; (3) for any finite set M S 0, M c_ { 1, 2 . . . . }, there is a rational 2-code L such that
L f) Q(m) is infinite for all m ~ M.
194 M . l ip et al.
C o r o l l a r y 4.6. Let IX I >> _ 2. Every infinite rational code over X contains infinitely m a n y pr imi t ive words.
We conc lude this sect ion with a descr ip t ion o f the re la t ion between D0L languages
and 2-codes.
P r o p o s i t i o n 4.7. Let IX ] >_ 2 and let L e DOL(X) be infinite. I f L ¢. C2(X) , then there is an integer k such that ]L'] <_ k f o r every L ' c_ L which is a 2-code.
P r o o f . Let L be an inf ini te D0L language genera ted by the D0L system
G = ( X , h , wo), and let wi=h~(wo) for i e ~ . Suppose that L is not a 2-code. Then
there exist i, k e ~ , i < k , such that {w i, wk} is not a code, that is, w i - - p " , w k - - p m
for some p e Q and n, m_> 1, n 4= m. Choose k min imal with this p rope r ty . Then the
set
/ ~ - - {W0, Wl . . . . . Wi I}
is a 2-code.
Now let t - k - i . F r o m w i - - p n and w ~ - w j + ~ - p " = h t ( p " ) - ( h ~ ( p ) ) " it fol lows
that h t ( p ) - p I for some /_> 1. Therefore , re=In and
wi, ,r +, - ha (h ' (w, ) ) = h't(h~(p)) '' - (h~(p))/'''
for r e N a n d s = 0 , 1 . . . . , t - 1 . Let
Then
L, {w,,,.,~ I rc N}.
! 1
L = L u U L , ,s = 0
with L~ inf ini te for all s. If L ' is any 2-code con ta ined in L, then IL'DL~I_< 1 and,
there fore , IL'I < _ t + i = k . LJ
C o r o l l a r y 4.8. Every infinite context - free D0L language is a 2-code.
Proof. This result fol lows f rom the preceding p r o o f by the " p u m p i n g l e m m a " for
con tex t - f ree languages. It can also be ob ta ined as a weak version of a resuh due to
[3] which states that every infini te contex t - f ree D0L language is a pref ix code or a
suff ix code, which implies that it is a 2-code. [ i
As an example o f a D0L language which is a 2-code but not a 3-code cons ider
the set {a, b, ab, bab, abbab . . . . },
that is, the Fibonacci language over X = {a, b} which is genera ted by the D0L rules
a ~ b , b - ~ a b . We now proceed to prove that the p rope r ty o f being a 2-code is dec idab le for
A n t i - c o m m u t a t i v e languages and n-codes 195
rat ional lahguages. The following statement provides the main argument for the proof .
P ropos i t ion 4.9. Let L 6 R a t ( X ) . It is decidable whether there exists a word w ~ X + such that w i t L and wJGL for two different powers o f w.
Proof. Let A = (X, S, 6, q0, F ) be a finite state acceptor with L = L(A). For qa, q/~ e S let
A@q/~ - ( X , S, 6, q~, {q/~}), and let
Lqaq/~ = L(Aq~q/~) .
Obviously, the following two statements are equivalent:
(1) 3 w ~ X + 3 i > j > 0 : wi, wJ~L, and
(2) 3ql ,q 2 . . . . ~S:
f'~Lqi ,q i \ {1} 4:0, and I{ql,q2 . . . . } N F ] ->2. i>1
Thus, in order to decide (1) one could try to decide (2). Observe first that the sequence of states reached by consecutive powers o f a fixed
input word is ultimately periodic. Thus, if
qi = 6(qo, wi),
then the sequence has the form
qo , q l , . . . , q i , q i+ 1, . . . , q i + p = q i .
Therefore in deciding (2) we may restrict ourselves to considering sequences o f this fo rm where i + p < _ n - 1 with n = ]S]. There are no more than n ! 2 n such sequences and one checks each of them separately.
Now consider such a sequence. The condit ion
] {ql, q2 . . . . } A F ] _ > 2
is satisfied if and only if
(a) there are two indices O<_a<fl<i with q~,qB~F, or (b) there is i<_a<i+p with q ~ F .
Thus the above condit ion can easily be checked. Finally,
~-] Lq, i qi = Lqi lq, ' i 1 i 1
and therefore also the condit ion
f~ Lqi ,qi\ {1} . 0 i - I
is decidable. []
196 M. l to et al.
Proposition 4.10. Let L eRat (X) . It is decidable whether L e C 2 ( X j holdsv I f
L ~ Fin(X), then it is also decidable whether L e C , ( X ) f o r n > 2.
Proof. The first statement is a consequence of Proposition 4.9. The second one follows from the fact that for testing the n-code property on a finite set it is suffi- cient to check the code property on its (finitely many) subsets of size n. Zd
5. Concluding remarks
This paper focusses on the following problem areas concerning n-codes:
(1) n-code hierarchy; (2) definition by binary relations; (3) relation to the set Q of primitive words; (4) comparison with the Chomsky hierarchy;
(5) decidability of the n-code property.
Several results concerning more detailed structural descriptions of n-codes, their syntactic monoids, and properties like maximality have been omitted here and will be presented in a follow-up paper. Open problems abound - we mentioned the decidability or undecidability of the n-code property for rational languages; a more
precise comparison to other language classes would be another simple example; some other more intricate ones have been omitted to keep the presentation consice.
References
[1] .I. Bcrstel and D. Pert-in, Theory of ( 'odes (Acadcmic Press, Ncw York, 1985).
[2] P.H. l)ay and H.J. Shyr, l.anguages defined by some partial orders,, Soochm~ J. Mafl~. 9(1983)
53-62. [3] T. Head and G. Thierrin, Polynomially bounded D01. systems yieM cocies, in: l . ('unlnfitL'.:,, tit.,
Combinatorics on Words (Academic Press, Nev, York, 1983) 167 174.
[4] .I.E. Hopcroft and F.I). Ulhnan, Introduction to Automata 1 heory, l.anguages, and (<mlpula~ioJ/
(Addison-Wesley, Reading, MA, 19791. [5] H..l{irgensen, H.J. Sh_',r and (i. Thierrin, ( 'odes and compatible partial orders on Irce motloid~,,
Ast,~risque, to appear. [6] H. J{irgensen and (5. Thierrin, Infix codes, in: Proceedings Comp. Sci. (onl*., Gy6r (1985). [7] .I. Karhtnn/.iki, On three-element codes, in: J. Paredaens, ed., Proceedings ICALP 1984, I.ecturc
Notes ( 'omputer Science 172 (Springer, Berlin, 1984) 292-302. [8] M. Lothaire, Combinatorics oil Words (Addison-Wesley, Reading, MA, 19831.
[9] H.J. Shyr, Free monoids and languages, Leclure Noles, Dept. Malh., Soocho,,~ Univ., 1979.
[10l H..I. Shyr and G. Tilieri-in, Codes and binary relalions, in: S6minaire d'Algbbre P. l)ubreil, 1975..76, Iccture Notes ill Mathemalics 586 {Springer, Berlin, 1977) 180 188.