Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
COMP209
Automata and Formal Languages
Section 1
Introduction
2 Introduction
Consider the picture below:
yk = f (x1x2. . . xk)
. . .y2 = f (x1x2)
y1 = f (x1)
. . . yk. . . y2y1
. . . xk. . . x2x1
• x1, x2, . . .: (finite) input stream (of sym-bols);
• A ‘ black-box’ reads these and computes anoutputyi = f (x1x2
. . . xi−1xi )
Informally, one concern of Automata Theoryis
What ‘happens’ inside the ‘black-box’?
e.g. how can the following input-outputneeds be realised?
Introduction 3
Some Simple Example Cases
Suppose the inputx symbols are single deci-mal digits, i.e.
xi ∈ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
a) The outputyk is 1 if x1x2. . . xk−1xk is a
k-bit binary string; otherwise it is 0.
If the x symbols are from the set of binaryvalues 0, 1 :
b) The outputyk is 1 if more than 3 symbolsare read.
c) The outputyk is 1 if an odd number ofsymbols have been read.
4 Introduction
Some ‘more difficult’ Examples
Again using 0, 1 as the input possibilities:
d) yk is 1 if the sequencex1x2. . . xk is a
palindrome, i.e.
x1x2. . . xk−2xk−1xk = xk xk−1xk−2
. . . x2x1
e) yk is 1 if x1. . . xk has exactly the same
number of 0s as it has 1s.
Using (, ) (left and right brackets) as pos-sible inputs:
f) yk is 1 if the sequencex1x2. . . xk is a
properly matched sequence of left and rightbrackets, e.g.
() or ()() or (()()) or (()(()())) etc
but not, e.g.( or ) or )( or (() or ()()) etc.
Introduction 5
and ‘more difficult still’ Examples
Input symbols 0, 1 :
g) yk is 1 if k symbols are read andk is aprime number.
h) yk is 1 if
k = i + j and
x1x2. . . xi = 111. . .111
xi+1xi+2. . . xk = 000. . .000
(i.e. i 1s followed byj 0s)
and j = i2.
6 Introduction
We shall see that there is a very precisesense in which
(d),(e) and (f)
are ‘more difficult’ than
(a), (b), and (c).
There is a similar precise sense in which
(g) and (h)
are ‘more difficult’ than
(d), (e), and (f)
One aspect of Automata and Formal Lan-guage Theory involves formalising theseideas.
Introduction 7
Below is a possible view of how the first‘simple example’ could be treated:
St0: Read the first symbol (if present) (x1);If x1 is a 0 or a 1 then
output a 1 (i.e. y1 = 1)go to St1.
Otherwiseoutput a 0 (i.e. y1 = 0)go to St2.
St1: Read the next symbol (if present);If it’s a 0 or a 1 thenoutput a 1go to St1.
Otherwiseoutput a 0go to St2.
St2: Read the next symbol (if present);Output a 0;go to St2.
An ‘obvious’ weakness of this description isits verbosity.
But consider,
8 Introduction
0,1,2,...,7,8,9/0
2,3,4,...,7,8,9/02,3,4,...,7,8,9/0
0,1/10,1/1
St0 St1
St2
A Finite StateTr ansducerFor Example (a)
Introduction 9
Instead of the ‘verbose’ description adirected ‘graph’ model is used.
• Vertices are labelled with ‘State’ names,St0,St1,St2.
• There is an (unlabelled) edge to indicatethe ‘starting point’
• Other edges have a label of the form in / out
a) in a subset of possible input symbols.b) out a single output symbol.
In total, an edgefrom Sti to Stj with label Xin / yout can be viewed as saying:
If the ‘program’ has reached ‘Step’ (i.e.‘State’) i and the next input symbol is in thesetXin then
Output the symbolyout;and go to ‘Step’ (State)j .
10 Introduction
We only have 0 and 1 as outputs, so this canbe further simplified
0,10,1
St0 St1
St2
2,3,4,5,...,8,9 2,3,4,5,...,8,9
0,1,2,3,...,8,9
A Finite StateRecogniserFor Example (a)
Introduction 11
The ‘recogniser’ has exactly the same struc-ture as the ‘transducer’.
The difference is that instead ofexplicitlyindicating the output as part of the edgelabel this isimplied by distingushing ‘spe-cial’ states. Namely,
States for which a 1 would be output whenthey are entered. St1 is such a case.
This modification will be useful in present-ing several formal results.
12 Introduction
The properties of suchFinite State Automataare examined in the first part of this module.
As one answer to ‘what happens in theblack-box’ these offer a rich set of ideas andapplications.
Among these applications:
• Lexical analysis in compilers: determiningidentifiers, numeric constants whenanalysing a program statement.
• Describing hardware systems.
In the latter context, all hardware systems(µprocessors etc) can, ultimately, bedescribed as compositions of finite statetransducers with output symbols 0, 1.
Design of a finite state machine is a standardapproach when developing a digital system.
Introduction 13
As an illustration of this, what do you thinkthe following machine does?
Its possible input ‘symbols’ are 00, 01, 10, 11 ;Its possible output symbols are 0, 1 .
q1q0
00/1
11/011/1
01,10/001,10/1
00/0
14 Introduction
Hint: Think of the input ‘symbols’ as pairs< xi yi >. Whatis the output streamz if the‘black-box’ below
xn. . xi . . x1
yn. . yi . . y1
zn. . zi . . z1zn. . zi . . z1
=
f (x1. . xn, y1. . yn)
uses the transducer,
q1q0
00/1
11/011/1
01,10/001,10/1
00/0
Introduction 15
Alphabets, Words, Languages
The informal descriptions used notions of
‘input’ and‘output’ ‘symbol’.
The ‘black-box’ view is in terms of
‘mapping’from ‘sequences’ of input symbolsto (‘sequences’ of) output symbols
These are rather vague and imprecise: e.g.the input example 00,01, 10, 11.
We now present the formal frameworkwithin which subsequent ideas will be set.
16 Introduction
Alphabets
An alphabetis a (finite)setof symbols,
Σ = σ1,σ2 , . . . ,σ k
Σ denotes an arbitrary alphabet;
σ an arbitrarysymbolin Σ.
Examples
The alphabet,Decimal of decimal digits:Decimal= 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
The alphabet,Binary ,Binary = 0, 1
The alphabetsRomanandGreek
Roman= A, B,C, . . ,X,Y, Z, a, b, c, . . ,x, y, z
Greek= Α, Β, Γ, . . ,Χ, Ψ, Ω,α , β ,γ , . . , χ ,ψ ,ω
Introduction 17
Words over A lphabets
A word , w, over an alphabet, Σ, is a finitesequence of symbolsfrom Σ.
Note the following points
A word is asequencenot aset.
Theorder of symbols is important.
Examples
Decimal words: 3, 10, 112289, 982211
Binary words: 0, 1, 10, 01, 100010.
Roman words: Java, ROMAN, word,SeQuEncE, Verbum.
Greek words: λογ ος , αγ α πη, θ εος ,θ εω ρια , θη ριο ν .
18 Introduction
Properties of and Operations on Words
Length of Words
The length of a word w is the number ofsymbols inw.
|w| denotes this value.
The word which has length 0 is called theempty word.
This will alwaysbe denoted byε
Examples
Decimal: |3| = 1; |10|= 2; |112289|= |982211|= 6
Binary : |0| = |1|= 1; |10|= |01|= 2; |100010|= 6
Roman: |Java| = 4; |ROMAN| = 5; |SeQuEncE| = 8
Greek: |λογ ος | = |αγ α πη| = 5; |θη ριο ν | = 6
Introduction 19
Concatenation
If u and v are words over Σ, the word wformed by concatenating u with v is theword whose sequence of symbols isu v.
The length ofw is |u| + |v|.
If eitheru or v are the empty word,ε, thenu ε = u; ε v = v; ε ε = ε .
We will, on occasion, use the notationu ⋅ v
to indicate concatenation of wordu andv.wk = w⋅w⋅w⋅ . . . ⋅w
(i.e. k concatenations ofw, wherek ≥ 0)
Examples
Decimal: 3⋅10 = 310; 22⋅33⋅22⋅ε = 223322
Binary : 1⋅0 = 10; 0⋅0⋅ε ⋅0⋅1 = 00⋅ε ⋅01 = 0001
Roman: W⋅o⋅r ⋅d = Word;
Greek: δ ια ⋅δ η µατ α = δ ια δ η µα τ α
20 Introduction
Languages of Words over A lphabets
The concept of alanguageis of great signif-icance in Computer Science.
In contrast to the other terms introducedabove, ‘ languages’ of interest are usuallyinfinite objects.
A language, L, over an alphabetΣ is asub-setof the set of all possible words over Σ.
It is convenient to have a shorthand for‘set of all possible words overΣ’
Since we consider onlyfinite length words,this set comprises:
0: all words over Σ of length 0, i.e.ε , and1: all words over Σ of length 1,and2: all words over Σ of length 2,and
. . .k: all words over Σ of lengthk, and
. . .
Introduction 21
The notationΣk is used for
all words overΣ of length k
So that the set we are describing is
∞
k=0∪ Σk
for which the shorthand,Σ* is employed.
It is sometimes convenient to consider theset of allnon-empty words over Σ, and forthis purpose, the notationΣ+ is used.
Examples
L(a) ⊂ Decimal* = w : w∈0, 1 +
L(b) ⊂ Binary* = w : |w| > 3
L(c) ⊂ Binary* = w : ∃ m ≥ 0 s. t. |w| = 2m + 1
L(d) ⊂ Binary* = w : w = Reverse(w)
L(h) ⊂ Binary* = w: w = 1i0 j and j = i2
22 Introduction
Operations on Sets/Languages
One of the issues of interest is the propertiesof languages formed by applying certainoperations to one or more languages.
SupposeL andM are languages over Σ.
The ‘basic’ operations are:
• Union (∪):L ∪ M = w ∈Σ* : w ∈ L or w ∈ M
• Intersection (∩):L ∩ M = w ∈Σ* : w ∈ L and w ∈ M
• Complement(Co−):Co− (L) = w ∈Σ* : w ∈ L .
• Concatenation(⋅):L ⋅ M = w ∈Σ* : w = u⋅v and u∈L and v∈M
• *-Closure (* ):
L* =∞
k=0∪ L(k)
where
L(k) = w : w ∈ L⋅L⋅L⋅ . . . ⋅L (k times)
Introduction 23
The Empty Language
The language over Σ that containsno wordsat all, is called
The Empty Language(over Σ)and is denoted by∅, the empty set sign.
Very Important
The emptylanguage, ∅ is not the same asthe language whose sole member is theemptyword , ε , i.e.
∅ = ε
For the operations ∪, ∩, ⋅,Co−, * :
Outcome OutcomeL ∪ ∅ L L ∪ ε L ⇔ ε ∈ LL∩∅ ∅ L∩ ε ε ⇔ ε ∈ LL⋅∅ ∅ L⋅ ε L
Co− (∅) Σ* Co− ( ε ) Σ+
∅* ε ε * ε
24 Introduction
Examples
Using the languages defined earlier.
L(a) ∪ L(b) = L(a)
L(b) ∩ L(c) = w : ∃ m≥ 2 s. t. |w| = 2m + 1
Co− (L(c)) = w : ∃ m≥ 0 s. t. |w| = 2m
L(h)⋅L(h) = w : w = 1i0 j 1r 0s, j = i2 and s = r 2
( L(a) )* = Binary*
Introduction 25
Equivalence Relations(Reminders)
Let S be a (possibly infinite) set. Arelation,R over S, is a set of ordered pairs of ele-ments fromS, i.e.
R ⊆ S × S
R is anequivalence relation if it satisfies allof the following:
a) ∀ x ∈S, < x, x > ∈ R
b) ∀ x, y ∈S, < x, y > ∈ R ⇔ < y, x > ∈ R
c) ∀ x, y, z ∈S, < x, y > ∈ R and <y, z > ∈ R ⇒ < x, z > ∈ R
These properties are respectively called:Reflexivity, Symmetry, Transitivity
Any equivalence relation,R, over S, inducesapartition of S,
<C1 ; C2 ; . . . ; Cr >The Ci ⊆ S (equivalence classes) are suchthat for allx, y ∈S:( x ∈Ci andy ∈C j and <x, y > ∈ R) ⇔ ( i = j )
26 Introduction
Example
SupposeS= N (the positive integers).
The relation≡k (k ≥ 2) is defined by
x ≡k y if the remainder when dividing x byk equals the remainder when dividing y by k
≡k is an equivalence relation. (Trivial exer-cise)
≡k partitions N into exactly k equivalenceclasses,
<C0 ; C1 ; . . . ; Ck−1 >for which x ∈Ci if and only if the remainderon dividingx by k equalsi .
Introduction 27
Describing Languages of Words
Given a languageL ⊆ Σ* ,
how are the words inL described/defined?
If L is infinite then one cannot simply pre-sent a list ofall the words inL.
Of course, one way is to give an ‘ad hoc’(finite) description, e.g.L(c) is
‘the set of all odd length binary words’
This iscomputationally unhelpful.
An alternative is to describe the operation ofa ‘black-box’, M , that outputs 1 when givenw ∈ L as input and outputs 0 otherwise.
Thus,L is‘the set of all words on which M outputs1’ .
28 Introduction
A Finite State Recogniseris an example ofsuch a description, and a formal definition ofL(M),
‘the language L recognised by M’
is given later.
There is, however, a third approach that isof great importance in Computer Science:
Define a formal grammar , G, againstwhich w ∈?L can be tested.
i.e. a set ofrules that ‘generates’ eachw ∈ L,
(equivalently), a process by which anyw ∈ L can be ‘decomposed’ or parsedusingthe rules inG.
Introduction 29
Formal Grammars
A formal grammar , G, is defined by aquadruple,
G = ( V, T, P, S)
V: a set of variable (or ‘non-terminal’)symbols.T: a set of terminal symbols (V ∩ T = ∅).S: thestart symbol (S ∈V).P: a set ofproduction rules, of the form
l i → r i .both l i and r i being words in (V ∪ T)* , l icontaining at least one symbol inV.
A formal grammar, G, defines how ‘accept-able’ words, w, may be generated from astarting point,S.
The production,L → R, is interpreted as:
‘A word w ∈(V ∪ T)* such that w= u⋅L⋅vgeneratesthe word u⋅R⋅v ∈(V ∪ T)* ’
30 Introduction
Of course suchderivations of words fromw ∈(V ∪ T)* can only continue whilewcontains (at least) one symbol fromV.
If w ∈T* can be derived from S in the gram-mar G = (V,T, P, S) then w is in the lan-guage, L(G), generated byG.
Notice thatL(G) ⊆ T* .
We treat Formal Grammars in greater depthlater.
For now we observe that ‘different types’ ofgrammar are distinguished by differingrestrictions on the form ofproductionrules.
Such restrictions specify ‘allowable’ combi-nations of variable and terminal symbols onthe left andr ight ‘sides’.
Introduction 31
Derivations in Formal Grammars
SupposeG = (V,T, P, S) is a formal gram-mar and thatx, y ∈(V ∪ T)* .
y is said to bedir ectly derived from x in G(x ⇒G y) if there is a a productionpi ∈ Psuch that applyingpi to x results in theword y.
y is said to bederived from x in G(x ⇒(*)
G y) ifx = y or
∃ z ∈(V ∪ T)* : x ⇒G z andz⇒(*)G y
Finally, x ∈T* is derivable in G if S⇒(*)G x.
Thus, the language,L(G), generated by thegrammarG is
x ∈T* : S⇒(*)G x
32 Introduction
A Simple Example Grammar
The following ought to be familiar:
EXPR= ( V,T, P, S) where
V = E, op, opd, num, digit .T = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, −, *, /, (, ) S = EP = p1 , p2 , . . . , p19, p20
Production Left → Right p1, p2, p3 E → (E) | E op E |opd
p4, p5, p6, p7 op → + |− | * | / p8 opd → num
p9, p10 num → digit |digit ⋅ num p11, . . , p20 digit → 0|1|2|3|4|5|6|7|8|9
Introduction 33
and Some Derivations in it
E opEopE ⇒ opd opEopE (by p3)
E opEopE ⇒(*) num + num + num(by < p3, p8, p4, p3, p8, p4, p3, p8 >
25 ∈ L( EXPR)
S⇒ E;E ⇒ opd;opd⇒ numnum ⇒ digit ⋅numdigit ⋅num ⇒ 2⋅num;2⋅num ⇒ 2⋅digit2⋅digit ⇒ 25
∴ S⇒(*) 25
34 Introduction
A fundamental discovery of Automata andFormal Language Theory may, informallybe stated as
There is a
‘hierarchy’of ‘black-box capabilities’
thatexactly matchesa‘hierarchy’
of ‘formal grammar types’
In other words,
L can be recognised by a machineM in aclassT of machinesif and only if L can bedescribed by agrammar G in a particularclassTG of formal grammars.
These ‘hierarchies’ are ‘natural’.
Languages recognised by Finite StateAutomata are at thelowest level of this.
Introduction 35
COMP209
Automata and Formal Languages
Section 2
Finite Automata
(Determinism and Non-Determinism)
36 Finite Automata
We now consider the ‘simplest’ class ofmachine model,
Finite State Automata
We concentrate on these asrecognisers.
Formally,
A finite state automaton (DFA), M , isdescribed by a quintuple,
M = ( Σ,Q, S, F ,δ )
Σ: a finite alphabet.
Q = ( q0, q1 , . . . ,qk ): finite set ofstates.
S∈Q: initial state.
F ⊆ Q: final (or accepting) states.
δ : Q × Σ→Q: state-transition mapping.
Finite Automata 37
Consider the example given earlier,
0,10,1
q0q1
q2
2,3,4,5,...,8,9 2,3,4,5,...,8,9
0,1,2,3,...,8,9
Σ = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9; Q = q0, q1, q2
S = q0 ; F = q1
δ : Q × Σ → Qq σ → δ (q,σ )q0 0, 1 → q1q0 2, 3, 4, 5, 6, 7, 8, 9 → q2q1 0, 1 → q1q1 2, 3, 4, 5, 6, 7, 8, 9 → q2q2 0, 1, . . . , 8, 9 → q2
38 Finite Automata
A L arger Example
0, 1
0
0
0
0
1
1
1
1
101
0
q1
q6q5
q4
q3q2q0
DFA MX Recognising a LanguageLX.
Σ = 0, 1 ; Q = q0, q1, q2, q3, q4, q5, q6;S = q0 ; F = q3, q4, q5.δ is easily extracted from the diagram.
Finite Automata 39
It is not obvious by inspection, exactly whatLX is in the larger example.
We shall use this example to illustrate thegeneral concept ofL(M)⊆Σ* :
‘the language over Σ* recognised by M’
Consider any DFA, M = (Σ,Q, S, F ,δ ).
Notice the following:
For each stateq∈Q, δ maps eachσ ∈Σ toexactly one‘next’ stateq′∈Q, i.e.
δ is total anddeterministic.
Thus for every w∈Σ* , the ‘processing’ ofwby M can be described by thesequence ofstates visited as each symbol inw is ‘read’.
Whether w is accepted (in L(M)) orrejected (not in L(M)) depends on whetheror not the last state visited is an acceptingone, i.e in the setF .
40 Finite Automata
Example using the DFAMX
w = w1w2w3w4w5w6w7w8 ∈ 0, 1 * .If w = 00111100 whichis in LX:
i wi Current State Next State0 - - q01 0 q0 q02 0 q0 q03 1 q0 q14 1 q1 q45 1 q4 q46 1 q4 q47 0 q4 q58 0 q5 q5 ∈ F
If w = 10100001 which isnot in LX:
i wi Current State Next State0 - - q01 1 q0 q12 0 q1 q23 1 q2 q34 0 q3 q35 0 q3 q36 0 q3 q37 0 q3 q38 1 q3 q6 ∈ F
Finite Automata 41
In order formally to capture the concept of‘sequence of states visited’, we proceed bygeneralising the state-transition mapping,δ ,from single symbolsin Σ to words in Σ* .
For M = (Σ,Q, S, F ,δ ), the mappingδ * : Q × Σ* → Q
is defined as:
δ * ( q, w ) =
q if w = εδ ( δ * ( q, u ),σ ) if w = u ⋅ σ
Notice that this definition isrecursive:
The casew = ε , indicates the stateq.The general case, (w = ε ) indicates that
‘the state reached byM from q on the wordw = u⋅σ is the state reached by applying thetransition functionδ to the state reached byM on the word u and the symbolσ ∈Σ’
42 Finite Automata
Example usingMX
With w = 110,q = S = q0.
δ * ( q, 110 ) = δ ( δ * ( q, 11 ), 0 )
= δ ( δ ( δ * ( q, 1) , 1 ), 0 )
= δ ( δ ( δ ( δ * ( q, ε ), 1 ), 1 ), 0 )
= δ ( δ ( δ ( q0, 1 ), 1 ), 0 )
= δ ( δ ( q1, 1), 0 )
= δ ( q4, 0 )
= q5 ∈ F
Finite Automata 43
δ * admits a formal definition of
‘the language, L(M), recognised by the DFAM = ( Σ,Q, S, F ,δ )’
L( M ) = w ∈Σ* : δ * ( q0, w ) ∈ F
Notice that this definition iscomputation-ally based, i.e. it isnot an ‘ad hoc’ descrip-tion.
Thus for ‘suitable’ l anguages,L, over Σ, onecan present afinite computational defini-tion of all the words inL by,
describing any DFA, M = (Σ,Q, S, F ,δ )for which
L( M ) = L[ i.e. for allw ∈Σ* , w ∈ L ⇔ w ∈ L(M)].
This, however, raises the question:‘What is meant by a "suitable" language?’
We (temporarily) defer discussion of this.
44 Finite Automata
Non-deterministic Finite State Automata
It was noted above that δ : Q × Σ → Q wasdefined to be:
total anddeterministic
i.e. every state/symbol pair has a definedtransition toexactly onenext state.
We now consider the effect of changing thisto allow non-determinism in the state-tran-sition functionδ .
2 mechanisms are used to do this:
• more than one ‘choice’ of next state.• ‘null’ transitions between states.
In the first:δ (q,σ ) need not beunique.
In the second: the ‘current’ state may changewithout any input occurring.
Finite Automata 45
1
1
0 0
1
0
q1
q0
q2
‘Simple’ Example of Non-deterministic FSA
In the example above, the initial state,q0,has transitions to stateq1 and stateq2 bothof which are labelled 0.
Similarly, q2 has transitions labelled 1 tostatesq0 andq1.
Notice, also, thatq0 has no transitionlabelled 1.
46 Finite Automata
Formally, a non-deterministic FSA,(NDFA) M , is described by a quintuple,
M = ( Σ,Q, S, F ,δ )whereΣ, Q, S, and F are as before, but δ isnow a mapping,
δ : Q × Σ → ℘( Q )℘( Q ) denoting thepowerset (set of allsubsets) ofQ.
For the example NDFA,δ is
δ : Q × Σ → ℘(Q)q σ → δ (q,σ )q0 0 → q1, q2q0 1 → ∅q1 0 → q2 q1 1 → q0 q2 0 → ∅q2 1 → q0, q1
Finite Automata 47
Interpretation
The change fromQ to ℘(Q) as the range ofδ , clearly has implications for the statesequence functionδ * and consequently forhow the term‘language, L(M), recognised by the NDFA M’is defined.
Before considering these we describe a‘physical’ interpretation of how non-deter-minism should be viewed in this context.
Consider the transitionδ ( q0, 0 ) = q1, q2
for the example machine.
We view this as modelling:
‘If 0 is read when in stateq0 theneither oneof statesq1 or q2 could occur as the ‘next’state’
48 Finite Automata
It is important to remember that:
a)Exactly oneof these states is chosen.b) Which one isnot predictable.
Thus if δ (q,σ ) = R⊆ Q, then in stateqreading σ the next state will be some(unknown) state inR.
Non-determinism (in the sense used here)shouldnot be thought of as a ‘random pro-cess’.
Although ‘random’ or ‘ probabilistic’ meth-ods define aparticular type of non-deter-minism, the concept of ‘non-deterministicchoice’ that we wish to use, precludes anypossibility of modelling by a stochastic pro-cess.
[Probabilistic automata have been widelystudied, however, a treatment of these is out-with the scope of this module.]
Finite Automata 49
Languages Recognised by NDFAs
Recall that the language recognised by aDFA, M , was defined with respect to itsassociated ‘state-sequence’ function
δ * : Q × Σ* → Q
In a deterministic automaton, starting fromthe initial state, any word w ∈Σ* , can bethough of as traversing a unique sequence(or path) of states.
In a non-deterministic automaton, this‘sequence’ may not beunique: there may bemany possible sequences of states that areconsistent with a singlew.
50 Finite Automata
Example
If w = 010, the ‘tree structure’ describespossible computations:
q1
q0q1
11
0 0 000
0
q1 q2 q2q2q1
q0
1
0
q2
q0
5 ‘computation paths’;3 accept (q2);2 reject (q1);
Question: Is 010 accepted or not???
Finite Automata 51
The definition ofw ∈Σ* being accepted bythe NDFAM is, informally, stated as:
‘ w is accepted by the NDFA, M, if there is atleast onecomputation path of M on w thatendsin a stateq∈F ’
So in the example, 010is accepted.
The significance of the computation beingendedshould be noted, i.e.all symbols inwmust be read.
If we consider w = 0100 in the example,then sinceq2 has no ‘0-transition’ the pro-cessing ofw would be ‘stuck’ at this point.We cannot conclude that 0100 is acceptedatthis point (even thoughq2 cannot be left).[ 0100 is accepted by continuing with the0-transition fromq1].
52 Finite Automata
We can now define an analogue ofδ * fornon-deterministic automata.
For the NDFA M = (Σ,Q, S, F ,δ ) theReachability Function,
ρ* : Q × Σ* → ℘(Q)is,
ρ* ( q, w ) =
q if w = ε
q’ ∈ρ* ( q, u )∪ δ ( q’,σ ) if w = u ⋅ σ
And we now hav e,
The language,L(M), recognised by theNDFA M = (Σ,Q, S, F ,δ ) is,
L( M ) = w∈Σ* : ρ* ( q0, w) ∩ F = ∅
Finite Automata 53
Example
wq’ ∈ρ* (q,u)
∪ δ (q’,σ ) ρ* (q0, w)
ε - q0
ε ⋅0 δ (q0, 0) q1, q20⋅1 δ (q1, 1) ∪ δ (q2, 1) q0, q101⋅0 δ (q0, 0) ∪ δ (q1, 0) q1, q2
ρ* ( q0, 010 ) = q1, q2 F = q2F ∩ ρ* ( q0, 010 ) = q2 = ∅.
∴ 010∈ L accepted by the NDFA example.
54 Finite Automata
For ‘suitable’ l anguages we now hav e twomethods for describing the set of wordsL ⊆ Σ* :
a) By a DFA,M for which L( M ) = L.b) By a NDFA,M’ for which L( M ) = L.
Of course, since the transition function,δwith deterministic machines is a "restricted"form of that allowed in non-deterministicautomata, it follows that:
‘Any language that is "suitable" (in sense(a)) isalso"suitable" (in sense (b))’
More formally,
L ⊆ Σ* : ∃ DFA, M , with L(M) = L ⊆ L ⊆ Σ* : ∃ NDFA, M’ , with L(M’ ) = L
Finite Automata 55
It turns out, however, that we do not ‘gainanything’ by way of languages recognisableusing NDFAs but not recognisable usingDFAs.
Theorem 1:For any NDFA, M = (Σ,Q, S, F ,δ ) there isa deterministic FA. M’ = (Σ,Q’, S’, F’ ,δ ’ )such that
L( M ) = L( M’ )
We first illustrate the idea behind the proof.
Consider the ‘general’ computation tree forthe NDFAexemplar:
56 Finite Automata
q0
∅
0 1
q12
q2
0 0
1
1
q0
q1,q2
q2
0 1
q0
q0
0
0
1
1
Tree vertices: labelled withdistinct subsetsof Q arising in the reachability functionρ* .
Graph edges: transitions between theseunder 0 or 1.
This automaton isdeterministic andequiv-alent to the non-deterministic example (itacceptsexactly the same language).
Finite Automata 57
Proof of Theorem 1
LetMnd = (Σ,Qnd, Snd, Fnd,δ nd )
be someNDFA recognisingL( Mnd) ⊆ Σ* .
We construct aDFA,
Md = (Σ,Qd, Sd, Fd,δ d )
such thatL( Md ) = L( Mnd ).
The idea is that eachsingle state inQd willcorrespond to somesubset of states fromQnd.
More precisely, R⊆ Qnd, will map to somestateqR ∈Qd if and only if
∃ w ∈Σ* : ρ* ( Snd, w ) = R.
58 Finite Automata
Once all of the required states inQd arefound, the state transition function
δ d : Qd × Σ → Qd
is a (total) function defined so that
∀ Ri ⊆ Qnd for which there is a correspond-ing stateqRi
∈Qd:
δ d( qRi,σ ) = qRj
⇔
q ∈ Ri
∪ δ nd( q,σ ) = Rj ⊆ Qnd
Finite Automata 59
Algorithmic Construction
We use a stack,∆, to keep trace of the sub-sets ofQnd.
Qd : = qSnd; // Starting point
Sd : = qSnd;
push Snd onto ∆while ∆ is not empty
R : = Top( ∆ );pop( ∆ ); // Remove top of stack.for eachσ ∈Σ
V : =
q ∈ R∪ δ nd( q,σ ); // (a)
if qV ∈Qd // ‘new’ subset ofQndpush V onto∆;Qd : = QD ∪ qV ;if V ∩ Fnd = ∅
Fd : = Fd ∪ qV ; // (b);
;δ d( qR,σ ) = qV ; // (c);
;
60 Finite Automata
LetMd = ( Σ,Qd, Sd, Fd,δ d )
be the DFA constructed from the NDFA,Mnd = (Σ,Qnd, Snd, Fnd,δ nd)
Each state,qR ∈QD is formed from somesetof statesR⊆ Qnd by the algorithm.
L( Md ) = L( Mnd )will follo w by proving: ∀ w ∈Σ* ,
δ *d( qR, w) = qV ⇔ V =
q∈R∪ ρ*
nd(q, w)
We use induction on |w| ≥ 0.
Inducti ve Base: |w| = 0, i.e.w = ε .
δ *d( qR, ε ) = qR;
q ∈ R∪ ρ*
nd( q, ε ) =q ∈ R∪ q = R.
∴ Inductive base holds.
Finite Automata 61
Inducti ve Step: (|w| ≤ k) ⇒ (|w| = k + 1)
Let w = u⋅σ , where |u| ≤ k andσ ∈Σ.
δ *d( qR, u⋅σ ) = δ d( δ *
d( qR, u ),σ )
= δ d( qV ,σ ) = qY
The setsV, Y are subsets ofQnd. Which?
V =q ∈ R∪ ρ*
nd( q, u ) [Ind. Hyp.]
Y =q∈V∪ δ nd( q,σ ) [(a)+(c)]
And,
q ∈ R∪ ρ*
nd( q, u⋅σ ) =q ∈ R∪
q’ ∈ρ*nd(q,u)∪ δ nd( q’,σ )
=q’ ∈V∪ δ nd( q’,σ ) = Y
∴ δ *d( qR, u⋅σ ) = qY ⇔
q∈R∪ ρ*
nd(q, u⋅σ ) = Y
Completing the induction.
62 Finite Automata
That L( Mnd ) = L( Md ), now follows bynoting that,
qR ∈ Fd ⇔ R∩ Fnd = ∅ [(b)]
So,
w ∈ L( Md ) ⇔ δ *d( q Snd , w ) ∈ Fd
⇔ ρ*nd( Snd, w ) ∩ Fnd = ∅
⇔ w ∈ L( Mnd )
Finite Automata 63
Example
Applying the process to our example NDFA,leads to the DFA, with:
Qd = q0 , q1,2 , q2 , q Sd = q0 ;Fd = q1,2 , q2 .
δ d : Qd × Σ → Qd
q σ → δ d(q,σ )q0 0 → q1,2q0 1 → qq1,2 0 → q2q1,2 1 → q0q2 0 → qq2 1 → q0q 0 → qq 1 → q
Note: The empty subset must be included inthe set of reachable subsets if it arises (thestateq above).
64 Finite Automata
ε-Transition Automata
We hav eseen that NDFA using a ‘multiplechoices’ sense of non-determinism do notextend the range of ‘suitable’ l anguages forDFA.
What about the other mechanism:‘null ’-transitions?
Formally, an ε-NDFA, M , is a quintuple,M = ( Σ,Q, q0, qF ,δ )
whereΣ andQ are as before;q0 ∈Q: Single initial state, having only out-going transitions, all of which are labelledε .qF ∈Q: Single accepting state, having onlyincoming transitions, all of which arelabelledε .
δ : Q × Σ ∪ ε → ℘(Q)The state-transtion function as forNDFAbutaugmented to allow ε -moves between somestates.
Finite Automata 65
Example ε-Transition Automaton
ε
εε
1
q0 qF
1
ε ε
1 ε
q1
q2
0
q5
q3 q4
66 Finite Automata
Interpretation
In anε − NDFA, M , supposeδ has a transi-tion of the form,
δ ( qi , ε ) = R⊆Q
if M can reach stateqi , then M can moveto any q∈R without any further inputoccurring. Whether theε -move is made ornot is decided non-deterministically.
The insistence onunique accepting andstarting states such that:a) q0 is left via anε -move and never re-entered.b) qF is entered via anε -move and neverleft.
is purely for technical convenience.
Tr ivial exercise: Show that any (normal)NDFA M can be transformed to an ‘equiv-alent’ ε -NDFA with unique start and accept-ing states meeting these requirements.
Finite Automata 67
Languages Accepted byε-NDFAs
We can define the concept ofw ∈Σ* beingaccepted by anε -NDFA, M , by extendingthe definition of ofρ* for anε -NDFA:
Unfortunately, this involves some additionalcomplications due to the fact that we musttake account of states reached byε -moves atany point during the processing ofw,
Let M = ( Σ,Q, q0, qF ,δ ) be an ε -NDFA.
First we definek-reachability (k ≥ 0)
ρ (k) : Q × Σ* → ℘(Q)to capture
‘the set of states reachable from q on w afterexactlyk moves’
68 Finite Automata
ρ (0)( q, w ) =
∅ if w=εq if w = ε
i.e. ‘in 0 moves state q cannot be left, andonly the empty word can be "read"’
For k > 0, ρ (k)(q, w) is
∅ if |w| > k
q’ ∈ρ (k−1)(q,ε )∪ δ (q’, ε ) if w = ε
q’ ∈ρ (k−1)(q,u)∪ δ (q’,σ ) ∪
q’ ∈ρ (k−1)(q,w)∪ δ (q’, ε ) if w = u⋅σ
Finally,
ρ* (q, w) =∞
k=0∪ ρ (k) (q, w)
w is accepted by theε -NDFA, M , ifqF ∈ ρ* ( q0, w)
So that,L( M ) = w ∈Σ* : qF ∈ ρ* ( q0, w )
Finite Automata 69
Example
qF ∈ ρ* ( q0, 0 )
k ρ (k−1)( q0, ε ) ρ (k−1)( q0, 0 ) ρ (k)(q0, 0)0 - - ∅1 q0 ∅ ∅2 q1 ∅ ∅3 q2 ∅ q24 q5 q2 q55 q1, qF q5 q1, qF
qF ∈ ρ* ( q0, 1 )
ρ* ( q0, 1 ) = q3
∴ 0 ∈ L( M ) and 1∈ L( M ).
70 Finite Automata
ε -transitions allow automata to be ‘gluedtogether’ to give an elegant method of ‘com-bining’ languages.
e.g. if MR, and MT recogniseR, T ⊆ Σ* ,
q0 MR MT
ε ε ε
MR
MT
q0
ε
ε ε
ε
recognise:R∪ T, R⋅T respectively.
Finite Automata 71
From these, and similar examples, it mightappear that
L ⊆ Σ* : ∃ ε − NDFA, M s. t. L(M) = L
properly contains
L ⊆ Σ* : ∃ NDFA, M s. t. L(M) = L
Or, in informal terms,
‘there are "more" "suitable" languages forε-NDFAs than there are for "ordinary"NDFAs’
In fact this isnot the case, as we shall show(constructively) in
Theorem 2:
L ⊆ Σ* : ∃ ε − NDFA, M s. t. L(M) = L = L ⊆ Σ* : ∃ NDFA, M s. t. L(M) = L
72 Finite Automata
Proof of Theorem 2
Let Mε = (Σ,Qε , q0, qF ,δ ε ) be an ε -NDFA.An equivalent NDFA without ε -moves isbuilt in 2 stages.
a) Forming anε -NDFA without ε -loops.b) Forming an equivalent NDFA to this.
Stage 1: Removal of ε -loops
R= c1, c2 , . . . ,ct is an ε -loop if
ci ∈ δ ( ci−1, ε ) ∀ 2≤ i ≤ t
c1 ∈ δ ( ct , ε )
[ e.g. q1, q2, q5 in t he example.].
Note that:a) q0 and qF cannot occur in any ε -loop.(Exerc: Why?)b) Any ε -move such that q ∈δ ( q, ε ), isredundant.
∴ ε -loops have at least 2 distinct states, anddo not contain the start or final state.
Finite Automata 73
If R= c1 , . . . ,ct is an ε -loop in Mε , forma new ε -NDFA M’ = (Σ,Q’, q0, qF ,δ ’ ) asfollows:
Q’ = Q − R∪ qR i.e. Remove all the states inR from Qadding a new stateqR to ‘represent’ these.
For each transition such thatqk ∈δ ( qi ,α )of Mε , (whereα ∈Σ ∪ ε f orm transitionsin δ ’ using the following:
qk ∈δ ’ ( qi ,α ) if qi ∈ R and qk ∈ R
qk ∈δ ’ ( qR,α ) if qi ∈ R and qk ∈ R
qR ∈δ ’ ( qi ,α ) if qi ∈ R and qk ∈ R
qR ∈δ ’ ( qR,α ) if qi ∈ R, qk ∈ R and α ∈Σ
The process is continued until noε -loopsremain.
74 Finite Automata
Example
Removing theε -loop q1, q2, q5 , from theexample, produces
1
q0 qF
1
q3 q4
1
εε
ε
q125
0
Finite Automata 75
Stage 2:Removal of ε -moves
On completion of Stage 1, anε -NDFA, M ,without ε -loops has been built.
Let Mε = (Σ,Qε , q0, qF ,δ ε ) be the ε -loopfreeε -NDFA.
Mnd = (Σ,Qnd, q0, Fnd,δ nd) an equivalentNDFA (without ε -moves) is built by:
Qnd : = Qε − qF
For eachq ∈Qnd ∩ Qε , σ ∈Σ
δ nd( q,σ ) =q’ ∈ρ*
ε (q,ε )∪ δ ε ( q’,σ )
Finally,
Fnd = q ∈Qnd∩Qε : qF ∈ ρ*ε ( q, ε )
76 Finite Automata
Example
The ε -move free NDFA resulting from theexample automaton is:
1
q0
1
q3 q4
1
0
0
0
1
q125
Fnd = q0, q125, q4 .
Finite Automata 77
Correctness Proof
Stage 1:
Let M1 = (Σ,Q1, q0, qF ,δ1) hav e an ε -loopthroughR⊆ Q1.
Let M2 = ( Σ,Q2, q0, qF ,δ2 ) result byremoving thisε -loop.
Supposew ∈ L( M1 ). Let k be the leastvalue for which
qF ∈ ρ (k)1 ( q0, w ).
There is a sequence of states and transitionsin M1,
q0 → α1 → s1 → α1 → s2 → α2 → . . .
. . .α k−1 → sk−1 → α k → qF(w1)
such thatα1⋅α2⋅ . . . ⋅α k = w.
Since k is minimal, this sequence cannotcontain anε -loop.
78 Finite Automata
Consider the sequence of states inM2defined by:
t j =
qR if sj ∈ R
sj otherwise(w2)
The construction ensures thatwith theexception ofmoves sj → ε → sj+1 wheresj ,sj+1 ∈ R each move si → α i → si+1 has amatching move in M2. The missing movesare of the formqR → ε → qR (i.e. anε -movefrom qR to itself) and these can be replacedby the single stateqR.
The proof thatqF ∈ ρ*
2( q0, w) ⇒ qF ∈ ρ*1(q0, w)
is similar.
Stage 2: Similar to Stage 1. (Exercise).
Finite Automata 79
COMP209
Automata and Formal Languages
Section 3
Regular Languagesand
Finite Automata
80 Regular Languages
We hav enow seen three different forms offinite automaton:
a)Deterministic (DFA)b) Non-deterministic (NDFA)c) ε-transition (ε − NDFA)
These are ‘equally powerful’ in the sensethat the set
of ‘suitable’ l anguages forDFAis exactly the same as the set
of ‘suitable’ l anguages forNDFAwhich is exactly the same as the set
of ‘suitable’ l anguages forε − NDFA
We now return to the question:
What is meant by a‘suitable’ language?
Regular Languages 81
Formal Grammars and Finite Automata
Recall that a formal grammar was intro-duced as
G = ( V, T, P, S)
V: a set of variable (or ‘non-terminal’)symbols.T: a set of terminal symbols (V ∩ T = ∅).S: thestart symbol (S ∈V).P: a set ofproduction rules, of the form
l i → r i .both l i and r i being words in (V ∪ T)* , l icontaining at least one symbol inV.
It was also claimed that,
‘Different grammar’‘types’ match‘different machine’ ‘ capabilities’
82 Regular Languages
Finite State Automata formed the‘simplest’ class of machine model
Intuition would suggest a ‘match’ w ith the‘simplest’ structure of grammar,
i.e. that which imposes the greatest restric-tions on the form of grammar productions.
What form would this be?
If pi : Li → Ri is a production:Vi → σ or Vi → σ ⋅V j or Vi → ε
Vi , V j variable symbols inV.σ a terminal symbol inT.
Regular Languages 83
Such grammars are called
Right Linear Grammars (RLG):
Thus, if G = (V,T, P, S) is a RLG andw = x Vi y ∈(V ∪ T)* , then only words ofthe form
x σ y if Vi → σ ∈ P(G)
x σ V j y if Vi → σ ⋅ V j ∈ P(G)
can be generated fromw by G.
Question: Is this intuition justified?i.e. is it the case that,∀ L ⊆ Σ* :
∃ RLG G: L(G) = L ⇔ ∃ DFA M : L(M) = L
Answer: Yes.
84 Regular Languages
Theorem 3:
L ⊆ Σ* : ∃ RLG Gwith L(G) = L
= L ⊆ Σ* : ∃ DFA M with L(M) = L
Proof of Theorem 3
Given a DFA M = ( Σ,Q, S, F ,δ ), form theRLG, GM = ( VM , Σ, PM , SM ) by:
VM = Vi : qi ∈Q SM = V0
PM is formed by the rules:
M has P(G) has1 qi ∈ F Vi → ε2 δ ( qi ,σ ) ∈ F Vi → σ3 δ ( qi ,σ ) = q j Vi → σ ⋅V j
Regular Languages 85
Similarly,
Given the RLG, G = ( V, Σ, P, S), form theNDFA, MG = ( Σ,QG, qG
0 , FG,δG ) with
QG = qi : Vi ∈V ∪ qF
qG0 = qS(G); qF ∈ FG
andδG given by:
G has δG has1 Vi → ε qi ∈ FG
2 Vi → σ qF ∈δ ( qi ,σ )3 Vi → σ ⋅V j q j ∈δ ( qi ,σ )
We claim that:
∀ w ∈Σ* :
δ * ( q0, w ) ∈ F ⇔ SM ⇒(*)G(M) w
S ⇒(*)G w ⇔ ρ*
M ( qG0 , w ) ∩ FG = ∅
86 Regular Languages
Only the first of these will be proved. Thesecond is similar.
Supposew = σ1σ2. . .σ k hasδ * ( q0, w ) ∈ F .
Let <s1, s2, . . . ,sk > be the state-sequencesuch that
δ ( q0,σ1 ) = s1
δ ( si ,σ i+1) = si+1 1 ≤ i < k
sk ∈ F
GM has productions with left-hand sides< L1, L2 , . . . ,Lk >
such that
S = V0 → σ1 L1
Li → σ i+1 Li+1 1 ≤ i < k
Lk−1 → σ k or Lk → ε
∴ SM ⇒(*)G(M) σ1
. . .σ k = w
In the same way, from the derivationsequence proving w ∈ L(GM ), a sequence ofstates in M giving δ * ( q0, w) ∈ F isformed.
Regular Languages 87
Regular Sets and Regular Expressions
The correspondence betweenDFA andRLGs is giv en by (rather obvious) directtranslations
< Q, δ > ↔ <V, P >
It is debatable, however, to what extenteither of these mechanisms assist with thefollowing questions:
a) Given a DFA, M , giv e a ‘succinct’description ofL(M).b) Given a RLG, G, giv e a ‘succinct’description ofL(G).c) Given L ⊆ Σ* , ∃?DFA, M : L(M) = L.d) Given L ⊆ Σ* , ∃?RLG, G: L(G) = L.e) Given L ⊆ Σ* construct a DFA, M , suchthat L(M) = L.f) Given L ⊆ Σ* construct a RLG, G, suchthat L(G) = L.
Note the difference between (c,d) (existencequestions) and (e, f) (synthesisquestions).
88 Regular Languages
One possible approach might be to try andfind a set of operations, Φ, on sets ofwords over an alphabet Σ which allow newsets of words to be formed by applying oper-ations inΦ to ‘previously built’ sets.
Thus if one takes theindividual symbols inΣ as the ‘initial ’ terms, then, depending onour choice ofΦ, we can describe a language,L, by giving the sequence of operations inΦwhich must be applied in composition togenerateL.
Now, for any particular choice of operations,Φ, there will besomeclass of languages thatcan be describedusingΦ.
Regular Languages 89
Question 1:
Is it possible to choose operations,Φ, so thatthis process definesexactly the class of lan-guages recognisable byDFA?
Question 2:
If it is possible, what choice of operationsachieves this?
The answer to the first question is that itispossible to define such a set.
In this section we describe how this is doneand prove that the class of resulting lan-guages are exactly those captured byDFA.
90 Regular Languages
Regular Sets
A regular set (or regular language), is a setL ⊆ Σ* that can be formed bysufficientlymany applications of the following opera-tions:
a) ∅ (the empty set) is a regular set.b) ε ( thesetcontaining the emptyword )is a regular set.c) ∀ σ ∈Σ: σ is a regular set.d) If V, W ⊆ Σ* are regular sets, then so areall of the sets:
V ∪ W ; V ⋅ W ; V*
Examples
0 ∪ 1 *
0 ∪ 1 ⋅0 ∪ 1 ⋅0 ∪ 1 *
0 * ⋅1 ⋅0 ⋅1 ⋅0 * ∪ 1 ⋅1 * ⋅0 *
[Note: Every oneof these examples hasbeen seen earlier.]
Regular Languages 91
Regular Expressions
The formalism used to described regularsetsbecomes rather cumbersome even whendescribing ‘simple’ set structures.
A more convenient (and clearly equivalentapproach) is that ofregular expressions
A regular expressionover Σ is recursivelydefined as follows:
a) ∅ is a regular expression.b) ε is a regular expression.c) ∀ σ ∈Σ: σ is a regular expression.d) If V, W are regular expressions over Σ,then so areall of
(V + W) ; (V ⋅ W) ; (V* )
[Note: Where no ambiguity arises, bracketsare omitted.].
92 Regular Languages
Examples
( 0+ 1 )*
( 0+ 1 )⋅( ( 0+ 1 )⋅( 0+ 1 ) )*
0* ⋅1⋅( 0⋅⋅1⋅0* + 1 )⋅1* ⋅0*
Interpreting+ as ∪ gives an obvious map-ping between regular sets and regularexpressions.
Thus a Regular Expression over Σ is adescription of a particular
Regular Setover Σ.We denote byL(R) the regularsetdescribedby the regularexpression, R.
Formally, for R a regular expression the reg-ular setL(R) is:
L( R) =
∅ if R = ∅ ε i f R = εσ i f R = σL(S) ∪ L(T) if R = S+ T
L(S)⋅L(T) if R = S⋅TL(S)* if R = S*
Regular Languages 93
Note that:
a) There may bemany different expressionsfor a single set.
b) There isexactly oneset corresponding toa single expression.
c) A regular expression can be seen both as a
description
of the set of words in a (regular) languageand as an
operational process
for generating these.
94 Regular Languages
Properties of +, ⋅, *
R, S, T denote arbitrary regular expressions.
R + S = S + R
( R+ S) + T = R + ( S+ T )
( R ⋅ S) ⋅ T = R ⋅ ( S⋅ T )
R ⋅ ( S+ T ) = R ⋅ S + R ⋅ T
The important properties of* are:
( R* )* = R*
R ⋅ R* = R* ⋅ R
R ⋅ R* + ε = R*
R ⋅ ( S⋅ R)* = ( R ⋅ S)* ⋅ R
( R+ S)* = ( R* + S* )*
= ( R* ⋅ S* )*
= R* ⋅ ( S⋅ R* )*
These are easily proved using the basic defi-nitions of⋅, ∪ (i.e. +) and * .
Exercise:Do this.
Regular Languages 95
Equivalence ofRegular Expressions and Finite Automata
Theorem 4:L ⊆ Σ* is a regular set if and only if there isa DFA, M , for whichL( M ) = L.
First we outline the proof structure:
Recall that
a) Any regularset is described by a regularexpression.b) DFA, NDFA, and ε -NDFA describeexactly the same classof languages.
The proof is carried out in two stages:
I) For any regular expression,R, constructsomeFA, M , for whichL( M ) = L( R)
II) For any DFA, M , construct some regularexpression,R, for whichL( R) = L( M ).
96 Regular Languages
Regular Expressions→ Finite AutomataBase Cases
ΣΣ
q0
q1
q0q1
Σ
Σ
Σ − σ
σ
q2
q0 Σ
R= ∅ R= ε
R= σ
DFA Constructions forRegular Expressions (a-c)
Regular Languages 97
Regular Expressions→ Finite AutomataComposite Cases
R andS are regular expressions.
R
S
ε
ε
ε
ε
R Sε ε ε
Rε εεε
ε
ε
R + S
R⋅S
R*
ε-NDFA Constructions forRegular Expressions (d)
98 Regular Languages
Correctness of Construction
Formally, this is by induction on the totalnumber of occurrences of +, ⋅, * .
Thus letT be any regular expression con-tainingk ≥ 0 operations.
Inducti ve Base: k = 0
T ∈ ∅, ε ,σ .The correctness of these constructions beingobvious.
Inducti ve Step: (≤ k − 1) ⇒ k
Assuming correctness for expressions with≤ k − 1 operations. Let T be a regularexpression havingk operations.
T ∈ R + S, R ⋅ S, R* .Inductively we may construct (correct)DFAfor R and S since these expressions usefewer thank operations.
Regular Languages 99
If T = R + S, we add ε -transitions fromaccepting states ofR, S, to a new (single)accepting state, andε -transitions from a newstart state to the initial states ofR andS.
If T = R⋅S, ε -transitions connect acceptingstates ofR to the initial state ofS (theseaccepting states inR being changed to ‘ordi-nary’ states, i.e. non-accepting.)
Finally if T = R* , an ε -loop from acceptingstates in R back to its initial state isarranged.
Exercise:Complete the formal details of theinductive step, describing the exact form oftheε -NDFA,
MT = ( QT , Σ, qT0 , qT
F ,δT )in terms of theDFA
MR = ( QR, Σ, qR0 , FR,δ R )
MS = ( QS, Σ, qS0, FS,δ S )
for each of the casesT = R+ S ; T = R ⋅ S ; T = R*
100 Regular Languages
Finite Automata → Regular Expressions
This is a little more complicated.
The key idea is to form asystem of ‘simultaneous’ equations,
E(M) from M = (Q, Σ, q0, F ,δ ):
a) There are exactly |Q| equations, in thissystem. One for each state inQ.b) Each equation,Ei , is aregular expressionover Σ and E j : 1 ≤ j ≤ |Q| .c) Ei describes the set of words that areacceptedstarting from the state qi .d) The aim is to ‘reduce’ this system so thata ‘closed form’ solution forE0 is found, i.e.a solution
E0 = Rwhere R is a regular expression involvingonly the operations +, ⋅, * and ∅, ε , Σ .
So∀ i R contains no occurrence ofEi
Regular Languages 101
Construction of theEquational SystemE( M )
The closed form solution forEi - the equa-tion corresponding to the stateqi ∈Q,should be aregular expression, Ri , suchthat:
L( Ri ) = w ∈Σ* : δ * ( qi , w ) ∈ F i.e. the set of words that would be acceptedif qi were the initial state.
But this set,L( Ri ) is just,
σ ∈Σ∪ σ ⋅u ∈Σ* : δ * ( δ ( qi ,σ ), u ) ∈ F ( *)
Let qi ,σ denote the stateδ ( qi ,σ ).
102 Regular Languages
Question
What is the set in (*) in terms ofσ ∈Σ∪ Ei ,σ ?
Answer
σ ∈Σ+ σ ⋅ Ei ,σ + Λi
where
Λi =
∅ if qi ∈ F
ε if qi ∈ F
In other words, for eachqi in Q, Ei is
Ei =
σ ∈Σ+ σ ⋅Ei ,σ + ∅ if qi ∈ F
σ ∈Σ+ σ ⋅Ei ,σ + ε if qi ∈ F
(**)
Regular Languages 103
Example
0, 1
0
0
0
0
1
1
1
1
101
0
q1
q6q5
q4
q3q2q0
E0 = 0 ⋅ E0 + 1 ⋅ E1 + ∅E1 = 0 ⋅ E2 + 1 ⋅ E4 + ∅E2 = 0 ⋅ E6 + 1 ⋅ E3 + ∅E3 = 0 ⋅ E3 + 1 ⋅ E4 + εE4 = 0 ⋅ E5 + 1 ⋅ E4 + εE5 = 0 ⋅ E5 + 1 ⋅ E6 + εE6 = 0 ⋅ E6 + 1 ⋅ E6 + ∅
104 Regular Languages
Reduction of Equational System
In the example, there areseven interdepen-dent relationships that in total define the lan-guage accepted by the automaton illustrated.
In general, there will be |Q| such relation-ships describingL( M ).
The problem now is how to use these rela-tionships to construct a solution
E0 = Rwith R a regular expression over Σ for which
L( R) = L( M )
= w ∈Σ* : δ * ( q0, w ) ∈ F
Regular Languages 105
First notice that a typical relationship
Ei =σ ∈Σ+ σ ⋅ Ei ,σ + Λi
satisfiesexactly oneof the following:
A) Ei does notoccur on the right-hand side.B) Ei doesoccur on the right-hand side.
Case A arises if∀ σ ∈Σ δ ( qi ,σ ) = qi(i.e. no symbol inΣ yields a transition fromqi to itself).
Case B arises if∃ σ ∈Σ δ ( qi ,σ ) = qi(i.e. there is a transition fromqi to itself,labelledσ ∈Σ).
We subsequently refer to
Case A asnon-iterativeandCase B asiterati ve
relationships forEi .
106 Regular Languages
Examples
E1 = 0 ⋅ E2 + 1 ⋅ E4 + ∅E2 = 0 ⋅ E6 + 1 ⋅ E3 + ∅
(Non-iterative)
E0 = 0 ⋅ E0 + 1 ⋅ E1 + ∅E3 = 0 ⋅ E3 + 1 ⋅ E4 + εE4 = 0 ⋅ E5 + 1 ⋅ E4 + εE5 = 0 ⋅ E5 + 1 ⋅ E6 + εE6 = 0 ⋅ E6 + 1 ⋅ E6 + ∅
(Iterative)
Regular Languages 107
Since the operation of concatenation (⋅) dis-tributes over the operation of union (+), i.e
R ⋅ ( S + T ) = R ⋅ S + R ⋅ T
we maysubstitute the right-hand side ofthe relationship governing any non-iterativeEi , for any other occurrence ofEi .
Of course, thismay lead to relationships,Ei ,which had been non-iterative, becomingiterative.
For example, if
E1 = 0 ⋅ E2 + 1 ⋅ E3 + ∅E2 = 0 ⋅ E1 + 1 ⋅ E3 + ∅both of which are non-iterative. Substitutingthe RHS ofE1 in E2 gives:
E2 = 0 ⋅ ( 0 ⋅ E2 + 1 ⋅ E3 + ∅ ) + 1 ⋅ E3 + ∅
= 00⋅E2 + ( 01+ 1 ) ⋅ E3 + ∅
108 Regular Languages
Substitution Rule
Given the system ofk relationships< E0, E1 , . . . ,Ek >
in which E j is non-iterative
Form the j-substituted system, of (k − 1)relationships
< E0, E1 , . . . ,E j−1, E j+1 , . . . ,Ek >
in which theRHS of the relationshipE j issubstituted for every occurrence ofE j in thesystem
< E0, E1 , . . . ,E j−1, E j+1 , . . . ,Ek >
We say a system isfully substituted underthis rule, if every relationship within it isiterati ve, i.e. no more applications are pos-sible (within the current system).
Regular Languages 109
Example
The1-substitutedsystem for the example:
E0 = 0 ⋅ E0 + 1 ⋅ ( 0⋅E2 + 1⋅E4 + ∅ ) + ∅E2 = 0 ⋅ E6 + 1 ⋅ E3 + ∅E3 = 0 ⋅ E3 + 1 ⋅ E4 + εE4 = 0 ⋅ E5 + 1 ⋅ E4 + εE5 = 0 ⋅ E5 + 1 ⋅ E6 + εE6 = 0 ⋅ E6 + 1 ⋅ E6 + ∅
The2-substitutedsystem from this is,
E0 = 0⋅E0 + 1⋅(0⋅(0E6 + 1E3 + ∅) + 1⋅E4 + ∅) + ∅E3 = 0⋅E3 + 1⋅E4 + εE4 = 0⋅E5 + 1⋅E4 + εE5 = 0⋅E5 + 1⋅E6 + εE6 = 0⋅E6 + 1⋅E6 + ∅
110 Regular Languages
This system, which isfully substituted canbe written as:
E0 = 0⋅E0 + 100E6 + 101E3 + 11E4 + ∅E3 = 0⋅E3 + 1 ⋅ E4 + εE4 = 0⋅E5 + 1 ⋅ E4 + εE5 = 0⋅E5 + 1 ⋅ E6 + εE6 = ( 0+ 1 )⋅E6 + ∅Every relationship of which is iterative.
Regular Languages 111
Reduction of Iterative RelationshipsArden’s Rule
Except in rather trivial cases, the systemE(M) associated with theDFA, M , will notgive the desired solution forE0 after the firstset of applications of the reduction ruledescribed above.
The resulting system< E0 , . . . ,Er >
will contain relationships of the form,
Ei = ( Wi ) ⋅ Ei + Ui (§)
whereWi is some regular expression over Σ,andUi is a regular expression over
Σ ∪ E j : 0≤ j ≤ r and j=i
How can this expression be rewritten in anon-iterative form, i.e so thatEi doesnotoccur on the RHS?
112 Regular Languages
In order to get some insight into the process,consider the diagrammatic representation ofthe identity (§) below
qi Ui
Wi
The language recognised fromqi is:concatenations of words in L( Wi )
concatenated withoneword in L(Ui ).
i.e. (Wi )* ⋅Ui .
This suggests theiterati ve relationship,
Ei = ( Wi ) ⋅ Ei + Ui
can be replaced by thenon-iterative form:
Ei = ( Wi )* ⋅Ui
Regular Languages 113
Pictures aren’t ProofsThe discussion above motivates the state-ment of the second reduction rule which isused to reduceiterati ve relationships.
It did not constitute a formal proof.
The result we need is known as
Arden’s Rule
Let L(R) be a regular language described bythe iterative relationship
R = S⋅ R + T
whereS and andT are regular expressions.
a) R = S* ⋅ T is a solution ofR = SR+ T.b) If ε ∈ L(S), R = S* T is theunique solu-tion of R = SR+ T.
114 Regular Languages
Proof of Arden’s Rule
a) It must be shown thatR = S* T, satisfiesL(R) = L(SR+ T), i.e.
S* ⋅T = S⋅( S* ⋅T ) + T
S⋅( S* ⋅T ) + T = S⋅S* ⋅T + T
= ( S⋅S* + ε ) ⋅ T = S* ⋅ T
b) Supposeε ∈ L(S) and L(V) = L(S⋅V + T).
We show L(V) = L(S* T).
L( V ) ⊆ L( S* T ):
Assume the contrary.Let w be ashortestword in
L(V) − L(S* T)
w ∈ L(V) = L(SV+ T). As w ∈ L( S* T )∴ w ∈ L(T) and w ∈ L( S⋅V )
∴ w = ws ⋅ wv, for wordsws ∈ L(S) and wv ∈ L(V).
Regular Languages 115
Furthermore |ws| > 0, as ε ∈ L(S).
If wv ∈ L( S* T )thenw = wswv ∈ L(S S* T + T) = L( S* T ).Contradicting,w ∈ L( S* T ).
If wv ∈ L( S* T ),wv ∈ L(V) and |wv | < |w|,Contradictingw being a
shortestword in L(V) − L( S* T ).
∴ L( V ) ⊆ L( S* T ).
L( S* T ) ⊆ L( V )
The argument is similarly by contradiction.Let w be ashortestword in
L(S* T) − L(V)
w ∈ L(S( S* T ) + T). w ∈ L( SV+ T ).∴ w ∈ L(T) and w ∈ L( S⋅ S* ⋅T )
∴ w = ws ws* wt , for wordsws ∈ L(S), ws* ∈ L(S* ), wt ∈ L(T)
116 Regular Languages
Again, |ws| > 0, as ε ∈ L(S).
If ws* wt ∈ L( V )thenw = wsws* wt ∈ L(S V+ T) = L( V ).Contradicting,w ∈ L( V ).
If ws* wt ∈ L( V ),ws* wt ∈ L(S* T) and |ws* wt | < |w|,Contradictingw being a
shortestword in L(S* T) − L(V).
∴ L( S* T ) ⊆ L( V ).
We hav eproved that if V satisfiesL(V) = L(SV+ T) whenε ∈ L(S),
then
L( V ) ⊆ L( S* T ) and L( S* T ) ⊆ L(V).
i.e. L( V ) = L( S* T ).
Regular Languages 117
Necessity of Conditionε ∈ L(S)If ε ∈ L(S), then there is nounique solution,R, such thatL(R) = L(SR+ T).
Let L(S) = L(X) ∪ ε , with ε ∈ L(X).∀ Y : L(Y) ⊆ Σ*
R = S* ⋅T + X* ⋅Yis a solution ofR= S⋅R + T.
Proof: Need to show,
L( S* T + X*Y ) = L( S⋅(S* T + X*Y) + T )
The right-hand side is:
= S⋅S* ⋅T + T + ( X + ε )⋅X* ⋅Y= ( S⋅S* + ε )⋅T + ( X⋅X* + X* )⋅Y (†)
= S* T + X*Y
Notice that the derivation of (†) requiresS = X + ε , without which the final line ofthe derivation can only be reduced to
S* ⋅ T + X* ⋅ Y = S* ⋅ T + X ⋅ X* ⋅ Y
whose unique solution isY = ∅.
118 Regular Languages
Summary
To obtain a solution forE0 in the systemE( M ) = < E0, E1 , . . . ,Ek >
i.e. a regular expression, R, such thatL(R) = L(M)
Repeat the following 2 steps until a regularexpression forL( E0 ), over Σ is obtained.
1) Construct afully substituted system,
E’( M ) = < E0’ , E1’ , . . . ,Er’ >from E(M) (r’ ≤ k).
2) Apply Arden’s Rule to remove iterati verelationships. Applythe substitution rule, tothe non-iterative relationships that resultfrom this.
Of course, the standard simplifications usingproperties of regular expressions can beemployed at any stage to obtain more man-ageable forms.
Regular Languages 119
Example
The fully substituted example was:
E0 = 0⋅E0 + 100E6 + 101E3 + 11E4 + ∅E3 = 0⋅E3 + 1 ⋅ E4 + εE4 = 0⋅E5 + 1 ⋅ E4 + εE5 = 0⋅E5 + 1 ⋅ E6 + εE6 = ( 0+ 1 )⋅E6 + ∅
Applying Arden’s Rule toE6:
E6 = ( 0 + 1 )* ⋅ ∅ = ∅
The 6-substituted system resulting,
E0 = 0⋅E0 + 101E3 + 11E4 + ∅E3 = 0⋅E3 + 1 ⋅ E4 + εE4 = 0⋅E5 + 1 ⋅ E4 + εE5 = 0⋅E5 + ε
120 Regular Languages
Applying Arden’s Rule toE5,
E5 = 0* ⋅ ε = 0*
Substituting forE5:
E0 = 0⋅E0 + 101E3 + 11E4 + ∅E3 = 0⋅E3 + 1 ⋅ E4 + εE4 = 1 ⋅ E4 + 0⋅0* + ε
Applying Arden’s Rule toE4,
E4 = 1* ⋅ ( 00* + ε ) = 1* ⋅ 0*
and substituing forE4,
E0 = 0⋅E0 + 101E3 + 11⋅1* ⋅0*
E3 = 0⋅E3 + 1 ⋅ 1* ⋅ 0* + ε
Applying Arden’s Rule toE3, giv es
E3 = 0* ⋅ ( 1 ⋅ 1* ⋅ 0* + ε )
= 0* ⋅ 1 ⋅ 1* ⋅ 0* + 0* ⋅ 0*
= 0* ⋅ ( 1 ⋅ 1* + ε ) ⋅ 0*
= 0* ⋅ 1* ⋅ 0*
Regular Languages 121
Substituting forE3,
E0 = 0⋅E0 + 101⋅0* 1* 0* + 11⋅1* ⋅0*
= 0 E0 + 1⋅( 010* + 1 )⋅1* 0*
Finally, applying Arden’s Rule toE0
E0 = 0* ⋅ 1 ⋅ ( 010* + 1 ) ⋅ 1* ⋅ 0*
Thus, LX, the language recognised by theautomaton first introduced on p.38, is thatdefined by the regular expression
0* ⋅ 1 ⋅ ( 010* + 1 ) ⋅ 1* ⋅ 0*
With this example the proof of Theorem 4,
L ⊆ Σ* is a regular set if and only if there isa DFA, M , for whichL( M ) = L.
is complete.
122 Regular Languages
Summary
Theorems (1-4) have established that forL ⊆ Σ* , the following areequivalent
a) ∃ DFA, M , for whichL( M ) = L.b) ∃ NDFA, M , for whichL( M ) = L.c) ∃ ε − NDFA, M , for whichL( M ) = L.d) ∃ RLG, G, for whichL( G ) = L.e) ∃ reg. expr.,R, for whichL(R) = L.
(a) ≡ (b) Theorem 1
(b) ≡ (c) Theorem 2
(a) ≡ (d) Theorem 3
(a) ≡ (e) Theorem 4
It should be noted that the conversion fromautomata to regular expressions in Theorem4, may be applieddirectly to NDFA (the useof DFA in the proof is merely for simplifica-tion).
For ε -NDFA, however, to ensure the unique-ness of solutions resulting from Arden’sRule,ε -loops ought to be removed.
Regular Languages 123
COMP209
Automata and Formal Languages
Section 4
Properties ofRegular Languages
Periodicity,Closure, and Decision
124 Properties of Regular Languages
Limitations of Finite AutomataPeriodicity and the Pumping Lemma
SupposeM is a DFA with n states and thatL( M ) contains words,
w = σ1σ2. . .σ k,
whose length,k, is at leastn.
What can be deduced about the process bywhich M reaches an accepting state givensuchw?
Certainly, there is asequence ofk + 1 statesof M ,
q0 qσ1qσ2
. . .qσ k
with
qσ1= δ ( q0,σ1 )
qσ j= δ ( qσ j−1
,σ j ) 2 ≤ j ≤ k
which are traversed.
SinceM has onlyn states andk + 1 >n thissequence must containat least two occur-rencesof somestateq.
Properties of Regular Languages 125
q0qX
σ i
LOOP
σ j+1
σ k
Thus, there is some stateqX, entered withσ iand re-entered withσ j : i.e.qσ i
= qσ j.
Taking this view, we can regard w as dividedinto 3 parts:
σ1σ2. . .σ i σ i+1
. . .σ j σ j+1. . .σ k
x y z
From our assumptions,w = x⋅y⋅z ∈ L(M).
It must also be the case, however, that:
x⋅ z ∈ L( M ) (by ignoring LOOP)
x⋅y⋅y ⋅ z ∈ L( M ) (by going through LOOP twice)
x⋅yk ⋅ z ∈ L( M ) (by going through LOOP k times)
126 Properties of Regular Languages
The Pumping Lemma(for Regular Languages)
The informal development indicates that:
An n state DFA, M, that accepts words, w, oflength at least n,must accept all wordsx ⋅ yi ⋅ z, ∀ i ≥ 0, for some x, y, z, such thatw = x⋅y⋅z, | x⋅ y |≤ n, and |y| ≥ 1.
The formal statement ofThe Pumping Lemma
for Regular Languages,as this is known, is
Let L be a regular language. There is acon-stant, m, such that if w ∈ L with |w|≥ m,then w may be written asx⋅y⋅z where|x⋅y| ≤ m, |y| ≥ 1, and∀ i ≥ 0, x⋅yi ⋅z ∈ L
Properties of Regular Languages 127
Proof of Pumping Lemma
Let L be a regular language andM a DFAwith L( M ) = L. Fix m = |Q| and let
w = σ1. . .σ k ∈ L k ≥ m.
There must be positionsi and j within w s.t.
δ * ( q0,σ1. . .σ i ) = qX
= δ * ( q0,σ1. . .σ i
. . .σ j )
Let i be the smallest such index. Sincew ∈ L
δ * ( qX,σ j+1. . .σ k ) ∈ F
Setting
x = σ1. . .σ i ; y = σ i+1
. . .σ j ; z = σ j+1. . .σ k
| x⋅ y |≤ m ; |y|≥ 1 ; w = x⋅y⋅z
Since for allt ≥ 0;
δ * ( q0, x ) = δ * ( q0, x⋅yt ) = qX
∴ δ * ( q0, x⋅yt ⋅ z ) ∈ F
i.e. x⋅yt ⋅z ∈ L, ∀ t ≥ 0.
128 Properties of Regular Languages
Applications
The Pumping Lemma, provides a very pow-erful tool with which to demonstrate thatspecific languages arenot regular.
Note that, although it has been hinted thatthere are languages thatcannot be recog-nised by DFA, we hav egiven no concreteevidence of this fact.
With the property of regular languagesdescribed by the Pumping Lemma, we arenow able to provide such evidence.
In particular, we can make precise the asser-tion at the start of the module concerning
L(d) ⊂ 0, 1 * = w : w = Reverse(w)
L(e) ⊂ 0, 1 * = w : w has equal numbers of 0s and 1s
L(g) ⊂ 1 * = w : |w| is a prime number
L(h) ⊂ 0, 1 * = w: w = 1i0 j and j = i2
Properties of Regular Languages 129
Proving L is not RegularUsing the Pumping Lemma
Suppose we have been given a description ofsome languageL ⊆ Σ* .How may it be shown, using the PumpingLemma, thatL is not regular?
Certainly, if L is not regular, then it mustcontainarbitrarily long words.(Exercise:Why?)
The argument proceeds by contradiction:AssumeL is regular.
a) Given any constant, m ≥ 1, choosesomeword in w ∈ L with |w| ≥ m.b) Given any partition of w into x, y, and zfor which:
| x⋅y |≤ m ; | y |≥ 1 ; w = x ⋅ y ⋅ zprove that for some t ≥ 0, the word x⋅yt ⋅zcannot belong toL.
If both are possible: any DFA acceptingL,alsoaccepts words thatare not in L.
130 Properties of Regular Languages
Some Examples
Example 1:L(d) ⊂ 0, 1 * = w : w = Reverse(w)
is not a regular language.
Proof: For any constant m, letw = 0m 1 0m ∈ L(d). For any partition of winto x, y, z for which |xy| ≤ m |y| ≥ 1 andw = xyz, we must have
z = 0m−|xy| ⋅ 1 ⋅ 0m
∴ w = 0|x| ⋅0|y| ⋅ 0m−|xy| ⋅ 1 ⋅ 0m
Now chooset = 0 and then0|x| 0m−|xy| 1 0m = 0m−|y| 1 0m ∈ L(d)
Example 2:
L(e) ⊂ 0, 1 * = w : w has equal numbers of 0s and 1s
is not a regular language.
Proof: Exercise. (Usew = 0m ⋅ 1m ∈ L(e)
to construct the counterexample).
Properties of Regular Languages 131
Example 3:L(g) ⊂ 1 * = w : |w| is a prime number
is not a regular language.
Proof: Given m, let p be any prime numbersuch thatp> m, and setw = 1p ∈ L(g). Con-sider anyx, y, z for which
| xy | ≤ m ; | y |≥ 1 ; x ⋅ y ⋅ z = 1p
x, y, z can be written as,1p = 1|x| ⋅ 1|y| ⋅ 1p−|x⋅y| = 1|y| ⋅ 1p−|y|
Now set t = p − |y|, to give1|x| ⋅ 1(p−|y|)|y| ⋅ 1p−|x|−|y| = 1(p−|y|)(p−|y|)|y| ∈ L(g)
Example 4:L(h) ⊂ 0, 1 * = w: w = 1i0 j and j = i2
is not a regular language.
Proof: Exercise.(Usew = 1p0p2
, for p> m).
132 Properties of Regular Languages
Non-trivial Exercise(Maths. and Joint Maths/C.S. Only)
Suppose we view w ∈1 ⋅ 0, 1 * as thebinary r epresentation of some naturalnumber,bin( w ) ∈N.
[ The condition thatw starts with a 1 is toensure that every natural number has aunique representation.]
Show that, the language,PRIMES
w ∈ 1 ⋅ 0, 1 * : bin(w) is a prime number
is not a regular language.
Note that whereas,L(g) uses aunary encod-ing system for numbers,PRIMES uses abinary encoding and it cannot be deducedthat PRIMESis not regular fromL(g) beingso.
[ Hint: Use Fermat’s Theorem:2p−1 ≡ 1 mod p for all primesp> 2.]
Properties of Regular Languages 133
Closure Properties
SupposeL1 and L2 are both regular lan-guages over Σ.
We know, from the definition ofregular lan-guage, that:
L1 ∪ L2
L1 ⋅ L2
( L1 )*
are also regular languages.
What, however, can we say about, e.g.:
L1 ∩ L2
Co− ( L1 )
etc
?
134 Properties of Regular Languages
In general, suppose
Ψ : (℘( Σ* ))k → ℘( Σ* )
is an arbitrary operation defining some lan-guage over Σ from any collection of k ≥ 1languages over Σ.
For example: Ψ = ∪ with k = 2, Ψ = Co−with k = 1.
The class of properties which we are con-cerned with here, are known as:
Closure Properties of(Families of) Languages
Properties of Regular Languages 135
Formally,
A family of languages over Σ, is a subset ofall possible languages over Σ. We use ℜ todenote an arbitrary family so that
ℜ ∈ ℘ ( ℘( Σ* ) )
A family, ℜ is said to beclosed under an operation
Ψ of k arguments if
∀ < L1, L2 , . . . ,Lk > ∈( ℜ )k
Ψ( L1, L2 , . . . ,Lk ) ∈ ℜ
That is, applyingΨ to any collection of klanguages inℜ, always produces some lan-guage inℜ.
136 Properties of Regular Languages
Example
For the family,
Reg = L ⊆ Σ* : L is a regular language
a) Regis closed under∪.i.e. the union of regular languages is a regu-lar language.
b) Regis closed under concatenation, (⋅).i.e. the concatenation of regular languages isa regular language.
c) Regis closed under* .i.e. The *-closure of a regular language is aregular language.
The questions introduced above can bephrased as:
Is Regclosed underintersection(∩)
Is Regclosed undercomplement(Co−)?
Properties of Regular Languages 137
Theorem 5:
a) Regis closed under complement.b) Regis closed under intersection.
Proof:a) Let L ∈ Reg. From Theorem 4, there is aDFA,
M = ( Q, Σ, q0, F ,δ )for which L( M ) = L.
The DFA, MCo = ( QCo, Σ, qCo0 , FCo,δCo )
withL( MCo ) = w ∈Σ* : w ∈ L = Co− ( L )
is formed byQCo = Q; qCo
0 = q0; δCo = δ ;and
FCo = qCok ∈QCo : qk ∈ F
i.e. q is an accepting state inQCo if and onlythe corresponding state inM is not anaccepting state.
138 Properties of Regular Languages
It is obvious that:
L( MCo ) = w ∈Σ* : δ *Co( qCo
0 , w ) ∈ FCo
= w ∈Σ* : δ * ( q0, w ) ∈ F
= Σ* − w ∈Σ* : δ * ( q0, w ) ∈ F
= Co− ( L( M ) ) = Co− ( L )
b) Since ∪, ∩,Co− defines a Booleanalgebra with respect to sets of words over Σ,from De Morgan’s Laws:
Co− ( L1 ∪ L2 ) = Co− ( L1 ) ∩ Co− ( L2 )
L1 ∩ L2 = Co− ( Co− ( L1 ) ∪ Co− ( L2 ) )
Properties of Regular Languages 139
(Non-trivial) ExerciseGive a direct construction of aDFA, M∩,recognisingL1 ∩ L2 from DFA M1 and M2with
L( M1 ) = L1 andL( M2 ) = L2.i.e. without using DeMorgan’s Laws.
Hint: If,M1 = ( Q1, Σ, q1
0, F1,δ1 )M2 = ( Q2, Σ, q2
0, F2,δ2 )
consider theDFA,M∩ = ( Q∩, Σ, q∩
0 , F∩,δ∩ )
For whichQ∩ = Q1 × Q2 ; q∩
0 = < q10, q2
0 >;andδ∩(< qi , q j > ,σ ) =< δ1(qi ,σ ),δ2(q j ,σ ) >
If F∩ = F1 × F2, what is
w ∈ Σ* : δ *∩( < q1
0, q20 > , w ) ∈ F∩ ?
140 Properties of Regular Languages
Some More Closure Properties
We giv e two further properties under whichregular languages are closed.
The first of these,substitution, provides auseful mechanism for mapping between dif-ferent alphabets.
The secondquotient illustrates that closurepropertiesdo not require explicit effectiveconstructions in order for the property tohold.
Properties of Regular Languages 141
Substitution
Let Σ1, and Σ2 be alphabets;℘( Σ*
2 ) the set ofall languages over Σ2.
A substitution function, f , is a mappingfrom symbols in Σ1 to languagesover Σ2,i.e.
f : Σ1 → ℘( Σ*2 )
f is extended to f (word) mapping fromwords over Σ1, by
f (word)( w ) =
ε if w = εf ( σ ) if w = σ ∈Σ1
f ( σ )⋅ f (word)( u ) if w = σ ⋅u ∈Σ*1
Finally, f (word) is extended f (lang) mappingfrom languagesover Σ1 to languagesoverΣ2 by,
f (lang) ( L ) =w ∈ L∪ f (word)( w )
142 Properties of Regular Languages
Despite the, superficially, inv olved definitionof substitution, the proof of the followingresult is quite easy.
Theorem 6:Let f : Σ1 → ℘( Σ*
2 ), be such that∀ σ ∈Σ1 f ( σ ) ∈ Reg.
If L is a regular language over Σ1 thenf (lang)( L ) is a regular language over Σ2.
Proof: (Outline)Let R be a regular expression over Σ1 forwhich L( R) = L. Let Rσ be the regularexpression over Σ2, for the languagef ( σ ).
To obtain a regular expression R( f ), forf (lang)( L ), replace each occurrence of thesymbol σ in R by the regular expression,Rσ .
Properties of Regular Languages 143
It is easy to show, that
L( (S+ T) f ) = L( Sf ) ∪ L( T f )
L( (S⋅T) f ) = L( Sf ) ⋅L( T f )
L( (S* ) f ) = L( ( Sf )* )
i.e. if R ∈ S+ T, S⋅T, S* , where S and Tare regular expressions, then the languagedescribed by applying the substitutionf toS and T separately, is exactly the same asthe language obtained by applyingf to thecombination of these.
An easy induction on the number of opera-tions definingR completes the proof.
144 Properties of Regular Languages
Example
With Σ1 = 0, 1 , Σ2 = a, b .
R= 0⋅1 ( 0⋅0+ 1⋅1 )* ⋅1 ⋅ 0
Let f : 0, 1 → ℘( a, b * ) be giv en by:
f ( σ ) =
( a⋅a ) if σ = 0
( b⋅b⋅a ⋅a)* if σ = 1
Then,R f
= (aa)(bbaa)* ((aaaa) + (bbaa)* (bbaa)* )* (bbaa)* (aa)
= (aa)(aaaa+ (bbaa)* )* (aa)
w = 0110∈ L(R);
f (word)( w ) ∈℘( a, b * )
= aa(bbaa)* (bbaa)* aa= aa(bbaa)* aa
andL( aa(bbaa)* aa) ⊂ L( R f ).
Properties of Regular Languages 145
Exercise
Let Bk be the alphabet containing 2k sym-bols Bk = 0, 1, 2 , . . . , 2k − 1 e.g.B4 = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A. B,C, D, E, F
Any word, starting with a symbol other than0 in B*
k can be interpreted as theuniquebase 2k representation of some naturalnumberbasek( w ) ∈N.
Using the fact that the languagePRIMES
w ∈ 1 ⋅ 0, 1 * : bin(w) is a prime number
defined on p.130 is not regular, show that∀ k > 1, the languagePRIMES_base_k,
w ∈ B*k : w = σ ⋅ u,σ ∈ Bk−1, basek(w) is a prime number
is not a regular languagewithout using thePumping Lemma.
[Hint: Use an appropriate substitution,f : Bk → ℘( 0, 1* ).
]
146 Properties of Regular Languages
Quotient
Let L1 and L2 be languages over the samealphabetΣ.
The quotient of L1 with respect to L2,(denotedL1 / L2), is
L1 / L2 = v : ∃ u ∈ L2 such thatv ⋅ u ∈ L1
In other wordsL1/L2 comprises those words(v) for which there issome word u ∈ L2such that concatenatingv andu gives a wordin L1.
ExampleΣ = 0, 1, 2, 3.
L1 = L( (00+ 11)* ⋅(22+ 33)⋅2* )L2 = L( 33⋅2* ).
L1/L2 = L( ( 00 + 11 )* )since given any w ∈ L( (00+ 11)* ),w⋅33 ∈ L1 and 33∈ L2.
[N.B. In general, (L1/L2)⋅L2 = L1 ]
Properties of Regular Languages 147
Theorem 7:Let L1 be a regular language over Σ and L2beany language over Σ.
L1/L2 ∈ Reg.
Proof:Let M = ( Q, Σ, q0, F ,δ ) be a DFA withL(M) = L1. DefineFquot to be the subset ofQ such that:
Fquot = q ∈Q : ∃ u ∈Σ* such thatδ * ( q, u ) ∈ F
The,DFA, M’ , giv en by
( Q, Σ, q0, Fquot,δ )
is such thatL( M’ ) = L1/L2.
To see this observe that,
w ∈ L( M’ ) ⇔ δ * ( q0, w ) = q ∈ Fquot
⇔ ∃ u ∈ L2 such thatδ * ( q, u ) ∈ F
⇔ δ * ( q0, w⋅u ) ∈ F )
⇔ w ⋅ u ∈ L1 whenu ∈ L2
⇔ w ∈ L1/L2
148 Properties of Regular Languages
Commentary on the Proof
A ‘ problem’ with the argument above is thatit is not constructive, i.e. given L1 and L2we are not toldhow to identify the setFquotthat will form the set of accepting states inM’ .
The proof relies simply on the observationthat, irrespective of the languageL2, thereexists such a subsetamong the states ofM ,the DFA recognising L1. (Of course thissubset may be anything from the empty setto every state inQ).
Notice that inspectingev ery state q ∈Q inturn will effect aconstructive solution pro-vided that it is possible correctly to deter-mine the following:
Is w ∈Σ* : δ * ( q, w ) ∈ F ∩L2 = ∅? (*)
Properties of Regular Languages 149
Decision Methods for Regular Sets
In discussing how a DFA recognisingL1/L2could be constructed, it was noted (in (*))that this was possible in those cases forwhich it coulddecidedif the interesction oftwo languages was empty or not.
Given a DFA, M = ( Q, Σ, q0, F ,δ ), the lan-guage,
Li = w ∈Σ* : δ * ( qi , w ) ∈ F
is a regular language.
From Theorem 5(b), we know that the inter-section of regular languages is a regular lan-guage.
∴ A DFA, M , acceptingL1/L2 when bothL1, L2 ∈ Reg can be explicitly constructedif:
There is an algorithm which given adescription of a regular language,L, asinput, returnstrue if L = ∅ and false other-wise.
150 Properties of Regular Languages
Decision Questions for Languages
There are many questions of the form
‘Does a given language L, have a particularproperty of interest?’
In dealing with suchlanguage propertieswe are generally interested in 2 types ofresult:
a) Positive (algorithmic) results: a descrip-tion of an algorithm that takes as input a(finite) description of a languageL return-ing true if and onlyL has the property con-cerned.
b) Negative (‘undecidability’) results: a for-mal proof that no algorithm for deciding theproperty concerned is possible.
We shall deal with the latter category muchmore extensively in the final section of thismodule.
Properties of Regular Languages 151
Properties of Interestfor Regular Languages
Assume, without loss of generality, that topresent afinite description of some regularlanguage,L, a regular expression,R, withL(R) = L is used.
Three basic questions we can seek algo-rithms for are:Given R a regular expression over Σ:
Is L( R) = ∅?Is L( R) afinite language?
Is L( R) an infinite language?
The relationships between the languagesdescribed by different regular expressions,R1 and R2 also present important decisionquestions:Given R1, R2 regular expressions over Σ:
Is L( R1 ) = L( R2 )?Is L( R1 ) ⊆ L( R2 )?
We shall show thatall of these properties forregular languages have decision algorithms.
152 Properties of Regular Languages
Deciding if L isEmpty, Finite, or Infinite
Theorem 8:Let M = (Q, Σ, q0, F ,δ ) be aDFA.
a) L( M ) = ∅ ⇔ ∃ w ∈ L(M) s. t. |w| < |Q|
b) L( M ) is infinite ⇔ ∃ w ∈ L(M) s. t. |Q| ≤ |w| < 2|Q|
Proof:a) That L(M) is non-empty if it acceptssomethingis obvious. SupposeL( M ) = ∅,let w be a shortest word inL(M). From thePumping Lemma |w| < |Q| otherwise wecould write w = xyz with |y| ≥ 1 andxz∈ L(M), contradicting the choice ofw.
b) If w ∈ L(M), with |Q| ≤ |w| < 2|Q|, thenthe Pumping Lemma shows L(M) to be infi-nite. If L(M) is infinite, letw be the shortestword in L(M) of length≥ |Q|. If |w| ≥ 2|Q|the Pumping Lemma gives a contradictionsince we can writew = xyz, |xy| ≤ |Q|,|y| ≥ 1 and xz∈L(M).
Properties of Regular Languages 153
Consequences of Theorem 8
From Theorem 8(a), we get a (not verygood) algorithm to test ifL(M) = ∅:
Generate each word over Σ of length up to|Q| − 1, and test ifM accepts any. If all arerejected thenL( M ) = ∅.
And similarly, from Theorem 8(b), another(not very good) algorithm to test ifL(M) isinfinite.
Generate each word over Σ of lengthbetween |Q| and 2|Q| − 1. If all are rejectedthenL(M) is finite.
154 Properties of Regular Languages
(Not too difficult) Exercise
Describe a more efficient algorithm for test-ing if L(M) = ∅ which works by construct-ing the set of states thatcould be reachedfrom the initial state.
[ Obviously if no q ∈ F can be reached thenL(M) = ∅.].
(Slightly more difficult) Exercise
Using a similar approach, describe a moreefficient algorithm for testing ifL( M ) isinfinite.
[ Consider constructing (non-simple) pathsof states starting fromq0 which containtwooccurrences of some state inQ and end withsome state inF .]
Properties of Regular Languages 155
Comparison of Languages Accepted by DFA
We now giv e two easy constructions fordeterminining ifDFA M1 andM2 satisfy
L( M1 ) = L( M2 )L( M1 ) ⊂ L( M2 )
To simplify notation, letL denoteCo− (L).
For the first construction, using Theorem5(a,b) build a DFA,M3 for which L( M3 ) is
( L(M1) ∩ L(M2) ) ∪ ( L(M2) ∩ L(M1) )
Then,L( M3 ) = ∅ ⇔ L( M1 ) = L( M2 )
For the second constructM3 so that,
L( M3 ) = ( L( M1 ) ∩ L(M2) )
If L( M3 ) = ∅ thenL(M1) ⊆ L(M2 ).Equality can be ruled out by using the firsttest described.
156 Properties of Regular Languages
COMP209
Automata and Formal Languages
Section 5
Construction and Uniqueness ofMinimum Number of States
Finite Automata
Minimum State DFA 157
A very important property of regular lan-guages is that for every regular languageRover Σ there is a ‘unique’ minimum num-ber of states, DFA, M , such thatL( M ) = R.
In addition, this automaton can be recoveredby an efficient algorithm from any DFArecognisingR.
Question
Why ‘unique’ rather thanunique?
Answer
Because choosing any renaming f : Q ↔ Qof the states of aDFA, M , obviously gives aDFA, M’ , recognising exactly the samelanguage asM .
158 Minimum State DFA
To make this concept of ‘unique’ precise,we can formally define such renaming pro-cesses via an equivalence relation≡iso, sothat
Let,M1 = ( Q1, Σ, q1
0, F1,δ1 )M2 = ( Q2, Σ, q2
0, F2,δ2 )
M1 ≡iso M2, if there exists a bijectionβ : Q1 ↔Q2 such that,
β ( q10 ) = q2
0
qk ∈ F1 ⇔ β ( qk ) ∈ F2
∀ σ ∈Σδ1( qi ,σ ) = q j
⇔δ2( β ( qi ),σ ) = β ( q j )
Thus, ‘unique’ is with respect to member-ship of an equivalence class under the rela-tion ≡iso.
Minimum State DFA 159
Example
0, 1
00
0
0
1
1
11
101
0
q1
q6q5
q4
q3q2q0
0, 1
00
0
0
1
1
11
101
0
q6
q3q2
q1
q0q5q4
160 Minimum State DFA
Example Continued
Using the bijection:
β ( q0 ) = q4
β ( q1 ) = q6
β ( q2 ) = q5
β ( q3 ) = q0
β ( q4 ) = q1
β ( q5 ) = q2
β ( q6 ) = q3
We hav e:
q10 = q0 ; q2
0 = q4 = β ( q10 )
F1 = q3, q4, q5
F2 = q0, q1, q2 = β ( q3 ), β ( q4 ), β ( q5 )
The transition functions meet the require-ments, so as is evident from the diagram, theautomata are equivalent under≡iso.
Minimum State DFA 161
Overview of Mininimisation Algorithm
The key idea underlying the method fordetermining if aDFA, M , has the minimumnumber of states needed to recogniseL(M),is that of identifying sets ofindistinguish-ablestates inQ(M).
What is meant by two states -qi , q j of aDFA, M , being indistinguishable?
If there is somew ∈Σ* for which
δ * ( qi , w ) ∈ F
AND
δ * ( q j , w ) ∈ F
OR
δ * ( qi , w ) ∈ F
AND
δ * ( q j , w ) ∈ F
then, clearly, the statesqi and q j performdifferent roles withinM .
Formally, the language recognised withqi asinitial state, isdifferent from the languagerecognised withq j as initial state:w is amember ofexactly oneof these.
162 Minimum State DFA
If, on the other hand,
∀ w ∈Σ*
EITHER
δ * ( qi , w ) ∈ F andδ * ( q j , w ) ∈ F
OR
δ * ( qi , w ) ∈ F andδ * ( q j , w ) ∈ F
then the language recognised starting fromqi is identical to that recognised startingfrom q j .
But if this is the case, why are two separatestates (qi andq j ) necessary inM?
The answer, of course, is that it isnot neces-sary: aDFA recognising the same languageasM is given by:
a) Deleting qi , q j f rom Q(M).b) Adding a new stateq i , j .c) Modifying δ : transitions into qi or q jbecome transitions intoq i , j ; transitionsfrom qi become transitions fromq i , j .
The new DFA recognisesL( M ) with oneless state.
Minimum State DFA 163
Thus, we say that the statesqi and q j of aDFA, M , are indistinguishable if
Li = w ∈Σ* : δ * ( qi , w ) ∈ F
is the same as
L j = w ∈Σ* : δ * ( q j , w ) ∈ F
If Li = L j , then qi and q j are distinguish-ablestates inM .
It is obvious that in a minimum number ofstatesDFA, M , every state,qi of M , is dis-tinguishable from every other stateq j of M .
164 Minimum State DFA
Detecting Indistinguishable State Sets
Let M = ( Q, Σ, q0, F ,δ ). Define a relation,∼, over pairs of states inQ by
qi ∼ q j if Li = L ji.e. if the two states are indistinguishable.
The relation∼ is an equivalence relation,i.e.
∀ i , j , k
qi ∼ qi
qi ∼ q j ⇔ q j ∼ qi
qi ∼ q j andq j ∼ qk ⇒ qi ∼ qk
The number of equivalence classes definedby ∼ for the DFA, M , will thus correspondto the number ofstatesin a minimisedDFAacceptingL(M).
Minimum State DFA 165
Exercise
LetM = ( Q, Σ, q0, F ,δ )
be aDFA.
Using the results already proved that:
a) DFA accept exactly the class of regularlanguages (henceL(M) for any DFA M isdescribed by some regular expression).
b) there is an algorithm to test if two regularexpressions,S andT, describe the same lan-guage - i.e.L(S) = L(T).
describe a method of constructing the parti-tion of Q into the equivalence classesinduced by the indistinguishabilty relation∼.
[Note: This algorithm is far from being themost efficient approach that could be used.].
166 Minimum State DFA
A better (in comparison with the exercisemethod suggested above) approach to con-structing the required partition ofQ is takesome initial ‘approximation’ to i t, and‘refine’ this until the final set of equivalenceclasses has been identified.
From our definition, we know that twostates, qi and q j are distinguishable ifLi = L j .i.e. there is some word, w ∈Σ* , belonging toexactly one ofLi andL j .
How long would a distinguishingw have tobe?
At most |Q| − 1. (Exercise:Why?)
The fact that there is anupper bound on thelengths of ‘distinguishing words’ indicatesthat the process below terminates.
Minimum State DFA 167
k-indistinguishability
For a DFA, MM = ( Q, Σ, q0, F ,δ )
The statesqi , q j are 0-indistinguishableqi ∼0 q j
if both are inF or neither of them are.
Statesqi , q j arek-indistinguishable (k > 0)qi ∼k q j
if
∀ σ ∈Σ δ ( qi ,σ ) ∼k−1 δ ( q j ,σ )
and
qi ∼0 q j
qi is k-distinguishable from q j if it is notthe case thatqi ∼k q j .
Thus, ifqi ∼k q j then there is no word, w, oflength ≤ k for which exactly one ofδ * ( qi , w ) ∈ F , δ * ( q j , w ) ∈ F holds, i.e.Liand L j containexactly the samesubset ofwords of length≤ k.
168 Minimum State DFA
It should be clear thatqi ∼ q j if and only if∀ 0 ≤ k < |Q| qi ∼k q j
This gives the following procedure for deter-mining the equivalence classes forM underthe relation∼, which works by refining thepartition of Q induced by∼k to form thatinduced by∼k+1.
We usePk = < C1; C2 ; . . . ; Cm >
to denote the equivalence classes induced by∼k. Thus
Ci ⊆ Q and∀ q, q’ ∈Ci q ∼k q’.
In the algorithm description,Ci will bereferred to a ablock of the partitionPk.
Minimum State DFA 169
State Partitioning Algorithm
1) k : = 0; P0 : = < Q − F ; F >
2) k : = k + 1;
3) Form the partitionPk:
IfPk−1 = <C1 ; C2 ; . . . ; Cr >
then the two states,q, q’ are in the sameblockCi of Pk if and only if:
a) q and q’ are in the same blockC j ofPk−1.
AND
b) ∀ σ δ ( q,σ ) andδ ( q’,σ ) are in the sameblock,C j ,σ of Pk−1.
4) If Pk = Pk−1 go to (2).
170 Minimum State DFA
Commentary
a) On each iteration, each block,Ci , of thepartition Pk, either remains unchanged or issplit into smaller sets which correspond toan equivalence class of (k + 1)-indistinguish-able states within thek-indistinguishableblockCi .
b) In implementing Step(3), the ‘obvious’approach of considering all pairs of distinctstates can be improved upon using appropri-ate data structures.
c) The final partition produced correspondsto the equivalence classes induced by therelation∼ onQ.
d) In constructing the final equivalentautomaton, we would always consider thatstates unreachable fromq0 are eliminated.
Minimum State DFA 171
Example
q0 q1 q2 q3
q4 q5q6 q7
0
1
0
0
0
0
0
0
1
1
1
1
1
01
1
[ From Hopcroft and Ullman, 1979, p.68]
172 Minimum State DFA
Initial Partition P0:
C0,1 C0,2
q0, q1, q3, q4, q5, q6, q7 q2
Formation of Partition P1:
Block C0,1
State 0-block 1-blockq0 C0,1 C0,1
q1 C0,1 C0,2
q3 C0,2 C0,1
q4 C0,1 C0,1
q5 C0,2 C0,1
q6 C0,1 C0,1
q7 C0,1 C0,2
Block C0,2
State 0-block 1-blockq2 C0,1 C0,2
∴ P1 is
C1,1 C1,2 C1,3 C1,4
q0, q4, q6 q1, q7 q3, q5 q2
Minimum State DFA 173
Formation of Partition P2:
Block C1,1
State 0-block 1-blockq0 C1,2 C1,3
q4 C1,2 C1,3
q6 C1,1 C1,1
Block C1,2
State 0-block 1-blockq1 C1,1 C1,4
q7 C1,1 C1,4
Block C1,3
State 0-block 1-blockq3 C1,4 C1,1
q5 C1,4 C1,1
Block C1,4
State 0-block 1-blockq2 C1,1 C1,4
∴ P2 is
C2,1 C2,2 C2,3 C2,4 C2,5
q0, q4 q6 q1, q7 q3, q5 q2
174 Minimum State DFA
It is easy to check that no further changesoccur to this partition, i.e.P3 = P2.
So the minimised form of the exampleautomaton has5 states (rather than8) and isshown below:
q0,4
q3,5
q1,7 q6
q2
1
0 0 0
00
1
1
1
1
Minimum State DFA 175
But, how do we know it is minimal?(The Myhill-Nerode Theorem)
The construction that we have just presentedtakes a given DFA, M recognisingL, and,by identifying equivalence classes of indis-tinguishable states, forms aDFA acceptingL but with (possibly) fewer states.
There are, however, infinitely many DFA,that recognise anyoneregular language,L.
Suppose we had started with a ‘different’DFA, M’ , for L.
QuestionCould it happen that the ‘reduced’DFAformed fromM’ has adifferent number ofstates compared to the ‘reduced’ form ofM?
AnswerNo (assuming states unreachable fromq0are removed from both automata).
176 Minimum State DFA
Mor e Equivalence Relations
In order to establish uniqueness of the min-imised DFA, we introduce (another) equiv-alence relation over words in Σ* .
Let L ⊆ Σ* , the relation≈L between words inΣ* isx ≈L y ⇔ ∀ z ∈Σ* : ( x⋅z ∈ L ⇔ y⋅z ∈ L )
Notice ≈L is properly defined, since (atworst) eachw ∈Σ* could be the only mem-ber of its equivalence class.
For L ⊆ Σ* , we define Index( L ) to be thetotal number of equivalence classesinduced by≈L
Our statement of the next theorem is techni-cally different from the orignal form proved.The version we give is, however, triviallydeducible from the ‘usual’ form.
Minimum State DFA 177
Theorem 9: (Myhill-Nerode Theorem)Let L be a regular language over Σ, and
M = ( Q, Σ, S, F ,δ )a minimum number of statesDFA withL( M ) = L.
Index( L ) = |Q( M ) |
Proof:a) |Q( M ) | ≤ Index( L )We may assume thatIndex( L ) is finite(otherwise the inequality is trivially correct),and letr = Index(L) with
<C1 ; C2 ; . . . ; Cr >the partition ofΣ* induced by≈L .
Let M’ = (Q, Σ, S, F ,δ ) be the DFA:
Q = q1, q2 , . . . ,qr
F = qi : L ∩ Ci = ∅
S = qi : ε ∈Ci
δ ( qi ,σ ) = q j if ( ∀ w ∈Ci w⋅σ ∈C j )
178 Minimum State DFA
Note thatδ is well-defined since from thedefinition of≈L :
w1 ≈L w2 ⇒ ( ∀ σ ∈Σ w1⋅σ ≈L w2⋅σ )
We claim thatL( M’ ) = L,
L( M’ ) = w ∈Σ* : δ * ( S, w ) ∈ F
= w : ε ⋅ w ∈Ci : L ∩ Ci = ∅
∪ Ci
= L
So we have a Index(L)-state DFA, M’recognisingL.
b) |Q(M)| ≥ Index(L)Define the relation≈M over Σ* by
x ≈M y ⇔ δ * ( q0, x ) = δ * ( q0, y ).Obviously the number of equivalenceclasses induced by≈M is exactly |Q(M)|.
If we take any equivalence class,Si in theinduced partition ofΣ* , we hav e
∀ x, y ∈ Si ∀ z ∈Σ*
x⋅z ≈M y⋅zand
x⋅z ∈ L ⇔ y⋅z ∈ L
Minimum State DFA 179
Thus,x ≈M y ⇒ x ≈L y
showing thatthe number of equivalence classes of≈M
i.e. |Q(M)|is at least as large as
the number of equivalence classes of≈Li.e. Index( L ).
So we have both,|Q( M )| ≤ Index( L )
and|Q( M )| ≥ Index( L )
Hence,
|Q( M ) | = Index( L )
180 Minimum State DFA
Uniqueness of Minimal State Construction
Theorem 10:Let
M’ = ( Qm, Σ, Sm, Fm,δ m )be theDFA resulting from the State Minimi-sation Algorithm when given
M = ( Q, Σ, S, F ,δ )where we assume thatQm has no unreach-able states.
|Qm | = Index( L( M ) ).
Proof: We know that |Qm|≥ Index( L(M) ).
Suppose |Qm | > Index( L(M) ) and let,
( D1 ; D2 ; . . . ; Dm )
( C1 ; C2 ; . . . ; Dr )
be the partitions ofΣ* induced by the rela-tions≈M’ and≈L(M) respectively.
Minimum State DFA 181
Since,x ≈M’ y ⇒ x ≈L(M) y
there must be setsDi , D j , andCk for whichDi ∪ D j ⊆ Ck
Consider the corresponding statesqi andq jin Qm.
These must bedistinguishable (or the algo-rithm would have merged them into onestate). Thus,∃ w ∈Σ* δ * ( qi , w ) ∈ Fm ⇔ δ * ( q j , w ) ∈ Fmi.e.∃ x ∈ Di , y ∈ D j : x⋅w ∈ L(M) ⇔ y⋅w ∈ L(M)
But this contradictsx ∈Ck and y ∈Ck,since
x, y ∈Ck ⇒ x ≈L(M) y
⇔ ∀ w ( x⋅w ∈ L(M) ⇔ y⋅w ∈ L(M) )
This contradiction shows that we must have|Qm|= Index( L(M) ). .
182 Minimum State DFA
Summary
a) The Minimisation Algorithms and its cor-rectness proof via the Myhill-Nerode Theo-rem complete our development of the firstpart of the module.
b) DFA and regular languages provide avery extensive collection of ideas, a numberof which will recur when examining more‘powerful’ ‘ black-box’ configurations.
c) Despite the flexibility of DFA we haveseen that there are some quite simple lan-guages which are beyond their recognitioncapabilities, e.g. palindromes, equal num-bers of zeros and ones.
d) In the next part of the module we con-sider a ‘natural’ enhancement ofDFA capa-bilities that doesextend the range of "suit-able" languages.
Minimum State DFA 183
COMP209
Automata and Formal Languages
Section 6
Context-Free Grammars
184 Contex-Free Grammars
Introduction
Over the next few lectures we examine asecond class of methods for describing andrecognising languages.
We introduce this by considering a simpleextension to the one class ofgrammars thathas been seen.
Later we shall see how this extension can beparalleled by a similarly simple extension tothe ‘black-box’ functionality offered byDFA.
Context-Free Grammars 185
Why another class of grammar?
Recall that the class of grammars corre-sponding toDFA — Right Linear Gram-mars — restrictproduction rules to theform:
Vi → σ ⋅ V j ; Vi → σwhereσ ∈T (i.e. aterminal symbol)
Consider an application such as
• Checking if astatementin aJava programis syntactically correct
Among the ‘sub-tasks’ that one might haveto carry out in order to do this, are:
a) Checking if arithmetic expressions are‘well-formed’;b) Checking if an "if .. then .. else" state-ment has correct nesting of sub-statements:i.e. that there are no unmatched or sym-bols.
186 Contex-Free Grammars
Neither of these can be recognised usingRLGs.
Exercise:Both (a) and (b) involve recognis-ing properly balanced sequences of left andright brackets — ‘(’ and ’)’ in (a); ‘’ and‘’ in (b). Prove that these languages are notregular.[Hint: Use the Pumping Lemma and wordsof the form m⋅ m]
One of the most important applications ofFormal Grammars, in Computer Science, isas a means of providing afinite descriptionof all syntactically correct statementswithin a High-Level Programming Lan-guage.
Clearly, RLGs are insufficient for this pur-pose.
What is the ‘minimal’ extension in the formof allowed production rules that wouldremove this problem?
Context-Free Grammars 187
Context-Free Grammars
A context-free grammar (CFG),G = ( V,T, P, S), is a formal grammar inwhich all productions,p ∈ P take the form
Vi → wwhereVi ∈V andw ∈( V ∪ T )*
So while theleft-hand side of any produc-tion is (still) only allowed to containexactlyone non-terminal symbol, theright-handside may comprise an arbitrary word builtfrom terminal and non-terminal symbols.
It should be obvious that any languagedefined by aRLG (i.e. regular language) isdefined by aCFG (since RLGs are just arestricted type ofCFG), i.e.CFGs areat least as ‘expressive’ as RLGs.
Before examining their properties in greaterdetail, a simple example that CFGs aremore‘expressive’ than RLGs is given.
188 Contex-Free Grammars
A Context-Free Grammar for aNon-regular Language
It was shown earlier (Ex. 1, p.130) that thelanguage,
L(d) = w ∈ 0, 1 + : w = Reverse( w )
is not a regular language.
L(d) is, however, generated by theCFGG = ( V,T, P, S) with
V = S ; T = 0, 1 andP having 6 production rules:
S → 0 S0 ; S → 1 S1
S → 0 0 S → 1 1
S → 0 ; S → 1
Exercise: Show, using induction on |w| ≥ 1,that S⇒(*)
G w if and only if w is a palin-drome, i.e.w = Reverse(w).
Context-Free Grammars 189
Derivation Trees
Given some description of a languageL ⊆ Σ* as a Formal Grammar, G, and aword, w ∈Σ* , the most basic question toaddress is
Is w ∈ L( G )?With the operational mechanisms providedby DFA and RLGs when L is a regular lan-guage such questions are ‘straightforward’.
In effect, if G is a RLG then the ‘chain ofproductions’ used to show S ⇒(*)
G w isimmediate (there is at most one non-terminalto expand at each step).
Derivations inCFGs are complicated by thepossible presence of several non-terminalsymbols in a production rule,
Derivation (a.k.a. Parse) trees provide ameans with which to illustrate how w isderived and suggest an automated method oftesting if w is accepted by a given CFG.This is important in Compiler Construction.
190 Contex-Free Grammars
Let G = ( V,T, P, S) be a CFG.
A derivation tr eein G is a treeD, each ver-tex u of which has a labelλ( u ) satisfying:
1) ∀ u λ( u ) ∈V ∪ T ∪ ε .
2) λ( root of D ) = S.
3) If u is non-leaf vertex in D, thenλ( u ) ∈V
4) If λ( u ) = A ∈V and u has children< u1 , . . . ,un > (from ‘left-to-right’) then
A → λ( u1 ) ⋅ λ( u2 ) ⋅ . . . ⋅ λ( un ) ∈ P.
5) If λ( u ) = ε , thenu is a leaf and its parentin D has no other children.
Note Tw o or more vertices may have exactlythe same label fromV ∪ T ∪ ε .
Context-Free Grammars 191
Example 1
S
1
S
S
S
0
1
0
0 0
0
Derivation tree (for word 0010100) using‘Palindrome’G.
192 Contex-Free Grammars
Example 2
op
opd
num
digit
6
3 5
+
*
E
E E
E Eop
opd opd
num num
digit digit
EE
E
E
op
opdop E
numopd +
*
num
opd
digitnum
digit
6
digit
3
5
Tw o derivation trees (for 6+ 3 * 5) in EXPRCFG.
Context-Free Grammars 193
Properties and Attributesof Derivation Trees
A derivation tree illustrates how a word wmay be derived from S( G ) using the pro-duction rules inP( G ).
The word produced by concatenating termi-nal symbols labelling the leaf vertices of aderivation tree (using the ‘natural’ left-rightordering) is called theyield of the tree.
[Hence the three examples have yields,0010100, 6+3*5, and 6+3*5].
There may be many derivation trees forGwith the same yield.
194 Contex-Free Grammars
A sub-tree of a derivation tree is formedfrom any vertex of the tree together with allof its descendants.
A sub-tree whose root vertex is labelled witha non-terminalX ∈V is called anX-tree.
Obviously if W is a derivation tree withyield w, and there is a non-terminal labelXin W, then the yield,x, of the X-tree is asub-word of w, i.e. w = u ⋅ x ⋅ v for someuandv.
In the examples:
There arefour different S-trees in Example1 (including the entire tree). These haveyields
0, 101, 01010 and 0010100.In Example 2, each tree has5 E-trees. Thefirst with yields:
6, 3, 5, 3* 5 and 6+ 3 * 5.The second with yields,
6, 3, 5, 6+ 3 and 6+ 3 * 5.
Context-Free Grammars 195
We will merely state the following resultthat captures the precise connection betweenDerivation Trees and Context-Free Gram-mars.
Theorem 11: Let G = ( V,T, P, S) be aCFG andw ∈T* .
w ∈ L( G ) (i.e. S⇒(*)G w) if and and only if
there is a derivation tree in G which hasyield w.
The languages,L ⊆ Σ* for which there existsa context-free grammar, G, with L( G ) = L,are known as the
Context-Free Languages(CFL)
As we saw above,
Regular Languagesover Σ ⊂Context− Free Languagesover Σ
196 Contex-Free Grammars
Simplification of CFGs
It may appear as if the extension in produc-tion rule forms from
V → σ ; V → σ ⋅ W (RLGs)to
X → w ∈( V ∪ T )* (CFGs)is rather ‘extreme’ in the sense of forming a‘natural hierarchy’ of grammar/languagetypes.
e.g. why not use an extension which boundsthe total number of variable symbols thatcan appear on the right-hand side of a pro-duction, so thatRLGs allowing at most oneare developed by allowing two, three, etc.
In fact, as we shall see as a consequence ofthe following processes, the class of CFGs,implicitly embody such an extension already.
i.e. an arbitrary CFG,G, can be expressed asa CFG, G’, where every production ofG’containsat most two variable symbols in itsright-hand side.
Context-Free Grammars 197
Before proving this result, we consider pro-cedures which allow ‘ redundant’ variableand terminal symbols to be removed fromany giv en CFG.
Let G = ( V,T, P, S) be a CFG and letX ∈V be some variable symbol inG.
How can it be determined ifX is ‘actuallyneeded’ with respect toL( G ) as defined bytheCFG, G?
We can identify 2 ‘obvious’ necessarycon-ditions:
a) ∃ u, v ∈(V ∪ T )* s.t.S⇒(*)G u ⋅ X ⋅ v.
b) ∃ w ∈T* s.t. X ⇒(*)G w.
i.e. (a) states that there is a derivation from S(the start symbol) that leads to some wordcontaing the variable X; (b) that there is aderivation from X that results in a wordcomprising only terminal symbols.
198 Contex-Free Grammars
(a) and (b) do notguarantee that X isneeded inG, since the wordsu andv in con-dition (a) may involve redundant non-termi-nal symbols.
As a formal definition for a variable,X ∈Vin a CFG, G = (V,T, P, S) being ‘produc-tive’ we hav e:
A symbol X is productive in the CFGG = (V,T, P, S) if there are words w ∈T* ,u, v ∈( V ∪ T )* such that
S ⇒(*)G u ⋅ X ⋅ v ⇒(*)
G wA symbol which is not productive is calledredundant.
The following procedures construct aCFG,G’ = (V’,T ’, P’, S’) from G = (V,T, P, S)in such a way that,L(G’) = L(G) and withev ery symbol ofV’ being productive.
Context-Free Grammars 199
1) Vold: = ∅;2 Vnew: = X : X → w, for somew ∈T* ;3 Vold: = Vnew;4) Vnew: = Vold ∪ X : X → w, for w∈( T ∪Vold )* ;5) if Vnew=Vold then go to (3).6) V’: = Vnew
Procedure 6.1
The productions,P’ of G’ are those produc-tion G in which only symbols inV’ ∪ Toccur.
200 Contex-Free Grammars
TheCFG, G’, generated fromG by the pro-cess above, ensures that each symbol in itsatisfies condition (b). To ensure condition(a) the following suffices:
1) V’ : = S;2) For each symbolX ∈V’ for each produc-tion X → w, add variable symbols inw to V’and terminal symbols inw to T ’.3) Repeat (2) until no changes occur inV’ ∪ T ’.
Procedure 6.2
Again the productions that are included arethose involving only symbols from the finalsetV’ ∪ T ’
Context-Free Grammars 201
SupposeG0 = (V0,T0, P0, S0) is a CFG andapplying Procedure 6.1 toG0 results in theCFG
G1 = ( V1,T1, P1, S1 )then applying Procedure 6.2 toG1 producestheCFG
G2 = ( V2,T2, P2, S2 ).
What properties doG2 and G1 have w.r.t.G0?
It is certainly the case that, ifL( G0 ) = ∅a) S0 = S1 = S2.[the start symbol is the same in all 3CFGs]b) V2 ⊆ V1 ⊆ V0; T2 ⊆ T1 = T0.c) ∀ X ∈V0, X ∈V1 ⇔ (∃ w ∈T*
0 X ⇒(*)G0
w)d) ∀ X ∈V1,
X ∈V2
⇔(∃ u, v ∈( V1 ∪ T1 )* s. t. S1 ⇒(*)
G1u⋅X⋅v
(a)-(d) are immediate from the definitions ofProc. 6.1 and 6.2.
202 Contex-Free Grammars
Combining (c) and (d) we deduce that,
∀ X ∈V0 X ∈V2 ⇔ X is productive
∀ σ ∈T0 σ ∈T2 ⇔ (∃ u, v ∈T*0 S0 ⇒(*)
G0u⋅σ ⋅v)
i.e. G2 contains no redudant variables or‘unused’ terminal symbols.
We can interpret the operations of Proc. 6.1and 6.2. as follows:
Procedure 6.1., iterates ‘backwards’ startingfrom variables with productions yieldingwords containing only terminal symbols, sothatV1 ev entually contains all symbols inV0with derivations in P0 leading to terminalwords.
Procedure 6.2., iterates ‘forwards’ fromS1 = S0, the start symbol so thatV2 ev entu-ally contains all variables inV1 that ‘can bereached’ from the start symbol. Similarly, T2will contain all terminal symbols that canoccur in the words ofL( G0 ).
Context-Free Grammars 203
Example
Let G = (V,T, P, S) with V = A, B, S,T = a, and P = S→AB, S→a, A→a.
Applying Procedure 6.1, gives
V1 = S, A
P1 = S→a, A→a
Applying Procedure 6.2 leaves V2 = SandP2 = S→ a.
Note that the order of application is impor-tant: applying Procedure 6.2 first toG wouldleave a CFG with the symbolA that couldnot be eliminated by 6.1
204 Contex-Free Grammars
Nullable Symbols,ε -Productionsand Unit Productions
We now turn to two further simplificationsthat can effectively be carried out onCFGs.
Consider the following cases forG = (V,T, P, S):
i) ε ∈ L( G );ii) ∃ X,Y ∈V s.t. X → Y ∈ P.
Intuitively, one would expect that:
In the former case, since ¬( S⇒(*)G ε ), any
symbolX ∈V for which X ⇒(*)G ε is ‘unnec-
essary’.
In the latter case, the productionX → Y,ought to be eliminable by ‘substituting’‘appropriate’ words over (V ∪ T) foroccurences ofX in P.
These intuitions are justified.
Context-Free Grammars 205
A production of the formX → ε is called anε-production.
A production of the formX → Y, (whereY ∈V) is called a
unit production .
Theorem 12:
Let G = (V,T, P, S) be aCFG, the languageL( G ) − ε is generated by someCFG,G’ = (V’,T ’, P’, S’) without ε -productionsor redundant symbols.
Proof: Given G, we say thatX ∈V isnullable
if X ⇒(*)G ε .
First we find all nullable symbols inG,
1) N0 : = X ∈V : X → ε ∈ P ;2) N1: = N0 ∪ X∈V: X→Y1 ⋅. .Yk and∀ i Yi ∈N0 ;3) if N1 = N0 then go(2).
Correctness of this method is obvious.
206 Contex-Free Grammars
Next the productions,P of G, are modified,so that no derivation X ⇒(*)
G’ ε is possible inG’, for anyX ∈V’.
Suppose that,X → Y1 ⋅ Y2 ⋅ . . . ⋅ Yk
is a production inP.
In P’ this is replaced by (a set of) produc-tions
PX = X → z1 ⋅ z2 ⋅ . . . ⋅ zk using
1) If Yi ∈ N0 (i.e. Yi is not nullable),zi = Yi .2) If Yi ∈ N0, add both rules withzi = Yi andzi = ε to PX.3) If a ruleX → ε results (i.e. allYi are nul-lable) this isnot added toPX.
Context-Free Grammars 207
Example
If,X → A ⋅ B ⋅ C ⋅ A ∈ P
with A andC both nullable, thenP’ wouldcontainall of the following
PX =
X → ABCA, X → ABA,
X → ABC, X → AB,
X → BCA, X → BA,
X → BC, X → B
In generalX → Y1 ⋅ Y2 ⋅ . . . ⋅ Yk in which r Yisymbols where nullable would create 2r pro-ductions inP’ (2r − 1 if r = k) one for each(non-empty, for r = k) subset.
To complete the Theorem proof it suffices toapply Procedure 6.1 and 6.2 to the (ε -pro-duction free)CFG, noting that neither pro-cess introducesnew variable symbols orproductions.
208 Contex-Free Grammars
Unit Production Elimination
Theorem 13:
Let G = (V,T, P, S) be aCFG, the languageL( G ) − ε is generated by someCFG,G’ = (V’,T ’, P’, S’) without ε -productions,unit productions or redundant symbols.
Proof: From Theorem 12, we may assumethat G contains noε -productions. We buildP’, the unit production free set, as follows.
P’ is initially set to contain, X → w ∈ P : w ∈V
i.e. allnon-unit productions inP.
Suppose∃ X,Y ∈V s.t. X ⇒(*)G Y,
[Note: this is easily tested for:recall G has no ε -productions, so ifX ⇒(*)
G Y, then some derivation must havethe form,
X ⇒G Y1 ⇒G Y2 ⇒G. . .⇒G Yk ⇒G Y
where theYi are all different symbols inV.]
Context-Free Grammars 209
If X ⇒(*)G Y, then P’ has added to it all pro-
ductions X → w : B → w ∈ P and w∈V
Let G’ = (V,T, P’, S) beCFG resulting.
Certainly if X → w ∈ P’ thenX ⇒(*)G w.,
i.e. if S⇒(*)G’ w thenS⇒(*)
G w.
The converse, that ifS⇒(*)G w thenS⇒(*)
G’ w,is established by considering aleft-mostderivation of w in G, i.e. one in which theleft-most variable is expanded at each step.The, somewhat tedious, argument whichshows any sequence of unit productions inthe former has a correspondingsingle non-unit production in the latter is omitted.
Again, to complete the proof, it suffices tonote that removal of redundant symbols viaProcedures 6.1 and 6.2. cannot create anyunit productions.
210 Contex-Free Grammars
Normal Forms forContext-Free Grammars
While the mechanisms for simplifyingCFGs described above are of independentinterest in terms removing some ‘inefficien-cies’, the principal reasons for reviewingthese are:
a) It can be assumed, when considering anyCFG, G, for whichε ∈ L(G) thatG containsno redundant symbols,ε -productions, orunit productions.
b) To assist in proving that any CFG can beexpressed in aNormal Form.
The concept of aNormal Form, i.e. that agiven structure can be described using rulesthat obey precisely defined restrictions, isfundamental in Computer Science.
Context-Free Grammars 211
In addition to the mechanism that we areabout to describe, you may already have metthe idea of Normal Forms in,
Defining Boolean logic functionsof n arguments.
[Any such function can be expressed as a‘disjunction of elementary conjuncts’
(sum of products)‘conjunction of elementary disjuncts’
(product of sums)‘modulo 2 sum of products’
(Zhegalkin-Reed-Muller/Ringsum expansion)are 3 normal forms for Boolean functions]
In addition, there is an extensive dev elopedtheory of Normal Forms in the context of
Relational Database Design
that is of importance in identifying potentialsavings and improvements in the organisa-tion of data within such a system.
212 Contex-Free Grammars
We will principally be concerned with therepresentation ofCFGs in
Chomsky Normal Form (CNF)but, for completeness, will mention (and nomore than) the other important Normal Formfor CFGs known as,
Greibach Normal Form
Theorem 14:(Chomsky Normal Form Theorem for CFGs)
Any context-free language,L, for whichε ∈ L, is generated by aCFG,
G = (V,T, P, S)in whichall productions are of the form,
X → Y ⋅ Z or X → σwhereX, Y, Z ∈V andσ ∈T.
[Note: X, Y, Z in X → Y⋅Z arenot requiredto be distinct variables ofV.]
Context-Free Grammars 213
Before proving this, it may be useful tohighlight some significant consequences ofthe Theorem:
Recall the opening discussion re.CFG sim-plification wherein the apparent ‘leap’ from
V → σ ⋅ W ; V → σ (RLGs)to
X → w ∈( V ∪ T )* (CFGs)was remarked.
Using CNF to describe any CFG, it is seenthat the former can be replaced by,
V → U ⋅ W or V → σi.e. the only change is to allow a single vari-able to be used instead of asingle terminalin the first production rule form.
A further important point concerningCNFis with respect to the form ofDerivationTr eesin G which is inCNF.
214 Contex-Free Grammars
Each production inG is either aterminal oris expanded asexactly two variable sym-bols.
It follows from this, that any derivation treeis a binary tree: each vertex has exactly 1child (which will be a terminal symbol) orexactly 2 children (both of which will bevariables).
This fact means that thenumber of steps, k,in a derivation in G implies anupper bound(in terms of k) on the length of a wordthereby derived.
We, thus, have a mechanism for relating thenumber of variables in G, the number ofsteps in a derivation, and thelength ofwords inL( G ).
Using similar observations, resulted in ameans for proving that particular languagesare not regular (The Pumping Lemma).
Context-Free Grammars 215
Proof of Theorem 14
Let G = (V,T, P, S) be a CFG withε ∈ L( G ). Without loss of generality it maybe assumed thatG has noε -productions,unit productions, or redundant symbols.
Consider any production inP which violatesthe conditions ofCNF.
X → Y1 ⋅ Y2 ⋅ . . . ⋅ Ym m≥ 2 (6.1)
SupposeYi = σ ∈T, i.e. a terminal symbol.
Modify G by adding a new variableCσ to V,the productionCσ → σ to P, and changingthe (terminal)Yi in the production (6.1) tothe new (non-terminal)Cσ .
If G’ = (V’,T, P’, S) is the CFG resulting, itis obvious thatL(G’) = L(G).
216 Contex-Free Grammars
Applying the process of replacing terminals(in productions such as 6.1) by (new) non-terminal symbols and adding appropriateproduction rules, it follows thatG = (V,T, P, S) eventually becomes aCFGG’ = (V’,T, P’, S) for which L( G ) = L(G’)and any productions that arenot in CNFhave the form,
X → Y1 ⋅ Y2 ⋅ . . . ⋅ Yn n ≥ 3 (6.2)
For each production of the form (6.2) intro-ducen − 2 newvariables
D1, D2 , . . . ,Dn−2
and replace the productionX → Y1 ⋅ Y2 ⋅ . . . ⋅ Yn
with the ‘chain’ of productions,
Context-Free Grammars 217
X → Y1 D1 ; D1 → Y2 D2
D2 → Y3 D3
D3 → Y4 D4. . . . . . Di → Yi Di+1. . . . . . Dn−2 → Yn−1 Yn
(6.3)
Let GC = ( VC,T, PC, S) be the final CFGresulting. For the replacement of (6.2) bythe set in (6.3) clearly,
X ⇒(*)GC
Y1 ⋅ Y2 ⋅ . . . ⋅ Yn
thus, L( GC ) = L(G’) = L( G ) and GC is inCNF.
218 Contex-Free Grammars
Example
Using,G = (V,T, P, S) with
V = S, A, B
T = a, b
P = S→bA, S→aB,
A→bAA, A→aS, A→a
B→aBB, B→bS, B→b
Only the productionsA→ a, B → b arevalid CNF.
As a first step, we remove illegal occurencesof terminal symbols, to give
V’ = S, A, B,Ca,Cb
P’ = S→Cb A, S→CaB,
A→Cb AA, A→CaS, A→a
B→CaBB, B→CbS, B→b
Ca→a, Cb→b
Context-Free Grammars 219
Then we deal with the productions
A → Cb AA ; B → CaBB
to give
VC = S, A, B,Ca,Cb, D1, D2
P’ = S→Cb A, S→CaB,
A→CbD1, A→CaS, A→a
B→CaD2, B→CbS, B→b,
Ca→a, Cb→b,
D1→AA, D2→BB
220 Contex-Free Grammars
Greibach Normal FormThere is another Normal Form forCFGswhich is of some theoretical interest, but wewill describe only for the sake of complete-ness.
Theorem 15:(Greibach Normal Form Theorem for CFGs)
Let L be aCFL with ε ∈ L. There is aCFG,G = (V,T, P, S) with L(G) = L and everyproduction inP having the form
X → σ w σ ∈T, w ∈V*
[ Notew ∈V* meansw canbeε ]
Proof: Omitted.
Algorithms for converting CFGs (ev en thosein CNF) to GNF are rather involved and thesimplest of these may exponentially increasethe size ofV.
This increase is avoidable using a moresophisticated algorithm.
Context-Free Grammars 221
COMP209
Automata and Formal Languages
Section 7
Pushdown Automata
222 Pushdown Automata
Introduction
We hav eseen that,
Regular Languages
≡Languages recognised by DFA
≡Languages described by Regular Expressions
≡Languages generated by RLGs
(7.1)
By ‘minimally’ changing one of the restric-tions imposed on the form of productions inRLGs, i.e. permitting
V → U ⋅ W (U ,W variables)instead of
V → σ ⋅ W (σ terminal )
a class of languages (thecontext-free lan-guages) that properly contains the regularlanguages is obtained.
Pushdown Automata 223
So the ‘picture’ given in (7.1) has become
Regular Languages
⊂Context-Free Languages
≡????
≡Languages generated by CFGs
⊃Languages generated by RLGs
(7.2)
We now wish to consider theminimal‘black-box’ capabilities that are needed tocapture the class of Context-Free Lan-guages, i.e. to answer the question,
Regular Languages are to DFA
as
Context-Free Languages are to ????
224 Pushdown Automata
If we consider the definition ofDFA given,one ‘obvious’ limitation of these is apparent:
A DFA can only ‘remember’ informationabout afixed, constantnumber of symbolsfrom any input word it is given.
Thus, if acceptance or otherwise ofw ∈Σ* ispredicated on precise relationships betweensub-words of w that may be of arbitrarylength and separated by an arbitrary dis-tance, then except in special cases, aDFAwill be unable to deal with these.
c.f. the derivation of the Pumping Lemmafor Regular Languages
e.g. an informal argument that the language 0m⋅1m : m≥ 1 is not regular, observes thata DFA has to recognise ‘how many 0s areseen’ before testing if the number of 1s‘matches’:m can be arbitrarily large, so the‘counting step’ cannot be done with a ‘finitememory’.
Pushdown Automata 225
A more subtle limitation, but one which alsoarises from the ‘finite memory’ restriction isthe following:
The ‘processing’ of an input word w ∈Σ* israther ‘passive’: symbols are read in orderand used to decide the next machine state;there is no mechanism for rescanning sym-bols, recording these or some ‘transformed’version.
These observations suggest that in order toenrich the functionality ofDFA so that a‘machine model’ w ith the minimal capabil-ity to recogniseContext-Free Languagesresults, the ‘new’ machine class must havesome method of
Recording arbitrarilylarge amounts of information.
Of course, this capability must be limited tothat necessary to move from regular lan-guages toCFLs - i.e. it must not allow non-CFLs to be recognised.
226 Pushdown Automata
Adding a Stack
The extension made toDFA, is to allow amemory (storage) facility that is used underthe following restrictions:
a) This storage is organised as astack.b) There is no limit on the capacity of this,although (obviously) only afinite amount ofspace will be used during asingle computa-tion.c) The input word will still be processed onesymbol at a time moving from left to right,and cannot be re-read.
Pushdown Automata 227
Overview ofPushdown Automaton Organisation
x1 x2 x3 xnxk ...
# µ1 µ t
... ...
...
M
Input Read so Far
... ...
Input to Read
Stack Store
228 Pushdown Automata
Example of M´s Structure
q0 q1
q2
(0,0,00)
(0,#,#0)
(1,0,ε)
(ε,#,#)
(1,0,ε)
Pushdown Automaton(PDA) Example M
A directed edge fromqi to q j labelled( σ ,γ , u ) indicates that in stateqi whenscanningσ with thesymbol at the stack topbeingγ , the next state could beq j with thestack top replaced by theword u.
Pushdown Automata 229
Formal Definition of Pushdown Automaton
A pushdown automaton (PDA) isdescribed by a septuple,
M = ( Q, Σ, Γ,δ , q0, Z0, F )
where
Q: Finite set ofstates.Σ: Finite input alphabet.Γ: Finite stackalphabet.δ : Q × ( Σ ∪ ε ) × Γ → ℘( Q × Γ* ): State Transitionq0 ∈Q: Initial state.Z0 ∈Γ: Initial stack symbol.F ⊆ Q: Final states
It should be noted thatδ ( q,σ ,γ ) must be afinite subset in℘( Q × Γ* ).
230 Pushdown Automata
Interpretation
Consider the definition ofδδ : Q × ( Σ ∪ ε ) × Γ → ℘( Q × Γ* )
SupposeM is in stateq the symbol beingscanned on theinput is σ , and the symbolat the topof the stack storage isγ .
δ prescribes as the outcome of this scenario:δ ( q,σ ,γ ) = ( qi1, u1 ) , . . . , (qi k
, uk ) where theqi j
are states inQ and theu j
words in Γ* .
A non-deterministic choice of one of thepairs ( qi j
, u j ) is made and then:
P1)γ is replacedby theword u j .P2) The state changes toq j .
Pushdown Automata 231
ε-transitions
The interpretation above deals with the casewhere an input symbolσ is actually ‘pro-cessed’.
The transition function, however, allowsε -transitions:
δ ( q, ε ,γ ) = ( qi1, u1 ) , . . . , (qi k, uk )
As with ε -NDFA, an ε -transition can bechosen (non-deterministically), and the pro-cess P1 is carried out. The important distinc-tion in this case is that,
no input symbol is readso if the next input isσ and anε -transitionperformed, the next symbol to read isstill σ .
232 Pushdown Automata
Important Features
PDAdefined here arenon-deterministicThere are important technical reasons whythis form is used.
As with NDFA w∈Σ* is accepted by aPDA, M , if there isat least onecomputa-tion of M on w which ends in some stateq ∈ F after scanning all ofw.
In order for a transition inδ ( q,σ ,γ ) to beapplicableboth of the following must hold:
σ is the next input symbol (ifσ =ε )γ is the ‘top of stack’ symbol.
The stack is empty at the start of a computa-tion, i.e. contains only the symbolZ0.
Γ (the stack alphabet) doesnot have to bethe same asΣ (the input alphabet)
Pushdown Automata 233
Example
For the examplePDA,M = ( Q, Σ, Γ,δ , q0, #, F ) has
Q = q0, q1, q2 ; Σ = 0, 1
Γ = 0, # ; F = q2
δ is easily extracted from the diagram, and itshould be noted that there is a singleε -tran-sition available: from stateq1 when the topof stack symbol is the initial symbol #.
Given 000 111as input:
qi σ Read Rest Stack q j1 q0 0 ε 000111 # q02 q0 0 0 00111 #0 q03 q0 0 00 0111 #00 q04 q0 1 000 111 #000 q15 q1 1 0001 11 #00 q16 q1 1 00011 1 #0 q17 q1 ε 000111 ε # q2
So that 000111∈ L(M).
[Exercise:: What, in fact, isL( M )?]
234 Pushdown Automata
Discussion
Comparing the capabilities ofPDA withε − NDFA( ≡ DFA ) it is seen that transi-tions in the latter dependonly on
the currentstate(q)the current input symbol (σ )
whether anε -transition is available.
Furthermore a transition has no ‘side-effects’: the ‘next’ state is chosen and the‘next’ input symbol read.
For PDA, transitions also depend onthe current stack top symbol (γ ).
and, as well as a ‘next’ state being chosen,next input symbol being scanned thesymbolat the stack top is replaced by a ‘new’ word .
Question: Are these justifiable as ‘minimal ’extensions to the functionality ofDFA?
Pushdown Automata 235
(Partial) Answer(s)
Of course, one justification of the definitionof PDA is that which we will prove later, i.e.
Theorem 16: L ⊆ Σ* is a Context-Free Lan-guage if and only if there is aPDA, M , forwhich L( M ) = L.
Despite this result, however, some featuresof PDA (as defined) may appear ‘non-mini-mal’: e.g.
‘Minimal’ Extensions to DFA?
a) ‘non-determinism’ in the definition ofδ .b) ‘infinite’ storage capacity of the Stack.c) allowing arbitrary lengthwords (albeitspecified inδ ) to replace singlesymbolsonthe stack.
236 Pushdown Automata
The presence of ‘non-determinism’ in thebasic definition has already been remarkedupon. That this is required will be shownformally in a later lecture.
As regards (b) - ‘infinite’ stack capacity - itwas noted that in any ‘effective’ computa-tion of a PDA, M , only a finite portion ofthis will be used: of course, by usingε -tran-sitions appropriately it is a trivial exericiseto design a (non-terminating)PDA thatincreases the size of stack on every move.
Equally, howev er, one could designε -NDFAwhich (in principle) could loop indefinitely.
Given that we are concerned with recognis-ing languages of finite length words, itonly matters that we have a model thatcanaccept such in a finite number of moves.
Note also,
Pushdown Automata 237
Difficult(ish) Exercise:
A PDA with f (|Q|)-bounded stack, is one inwhich the
stack sizeis limited to f (|Q | ) for some (bounded)function f : N → N: e.g.
|Q|2, 2|Q|, etc,
Attempting to exceed this, causes an error(cf. having no available move in NDFA).
Show that the class of languages recognisedby
PDAwith f (|Q|)-bounded stack,is exactly
the class of regular languages.
[N.B. The stack bound is given as a functionof |Q| - the number ofstatesin M andnot asa function of the length of theinput wordbeing scanned. The latter ‘restriction’ (itisn’t!) makes no difference toPDA capabil-ities.]
238 Pushdown Automata
The case of (c) - allowing arbitrary finitelength words to be placed on the Stack israther more complicated.
We state, without proof, the following resultwhich establishes that our model is, in fact,equivalent, to the model in which the size ofstack changes by±1 on each move:
For any PDA,M = ( Q, Σ, Γ,δ , q0, Z0, F )
there is aPDAM’ = ( Q’, Σ, Γ’ ,δ ’ , q’0, Z’0, F’ )
such thatL( M ) = L(M’ ) and for any( q j , u ) ∈δ ’ ( qi ,α ,γ ) ( α ∈Σ ∪ ε )
u ∈ ε ,γ ,γ ⋅ β ( β ∈Γ )
thus the Stack sizedecreasesby one (u = ε )
orincreasesby one (u = γ ⋅β )
orremains unchanged (u = γ )
Pushdown Automata 239
Instantaneous Descriptions
GivenM = ( Q, Σ, Γ,δ , q0, Z0, F ) and w ∈Σ*
we need a way to describe:the current state ofM ;
the content of the Stack;how much ofw remains to be scanned.
An instantaneous descrption, (ID), of Mon w is atriple
I = ( q, w, u ) q ∈Q, w ∈Σ* , u ∈Γ* .q represents a current state ofM , w theunscanned input remaining, andu the cur-rent Stack content.
For I = ( qi ,σ ⋅w, u⋅γ ), J = ( q j , w, u v) twoIDs of M we write
I —M Jif ( q j , v ) ∈δ ( qi ,σ ,γ ), and
I —(*)M J
if there is a sequenceI = I0, I1 , . . . , I k = J
of IDs such thatI m —M I m+1 ∀ 0≤ m< k
240 Pushdown Automata
PDA Accepting by ‘Empty Stack’
It is possible to consider an alternative con-cept of a PDA accepting an input wordw ∈Σ* .
Let I = ( q0, w, Z0 ) be the initial ID of M .
Definition: The language,L( M ) recognisedby thePDA,
M = ( Q, Σ, Γ,δ , q0, Z0, F )usingempty stackis
w ∈Σ* : ∃ q ∈Q, I —(*)M ( q, ε , Z0 ) ,
i.e. the set of inputs for which there is somesequence of moves in M , which read all ofw and lead to the stack in its initial (empty)condition.
Acceptance in this manner is clearly inde-pendent of which state ofq is reached, sowithout any loss, for acceptance by emptystack it may be assumed thatF = ∅.
Pushdown Automata 241
Theorem 17: For any PDA, M1, withL( M1 ) defined ‘by final state’ as: w : ∃ J = (q, ε , u) s. t. I—(*)
M J and q∈Fthere is aPDA, M2, with L( M2 ) defined byempty stack and
L( M1 ) = L( M2 ).Proof: (Outline)Form the state set ofM2 by adding a newstateqerase to Q of M1. Then for stateq ∈ Fin M1 (i.e. final state), addε -transitions
δ ( q, ε ,γ ) = ( qerase,γ ) ∀ γ ∈Γand
δ ( qerase, ε ,γ ) =
( qerase, ε ) if γ = Z0
( qerase, Z0 ) if γ = Z0
i.e. on reaching a final state ofM1, M2 canenter qerase which simply usesε -moves toempty the Stack. Note, the definition of‘acceptance’ requires thatall of the input isread: if the stack is empty while part ofwremains, this isnot an accepting computa-tion.
242 Pushdown Automata
Proof of Theorem 16
L ⊆ Σ* is a Context-Free Language if andonly if there is a PDA, M , for whichL( M ) = L. Only the construction of aPDAacceptingL( G ) defined by aCFG, G, ispresented. The formation of aCFG from aPDA, M , can be done by a technicallyopaque ‘simulation’ ofM ’s operation byappropriate production rules.
I) CFGs to PDA
Let L be aCFL and G = ( V,T, P, S) be aCFG with L( G ) = L − ε . We mayassume thatG is in CNF with no redundantsymbols.
We first construct aPDA,MG = ( QG, ΣG, ΓG,δG , q0, #, FG )
(accepting by final state) for whichL( MG ) = L( G ) = L − ε
Pushdown Automata 243
Outline of Construction
Supposew = x1. . . xn ∈T* is a word we
wish to test for membership inL( G ) usingthePDA, MG
MG as built from G relies significantly (forits correctness) on non-determinism.
The key idea is to use the Stack to build up a‘guess’ for a possibleleft-most derivationS⇒(*)
G w, i.e. one in which the left-mostvariable symbol is expanded until a reducesto singleterminal.
Of course, this ‘guess’ will be consistentwith the productions ofG.
As each ‘new’ terminal appears in theguessed derivation it is compared with thenext input xi from w. Should these match,the process continues otherwiseM will haltin a non-accepting state.
244 Pushdown Automata
If it is the case thatw ∈ L( G ) then certainlythere will be some (left-most) derivationS⇒(*)
G w, and thus, in MG there will besomecomputation onw that reaches a finalstate with all ofw having been read.
The is only, one, minor complication thatarises: ensuring when a productionX → σ isactivated, the symbolσ which will be com-pared with the next input symbol, is at thetop of the stack.
FromG = ( V,T, P, S) we defineMG = ( QG, ΣG, ΓG,δG , q0, #, FG )
with
ΣG = T;ΓG = V ∪ T ∪ # ;QG = q0, qOK ∪ qX : X ∈ΣG FG = qOK
Pushdown Automata 245
We can viewδG as comprising three stages:
Initiating the guessed derivation.Expanding the guess until a terminal reached.
Checking this against the next input.
With the exception of the final stage, allparts are performed usingε -transitions.
Initiation :δG( q0, ε , # ) = ( qOK, #S)
Thus the start symbol ofG is written on theStack.
Derivation
Recall thatG is in CNF.
For each productionX →U ⋅ W in G, δGcontains a transition,
( qOK,W U ) ∈δG( qOK, ε , X )i.e. if the top of stack symbol isX ∈V ⊂ ΓthenMG may (non-deterministically) chooseto replaceX by the wordW U. (U , W ∈V).
246 Pushdown Automata
The reason forre versing the order (fromU W to W U) is thatU will be the new topof stack symbol and so can be expanded atthe next move of MG.
Thus, the process is consistent with aleft-mostderivation in G.
For each productionX → σ of G, δG con-tains a transition,
( qσ ,σ ) ∈ δG( qOK, ε , X ).
Checking
This is only performed in the statesqσ , forσ ∈T and forms the only stage where theinput w is examined. The checking move issimply,
δ ( qσ ,σ ,σ ) = ( qOK, ε )Thus, compare the top of stack symbol tothe current input symbol; if these matcherase it from the stack, move to the nextinput symbol, and continue with the processof guessing a derivation.
Pushdown Automata 247
Example
Using theCFG that we derived aCNF formearlier,
G = (V,T, P, S) with
V = S, A, B,Ca,Cb, D1, D2
T = a, b
P = S→Cb A, S→CaB,
A→CbD1, A→CaS, A→a
B→CaD2, B→CbS, B→b,
Ca→a, Cb→b,
D1→AA, D2→BB
MG = ( QG, ΣG, ΓG,δG , q0, #, FG )where
ΣG = a, b ;ΓG = a, b, S, A, B,Ca,Cb, D1, D2, # ;QG = q0, qOK, qa, qb ;FG = qOK
δG is shown in the diagram below:
248 Pushdown Automata
q0
qok
qaqb
(ε , #, #S)(a, a, ε ) (b, b, ε )
(ε ,Ca, a)(ε , A, a)
(ε ,Cb, b)(ε , B, b)
(ε , S, BCa), (ε S, ACb), (ε , A, SCa), . . . ,
baab∈ L( G ), a leftmost derivation is,
S→Cb A→bA→bCaS→baS
→baCaB→baaB→baab
Pushdown Automata 249
Example Continued
Given, baab as input toMG, this could becontructed by the sequence,
Move δ Stack Unread0 δ (q0, ε , #) = (qok, #S) # baab1 δ (qok, ε , S) = (qok, A Cb) #S baab2 δ (qok, ε ,Cb) = (qok, b) #ACb baab3 δ (qok, ε , b) = (qb, b) #Ab baab4 δ (qb, b, b) = (qok, ε ) #Ab baab5 δ (qok, ε , A) = (qok, SCa) #A aab6 δ (qok, ε ,Ca) = (qok, a) #SCa aab7 δ (qok, ε , a) = (qa, a) #Sa aab8 δ (qa, a, a) = (qok, ε ) #Sa aab9 δ (qok, ε , S) = (qok, BCa) #S ab10 δ (qok, ε ,Ca) = (qok, a) #BCa ab11 δ (qok, ε , a) = (qa, a) #Ba ab12 δ (qa, a, a) = (qok, ε ) #Ba ab13 δ (qok, ε , B) = (qok, b) #B b14 δ (qok, ε , b) = (qb, b) #b b15 δ (qb, b, b) = (qok, ε ) #b b16 - # -
250 Pushdown Automata
The correctness of the construction is imme-diate. The only point of detail required con-cerns the case thatε ∈ L.
[Recall, that theCFG, G, is in CNF andgenerates the languageL( G ) − ε .]
In this case, all that is required is to add theinitial state,q0, to the set offinal states ofMG.
This completes the proof that ifL is aCFL,then there is a PDA, M , for whichL( M ) = L.
II) PDA to CFGs.Omitted.
Pushdown Automata 251
COMP209
Automata and Formal Languages
Section 8
Properties ofContext-Free Languages:
Limitations, Closure, Decision
252 Properties of CFLs
When considering the class of Regular Lan-guages earlier, it was seen that:
a) there is a general technique -The Pumping Lemma
- by which many non-regular languages canbeproved not be regular.
b) the class of regular languages satisfies anumber ofclosure properties, being closedunder:
ComplementationIntersection
Union
c) Given descriptions of regular languages,L1, L2, there are ‘effective’ (and ‘efficient’)methods that can determine if,
L1 = ∅, finite or infiniteL1 = L2L1 ⊂ L2
w ∈ L1 (for w ∈Σ* )
Properties of CFLs 253
In this section analogous questions regard-ing the class ofContext-Free Languagesare examined.
Thus,
a) How can a given language,L, be shownnot to be Context-Free?
b) What closure properties hold for the classof CFLs?
c) Given descriptions ofCFLs, L1 and L2,do ‘effective’ methods exists for deciding
L1 = ∅, finite or infinite?L1 = L2?L1 ⊂ L2?
w ∈ L1 (for w ∈Σ* )?
254 Properties of CFLs
A Pumping Lemma forContext-Free Languages
We know already that:
P1) If L is aCFL over Σ there is aCFG,G = ( V, Σ, P, S)
with L( G ) = L − ε , all productions inPtaking the form
X → Y ⋅ Z or X → σ (Y, Z ∈V, σ ∈Σ)andG having no redundant symbols.[Chomsky Normal Form Theorem].
P2) w ∈ L( G ) if and only if there is aderivation tree in G that hasyield w[Theorem 11].
Suppose thatL is a CFL over Σ such thatinfinitely manyw ∈Σ* , belong toL.
What do (P1) and (P2) indicate aboutL?
Properties of CFLs 255
Given G = (V, Σ, P, S) with L( G ) = L andG in CNF, consider a derivation tree inGfor which thelongestpath (fromS to a ter-minal leaf) contains exactly n non-leaf ver-tices.
Certainly each of these must be labelledwith avariable from V.
It follows that if n> |V| thensomevariable,W say, must occurmore than onceon thispath.
256 Properties of CFLs
S
W
u
v
w x
y
W
From this example, we see that,
S ⇒(*)G u ⋅ W ⋅ y, u, y ∈Σ* , W ∈V
W ⇒(*)G v ⋅ W ⋅ x, v, x ∈Σ*
W ⇒(*)G v ⋅ w ⋅ x, w ∈Σ*
S ⇒(*)G u ⋅ v ⋅ w ⋅ x ⋅ y
Properties of CFLs 257
S
W
u y
W
W
vxw
258 Properties of CFLs
From this, however, since
S ⇒(*)G u W y ⇒(*)
G u vW xy
it must be the case that for any w ∈Σ* suchthatW ⇒(*)
G w, we hav e
S ⇒(*)G u w y
S ⇒(*)G u v wx y
S ⇒(*)G u v vw x xy
... ...
S ⇒(*)G u vk w xk y
... ...
In summary, If W ∈V is repeated in somederivation tree of uvwxy (whereW ⇒(*)
G v W xandW ⇒(*)G w) thenall words
u vk w xk y k≥ 0are inL( G ).
Properties of CFLs 259
Pumping Lemma forContext-Free Languages
For any CFL, L there is aconstant, m, suchthat if z ∈ L and |z|≥ m, thenz may be writ-ten as
z = u ⋅ v ⋅ w ⋅ x ⋅ ywhere
1) |v x |≥ 1
2) |v w x|≤ m
3) ∀ k ≥ 0 u vk w xk y ∈ L
Proof: Consider aCNF, G = (V,T, P, S),for which L( G ) = L. If we consider aderivation tree in G in which the longestpath fromS contains at mostn non-leaf ver-tices, then sinceG is in CNF, such aderivation tree is abinary tr eeand thereforehasat most2n leaf vertices.
260 Properties of CFLs
Recalling that the yield,z, of a derivationtree is formed by concatenating terminalsymbols labelling the leaves, we deduce that
| z | ≤ 2n.Suppose now that z ∈ L has| z |≥ m= 2|V| + 1. It follows that anyderivation tree inG that yieldsz must have apath of non-leaf vertices of length> |V|, andtherefore there issome variable, W, thatoccurs at least twice on this path.
This, with the prior analysis suffices to provethe result.
Properties of CFLs 261
Applying the Pumping Lemmafor Context-Free Languages
Suppose thatL ⊂ Σ* is some language thatwe wish to prove is not Context-Free.
The strategy adopted in using the PumpingLemma is that of proof by contradiction.
1) For any constant, m, choose some wordz ∈ L with |z| ≥ m.2) For any partition of z as z= uvwxy forwhich
| v x |≥ 1 and |v w x|≤ m
show that there issome value k ≥ 0, forwhich u vk w xk y ∈ L.
From which it follows, that any CFG gener-ating L must, in addition, generate wordsthat arenot in L.
262 Properties of CFLs
Examples
Example 1: The languageL(g) ⊂ 1 * (p.5)with
L(g) = 1 k : k is a prime number is not Context-Free.
Proof: Suppose G is a CFG withL( G ) = L(g). Giv en m, choose p> m aprime number and setz= 1p. Let z be writ-ten as
uvwxy= 1|u|1|v|1|w|1|x|1|y|
Then,zk = 1|u|1k|v|1|w|1k|x|1|y| ∈ L(G) ∀ k ≥ 0
But,zk = 1p−|w| + k(|v|+|x|)
and now setting k = p − |w| giv es a word inL( G ) but notL(g).
Properties of CFLs 263
Example 2:The language,L = w ∈ a, b, c * : w = anbncn n> 0
is not Context-Free.
Proof: Suppose G, is a CFG withL( G ) = L. Giv en m, choose z= ambmcm
and consider any partition of z as uvwxywith |vwx| ≤ m |vx| ≥ 1. The subword vwxcannot contain all of the symbols in a, b, c . Thus, either
vwx = ar bs or vwx = br cs (1≤ r + s≤ m)Without loss of generality, suppose the firstapplies. Thenwe can write,w = ap bq, sothat
vwx = ar−p w bs−q
z0 = uv0wx0y = am−(r−p) bm−(s−q) cm ∈ L(G)Since
|vwx| − |w| = |vx| and |vx| ≥ 1we have (r − p) + (s − q) ≥ 1, it follows thatz0 has either less thanm as or less thanm bsand is not of the formanbncn.
Thus this language is not Context-Free.
264 Properties of CFLs
Example 3: The language,L(h), (p.5)L(h) = w ∈ 0, 1 * : w = 1n 0n2
, n ≥ 1 is not Context-Free.
Proof: SupposeG is aCFG generatingL(h).Given m, fix z= 1m0m2
. Let z be written asuvwxy(|vwx| ≤ m and |vx| ≥ 1). So that,
z = 1m−r v w x0m2−s
w must have the form 1p0q for some p, qsuch that
0 ≤ p ≤ r
0 ≤ q ≤ s;
1 ≤ r + s ≤ m
r + s − (p + q) ≥ 1
uv0wx0y = 1m−(r−p) 0m2−(s−q) ∈ L( G )
(by the Pumping Lemma), and so ifL( G ) = L(h) we must have,
( m − (r − p) )2 = m2 − (s − q)
Properties of CFLs 265
Rearranging, this is
( s − q ) = ( r − p ) ( 2m− (r − p) ) (8.1)
From r + s − (p + q) ≥ 1 it cannot be thecase thatr = p and s = q, i.e. both sidesmust bepositive, so we can assumep< rand q < s.
We now know that,
s − q ≤ m
(r − p) ( 2m − (r − p) ) ≥ 2m − r ≥ m
r + s ≤ m
(8.2)
The only case for which the first inequalityis not strict is whens = m, q = 0, but thenthe third inequality would give r = p = 0,and (8.1) is not satisfied. Otherwise, we have
s − q < m
(r − p) ( 2m − (r − p) ) ≥ 2m − r ≥ m(8.3)
and again (8.1) has no solution which wouldgive u x y∈ L(h).
It follows thatL(h) is not Context-Free.
266 Properties of CFLs
Notice that some care needs to be exercisedwhen applying the Pumping Lemma forCFLs.
Having fixed z, with |z|≥ m the argumentmust deal withany partition
u ⋅ v ⋅ w ⋅ x ⋅ yof z that satisfies the conditions |vwx| ≤ m,|vx| ≥ 1.
Compare the proof thatL(h) is not a regularlanguage with the proof that it is not Con-text-Free.
In the former case, the partition intothreeparts, means that the ‘pumped’ sub-wordcan comprise entirely of 0s, whence derivingsomew ∈ L(h) is easy.
In the second case, however, it is necessaryto consider the possibility that the sub-wordvwx has the form 0r 1s, wherew = 0p1q andonly therange for p, q is known.
Properties of CFLs 267
Closure Properties of CFLs
Theorem 18: The class of Context-FreeLanguages is closed under the operations,
a) Union (∪)
b) Concatenation (⋅)c) * -Closure (* )
Proof: Let L1 andL2 be CFLs over Σ, and
G1 = ( V1, Σ, P1, S1 )
G2 = ( V2, Σ, P2, S2 )
CFGs with L(G1) = L1, L(G2) = L2.
It may assumed thatV1 ∩ V2 = ∅ (by renam-ing variables if necessary).
a) G∪, aCFG generatingL1 ∪ L2, is
G∪ =
V1 ∪V2 ∪ S∪,
Σ,
P1 ∪ P2 ∪ S∪ → S1, S∪ → S2 ,
S∪
268 Properties of CFLs
b) The construction is similar to (a), exceptthat the new start symbol,S⋅ has the singleproduction rule
S⋅ → S1 S2associated with it inP⋅.
c) A CFG, generating (L )* for theCFL, L,is formed by adding productions
S(* ) → ε andS(* ) → S S(* )to a CFG for L with start symbolS. Thestart symbol of the newCFG beingS(* ). .
So far, all of these are properties shared byregular languages.
In contrast, however, we hav e
Theorem 19: The class ofCFLs is notclosed under
a) Intersection (∩)
b) Complement (Co− )
Properties of CFLs 269
Proof: Consider the 3 languages overΣ = a, b, c :
L1 = an bn cm : n, m≥ 1
L2 = am bn cn : n, m≥ 1
L3 = an bn cn : n ≥ 1
L3 is not aCFL. [Example 2]
L1 andL2, may be expressed as,
L1 = Xa ⋅ Yc = an bn : n ≥ 1 ⋅ c +
L2 = Ya ⋅ Xbc = a + ⋅ bn cn : n ≥ 1
SinceXa, Xbc, Ya, and Yc are allCFLs andCFLs are closed under concatenation, it fol-lows thatL1 andL2 are bothCFLs.
We now hav e,
L1 ∩ L2 = an bn cn : n ≥ 1 = L3
which is not aCFL.
Part (b) is now immediate, since closureunder complement would, withCFLs beingclosed under union, contradict part (a) [cf.DeMorgan’s Laws]
270 Properties of CFLs
Decision Methods for CFLs
Theorem 20: Given a description of anyCFL, L, there are effecitive algorithms thatcan decide
a) If L = ∅,b) If L is a finite language.c) If L is aninfinite language.
Proof:a) Let L be described by aCFG,G = (V,T, P, S), and apply Procedure 6.1 toG. If S is identified as a redundant symbolduring this, then, obviously,L( G ) = ∅.
b), c) A condition using the PumpingLemma for CFLs can be defined (as forRegular Languages), however, we use adirect algorithm.
Let L − ε be described by aCFG in CNFG = ( V,T, P, S)
Properties of CFLs 271
Build a dir ected graph, H( V, F ), from Gas follows:
Each vertexv of H is labelled with a uniquevariable from V;There is an edgefrom the vertex labelledXto the vertex labelledY if and only ifG con-tains a production:
X → Y Z or X → Z Y for someZ ∈V
Then L( G ) is infinite if and only this graphis acyclic.
To see this, first suppose thatH(V, F) con-tains a cycle with vertices labelled,
W1 → W2 → . . . → Wk → W1Since G contains no redundant symbols,there are derivations,
S ⇒(*)G u W1 y ⇒(*)
G uwy (u, w, y ∈T* )G, howev er, is in CNF, so the cycle mustcorrespond to some chain of productions,
272 Properties of CFLs
For example,
W1 ⇒ X1 W2
⇒G X1 X2 W3
⇒G. . . ⇒G X1X2
. . . Xk W1
and so,
S ⇒(*)G uW1y
W1 ⇒(*)G v W1 x v, x ∈T*
W1 ⇒(*)G w w ∈T*
Hence, S ⇒(*)G u v * w x * y which is
infinite.
Similarly, if L( G ) is infinite, then theremust be somez ∈ L( G ) whose shortestderivation involves two occurences of thesame variable (cf. Pumping Lemma proof),thus H(V, F) contains a cycle involving thisvariable. .
Properties of CFLs 273
Example
TheCFG, G = (V,T, P, S), havingV = S, A, B,C, D, T = a, b, c, d
and
P =
S→AB, S→BC, B→CC, B→AC, C→AD
A→a, B→b, C→c, D→d
defines the directed graph,
S
A
B C
D
which isacyclic.
ThusL( G ) is finite.
274 Properties of CFLs
Example Continued
L(G) is the language
aac, acc, aadad, aadc, acad, ab, aaad
ccc, ccad, adadc, adadad, adcc, adcad
cadc, cadad, bc, bad
Adding the production,D → BB, wouldcause an edge directed fromD to B to beadded, thereby creating a cycle,
B → C → D → Bcorresponding to, e.g. the derivation,B ⇒ CC ⇒ ADC ⇒ ABBC⇒ abBc⇒ . . .
(note, trivially,S → AB ⇒(*) aabBc).
So, with this productionL( G ) is infinite.
Properties of CFLs 275
Deciding w ∈ L for CFLs(The CYK-Algorithm)
Since our machine model corresponding toCFLs - PDA - is non-deterministic, in con-trast to the case of regular languages it is farfrom obvious how to define an algorithmthat decides
w ∈? L L Context-Free Language
Notice that attempts to ‘simulate’ thePDAconstructed in Theorem 16, must be ablecorrectly to determine when an inputw isnot in L within a finite number of steps, i.e.such simulations must recognise when aderivation cannot result inw.
One method is to start a new ‘guess’ whenthe current one hasmore than |w| symbols.
This approach, however, is very inefficient.
276 Properties of CFLs
The CYK-AlgorithmThe algorithm we now describe was discov-ered independently byCocke,Younger, and,Kasami.
This takes aCFG,G = ( V, Σ, P, S)
in CNF together with a word w ∈Σ+, anddecides ifw ∈ L( G ) using adynamic pro-grammingapproach.
Supposew = w1 w2
. . .wn ∈Σn.Define the subset ofvariables, Di , j byDi , j = X ∈V : X ⇒(*)
G w j w j+1. . .w j+i−1
So Di , j is the set of variables from which thesubword of w starting at w j and havinglengthi , can be derived.
Obviouslyw ∈ L(G) ⇔ S ∈ Dn,1.
The CYK-algorithm works by computingeach of the subsetsDi , j , where 1≤ i ≤ n and1≤ j ≤ n − i + 1
Properties of CFLs 277
First consider the cases,D1, j (1≤ j ≤ n).
The definition of ofD1, j indicates that thisshould contain those variables ofV whichderive the subword ofw starting atw j andhavinglength 1, i.e.w j
But, w j is a single terminal symbol, and soD1, j simply containsthose variables X ∈V,for which X → w j ∈ P.
What about the cases wherei > 1?
The key observation in this case is that:
X ∈ Di , j ⇔∃ k ∃ Y ∈ Dk, j , Z ∈ Di−k, j+k
X → YZ ∈ P
i.e. X ⇒(*)G w j w j+1
. . .w j+i−1 if and only ifwe can find a positionk and variablesY, Zthat satisfy:
Y ⇒(*)G w j w j+1 . . w j+k−1 ( Y ∈ Dk, j )
Z ⇒(*)G w j+k w j+k+1
. . .w j+i−1 (Z ∈ Di−k, j+k)
X → YZ ∈ P
278 Properties of CFLs
This gives the complete algorithm as:
Initiation Stage
For each j (1≤ j ≤ n)D1, j : = X ∈V : X → w j ∈ P
General Step
for (i : = 2; i ≤ n; i + +)
for ( j : = 1; j ≤ n − i + 1; j + +)
Di , j : = ∅for (k: = 1; k < i; k + +)
Di , j : = Di , j ∪
X : X → YZ ∈P andY ∈ Dk, j , Z ∈ Di−k, j+k
Properties of CFLs 279
Example
Let G = ( V, Σ, P, S) with
V = A, B,C, S ; Σ = a, b
P =
S→AB, S→CB, S→SS
C→AS, A→a, B→b
Supposew = a ab a b a
Step 1:D1, j
w j
j = 1 j = 2 j = 3 j = 4 j = 5 j = 6a a b a b bA A B A B B
280 Properties of CFLs
Step 2:D2, j
Here we useD1, j to consider possiblederivations of the subwords,aa, a, ba, a, ba:
w j w j+1
j = 1 j = 2 j = 3 j = 4 j = 5aa a ba a ba∅ S ∅ S ∅
For D2,2 the subword a of length 2 startingat position 2, we have A→ a (D1,2) andB → b (D1,3). P contains S→ AB, thusS ∈ D2,2.
Properties of CFLs 281
Step 3:D3, j
D1, j , and D2, j are used to consider the sub-wordsaab, aba, bab, and abb
w j w j+1w j+2
j = 1 j = 2 j = 3 j = 4aab aba bab abbC ∅ ∅ ∅
In D3,1 aab has A→ a (D1,1), S⇒(*) a(D2,2) with C → AS in P, giving C ∈ D3,1.
Step 4:D4, j
w j w j+1w j+2w j+3
j = 1 j = 2 j = 3aaba abab babb∅ S ∅
Using D2,2 (S⇒(*) a) and D2,4 (S⇒(*) a)with the productionS→ SS).
282 Properties of CFLs
Step 5:D5, j
w j w j+1w j+2w j+3w j+4
j = 1 j = 2aabab ababbC ∅
A→ a (D1,1), S⇒(*) abab(D4,2), C → AS.
Step 6:D6,1
We now find S ∈ D6,1 fromC ⇒(*) aabab(D5,1)
B → b (D1,6)S→ CB
and so conclude thataababb∈ L(G).
Properties of CFLs 283
The complete tableDi , j is given below:
w1 w2w3w4w5w6 = a ab ab b
ji 1 2 3 4 5 61 A A B A B B2 ∅ S ∅ S ∅3 C ∅ ∅ ∅4 ∅ S ∅5 C ∅6 S
284 Properties of CFLs
COMP209
Automata and Formal Languages
Section 9
Deterministic Context Free Languages:Properties and Applications
Deterministic CFLs 285
Deterministic Pushdown Automata
The definition ofPDA - the machine modelthat exactly describes the class of Context-Free Languages - allowednon-determinismin specifying the transition functionδ , i.e.for a PDA,
M = ( Q, Σ, Γ,δ , q0, Z0, F )given,
q ∈Qα ∈Σ ∪ ε γ ∈Γ
there could beseveral possible outcomesfor δ ( q,α ,γ ).
A Deterministic Pushdown Automaton,(DPDA) is a PDA for which there isat mostone possible move available at each step.
286 Deterministic CFLs
More formally, in a DPDA,M = ( Q, Σ, Γ,δ , q0, Z0, F )
The transition function is a mapping,δ : Q × Σ ∪ ε × Γ → Q × Γ*
such that
d1) For eachq ∈Q, σ ∈Σ, γ ∈Γ, a DPDA,M when in stateq with σ the next inputsymbol andγ the stack top, has at most one(state,word) pair (<q’, u>, say) that definesits next move.d2) If there is a move defined for (q ,σ ,γ )with (σ ∈Σ) then there isno move definedfor ( q, ε ,γ , ).
Condition (d2) means that aDPDA cannever make a choicebetween performing anε -move (when the stack top symbol isγ )and reading the next input σ (when thestack top symbol isγ ). At most one of
δ ( q,σ ,γ ) andδ ( q, ε ,γ )has a defined outcome.
Deterministic CFLs 287
Example
The first example of aPDAwe gav e, i.e.
q0 q1
q2
(0,0,00)
(0,#,#0)
(1,0,ε)
(ε,#,#)
(1,0,ε)
is, in fact, aDeterministic PDA.
288 Deterministic CFLs
Deterministic Context-Free Languages
A DPDA is a restricted form ofPDA and wehave seen in Theorem 16 thatThe class of languages recognised byPDA
≡The class of languages generated byCFGs
≡defThe Context-Free Languages (CFLs)
So it is certainly the case thatThe class of languages recognised byDPDA
⊆Context-Free Languages (CFLs)
If L ⊆ Σ* is recognised by aDPDA, then Lis said to be aDeterministic Context-Free Language(DCFL)
Question: Is DCFL = CFL?
Answer: No.
Deterministic CFLs 289
CFLs not Recognisable byDeterministic Pushdown Automata
We now dev elop arguments that can beapplied to show someCFLs are not DCFLs.
We first provide some motivation for theapproach used.
SupposeL ⊆ Σ* is aCFL and a proof thatLis not aDCFL needed.
How could such a proof be constructed?
The only methods we have dev eloped, sofar, that can establishL is not in some classof languages are
Pumping LemmataUnfortunately, such a technique specificallyfor DCFLs has yet to be discovered.
We must consider rather ‘indirect’ methods.
290 Deterministic CFLs
Theorem 21:The class,DCFL, ofDeterministic Context-Free Languages
over Σ is L ⊆ Σ* : L ∈CFL andΣ* − L ∈ CFL
i.e. those Context-Free Languages whosecomplement is also a Context-Free Lan-guage.
Proof: Omitted.
Theorem 21 shows thatL is recognised by aDPDA, M , if and only if Co− ( L ) is recog-nised by someDPDA, M’ .
Since we know from Theorem 19(b), thatthe complement of an arbitraryCFL, L, maynot be aCFL, we may deduce
DCFL ⊂ CFL.
Deterministic CFLs 291
Further Properties of DCFLs
The fact that aDPDA for a DCFL, L, can bechanged to anotherDPDA recognisingwords that arenot in L, is only one differ-ence that restrictingPDAs to be determinis-tic creates, there are a number of others.These are summarised in
Theorem 22:The classDCFL is not closedunderany of the operations:
Union (∪)Intersection (∩)Concatenation (⋅)
*-Closure (* )i.e. if ⊕ is any of the first three, then thereareDCFLs, L1, L2 for which L1 ⊕ L2 is nota DCFL; there areDCFLs, L, for which L*
is not aDCFL.
Proof: Omitted.
Recall thatCFL are closed under all but thesecond (∩).
292 Deterministic CFLs
Discussion
One of the principal ideas underpinning thetheory of Formal Languages and Automatathat we have been emphasising is that of a
‘hierarchy’ of ‘ language classes’matching exactly with a
‘hierarchy’ of ‘ machine capabilities’matching exactly with a
‘hierarchy’ of ‘ formal grammar rules’
So we have, so far:
Language Machine Grammar RulesRegular DFA≡NDFA V→ σ ; V → σ W
CFL PDA V → σ ; V →UW
withRegular⊂ CFL ⊂ . . . (?).
To these have been added,
DCFL DPDA≡PDA V→ σ ; V →UW
with, now,Regular⊂ DCFL ⊂ CFL ⊂ . . . (?)
Deterministic CFLs 293
We may further analyse the‘machine hierarchy’
Machine OrganisationDFA Finite Memory
≡ NDFA Finite Memory< DPDA 1 (unbounded) Stack< PDA 1 (unbounded) Stack
Nevertheless, we view the ‘basic hierarchy’(built so far) as
Regular Languages (Lowest)Context-Free Languages (‘next level’)
. . .≡
DFA (Simplest machine type)PDA (‘next level’)
. . .
Question: Why do wenot choose to viewDCFL ≡ DPDAs
as the ‘second’ l ev el?
294 Deterministic CFLs
Answer(s)
a) The ‘machine hierarchy’ has beenexpressed in terms of increasingmemorycapabilities: from only finite, i.e. indepen-dent of input word length, through to a sin-gle stack whose size can grow arbitrarily.
It is not described in terms of ‘change ofprogram state’ abilities, i.e. determinism,non-determinism, etc.
The distinction betweenDCFLs and generalCFLs, arises through the latternot the for-mer.
b) While there is a formal grammar charac-terisation ofDCFLs, its definition is some-what ‘contrived’ in comparison withRLGs(for Regular) andCNF (for Context-Free)Languages.
Deterministic CFLs 295
Of course this interpretation is, arguably,rather arbitrary.
One side-effect of it is that the question ofdeterministic
versusnon-deterministic
is seen as relating to‘program’ models
for specific‘machine’ types.
e.g. in finite memory machines (i.e. FA)the language recognition capabilities of
non-deterministic (NDFA)and
deterministic (DFA)programs (i.e.δ ) are identical.
In machines with a single unbounded stack,deterministic programs (DPDA)
are ‘less powerful’ thannon-deterministic programs (PDA).
296 Deterministic CFLs
Applications of DCFLs:Programming Language Description
and Syntax Analysis
The earlier discussion of Context-FreeGrammars noted that one important applica-tion of these in Computer Science was in thearea of defining
Programming Language Syntax
In fact most (if not all) widely used High-Level Programming Languages are such thatthe set of all
syntactically correctconstructs in the language is exactlydescribed by somedeterministic CFL.
Why is this significant?
Deterministic CFLs 297
a) Any practical mechanisms for deciding ifw ∈ L (for L a DCFL)
are of considerable importance inCompiler Development.
since, these make it possible to determine ifa program (and/orprogram statement) is avalid construct in the language.
In addition, if a HLL is described by anappropriate grammar generating aDCFLand we have a method of automaticallybuilding a parser for any (suitable)DCFLthen this provides ageneral technique thatcan be used to build the syntax analysisstage forany (suitable)HLL.
[ So, e.g. minor changes to a language’s def-inition need not necessitate developing anew syntax analyserfr om scratch: one canbe constructed from theparser generatorusing the new language grammar.]
298 Deterministic CFLs
b) Using an appropriate formal grammarprovides a concise unambiguous descriptionof valid language statements. Hence, indi-viduals new to the language have a definitivereference against which not only to check
howa particular construct should be describedbut also with which to determine precisely
whythe compiler has indicated a program state-ment to be syntactically incorrect.
Of course, these could all be achieved usingCFGs and the techniques that have alreadybeen described.
What is gained by usingDeterministic CFLs?
Deterministic CFLs 299
Consider the method that was presented fortesting if w ∈ L for L a Context-Free Lan-guage — the CYK-Algorithm.
Why is this ‘unsuitable’ as a ‘practical’syntax checking method for HLL compilers?
a) It requires aCFG inChomsky Normal Form. (CNF)
b) It is ‘too slow’ for ‘practical’ purposes.c) Knowing that a statement is syntacticallycorrect is not, in itself, sufficient: to be use-ful a description of how the statement isgenerated from the grammar is required, i.e.a parser for L should not only decide ifw ∈ L but also return a
derivation treecertifying this.While thiscould be extracted using the tablein the CYK-Algorithm,CNF is not the mosttransparent form for
describingor processingprogramming language syntax.
300 Deterministic CFLs
Example
Consider the ‘expression’CFG defined ear-lier, i.e.
E → (E) | E op E |opdop → + |− | * | /
opd → numnum → digit |digit numdigit → 0|1|2|3|4|5|6|7|8|9
In CNF (having removed the unit produc-tions) this becomes
E → EL CR | Eop E |digit num |0|1|2|3|4|5|6|7|8|9EL → CL EEop → E opop → + |− | * | /
num → digit num |0|1|2|3|4|5|6|7|8|9CL → (CR → )
digit → 0|1|2|3|4|5|6|7|8|9
Notice that theopd variable of the originalis redundant.
Deterministic CFLs 301
Parsing Techniques forDCFLs
We conclude the section on Context-FreeLanguages by presenting a brief overview ofsome standard methods that have beendeveloped for parsing words whenL is aDCFL.
By a parser for L a DCFL we mean amethod that with aCFG
G = ( V, Σ, P, S)definingL, takes:
Input : w ∈Σ*
and returns as output:
Somederivation tr ee in G with yield w, ifw ∈ L.An error message ifw ∈ L.
302 Deterministic CFLs
Overview of Parsing Methods
GIven aCFGG = ( V , Σ, P, S)
describing someDCFL, L, there are twogeneral approaches one could use to test ifw ∈ L( G ):
Bottom-up ParsingBuild a derivation tree starting from thesymbols inw as the leaves of such a tree.
Top-down ParsingTry to construct a derivation tree with yieldw starting fromS (labelling the root of sucha tree).
While there are a number of different algo-rithms that have been used in compilers inpractice any of these can be described interms of one of these approaches.
Deterministic CFLs 303
Shift-Reduce Parsers
These are one class of bottom-up parsers:
Givenw = w1w2w3
. . .wn ∈Σ* ,such methods searchw for some subword
wi wi+1. . .w j
for which X → wi. . .w j is a production in
the grammar.
This scanning process can then continuewith the word
w12. . .wi−1 X wj+1
. . .wnuntil either the start symbolS results(w ∈ L(G)) or no further productions areapplicable.
Of course, in order to be effective some pol-icy for organising the subword search mustbe employed.
One such policy is to search forhandles.
304 Deterministic CFLs
SupposeX → α is some production andk aposition in a word u. The pair <X→α , k >is ahandleof the wordu ∈( V ∪ Σ* ) if
a) u = vα w v ∈( V ∪ Σ* ), w ∈Σ*
b) ∃ a right-most derivation S ⇒(*)G v X w⇒G u
Shift-Reduce parsers can be viewed as con-structing a sequence of right-mostderivations:
< un, un−1, un−2 , . . . ,u1, u0 >where,w = un
un−1 ⇒G un
. . .
uk ⇒G uk+1. . .
S = u0 ⇒G u1
Deterministic CFLs 305
Example
Using,Σ = +, *, (, ), id ; V = E ; S= E,
P =
E → ( E ), E → id
E → E + E, E → E * E
Suppose,w = id + id * id,
k uk Handle5 id + id * id < E→id, 1 >4 E + id * id < E→id, 3 >3 E + E * id < E→id, 5 >2 E + E * E < E→E * E, 3 >1 E + E < E→E + E, 1 >0 E Accept.
Notice that uk ⇒ uk+1 using a right-mostderivation.
306 Deterministic CFLs
Problems with Shift-Reduce Methods
There are two main problems to be solved inimplementing Shift-Reduce methods forparsing:
Identifying an appropriate handle inwDeciding which production to apply.
For manyCFGs there may well be words forwhich a unique handle is not defined:selecting the ‘wrong’ handle (subword andproduction) could lead to the reduction pro-cess rejecting a word.
One major advantage ofDCFLs for definingprogramming languages is that for anyDCFL, L, there is aCFG, G, with L(G) = Land for any w ∈ L there is aunique right-most derivation, S⇒(*)
G w.
This class of grammars forDCFLs arecalled LR(k) grammars (Left-Right scanwith k symbols lookahead).
Deterministic CFLs 307
Top-down Parsers
In a top-down parser the aim is to produce aderivation tree for w be searching for anappropriate sequence of production rulesstarting fromS.
These can be though of as building aleft-mostderivation.
SupposeG = ( V, Σ, P, S) is aCFG.
One, ‘simple’, method that can be used toconstruct a parse forG is to implement aseparate method for each variableX ∈V.
Such a parser scans along an input (onesymbol at a time) using invoking the meth-ods for variables that are required.
308 Deterministic CFLs
Problems with Top-down Parsers
For any ‘non-trivial’ CFG, methods will berecursive. This can create two problems:
a) If more than one production could betested as a possible derivation step thenbacktracking may be needed.
b) Productions of the formX → X u (‘ left-recursion’) must be eliminated to removepossibility of indefinite recursion.
A standard technique (which a number ofyou may see in COMP204 next semester) isthat called ‘recursive descent’ parsing.
These are top-down parsers organised sothat recursive backtracking is neverrequired.
Problems of ‘left-recursion’ when definingprogramming languages are avoidable bycareful restructuring of the grammar.
Deterministic CFLs 309
COMP209
Automata and Formal Languages
Section 10
Turing Machines
Computability and Decidability
310 Turing Machines
Introduction
The three languages, 1 k : k is a prime number
1 k 0k2: k ≥ 1
an bn cn : n ≥ 1 have been shown earliernot to be Context-Free Languages.
So, with the machine models that have beenconsidered, -
Finite AutomataPushdown Automata
it is not possible to describe aprogram (i.e.state transition function,δ ) that recognises
all of the words in eachand
only the words in each
irrespective of whether such a program isdeterministic or non-deterministic.
Turing Machines 311
Yet, thereare algorithms thatcan:
given 1k, decide ifk is a prime.
given 1i0 j , decide if j = i2.
given ai b j ck, decide ifi = j = k.
Thus, if we wish to be able to associatesome ‘minimal’ capability ‘machine model’with each ‘recognisable’ l anguage, it isclear that some model that is ‘more power-ful ’ thanPDA is required.
Equivalently, some formal grammar typethat is more ‘expressive’ than Context-FreeGrammars is needed.
312 Turing Machines
In order to motivate the ‘next’ (and final)level of machine model, consider in whatways PDAorganisation is limited:
A PDA reads its input word,w, onceonly.
What alternatives for processing are there?
1) w (or a subword) could be saved on thestack for ‘later decision making’.
Problem: there may an arbitrary number ofstack symbols that need to saved on thestack in order for ‘correct’ decisions to bemade:
In a finite number of states andone stack aPDAcannot record for
simultaneoususe,
both an arbitrarily long subword ofwand an arbitrarily long ‘stack word’.
Turing Machines 313
Informal Example
A (very!) informal argument that 1k0k2is
not context-free, could be derived by consid-ering how the sub-word 1k is treated:
A PDA has to recogniseexactly k copies of0k once 1k has been read.
But k can be arbitrarily large, so its valuecannot be recorded using afinite number ofstates.
Thus, the value ofk must somehow be‘remembered within the stack’:
For example,
a) Push the sequence of 1s onto the stackand after each sequence ofk 0s popexactlyone 1 from the stack, repeating until thestack is empty and no more input left.
314 Turing Machines
Problem
It is not possible to count tok using afinite state set
withoutdestroying the value ofk on the stack
(i.e. the 1k value).
No matter what approach is used, at somestage the value ofk must be ‘remembered’
outsideof the stack
The only ‘available resource’ is the finitestate set.
But this cannot store an arbitrarily largeamount of information.
Turing Machines 315
Turing Machines
The machine model we introduce now - theTuring Machine - defines the ‘most power-ful ’ class of machine capabilities.
As with the earlier models -DFA andPDA -its operations are defined through a finiteprogram: i.e. state transition procedure, itsincreased power comes through a develop-ment of its memory organisation.
Although it may not be clear at first, weshall see that this development has a naturalinterpretation within our ‘hierarchy’ ofmachine capabilities.
316 Turing Machines
Turing Machine Overview
M
x1 x2 xkxk−1 xk+1 xn B...... ...... B
Input x1. . . xn stored in the firstn locations.
Locations which have not been used (yet)hold aBlank symbol.
In a single move, M can:
read the current location (xk);write into this.;go to the location to the left (xk−1)or the location to the right (xk+1)
The starting location isx1.
Turing Machines 317
Example of Turing Machine Program (M)
q0
q1 q2
q3 q4
q5q6qA qR
(0, #,R)
(1, #,R)
(σ ,σ , R)
( B, %, %, L)(1, 1,R)
(0, %,L)
(σ ,σ , L)
(#, #,R)(%, %,R)
(1, %,L)
(0, 0,R)( B, %, %, L)
(0, #,R)
(1, #,R)
(σ ,σ , R)
Alphabet: 0, 1, #, % ; (σ ∈ 0, 1 )
Edge fromqi to q j labelled (x, y, L) indi-cates: if reading symbolx in state qi :replace it with symboly; move to symbol onLeft and enter stateq j .qA: Uniqueacceptstate.qR: Uniquereject state.
318 Turing Machines
Formally, a Turing Machine, (TM), M , isdescribed by a septuple,
M = ( Q, Σ, Γ, q0, B,δ , qA, qR )
Q: finite set ofstates;Σ: input alphabet;Γ: tapealphabet,Σ ⊂ Γ;q0: Initial state;B ∈Γ: Blank symbol;δ : Q × Γ → Q × ( Γ − B ) × L, RqA: Halt and acceptstate;qR: Halt and reject state;
The input word w ∈Σ* occupies the first |w|locations (orcells) of an infinite tape.
The tape is scanned by atape head(posi-tioned at cell 1 to start), that can move onlyone cell to the Left or Right after each move.
w is acceptedif M reaches stateqA.w is rejected if M reaches stateqR.
Important : M may fail to reach either.
Turing Machines 319
Turing Machine Operation
The actions of aTM, M , are completely pre-scribed by its transition function,
δ : Q × Γ → Q × ( Γ − B ) × L, RNotice that this isdeterministic.
For each combination of stateqi and symbolσ ∈Γ, δ defines:
The symbolγ ∈Γ − B to be written;Whether the tape head should move to theLeft orRight.The next state to enter.
M starts its operation in stateq0 with thetape head scanning the symbol in cell 1, i.e.the first symbol ofw. and any cell whichdoes not contain aninput symbol containstheBlank symbol.
This cannot bewritten to the tape (or appearinside the inputw)
320 Turing Machines
TM Configurations andInstantaneous Descriptions
Informally, the language accepted by aTM , M , comprises thosew ∈Σ* , uponwhich M reaches its halt and accept state,qA.
To make this precise, the concept of aTMconfiguration or
instantaneous description(ID)is used.
An ID of aTM,M = ( Q, Σ, Γ, q0, B,δ , qA, qR )
on inputw = w1 w2
. . .wn ∈Σ*
is a word of the from,c1 c2
. . .ck−1 q ck ck+1. . .cm B
whereci ∈Γ − B and q ∈Q.
Turing Machines 321
This records the information that:
M is currently in stateq;The tape head ofM is scanning thek’th cell.The i ’th tape cell contains the symbolci ∈Γ − B ( 1 ≤ i ≤ m).
So theinitial configuration of M on inputw (ID0) is,
IDw0 = q0 w1 w2
. . .wn B
Suppose,ID r = c1
. . .ck−1 qi σ ck+1. . .cm B
and thatδ ( qi ,σ ) = ( q j ,γ , D ) D ∈ L, R
The next configuration ofM (ID s) afterapplyingδ , will be
ID s =
c1. . .q j ck−1 γ ck+1
. . .cm B if D = L
c1. . .ck−1 γ q j ck+1
. . .cm B if D = R
ID r —M ID s denotes this withID i —(*)M ID j
indicating there is asequenceof moves ofM leading toID j starting fromID i .
322 Turing Machines
Languages Accepted by Turing Machines
Given,M = ( Q, Σ, Γ, q0, B,δ , qA, qR )
letID (i) = IDs of M whose indicated state isqi ∈ Q
The languageL ⊆ Σ* acceptedby M , isL( M ) = w : ∃ ID k ∈ ID (A) with IDw
0 —(*)M ID k
i.e. all words upon which the initial configu-ration (IDw
0 ) will lead to a configurationindicating the halt and accept stateqA.
ImportantThere are a number of ‘technical subtleties’in this definition that are dealt with later. Byfar the most significant of these is the use of
languageacceptedrather than
languagerecognised,i.e. at this point we are not concerned withM ’s outcome forw ∈ L(M).
Turing Machines 323
Example
For the exampleTM, with input w = 0110:
k IDk δ ( q,α )0 q00110B (q1, #, R)1 #q1110B (q1, 1, R)2 #1q110B (q1, 1, R)3 #11q10B (q1, 0, R)4 #110q1B (q2, %, L)5 #11q20%B (q5, %, L)6 #1q51%%B (q5, 1, L)7 #q511%%B (q5, 1, L)8 q5#11%%B (q6, #, R)9 #q611%%B (q3, #, R)10 ##q31%%B (q3, 1, R)11 ##1q3%%B (q4, %, L)12 ##q41%%B (q5, %, L)13 #q5#%%%B (q6, #, R)14 ##q6%%%B (qA, %, R)
Thus, 0110 is accepted byM .
324 Turing Machines
Example (continued)
The exampleTM in fact accepts all words w ∈ 0, 1 + : w = u reverse(u)
i.e. even length palindromes.
Although this may not be immediately clear(from the state-transition graph), the ‘algo-rithm’ that M uses is:
a) Replace leftmost 0/1 with #;b) Check if this matches the rightmost 0/1
(q1 andq2 do this if 0 occurs);(q3 andq4 do this if 1 occurs);
c) If they matchthen replace the rightmost 0/1 with %;
d) Continue from (a) if any 0/1 symbolsremain.
e) If all symbols are replaced by # or %then accept.
Turing Machines 325
The ‘Machine Hierarchy’and Turing Machines
We now hav ethree basic machine models:Finite Automata
Pushdown AutomataTuring Machines
In each afinite length input word w is ‘pro-cessed’ by aprogram defined by astatetransition function , δ .
The available ‘actions’ of the programdepend on the precise memory regime asso-ciated with the model:
For FA: fixed amount of memory.
For PDA: oneunbounded stack.
For TMs: unbounded memory which a pro-gram can access ‘freely’.(i.e. without the regime imposed by a Stack).
326 Turing Machines
There are two questions that might be asked:
Question 1:
Why should removing the restrictionimposed by a
Stackorganisationbe significant?
i.e. why should‘one type’ of unlimited store
be ‘more powerful’ thananother type?
Question 2:
Wouldn’t a ‘more natural’ extension ofPDAbe to provide
asecond unbounded Stack?
Turing Machines 327
e.g. suppose we define a2-Stack Automa-ton (2 − SA), as
M = ( Q, Σ, Γ, q0, #,δ , F )(as for a deterministic PDA, having noε -moves.)
M hastwo unbounded Stacks,S1 andS2.
An input word w = x1. . . xn is held onS1
(x1 at thetop). with S2 holding only #.
The state transition function encodes movesδ : Q × Γ × Γ → Q × Γ* × Γ*
i.e.δ ( qi ,γ1,γ2 ) = ( q j , u1, u2 )
means
‘ in stateqi , if the top ofS1 is γ1 andS2 is γ2,replace these bywords u1, and u2 and go tostateq j .’
ui must be one ofε (‘pop’ top symbol);γ i(leave stack unchanged); α γ i (α ∈Γ)(‘push’ α onto stack).
328 Turing Machines
Theorem 23: L ⊆ Σ* is accepted by someTM, M if and only if L is accepted by some2 − SA, M’ .
Proof: (Outline)
1) TM ⇒ 2 − SA:Given M , aTM acceptingL, a2 − SA, M’ isconfigured so thatS1 always holds the con-tents of M ’s tape up to and including thesymbol currently scanned (which is at thetop), whileS2 holds the (non-blank) portionto the right of the current symbol.
The current configuration ofM is easilyreflected by appropriate moves in M’ .
Turing Machines 329
For example,ID = c1
. . .ck−1 qi σ ck+1. . .cm B
with δ ( qi ,σ ) = (q j ,γ , D)
S1 S2 becomes S1 S2σ ck+1 ck−1 γ
ck−1 ck+2 ck−2 ck+1. . . . . . . . . . . .c1 cm c1 cm# # # #
D = Left
S1 S2 becomes S1 S2σ ck+1 ck+1 ck+2
ck−1 ck+2 γ ck+3. . . . . . . . . . . .c1 cm c1 cm# # # #
D = Right
330 Turing Machines
2) 2− SA⇒ TM:A TM, M represents the content,γ i vi of Si(i ∈1, 2) of the 2− SA, M’ on its tapewith new symbols %,@ ∈Γ(M’ ) separatingthese, so that this has the form
%γ1 u1 @γ2 u2.M can recover both top of stack symbolsand remember these. The changes to the tapeof M reflecting a move of M’ are performedover sev eral stages, e.g.
IfS1 = c1 c2
. . .ckS2 = d1 d2
. . .dmthenM ’s tape holds
%c1. . .ck @d1
. . .dm BIf c1 is replace byu1 andd1 by u2 (|ui |≤ 2)
If u1=α c1, u2 = β d1, thena) M copiesd1
. . .dm 2 cells to the right;b) Insertsβ ;c) Copies @,c1 . .ck 1 cell right;d) Insertsα .etc for other possibilities. .
Turing Machines 331
Discussion
In addition to the fact that Theorem 23shows the increase in storage capabilities tobe exactly the same as providing asecondstack store, there is one point to be notedconcerning its proof:
in the second part we did not present adetailed description of how the TM transi-tion function,δ , was defined in terms of thatof the 2− SA: instead an
‘algorithmic ’overview of M was giv en.
It will in f act,alwayssuffice to describe spe-cific TM programs and actions using thislevel of description: i.e.
For design of aTM, M , if it i s possiblealgorithmically
to describeMs actions, then it may beassumed
that it is possible formally to defineδ for M .
332 Turing Machines
The Class of LanguagesAccepted by Turing Machines
Recall that the language over Σ acceptedbyaTM,
M = ( Q, Σ, Γ, q0, B,δ , qA, qR )comprises thosew ∈Σ* upon which M oninput w reaches the halt and accept stateqA.
Any L ⊆ Σ* for which there is aTM, M ,with L( M ) = L is called a
Recursively Enumerable (r.e.)language.
Any L ⊆ Σ* for which there is aTM, M ,such that
∀ w ∈ L M reaches the accept stateqA on inputw
∀ w ∈ L M reaches the reject stateqR on inputw
is called aRecursiveLanguage
Turing Machines 333
Terminology and Interpretation
The terms recursively enumerable andrecursive date from the earliest studies intothe questions:
a) Is it possible to construct ‘effective’ algo-rithmic methods for acceptingany lan-guage?
b) Can somespecific language beprovednot have an ‘effective’ acceptance algo-rithm?
c) Are different ‘general’ ‘machine models’equally ‘powerful’: i.e. are there ‘reason-able’ machine models that can accept lan-guages not accepted by ‘other reasonablemodels’?
Much of the remainder of this module isconcerned with these questions.
334 Turing Machines
Suppose a languageL ⊆ Σ* is recursivelyenumerable.
What does this mean in ‘computational’terms?
It means that there is analgorithmicmethod (‘program’) that:
1) Takes as input anyw ∈Σ* .2) If w ∈ L then this can be confirmed in
a finite number of steps
If L is recursive then there is an algorithmicmethod thatnot only reports if w ∈ L butalsoreports ifw ∈ L.
The TM, M , provides (a possible) algorith-mic method.
Turing Machines 335
For a language,L, to be recursive may beseen as a ‘minimal’ requirement in computa-tional terms:
For such languages, methods exists thatallow the status of any word, w with respectto L to be decided in a finite number ofsteps.
The term decidable is often used of lan-guages with this property, i.e.
Recursive ≡ Decidable
If L is r.e. but not recursive, the ‘best’ that ispossible is a method that willhalt andacceptanyw ∈ L within a finite time.
Such methods, however, mayfail to reach a halting state
when given wordsw not belonging toL.
The termsemi-decidableis sometimes usedto describe such languages.
336 Turing Machines
Formal Grammars and r.e. Languages
Turing machines have been presented as the‘final’ l ev el of the hierarchy of ‘black-box’capabilities.
Implicit in this description ofTMs are sev-eral claims:
a) If a languageL is not r.e. (or notrecur-sive) then no ‘reasonable’ ‘ extension’ ofTM capabilities will provide an ‘effective’model that can be used to accept (or recog-nise)L.
b) A class of formal grammars correspond-ing to the r.e. languages ought to correspondto the ‘mostexpressive’ f orm.
Turing Machines 337
Thus (a)+(b) can be viewed as stating:
Intuitively, the class of formal grammarscorresponding to r.e. languages, should besuch that:
For L ⊆ Σ* there issomeformal grammar,G, with L( G ) = L
if and only ifL is recursively enumerable
(i.e. there is aTM, M , acceptingL).
Of course, given that we have definedclasses of formal grammar so far byrestricting the form of allowable productionrules, the ‘most expressive’ class of gram-mars can only be those for which
no restrictions whatsoever areplaced on the form ofgrammar productions
338 Turing Machines
Unrestricted Grammars
G = ( V, Σ, P, S) is an unrestricted formalgrammar when productionspi ∈ P take theform,
Li → RiLi ∈( V ∪ Σ )+
(Li containing at least one variable fromV)Ri ∈( V ∪ Σ )*
As before, w ∈Σ* , is in the language,L( G ), generated byG, if S⇒(*)
G w.
A further justification for the machine modeldefined by Turing machines is
Theorem 24: L ⊆ Σ* is r.e. if and only ifthere is a formal grammar (i.e. unrestricted)G = ( V, Σ, P, S) for whichL( G ) = L.
Proof: (Omitted). .
Turing Machines 339
SummaryLanguage, Machines, Grammars
The Chomsky Hierarchy
Theorem 24 ‘completes’ our ‘hierarchy’ as:
The Chomsky Language HierarchyType Name Grammar Machine
3 Regular RLG FA2 Context-Free CFG PDA0 r.e. Unrestricted TM
We hav eshown (earlier),
Regular⊂ CFL ⊂ r . e
≡FA < PDA < TM
≡RLG < CFG < Unrestricted
Of course this table raises some questions.
We first deal with the ‘most obvious’ one.
340 Turing Machines
What about ‘Type 1’ Languages?Digression
(Context-Sensitive Languages)
There is, in fact, a ‘layer’ of this hierarchythat falls strictly betweenCFLs (Type 2) andr.e. languages (Type 0).
There are several technical reasons whythese are not treated in any depth within thismodule and they are discussed now only forreasons of completeness.
This ‘missing’ level is the class ofContext-Sensitive Languages(CSL)
the corresponding grammar class (CSG)imposing the restriction that productionrulesLi → Ri must satisfy
Li ∈( V ∪ Σ )+ ; Ri ∈( V ∪ Σ )+ ; | Ri | ≥ | Li |
i.e. whenever u ⇒(*)G w, |w| ≥ |u|.
Turing Machines 341
The matching machine model (LBA - LinearBounded Automata) forCSLs is ‘similar’ toTuring Machines, but with the following dif-ferences:
a) The transition function isnon-determin-istic
b) Only the space occupied by the input isavailable. (thisof course, can be used in anyway consistent withTM capabilities: over-written, scanned repeatedly, etc)
The (non-Context-Free) languages:
1 k : k is a prime number
1 k 0k2: k ≥ 1
an bn cn : n ≥ 1
areall CSLs.
342 Turing Machines
So why ‘ ignore’ CSLs?
a) Important as this class is, in comparisonwith the other 3 levels very little is knownabout properties ofCSLs, e.g.
1) Whether deterministic LBA acceptexactly the same languages asnon-deter-ministic LBA is a major open question, cf.DCFL vCFL.
2) Closure under complementation has onlyrecently (1988) been proved: in other classes‘most’ questions were resolved between1935 and 1970.
b) Very few ‘natural’ examples of r.e. lan-guages that arenot CSLs are known andproofs arehighly non-trivial
c) The machine restriction suggests thatissues regarding CSLs, in particular provingthat someL ∈CSLare more ‘naturally’ con-sidered withinComputational ComplexityTheory (COMP202) rather than Automataand Formal Languages.
Turing Machines 343
Example
The following are recursive but not CSLs:Σ = ∃, ∀ , = , +, ’ , (, ), x, ∧, ∨, ¬, 0, 1
Th+N =
w :
w is a well-formed First-Order
sentence about addition over Nwhich istrue
e.g.∀ x ( ∃ x’ ( (x’ + 1) = x) ) ∨ ( x = 1 )
[for any positive integer (x): either there is apositive integer (x’) less than it (x’ + 1 = x)or the integer is 1].
Σ = 0, 1, (, ), *, ⋅, ∩, +, ¬
TOT =
w :
w is a well-formed
‘extended’ regular expression
for which L( w ) = 0, 1 *
an ‘extended’ regular expression being onein which additional operations∩ (intersec-tion) and ¬ (complement) may be used.
344 Turing Machines
Turing Machines and Computationof Arithmetic Functions
With the exception of ‘finite state transduc-ers’, at the start of this module, the view ofcomputation that has been presented hasbeen in terms of,
‘Given a (description) of some language, Lover Σ, and a word,w, determine if w∈ L’
When compared with ‘real’ computations,this view may look rather over-simplifiedand contrived, e.g. what does it say aboutcomputing arbitraryarithmetic functions
f : (N ∪ 0) k → ( N ∪ 0 ),i.e. where the argument comprisesk non-negative integers and the result is a non-neg-ative integer?
Let Σ = 0, 1 so that values are written inunary, with 0 used to separate arguments,e.g. the values 2, 0, 5, 17, 0, are ‘coded’
0 11 0 0 11111 0 11111111111111111 0
Turing Machines 345
We may view a Turing machine,M , whoseinput words take the form
# x1 # x2 # . . . # xk−1 # xk
(with xi ∈1 * )
as computingsome functionf : (N ∪ 0) k → N ∪ 0
How is this done?
By interpreting thecontent of M ’s tape,whenM halts as the result.
For example,
Forf ( x1, . . . ,xk ) = y,
M could write the symbol 1 in each of thefirst y locations, and the symbol 0 in all theremaining (non-blank) cells.
Notice that only the computationhaltingdetermines the result,not if this is in qR orqA.
346 Turing Machines
In order to formalise these ideas, letM = ( Q, Σ, Γ, q0, B,δ , qA, qR )
with Σ = 0, 1 , Γ = 0, 1, B ,
M is said to compute the functionf : (N ∪ 0) k → N ∪ 0 if
∀ w ∈ 0 ⋅ 1 * k,w = 0 1x1 0 1x2 0. . .0 1xk
∃ u ∈ 0 * , such thatM halts with the word 1yu
written on its tape,if and only if
f ( x1 , . . . ,xk ) = y
We use f (k)M ( x1 , . . . ,xk ) to denote thek-
argument function computed by theTM, M .
Turing Machines 347
Partial and Total Computable Functions
It is not difficult to constructTMs that com-pute all of the standard one and two argu-ment functions e.g.
m + n ; m * n ; 2n
m − n ; m/n ; log n Obviously for the first 3 functions, there isalwaysa uniquely defined result.
What about the cases,m − n (whenn> m) m/n (whenn = 0) log n (whenn = 0)
though?
For such as these we have to distinguishbetweentotal functions - with a defined out-come onev ery point of their domain - andpartial functions - which for some argu-ments maynot have a defined result.
348 Turing Machines
Functions with possibly undefined resultsand the respective TM computations are dis-tinguished in the following,
Definition:f : (N ∪ 0) k → N ∪ 0
is said to be apartial r ecursive function ifthere is aTM, M , for which
f (k)M ( x1 , . . . ,xk ) = f (x1 , . . . ,xk )
whenever f has a defined result for< x1 , . . . ,xk >.
f : (N ∪ 0) k → N ∪ 0is a total recursive function, if there aTM,M , for which
f (k)M ( x1 , . . . ,xk ) = f (x1 , . . . ,xk )
and f is definedfor every < x1 , . . . ,xk >.
Turing Machines 349
Discussion
The first studies of ‘computability’ werecouched in terms of decribing what we havedefined as ‘partial recursive’ f unctions: the(archaic sounding) term
Recursive Function Theorystill survives as the general name for thisfield of research.
With some minor technical development it iseasy to translate between the (superficially)different concepts,
r.e. language↔ Partial recursive functionRecursive language↔ Total recursive function
When dealing withlanguagesand testingmembership, we use
Decidable
When dealing with functions and theirev aluation, we use
Computable
350 Turing Machines
The Church-Turing Hypothesis(and its Implications)
Turing Machines were described as themost ‘powerful’
machine class within the ‘hierarchy’ ofmachine types.
In the commentary following Theorem 23, itwas claimed that
‘When designing aTM , M: if it i s possiblealgorithmically
to describeM actions, then it may beassumed
it is possible formally to defineδ for M.’
We hav e further seen that, the class of lan-guages accepted by someTM, is exactly theclass of languages for which some formalgrammar can be defined.
Turing Machines 351
Finally, it has been argued thatTMs providea ‘natural’ mechanism for computing thevalues of multi-argument numeric functions.
Question
What basis is there for accepting the firsttwo assertions?
Certainly there are a number of reasons whythese may appear to be ‘exaggerations’,
For example:
a) the available ‘actions’ arevery limited: a‘program’ can change a single symbol at atime; can only ‘move’ l eft or right of its cur-rently scanned tape cell;
b) ‘real’ computers have a much more ver-satile set of ‘instructions’ provided.
352 Turing Machines
It is, however, not of any importance, thatthe actions available to aTM program (i.e.δ ) are rather limited (by comparison with a‘typical’ processor instruction set).
What matters and goes some way to justifythe assertions made, is that:
The capabilities of Turing Machinesare sufficient
tosimulate
the the operation ofany ‘ real’ computer,whether we look at such as
carrying out a decision processs(i.e. recognising a language)
orCalculating a function.
Turing Machines 353
The formal definition of Turing machines,can be seen as embodying the followingfundamental assertion
All ‘reasonable’models of computationindefining ‘effective’ algorithms must be suchthat:
a) ‘Programs’ specified in the model arefinite
(one cannot ‘write’infinite programs).
b) A ‘program’ may only employ ‘opera-tions’ that are within the capability providedby the ‘model’.
c) There can only be afinite number of‘basic’ operations provided within any suchmodel.
TheChurch-Turing Hypothesis is a preciseformulation of this assertion.
354 Turing Machines
The Church-Turing Hypothesis
Version 1: (Decidability)If L ⊆ Σ* is accepted within some ‘reason-able’ model of computation thenL is recur-sively enumerable.i.e. there is aTM, M , acceptingL.
If L ⊆ Σ* is recognisedwithin some ‘rea-sonable’ model of computation theL isrecursive.i.e. there is aTM, M , accepting allw ∈ Land rejecting allw ∈ L.
Version 2: (Computability)If f : ( N ∪ 0 ) k → N is a partial functionthat can be computed within some ‘reason-able’ model of computation thenf is apar-tial recursive function.i.e. there is aTM, M , with f (k)
M ≡ f .
If f is a total function computed withinsome ‘reasonable’ model of computationthen f is atotal recursive function.
Turing Machines 355
Discussion
The Church-Turing Hypothesis may be sum-marised, informally, as:
"any ‘computational problem’ (functioncomputation, language membership) thatcan be ‘effectively programmed’ within some‘reasonable’ model, can be ‘solved’ using aTuring Machine."
This hypothesiscannotbeproved,[ it assumes some ‘intuitive view’ of what‘ reasonable model’ means: there is,however,noprecisedefinition that can capture this.]
There is one consequence of the CTH thatis, perhaps, not immediate from the wordingin terms ofTMs being able to replicate theoperation of any ‘ reasonable model’.
356 Turing Machines
Suppose it can be proved that some lan-guage,L, is not recursive (resp.not recur-sively enumerable).
What would such a result suggest whentaken in conjunction with the Church-Turinghypothesis?
It would indicate thatno effective algorithm whatsoever
could be found to recogniseL (resp. acceptL)
Thus, a proof thatL is not recursive is muchmore than a result about ‘technical limita-tions’ of Turing Machines:
in view of CTH, such a proof constitutes ademonstration of theimpossibility of recog-nising L by any ‘ realistic’ algorithm.
Turing Machines 357
Supporting Evidence for CTH
Although the Church-Turing hypothesis can-not be proved, its validity has not been‘seri-ously’ challenged since its formulation in1936.
Since then there have been a great numberof different ‘models of computation’ pro-posed: many of these bear no resemblance to‘machine’ models or other ‘computer-like’systems.
So we have,λ-calculus
Post SystemsUnlimited Register Machines
Markov AlgorithmsGodel-Herbrand-Kleene Calculus
Horn ClausesQuantum Turing Machines
etc etc
358 Turing Machines
The λ-calculus will have seen by some ofyou in COMP205.
Horn Clauses are foundational to Logic Pro-gramming (COMP208).
Every one of these systems (and all othermodels that have been put forward) can beshown to beno more powerful than TuringMachines.
The Church-Turing Hypothesis justifies oursubsequent practice of developing TuringMachine descriptions using a
high-level algorithmic description.
instead of indicating the form ofδ .
The advantages of this approach will be seenin the next section.
Turing Machines 359
COMP209
Automata and Formal Languages
Section 11
Universal Turing Machines
360 Universal Turing Machines
Introduction
Even taking into the Church-Turing Hypoth-esis as a basis for the existence ofTM, ML ,that can accept/recognise any decidablelan-guageL, there is one facet ofTM descrip-tion which does not reflect ‘real program-ming’ practice.
Consider the process of realising, e.g. a Javaimplementation of some algorithm.
The program is (ultimately) interpretedwithin a single framework, i.e. the Java Vir-tual Machine.
It is obvious that one does not ‘create’ anew‘instance’ of such a machine for every pro-gram.
Universal Turing Machines 361
In other words, in ‘standard’ programmingenvironments there is a system,S, that:
Given a suitable description of any ‘valid’program P and input datax ‘controls’ theexecution ofP on x.
It is often the case that such systems may bedeveloped using the sameHLL as that of theprograms ‘controlled’.
For example,Compilers
Operating Systems (Unix and C)
The model forTMs, however, that we havepresented specifies the ‘program’,δ , as‘hard-wired’ into the machine.
In this section we construct auniversal Tur-ing Machine, i.e. asingleTM that can sim-ulate any giv en TM, M .
362 Universal Turing Machines
Universal Machines
We first give a precise definition of what willbe understood by the concept of ‘universalmachine’.
SupposeMC is some model of computation(e.g. Turing Machines,PDA, Java VM) andP some valid program in the modelMC (i.eδ , δ , Java code).
UMC is aUniversal Machine for MC, if
u1)UMC is anMC-program.u2) UMC takes as inputany MC-programPandany input w for the programP.u3)UMC halts on <P, w > ⇔ P halts onwu4) For language recognition,
< P, w > is accepted/rejected⇔
P accepts/rejectswu5) For function computation, the valuereturned byUMC equals that returned byPon inputw.
Universal Turing Machines 363
Thus, in the specific context ofTuringMachines, a Universal Turing Machine(UTM) is one which:
a) Starts with an‘encoding’ (η(M)w))
of aTM M and inputw for M on its tape.
b) Halts if and onlyM halts onw.
c)UTM accepts/rejectsη(M)w
⇔M accepts/rejectsw.
d) fUTM(η(M)w)) = y ⇔ fM ( w ) = y.
So if one can construct auniversal Turingmachine, this can be viewed as a ‘natural’counterpart to a ‘standard’ stored programcomputer.
The remainder of this section describes sucha construction.
364 Universal Turing Machines
Design of a Universal Turing Machine
In order to simplify the design it will behelpful to establish some simple technicallemmata.
These in turn will show that,
The alphabets can be restricted toΣ = 0, 1 andΓ = 0, 1, B
UTM can be described as amultiple tape
machine.
Using the second property, we can dedicateseparate tapes to holding
InputWorkspace
Outputetc
Universal Turing Machines 365
Using only a Binary Alphabet
Given any TM,M = ( Q, Σ, Γ, q0, B,δ , qA, qR )
there is mappingφ : Γ → 0, 1 * and aTMMb = (Qb, 0, 1, 0, 1,B, q0, B,δ b, qA, qR)such that
w ∈ L(M) ⇔ φ (w) ∈ L( Mb )
Proof: (Outline)SupposeΓ − B = γ1 , . . . ,γ k . Use theword 1 j 0k+1− j to code γ j e.g. ifΓ = 1, 2, 3, 4, B t hen 143 would be coded
10000 11110 11100.Each transition ofδ in M is simulated by asequence of transitions ofδ b in Mb whichwill always end with the tape-head ofMbpositioned over the start of some block ofk + 1 symbols. Any move of M just requireschanging the current block ofk + 1 scannedby Mb and moving k + 1 places left or right.Sincek is constant this is easily achieved.
366 Universal Turing Machines
Multiple Tracks and Multiple Tapes
In a multi-track TM the tape is consideredas divided into
k tracks,each track square recording one symbol.
The tape head reads and printsk-tuples.
x1 x2. . . xi
. . . xn B . . .
B B . . . B . . . B B . . .
B B . . . B . . . B B . . .
M
1 2 i n. . . . . .
Tr ack 1
Tr ack 2
Tr ack 3
Universal Turing Machines 367
The use of multiple-tracks is simply aprogramming ‘trick’
andnot a change in the definition ofTM wehave been using.
Since the number of tracks isfixed (k) allthat is being used is the obversation that:
If Γ is the alphabet for a 1-trackTM, M ,then there are
|Γ |k possibilitiesfor k-tuples fromΓ on ak-track tape.
i.e. ‘multi-track’ TMs simply use a largerfinite alphabet.
368 Universal Turing Machines
k-Tape Turing Machines
A k-tape TM, employs k ≥ 1 distinct tapeseach of which is scanned by a separate tapehead moving independently.
The combination determines the symbolwritten to each tape and the direction differ-ent heads move in.
Thusδ is nowδ : Q × Γk → Q × Γk × L, R k
The input is held on Tape 1, and to startev ery tape head is scanning its left-most tapecell.
[ A k-tapeTM shouldnot be thought of as a‘parallel’ computer model: its actions arestill controlled by a single ‘program’]
Universal Turing Machines 369
k-tapes vs. 1-tape
Theorem 25:
If L ⊆ Σ* is accepted by ak-tapeTM, Mk,then L is r.e, i.e. accepted by somesingletapeTM, M1.
If L ⊆ Σ* is recognised by a k-tape TM,then L is recursive, i.e recognised by somesingletapeTM,
Proof: (Outline)Given a k-tape,TM, Mk, its actions are sim-ulated by a 2k-track single tapeTM, M1.
For each tape,Tapei of Mk, the tape ofM1uses one track to record the contents ofTapei and one track to record the currentposition of the tape head forTapei .
A move is simulated byM1 scanning its tapeto find thek symbols currently scanned oneach tape; and then updating the track sym-bols and head positions.
370 Universal Turing Machines
. . . c11 c1
2 c13
. . . c1i−1 c1
i c1i+1
. . .
. . . c21 c2
2 c23
. . . c2i−1 c2
i c2i+1
. . .
. . . . . . . . .
. . . ck1 ck
2 ck3
. . . cki−1 ck
i cki+1
. . .
k-tape Turing machineMk
Head1 ∅ ∅ ∅ . . . ∅ ⊕ ∅ . . .
Tape1 c11 c1
2 c13
. . . c1i−1 c1
i c1i+1
. . .
Head2 ∅ ⊕ ∅ . . . ∅ ∅ ∅ . . .
Tape2 c21 c2
2 c23
. . . c2i−1 c2
i c2i+1
. . .
. . . . . . . . . . . . . . .
Headk ∅ ∅ ⊕ . . . ∅ ∅ ∅ . . .
Tapek ck1 ck
2 ck3
. . . cki−1 ck
i cki+1
. . .
1-tape Turing machineM1
Universal Turing Machines 371
We may now assume that anyTM, M , is( Q, 0, 1 , 0, 1,B , q1, B,δ , q2, q3 )
i.e. for any TM, M , we can construct anequivalent machine,M’ , which employs abinary alphabet, hasq1 as its start state,q2as itsacceptstate, andq3 as itsreject state.
A TM satisfying this is instandard formStandard= M : M is in standard form
Theorem 26: There is an ‘encoding’scheme,
η : Standard→ 0, 1 *
such thatLcode = η( M ) : M ∈ Standard
is recursive.i.e. we can build a TM, M , that halts andaccepts any input that defines the encodingof some TM in standard form; halts andrejects any input not corresponding to suchan encoding.
372 Universal Turing Machines
Proof: The key observation made is that theactions of M ∈Standard are completelydescribed by its transition function.
There is no needexplicitly to encode thefact that q1 is the start state andq2, q3 thehalt states: forM in standard form this isalwaysthe case.
Thus to describeM , all thatη( M ) must rep-resent is the set of moves
δ ( qi ,σ ) = ( q j ,γ , D ) D ∈ L, R, Since |Γ | = 3 there areexactly 3(|Q| − 2)such moves.(there are no transitionsfrom halt states).Let Γ = 0, 1, B = γ1,γ2,γ3 Each move δ ( qh,γ i ) = ( q j ,γ k, D ) isencoded as abinary word
moveh,i =
0h 1 0i 1 0j 1 0k 0 if D = L
0h 1 0i 1 0j 1 0k 00 if D = R(m1)
Finally,η( M ) is giv en by111move1,111move1,211. . . . 11move|Q|,211move|Q|,3111
Universal Turing Machines 373
To see that the languageLcode is recursive, itsuffices to observe that given w ∈ 0, 1 *
for w to be inLcode all of the following mustbe true:
a) w beings and ends with the word 111.b) w consists of words
moveh,iseparated by 11.c) There are
exactly 3wordsmoveh,? for anyqh.
Given w ∈ 0, 1 * that satisfies (a),(b) and(c) it is easy to check that eachmoveh,i sub-word is valid, i.e. that it observes the form of(m1), with
1≤ i ≤ 3 (γ i ∈ 0, 1, B )
1≤ k ≤ 2 (γ k ∈ 0, 1 )
moveh,i ends with0 or 00
374 Universal Turing Machines
Theorem 27:There is a Turing machine,UM , which givenu ∈ 0, 1 * acts as follows:
U1) If u = η(M) x for someη( M ) ∈ Lcode,thenUM simulatesM on inputx, i.e.
UM halts and accepts (rejects)η(M)x⇔
M halts and accepts (rejects)x
UM fails to halt onη(M)x⇔
M fails to halt onx
U2) If u = η( M )x for any η(M) ∈ Lcode,UM does nothing.
Universal Turing Machines 375
Proof: (Outline)UM is formed as a 3-tape machine:
Tape 1 holds the input wordu ∈ 0, 1 * .Tape 2 will hold the contents ofMs tape(assumingu = η(M)x)Tape 3 holds the word in 0i 1 * to indicateM (at this stage) is in stateqi .
To start UM checks ifu = η(M)x, and if so,copies x to Tape 2 and writes 0 to Tape 3(start stateq1 of M).The tape head on Tape 2 is set to location 1.
For each move of M , UM looks up the cur-rent state and symbol scanned on Tape 2 inthe encodingη(M). Using these data,UM ,can update Tape 2 and the recorded state.
If the state recorded on Tape 3 is 00 (accept)or 000 (reject), thenUM halts and accepts(00), resp. halts and rejects.
376 Universal Turing Machines
COMP209
Automata and Formal Languages
Section 12
Undecidable Languages
Undecidable Languages 377
Introduction
Every language that we have seen on themodule so far is
Recursively Enumerable
In fact, they hav eall been
Recursive
In a ‘practical’ computing context, thisequates to these languages having ‘effective’decision algorithms, i.e. we can
‘write a program’in, sayJava, that given a binary word, w, asits input,
a)Always comes to halt.b) Returns the resultacceptif w is in Lc) Returns the resultreject if w is not inL
for any of the specificL seen so far.
378 Undecidable Languages
Albeit that only avery small number of dif-ferent languages have been viewed, onemight conjecture from these examples that:
a) All languagesL ⊆ 0, 1 * arerecursive, (decidable)
or failing this,
b) All languagesL ⊆ 0, 1 * arer.e. (semi-decidable)
or failing this,
c) All‘interesting’ languages
are decidable.
or failing this,
d) All‘interesting’ l anguages
are semi-decidable.
Undecidable Languages 379
Leaving aside the (arguably subjective)notion of what constitutes an ‘interesting’language, what can be said about the firsttwo, i.e.
Is it the case that all languages aredecidable?
or(at least) semi-decidable?
Before addressing these, it worth noting thatthere aretwo issues raised in attempting toprove that neither is true.
1) By anexistenceproof:i.e. giving an ‘indirect ’ argument that theremust be some languages that are not r.e.(semi-decidable)
2) By a proof thatan explictly definedlanguage
is not decidable.
380 Undecidable Languages
Explicitly Defined Languages
SupposeL ⊆ 0, 1 * is suspectedto be anon r.e. language, i.e. notsemi-decidable.
L must containinfinitely many words.
How can we ‘describe’L?
We cannot ‘list’ all of the words inL.We cannot give a grammar forL.We cannot give a program (e.g.TM) for L.
(because either of the last two would implyL is semi-decidable)
Informally, L is an explicitly defined lan-guage if there is a
finite descriptionthat characterises which words belong toL.
In trying to find explicitly defined non r.e.languages, such descriptions must be ‘adhoc’, i.e. non-computational.
Undecidable Languages 381
Some Examples
2 ‘non-computational’ definitions we haveseen already are:
1 k : k is a prime number 1 k 0k2
: k ≥ 1
Such descriptions do notexplain
given 1k, how to show if k is prime.given 1i0 j , how to show if j = i2.
Each can be described by aTM and by aformal grammar.
A finite description of a languageL maygive no indication of how to construct adecision algorithm for L.
[ Regarding ‘interesting’ l anguages, thatthese be‘explicitly defined’ is a minimal cri-terion.]
382 Undecidable Languages
Using Closure Properties
The existence ofCFLs which are not deter-ministic, was proved by showing thatDCFLs were closed undercomplementation
Since it was known thatCFLs (in general)did not have this property, an explicit con-struction ofL ∈CFL − DCFL would followfrom a proof that
L ∈CFL but Co− (L) ∈CFL.
Can we use a ‘similar’ indirect method toassist in finding explicitly definednon-r.e.languages?
Theorem 28:(Closure Properties ofRecursiveLanguages)If L1, L2 are recursive (i.e. decidable) thenso are:
L1 ∪ L2 ; L1 ∩ L2 ; Co− ( L1 ).Proof: Easy exercise.
Undecidable Languages 383
On the other hand,
Theorem 29:(Closure Properties ofr.eLanguages)If L1, L2 are r.e. (i.e. semi-decidable) thenso are:
L1 ∪ L2 ; L1 ∩ L2If L is r.e.but not recursive then
Co− (L) is not r.e. (semi-decidable).
Proof: Let M1, M2 acceptL1, L2.M∪ acceptsw ∈ L1 ∪ L2 by alternating asimulation ofone move of M1 on w withone move of M2 on w. If either simulationreaches the accept state,M∪ halts andaccepts.
L1 ∩ L2: exercise.
384 Undecidable Languages
The r.e. languages, however, are not closedundercomplement
SupposeL is r.e. but not recursive.
If Co− ( L ) were r.e. there would beTMsML acceptingL
MCL acceptingCo− (L).Consider theTM, M , that on inputw alter-nates a simulation ofML on w with a simu-lation of MCL on w.
Certainlyexactly one of these must reach itsaccept state:
if ML acceptsw, thenM halts and accepts;if MCL acceptsw, thenM halts and rejects.
This implies that,L is recursive: a contra-diction, whenceCo− (L) cannot be r.e.
Undecidable Languages 385
It follows from Theorem 29, that anexplictly defined language which is
not semi-decidablecan be constructed from an explicitlydefined language,L, if it can be proved that:
L is semi-decidableand
L is not decidable.
Since for such languagesL the language w : w ∈ L
is not semi-decidable.
386 Undecidable Languages
The Halting Problem(for Turing Machines)
One aspect of the encoding functionη( M )for Turing machines in standard form, is thatit provides a means by which
computational questionsconcerning the behaviour of
specific Turing machine programscan be formulated.
ExamplesThe question ‘Does M acceptε’ , is equiv-alent todeciding
η( M ) ∈? η( M ) : ε ∈ L( M ) The question
‘Does M make at least n2 moves onsomew of length n, for every n’?
is equivalent to deciding
η( M ) ∈?
η( M ) :∀ n ∃ w ∈ 0, 1 n on which
M makes≥ n2 moves
Undecidable Languages 387
Notice that questions of this form
‘Does this Turing machine belong toa particular class of Turing machines’
and the existence or otherwise of decisionalgorithms that treat them, are of muchgreater significance than merely technicalquestions aboutoneformalism.
Suppose we take any high-level program-ming language and appropriate abstract plat-form upon which programs may beexecuted.We may equally phrase such questions as,
‘Does thisprogram havea particular behaviour’
e.g.‘Does this program compute a specific function’
‘Does this program always terminate’‘Does this program behave identically to another’
etc etc
388 Undecidable Languages
We shall return to the implications of thisinterpretation later.
We now consider a particular property ofTuring machines and its associated decision(language membership) problem.
This is known as,The Halting Problem
and is the language (over 0, 1 * ),
LHP =
u :
u = η( M )w with M ∈Standard,
and
M on inputw halts
Thus,LHP comprises the set ofpairs ofTM programs (M) and Inputs (w)
for which a positive answer would bereturned to the question,
‘Does this Turing machine, M, eventuallyreach one of the halting states(qA or qR)when given the input w’?
Undecidable Languages 389
Theorem 30: LHP is undecidable.i.e. there isno TM, M , such that
∀ w ∈ LHP, M halts and accepts.
and∀ w ∈ LHP, M halts and rejects.
(HP)
Proof: Suppose, by way of contradiction,thatthere is aTM, MHP, such that
∀ w ∈ LHP, MHP halts and accepts.
and∀ w ∈ LHP, MHP halts and rejects.
Without loss of generality, let MHP be instandard form.
Now consider a TM, MNOT−HP whosebehaviour is the following:
390 Undecidable Languages
a) MNOT−HP checks if its inputu is in Lcode,i.e. takes the formη( M ) -(an encoding of aTM, M)
b) If u ∈ Lcode then
MNOT−HP goes into an infinite loop.
otherwise (//u = η(M) ∈ Lcode)
MNOT−HP simulates the actions ofMHP on input η( M )η( M )
c) If MHP would acceptη(M)η(M)
MNOT−HP goes into an infinite loop.
If MHP would reject η(M)η(M)
MNOT−HP halts and accepts
Undecidable Languages 391
Notice that the construction ofMNOT−HPassumesonly the existence of aTM, MHP,deciding the Halting Problem languageLHP.
MHP is used as a ‘sub-procedure’ byMNOT−HP in order to determine if
theprogram, (M),(given in the encoded formη(M)),
halts whengiven its own description, η(M), as input
ThusMNOT−HP determines the answer to thequestion,
‘Does this program halt when given its owndescription as input?
If the answer is ‘Yes’:MNOT−HP enters an infinite loop;
If the answer is ‘No’:MNOT−HP halts and accepts;
392 Undecidable Languages
[Note: there is nothing ‘exceptional’ asregards a program,P, being given its owndescription as input.This is common in ‘practical’ computing,A Pascal compiler may be written in Pascal;An editor may edit its source code, etc]
QuestionWhat is the input forMNOT−HP?
AnswerA TM program description -η( M ).
QuestionWhat isMNOT−HP?
AnswerA TM (in standard form).
∴ MNOT−HP has an encodingη( MNOT−HP ) ∈ Lcode
and this code can be givento MNOT−HP itself
as a possible input.
Undecidable Languages 393
What would happen in this case?i.e what happens whenMNOT−HP is given
its own description, η( MNOT−HP ),as input?
The description of MNOT−HP’s operationshow that there are only 2 possibilities:
Either
MNOT−HP enters aninfinite loop onη( MNOT−HP )
or
MNOT−HP halts and acceptsη( MNOT−HP )
394 Undecidable Languages
Suppose
MNOT−HP enters aninfinite loop onη( MNOT−HP )
This means that, at Step (c) the simulation ofMHP indicates
η( MNOT−HP )η( MNOT−HP )would be accepted (byMHP).
MHP, howev er, is assumed to decide theHalting Problem Language, LHP, and so,
η( MNOT−HP )η( MNOT−HP ) ∈ LHP,i.e.
TheTM MNOT−HPhalts
on the inputη( MNOT−HP ).
This contradicts the premise,MNOT−HP enters aninfinite loop onη( MNOT−HP )
Undecidable Languages 395
On the other hand, suppose
MNOT−HP halts and acceptsη( MNOT−HP )
This means that, at Step (c) the simulation ofMHP indicates
η( MNOT−HP )η( MNOT−HP )would be rejected (byMHP).
Hence,η( MNOT−HP )η( MNOT−HP ) ∈ LHP,
i.e.TheTM MNOT−HP
does not halt(i.e. enters an infinite loop)on the inputη( MNOT−HP ).
Again this contradicts the premise,MNOT−HP halts and acceptsη( MNOT−HP )
396 Undecidable Languages
In summary, the definition of the TMMNOT−HP, is such that there is
no consistent outcome whenits input is η( MNOT−HP )
From the assumption thatMHP recognisesLHP, and the design ofMNOT−HP we see
MNOT−HP halts onη( MNOT−HP )⇔
MNOT−HP does not haltonη( MNOT−HP ).
So we deduce that,MHP cannotbe constructed.
i.e LHP is undecidable.
Undecidable Languages 397
The Halting Problem Language,LHP is,however,
semi-decidable(recursively enumerable)
i.e. There is aTM, M (a)HP, that halts and
accepts inputs η( M ) w whenever the TM,M halts onw; M (a)
HP may, howev er, fail tohalt whenM does not halt onw.
[All that M (a)HP does, having checked that its
input is of the formη( M )w, is to simulatethe machineM on inputw: if the simulationhalts thenM (a)
HP halts and accepts.]
This gives the following,
Corollary : The languageCo− ( LHP ), i.e.
u :
u = η( M )w with M ∈Standard,
and
M on inputw doesnot halt
is not semi-decidable(i.e. not r.e.)
398 Undecidable Languages
Discussion
The exact result established in Theorem 30may be, informally, phrased as:
‘ It is not possible to construct a Turingmachine (program), that can distinguishTuring machine (programs) that halt on agiven input, from those that fail to halt on agiven input.’
On the surface, this merely seems to be aneclectic technical detail regarding the scopeand power of Turing machine programs.
The Church-Turing Hypothesis implies thatthis result has more far-reaching conse-quences.
Undecidable Languages 399
Tw o Consequences of Theorem 30from the Church-Turing Hypothesis
‘ It is not possible to construct
any ‘effective’ algorithm,
that can distinguish TMs that halt on a giveninput, from those that fail to halt on a giveninput.’
If S is any ‘ reasonable’ model of computa-tion that is ‘at least as powerful ’ as TuringMachines, then
‘It is not possible to construct
any ‘effective’ algorithm,
that can distinguish programs in the systemS, that halt on a given input, from those thatfail to halt on a given input.’
400 Undecidable Languages
In other words,not only is‘the Halting Problemfor Turing Machines’
impossible to solve by using ‘effective’algorithms,but also
any analogous ‘Halting Problem’onany general model of computation.
For example,For any sufficiently general high-level pro-gramming language, (Java, Ada, etc) therewill never be algorithms that can distin-guish betweenprograms that halt on giveninput data and those which fail to halt.
So, for example, it is not possible to ‘embed’within a Compiler, a test if a source programand data might loop indefinitely.
Undecidable Languages 401
Summary
In proving ‘undecidability’ properties of lan-guages, our concern is to argue that
no ‘effective’ algorithmic method exists.
Turing Machines (by virtue of the Church-Turing Hypothesis) provide onemodel uponwhich to build such arguments.
The Church-Turing Hypothesis contends,
If there is no Turing machine deciding(semi-deciding) some languageLthen there is
no ‘effective’ algorithm whatsoeverfor deciding (semi-deciding)L.
i.e. The critical point isnot merelythe impossibility of aTM program
butthe impossibility ofany program
(i.e algorithm)
402 Undecidable Languages
Undecidable Languages ‘related’to Halting Problems
The construction used in the proof of Theo-rem 30, relies on treating the encoding of aparticularTM (MNOT−HP) as both a
programand an
input word for a program
This was all right since the general HaltingProblem, i.e.any TM with any input word,was being examined.
From what has been presented, however, itmight be argued that:
‘Even though, in general, there is no methodthat ‘works’ for ev ery combination of pro-gram and input, it might be possible todesign algorithms that work forev ery pro-gram provided that only certain inputwords to these are tested.’
Undecidable Languages 403
For example, in the context of trying toembed a ‘halting test’ into a high-level lan-guage compiler, Theorem 30, leaves openwhether deciding halting on, e.g. emptyinput data, is possible.
In fact, even such ‘simplified’ Halting Prob-lems are undecidable.
For w ∈ 0, 1 * the languageLwHP is
LwHP =
u :
u = η( M ) with M ∈Standard,
and
M on inputw halts
Theorem 31:∀ w ∈ 0, 1 * , Lw
HP is undecidable.
404 Undecidable Languages
Proof: Suppose thatMwHP recognisesLw
HP,i.e.
∀ u ∈ LwHP, Mw
HP halts and accepts.
and∀ u ∈ Lw
HP, MwHP halts and rejects.
MwHP can be used to decide thegeneral
Halting Problem,LHP as follows:
Build a TM, Reduce, which given an inputof the formη( M ) x proceeds by construct-ing a newTM, M’ usingη( M ) and x.
The machine M’ simply simulates theactions ofM on its inputx.
Having constructedM’ , Reducethen tests ifη(M’ ) ∈ Lw
HP, using MwHP.
Notice that M’ ignores any word on itsinput.
Undecidable Languages 405
Reduce:halts and acceptsη(M) x
if MwHP halts and acceptsη( M’ );
halts and rejectsη(M) xif Mw
HP halts and rejectsη( M’ );
Exactly one of these must occur, i.e. MwHP,
always reaches some halting state.
But,Mw
HP halts and acceptsη( M’ )⇔
M’ halts (on inputw)⇔
M halts on inputx
SimilarlyMw
HP halts and rejectη( M’ )⇔
M’ does not halt (on inputw)⇔
M does not halt on inputx
406 Undecidable Languages
Thus if Reducecould be built thenLHP would be decidable.
From Theorem 30,LHP is undecidable,
So theTM Reducecannot be constructed.
Since the only assumption made in its defi-nition is that aTM recognisingLw
HP exists,we conclude thatLw
HP is undecidable.
In addition to the result that ‘restricted Halt-ing Problems’ of this type remain undecid-able, there is one important feature of theproof that should be noted:
Use ofreduction: i.e. defining an algorithmfor one problem (LHP) using an algorithmfor another (Lw
HP).
Informally, it provides a means for demon-strating a property of a languageL’ by relat-ing it to another languageL.
Undecidable Languages 407
Deciding Properties of LanguagesRice’s Theorem
An important consequence of the existenceof a universal Turing machine is that wecan define a
universal language,Luniv ⊂ 0, 1 * , i.e.Luniv = u : u = η(M)x, η(M) ∈ Lcode, x ∈ L( M )
This language isrecursively enumerable(semi-decidable)
but it is not recursive (decidable).
A further important consequence of theencoding mechanism is that it provides acomputational process for examiningprop-erties of r.e. languages, i.e.
L ⊆ 0, 1 * is r.e.⇔ ∃ M with L( M ) = L
⇔∀ x ∈ L η( M )x ∈ Luniv
∀ x ∈ L η( M )x ∈ Luniv
408 Undecidable Languages
Thus identifyingr.e. languages
with theencoding
of any TM accepting them, provides a meansfor examining questions such as,
Is ε ∈ L?Is L = ∅
Is L recursive, context-free, regular ?etc
via thelanguage of related TM encodings:
So, if M is such thatL( M ) = L, the ques-tions above correspond to,
η( M ) ∈? η(M) : ε ∈ L( M )
η( M ) ∈? η(M) : L( M ) = ∅
η( M ) ∈? η(M) : L( M ) is recursive
Undecidable Languages 409
Properties and Familiesof Languages
Recall that afamily of languages, is just asubset, ℜ, of the set of all possible lan-guages over some alphabet,Σ.
Thus, the question,‘Does the language L havea particular propertyΠ?’
is equivalent to,‘ Is L a member of thefamily , ℜΠof languages having propertyΠ?’
We now examine the issue of whichproper-ties of the r.e. languages is it possible todevelop decision algorithms for.
i.e. Given some r.e. languageL;for whichproperties, Π, can one decide if
L ∈Π?
410 Undecidable Languages
Some Example Properties
We useΣ = 0, 1 .
The property ofL beingempty is the prop-erty, Π∅, containingexactly one language,i.e.
Π∅ = ∅
The property of L containing theemptyword is
Πε = L ⊆ 0, 1 * : ε ∈ L
The property ofL being aunary languageis,Πunary = L ⊆ 0, 1 * : L ⊆ 1 * or L⊆0 *
Undecidable Languages 411
Very Important
The propertyΠ∅, - theempty language- isnot the same as
the empty propertyThe family ℜ∅ corresponding to the formercontains
exactly one language(i.e. the empty language∅)
The family corresponding to the latter con-tains
no languages at all.
412 Undecidable Languages
Of course the phrasing,
Given some r.e. languageL and propertyΠ;decide if L ∈Π? is not usable in the senseof formulating a decision question.
2 issues have to be addressed:
a) How is a language,L, to be presented?b) How is a ‘property’Π to be viewed?
Without loss of generality, it suffices to con-sider languages over 0, 1 .
Furthermore, since afinite description of alanguage has to be given, we can only con-sider
properties ofrecursively enumerablelanguages
[ a general algorithm testing ifL ∈Π mustbe able to treat descriptions of differentL insome ‘uniform’ manner: this rules out tryingto interpret ‘ad hoc’ natural language defini-tions.]
Undecidable Languages 413
Concentrating on properties of r.e. languagesnow giv es a solution to (a) and (b) above.
a) L is r.e. if and only if there is aTM, M , instandard form acceptingL.
∴ The questionL ∈Π, can be viewed as adecision problemconcerning
Turing machine (encodings) -η( M ).
Similarly, aproperty , Π, is asubsetof (r.e.)languages, and soΠ can be interpreted as
asetof Turing machine (encodings).
So the question,‘ Is L ∈Π’
can be addressed by defining,LΠ = η( M ) : L( M ) ∈ Π
and then, given ML with L(ML) = L, decid-ing L ∈Π is
equivalent to decidingη( ML ) ∈? LΠ
414 Undecidable Languages
Examples
For Π∅ being the property‘L is the empty language’
LΠ∅= η( M ) : L( M ) = ∅
(i.e. for all inputs x, M never reaches itsaccept state onx).
For theempty propertyLΠ = ∅
(i.e. no r.e. language (TM) would beaccepted.)
For the propertyΠR.E of L being r.e.LΠR.E.
= η( M ) : L(M) is r.e. i.e.ev ery r.e. language (TM) is accepted.)
Undecidable Languages 415
The ‘Trivial’ Properties
The 2 properties,The Empty Property
The Property of being r.ewill be denoted subsequently by
ℵ andR. EThus,
ℵ = ∅R. E = L : L is r.e.
Both of the correspondinglanguages
L ℵ = ∅LR.E. = η( M ) : L(M) is r.e.
aredecidable(recursive).
For Lℵ, a TM, simply enters its reject stateonev ery input.For LR.E, a TM simply checks if its input isa TM encoding and, if so, enters its acceptstate (rejecting otherwise). Hence
LR.E. ≡ Lcode.
416 Undecidable Languages
These properties -ℵ andR. E. - are called
Tr ivial Properties
The reason being that:
In the case ofℵ:No r.e. language has it.
In the case ofR. E.:Every r.e. language has it.
Thus, the decision processesη( M ) ∈ Lℵ andη( M ) ∈ LR.E.
are ‘trivial’.
Again we emphasise that theThe EmptyProperty (ℵ)
andThe Property of being anEmptyLanguage(Π∅)
are different .
Undecidable Languages 417
Notice that the propertyΠ∅ of a languagebeing empty has been considered earlier inthe context of a
language descriptionbeing supplied in the form of aDFA orCFG.
In both cases there were easy decision meth-ods available.
When L is presented as aTM encodingη( M ), the question
‘Does M accept any words?’(i.e. ‘Is η( M ) ∈ LΠ∅
?’)is not straightforward .
As a simple illustrative exercise consider thefollowing r.e. languages:
Fermat= 1n : n > 2 and ∃ x, y, z with zn = xn + yn
G’bach= 12n : n > 1 and ∀ primesp, q, p + q = 2n
(G’bachis actuallyrecursive)
What would a decision method forLΠ∅
imply in these cases?
418 Undecidable Languages
We now hav e a complete framework withwhich to consider the question:
‘Which properties,Π, of r.e. languagesaredecidable?’
That is, forΠ a set of r.e. languages, when isit the case that
LΠ = η( M ) : L(M) ∈Π is decidable (recursive)?
The final result that will be proved in thismodule is
Rice’s Theorem
LΠ is decidable⇔
Π is a Trivial Property.
Thus, onlyLℵ andLR.E are decidable.
Undecidable Languages 419
Proof of Rice’s Theorem
It has already been shown that ifΠ is trivialthenLΠ is decidable.To complete the proof it must be shown thatIf LΠ is decidablethen Π = ℵ or Π = R. E..
Suppose the contrary, i.e.
There is some property,Π, such that
Π is not empty, (i.e.ℵ) ; Π ⊂ R. E
LΠ is decidable
From the fact thatΠ = ∅, it follows thatΠ containsat least oner.e. language.
From the fact thatΠ = R. E, it follows thatThere isat least oner.e. languagenot in Π
Since LΠ is decidable, there is aTM, MΠ,such that
∀ η( M ) L( M ) ∈Π MΠ halts and acceptsη( M )
∀ η( M ) L( M ) ∈Π MΠ halts and rejectsη( M )
420 Undecidable Languages
Using theTM, MΠ, we show that theuni-versal language,
Luniv = u : u = η(M)x, x ∈ L( M )
is decidable.
Since we know that Luniv is not decidable,this would imply the contradiction needed tocomplete the proof.
First observe we may assume thatthe Empty Language
is not one of the r.e. languages inΠ:
[ If ∅ ∈Π, then we can use the propertyR. E − Π, i.e the property
‘ L is r.e. andL ∈Π’This property is not trivial, does not containthe empty language, and is decidable ifLΠis decidable.]
Let L be any r.e. language inΠ.(thus,L = ∅).
Undecidable Languages 421
In summary, we hav eso far:
MΠ aTM recognisingLΠ.(by the assumptions made earlier).
L ∈Π a (non-empty) r.e. language inΠ;
ML aTM acceptingL (sinceL is r.e.)
We now show how to combine:ML andMΠ
in order to build aTM, Mu, that:
On inputy = η( M ) x:Halts and acceptsif x ∈ L( M )Halts and rejectsif x ∈ L( M )
i.e. Mu would prove that Luniv is decidable.
422 Undecidable Languages
The TM, Mu, behaves as follows givenη( M )x:
1) Mu usesη( M ) x to compile thedescription of aTM, Check( M , x )that acts as follows on inputw ∈ 0, 1 * :
2) Check( M , x ):a) SimulatesM on inputx;b) If x ∈ L(M) then
Check(M , x) enters an infinite loop.c) If x ∈ L(M) (M halts and acceptsx) then
Check(M , x) simulatesML on inputw;Check(M , x) halts and acceptswonly if ML halts and acceptsw.
3) Having constructedCheck( M , x ),Mu then simulatesMΠ with inputη( Check( M , x ) );If MΠ halts and accepts then
Mu halts and acceptsη(M)xIf MΠ halts and rejects then
Mu halts and rejectsη(M)x
Undecidable Languages 423
YESNO
ML
YES
LOOPNO
M
x
w
The MachineCheck( M , x )
What do we know about the TM,Check( M , x )?
Suppose thatx is not accepted byM :thenCheck( M , x ) accepts
no words at all,(its inputw is never read).
∴ x ∈ L( M ) ⇒ L( Check( M , x ) ) = ∅If x is accepted byM thenCheck( M , x ) accepts
exactly the same language asML .∴ x ∈ L( M ) ⇒ L( Check( M , x ) ) = L( ML )
424 Undecidable Languages
EitherL( Check(M , x) ) = ∅
(whenx ∈ L(M))or
L( Check(M , x) ) = L( ML ) = L(whenx ∈ L( M ))
SinceL ∈Π and ∅ ∈Π
we see that theTM, Mu
halts and acceptsη(M)x if η(Check(M , x)) ∈ LΠ
if x ∈ L( M )
halts and rejectsη(M)x if η(Check(M , x)) ∈ LΠ
if x ∈ L( M )
so if LΠ were decidable, thenMu decidesLuniv.
This contradiction, establishes thatLΠ is notdecidable.