COMP209 Automata and Formal Languages Section 1 Introductioncgi.csc.liv.ac.uk/~ped/teachadmin/COMP209/Automata-and-Comput… · COMP209 Automata and Formal Languages Section 1 Introduction

COMP209

Automata and Formal Languages

Section 1

Introduction

2 Introduction

Consider the picture below:

yk = f (x1x2. . . xk)

. . .y2 = f (x1x2)

y1 = f (x1)

. . . yk. . . y2y1

. . . xk. . . x2x1

• x1, x2, . . .: (finite) input stream (of sym-bols);

• A ‘ black-box’ reads these and computes anoutputyi = f (x1x2

. . . xi−1xi )

Informally, one concern of Automata Theoryis

What ‘happens’ inside the ‘black-box’?

e.g. how can the following input-outputneeds be realised?

Introduction 3

Some Simple Example Cases

Suppose the inputx symbols are single deci-mal digits, i.e.

xi ∈ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

a) The outputyk is 1 if x1x2. . . xk−1xk is a

k-bit binary string; otherwise it is 0.

If the x symbols are from the set of binaryvalues 0, 1 :

b) The outputyk is 1 if more than 3 symbolsare read.

c) The outputyk is 1 if an odd number ofsymbols have been read.

4 Introduction

Some ‘more difficult’ Examples

Again using 0, 1 as the input possibilities:

d) yk is 1 if the sequencex1x2. . . xk is a

palindrome, i.e.

x1x2. . . xk−2xk−1xk = xk xk−1xk−2

. . . x2x1

e) yk is 1 if x1. . . xk has exactly the same

number of 0s as it has 1s.

Using (, ) (left and right brackets) as pos-sible inputs:

f) yk is 1 if the sequencex1x2. . . xk is a

properly matched sequence of left and rightbrackets, e.g.

() or ()() or (()()) or (()(()())) etc

but not, e.g.( or ) or )( or (() or ()()) etc.

Introduction 5

and ‘more difficult still’ Examples

Input symbols 0, 1 :

g) yk is 1 if k symbols are read andk is aprime number.

h) yk is 1 if

k = i + j and

x1x2. . . xi = 111. . .111

xi+1xi+2. . . xk = 000. . .000

(i.e. i 1s followed byj 0s)

and j = i2.

6 Introduction

We shall see that there is a very precisesense in which

(d),(e) and (f)

are ‘more difficult’ than

(a), (b), and (c).

There is a similar precise sense in which

(g) and (h)

are ‘more difficult’ than

(d), (e), and (f)

One aspect of Automata and Formal Lan-guage Theory involves formalising theseideas.

Introduction 7

Below is a possible view of how the first‘simple example’ could be treated:

St0: Read the first symbol (if present) (x1);If x1 is a 0 or a 1 then

output a 1 (i.e. y1 = 1)go to St1.

Otherwiseoutput a 0 (i.e. y1 = 0)go to St2.

St1: Read the next symbol (if present);If it’s a 0 or a 1 thenoutput a 1go to St1.

Otherwiseoutput a 0go to St2.

St2: Read the next symbol (if present);Output a 0;go to St2.

An ‘obvious’ weakness of this description isits verbosity.

But consider,

8 Introduction

0,1,2,...,7,8,9/0

2,3,4,...,7,8,9/02,3,4,...,7,8,9/0

0,1/10,1/1

St0 St1

St2

A Finite StateTr ansducerFor Example (a)

Introduction 9

Instead of the ‘verbose’ description adirected ‘graph’ model is used.

• Vertices are labelled with ‘State’ names,St0,St1,St2.

• There is an (unlabelled) edge to indicatethe ‘starting point’

• Other edges have a label of the form in / out

a) in a subset of possible input symbols.b) out a single output symbol.

In total, an edgefrom Sti to Stj with label Xin / yout can be viewed as saying:

If the ‘program’ has reached ‘Step’ (i.e.‘State’) i and the next input symbol is in thesetXin then

Output the symbolyout;and go to ‘Step’ (State)j .

10 Introduction

We only have 0 and 1 as outputs, so this canbe further simplified

0,10,1

St0 St1

St2

2,3,4,5,...,8,9 2,3,4,5,...,8,9

0,1,2,3,...,8,9

A Finite StateRecogniserFor Example (a)

Introduction 11

The ‘recogniser’ has exactly the same struc-ture as the ‘transducer’.

The difference is that instead ofexplicitlyindicating the output as part of the edgelabel this isimplied by distingushing ‘spe-cial’ states. Namely,

States for which a 1 would be output whenthey are entered. St1 is such a case.

This modification will be useful in present-ing several formal results.

12 Introduction

The properties of suchFinite State Automataare examined in the first part of this module.

As one answer to ‘what happens in theblack-box’ these offer a rich set of ideas andapplications.

Among these applications:

• Lexical analysis in compilers: determiningidentifiers, numeric constants whenanalysing a program statement.

• Describing hardware systems.

In the latter context, all hardware systems(µprocessors etc) can, ultimately, bedescribed as compositions of finite statetransducers with output symbols 0, 1.

Design of a finite state machine is a standardapproach when developing a digital system.

Introduction 13

As an illustration of this, what do you thinkthe following machine does?

Its possible input ‘symbols’ are 00, 01, 10, 11 ;Its possible output symbols are 0, 1 .

q1q0

00/1

11/011/1

01,10/001,10/1

00/0

14 Introduction

Hint: Think of the input ‘symbols’ as pairs< xi yi >. Whatis the output streamz if the‘black-box’ below

xn. . xi . . x1

yn. . yi . . y1

zn. . zi . . z1zn. . zi . . z1

=

f (x1. . xn, y1. . yn)

uses the transducer,

q1q0

00/1

11/011/1

01,10/001,10/1

00/0

Introduction 15

Alphabets, Words, Languages

The informal descriptions used notions of

‘input’ and‘output’ ‘symbol’.

The ‘black-box’ view is in terms of

‘mapping’from ‘sequences’ of input symbolsto (‘sequences’ of) output symbols

These are rather vague and imprecise: e.g.the input example 00,01, 10, 11.

We now present the formal frameworkwithin which subsequent ideas will be set.

16 Introduction

Alphabets

An alphabetis a (finite)setof symbols,

Σ = σ1,σ2 , . . . ,σ k

Σ denotes an arbitrary alphabet;

σ an arbitrarysymbolin Σ.

Examples

The alphabet,Decimal of decimal digits:Decimal= 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

The alphabet,Binary ,Binary = 0, 1

The alphabetsRomanandGreek

Roman= A, B,C, . . ,X,Y, Z, a, b, c, . . ,x, y, z

Greek= Α, Β, Γ, . . ,Χ, Ψ, Ω,α , β ,γ , . . , χ ,ψ ,ω

Introduction 17

Words over A lphabets

A word , w, over an alphabet, Σ, is a finitesequence of symbolsfrom Σ.

Note the following points

A word is asequencenot aset.

Theorder of symbols is important.

Examples

Decimal words: 3, 10, 112289, 982211

Binary words: 0, 1, 10, 01, 100010.

Roman words: Java, ROMAN, word,SeQuEncE, Verbum.

Greek words: λογ ος , αγ α πη, θ εος ,θ εω ρια , θη ριο ν .

18 Introduction

Properties of and Operations on Words

Length of Words

The length of a word w is the number ofsymbols inw.

|w| denotes this value.

The word which has length 0 is called theempty word.

This will alwaysbe denoted byε

Examples

Decimal: |3| = 1; |10|= 2; |112289|= |982211|= 6

Binary : |0| = |1|= 1; |10|= |01|= 2; |100010|= 6

Roman: |Java| = 4; |ROMAN| = 5; |SeQuEncE| = 8

Greek: |λογ ος | = |αγ α πη| = 5; |θη ριο ν | = 6

Introduction 19

Concatenation

If u and v are words over Σ, the word wformed by concatenating u with v is theword whose sequence of symbols isu v.

The length ofw is |u| + |v|.

If eitheru or v are the empty word,ε, thenu ε = u; ε v = v; ε ε = ε .

We will, on occasion, use the notationu ⋅ v

to indicate concatenation of wordu andv.wk = w⋅w⋅w⋅ . . . ⋅w

(i.e. k concatenations ofw, wherek ≥ 0)

Examples

Decimal: 3⋅10 = 310; 22⋅33⋅22⋅ε = 223322

Binary : 1⋅0 = 10; 0⋅0⋅ε ⋅0⋅1 = 00⋅ε ⋅01 = 0001

Roman: W⋅o⋅r ⋅d = Word;

Greek: δ ια ⋅δ η µατ α = δ ια δ η µα τ α

20 Introduction

Languages of Words over A lphabets

The concept of alanguageis of great signif-icance in Computer Science.

In contrast to the other terms introducedabove, ‘ languages’ of interest are usuallyinfinite objects.

A language, L, over an alphabetΣ is asub-setof the set of all possible words over Σ.

It is convenient to have a shorthand for‘set of all possible words overΣ’

Since we consider onlyfinite length words,this set comprises:

0: all words over Σ of length 0, i.e.ε , and1: all words over Σ of length 1,and2: all words over Σ of length 2,and

. . .k: all words over Σ of lengthk, and

. . .

Introduction 21

The notationΣk is used for

all words overΣ of length k

So that the set we are describing is

∞

k=0∪ Σk

for which the shorthand,Σ* is employed.

It is sometimes convenient to consider theset of allnon-empty words over Σ, and forthis purpose, the notationΣ+ is used.

Examples

L(a) ⊂ Decimal* = w : w∈0, 1 +

L(b) ⊂ Binary* = w : |w| > 3

L(c) ⊂ Binary* = w : ∃ m ≥ 0 s. t. |w| = 2m + 1

L(d) ⊂ Binary* = w : w = Reverse(w)

L(h) ⊂ Binary* = w: w = 1i0 j and j = i2

22 Introduction

Operations on Sets/Languages

One of the issues of interest is the propertiesof languages formed by applying certainoperations to one or more languages.

SupposeL andM are languages over Σ.

The ‘basic’ operations are:

• Union (∪):L ∪ M = w ∈Σ* : w ∈ L or w ∈ M

• Intersection (∩):L ∩ M = w ∈Σ* : w ∈ L and w ∈ M

• Complement(Co−):Co− (L) = w ∈Σ* : w ∈ L .

• Concatenation(⋅):L ⋅ M = w ∈Σ* : w = u⋅v and u∈L and v∈M

• *-Closure (* ):

L* =∞

k=0∪ L(k)

where

L(k) = w : w ∈ L⋅L⋅L⋅ . . . ⋅L (k times)

Introduction 23

The Empty Language

The language over Σ that containsno wordsat all, is called

The Empty Language(over Σ)and is denoted by∅, the empty set sign.

Very Important

The emptylanguage, ∅ is not the same asthe language whose sole member is theemptyword , ε , i.e.

∅ = ε

For the operations ∪, ∩, ⋅,Co−, * :

Outcome OutcomeL ∪ ∅ L L ∪ ε L ⇔ ε ∈ LL∩∅ ∅ L∩ ε ε ⇔ ε ∈ LL⋅∅ ∅ L⋅ ε L

Co− (∅) Σ* Co− ( ε ) Σ+

∅* ε ε * ε

24 Introduction

Examples

Using the languages defined earlier.

L(a) ∪ L(b) = L(a)

L(b) ∩ L(c) = w : ∃ m≥ 2 s. t. |w| = 2m + 1

Co− (L(c)) = w : ∃ m≥ 0 s. t. |w| = 2m

L(h)⋅L(h) = w : w = 1i0 j 1r 0s, j = i2 and s = r 2

( L(a) )* = Binary*

Introduction 25

Equivalence Relations(Reminders)

Let S be a (possibly infinite) set. Arelation,R over S, is a set of ordered pairs of ele-ments fromS, i.e.

R ⊆ S × S

R is anequivalence relation if it satisfies allof the following:

a) ∀ x ∈S, < x, x > ∈ R

b) ∀ x, y ∈S, < x, y > ∈ R ⇔ < y, x > ∈ R

c) ∀ x, y, z ∈S, < x, y > ∈ R and <y, z > ∈ R ⇒ < x, z > ∈ R

These properties are respectively called:Reflexivity, Symmetry, Transitivity

Any equivalence relation,R, over S, inducesapartition of S,

<C1 ; C2 ; . . . ; Cr >The Ci ⊆ S (equivalence classes) are suchthat for allx, y ∈S:( x ∈Ci andy ∈C j and <x, y > ∈ R) ⇔ ( i = j )

26 Introduction

Example

SupposeS= N (the positive integers).

The relation≡k (k ≥ 2) is defined by

x ≡k y if the remainder when dividing x byk equals the remainder when dividing y by k

≡k is an equivalence relation. (Trivial exer-cise)

≡k partitions N into exactly k equivalenceclasses,

<C0 ; C1 ; . . . ; Ck−1 >for which x ∈Ci if and only if the remainderon dividingx by k equalsi .

Introduction 27

Describing Languages of Words

Given a languageL ⊆ Σ* ,

how are the words inL described/defined?

If L is infinite then one cannot simply pre-sent a list ofall the words inL.

Of course, one way is to give an ‘ad hoc’(finite) description, e.g.L(c) is

‘the set of all odd length binary words’

This iscomputationally unhelpful.

An alternative is to describe the operation ofa ‘black-box’, M , that outputs 1 when givenw ∈ L as input and outputs 0 otherwise.

Thus,L is‘the set of all words on which M outputs1’ .

28 Introduction

A Finite State Recogniseris an example ofsuch a description, and a formal definition ofL(M),

‘the language L recognised by M’

is given later.

There is, however, a third approach that isof great importance in Computer Science:

Define a formal grammar , G, againstwhich w ∈?L can be tested.

i.e. a set ofrules that ‘generates’ eachw ∈ L,

(equivalently), a process by which anyw ∈ L can be ‘decomposed’ or parsedusingthe rules inG.

Introduction 29

Formal Grammars

A formal grammar , G, is defined by aquadruple,

G = ( V, T, P, S)

V: a set of variable (or ‘non-terminal’)symbols.T: a set of terminal symbols (V ∩ T = ∅).S: thestart symbol (S ∈V).P: a set ofproduction rules, of the form

l i → r i .both l i and r i being words in (V ∪ T)* , l icontaining at least one symbol inV.

A formal grammar, G, defines how ‘accept-able’ words, w, may be generated from astarting point,S.

The production,L → R, is interpreted as:

‘A word w ∈(V ∪ T)* such that w= u⋅L⋅vgeneratesthe word u⋅R⋅v ∈(V ∪ T)* ’

30 Introduction

Of course suchderivations of words fromw ∈(V ∪ T)* can only continue whilewcontains (at least) one symbol fromV.

If w ∈T* can be derived from S in the gram-mar G = (V,T, P, S) then w is in the lan-guage, L(G), generated byG.

Notice thatL(G) ⊆ T* .

We treat Formal Grammars in greater depthlater.

For now we observe that ‘different types’ ofgrammar are distinguished by differingrestrictions on the form ofproductionrules.

Such restrictions specify ‘allowable’ combi-nations of variable and terminal symbols onthe left andr ight ‘sides’.

Introduction 31

Derivations in Formal Grammars

SupposeG = (V,T, P, S) is a formal gram-mar and thatx, y ∈(V ∪ T)* .

y is said to bedir ectly derived from x in G(x ⇒G y) if there is a a productionpi ∈ Psuch that applyingpi to x results in theword y.

y is said to bederived from x in G(x ⇒(*)

G y) ifx = y or

∃ z ∈(V ∪ T)* : x ⇒G z andz⇒(*)G y

Finally, x ∈T* is derivable in G if S⇒(*)G x.

Thus, the language,L(G), generated by thegrammarG is

x ∈T* : S⇒(*)G x

32 Introduction

A Simple Example Grammar

The following ought to be familiar:

EXPR= ( V,T, P, S) where

V = E, op, opd, num, digit .T = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, −, *, /, (, ) S = EP = p1 , p2 , . . . , p19, p20

Production Left → Right p1, p2, p3 E → (E) | E op E |opd

p4, p5, p6, p7 op → + |− | * | / p8 opd → num

p9, p10 num → digit |digit ⋅ num p11, . . , p20 digit → 0|1|2|3|4|5|6|7|8|9

Introduction 33

and Some Derivations in it

E opEopE ⇒ opd opEopE (by p3)

E opEopE ⇒(*) num + num + num(by < p3, p8, p4, p3, p8, p4, p3, p8 >

25 ∈ L( EXPR)

S⇒ E;E ⇒ opd;opd⇒ numnum ⇒ digit ⋅numdigit ⋅num ⇒ 2⋅num;2⋅num ⇒ 2⋅digit2⋅digit ⇒ 25

∴ S⇒(*) 25

34 Introduction

A fundamental discovery of Automata andFormal Language Theory may, informallybe stated as

There is a

‘hierarchy’of ‘black-box capabilities’

thatexactly matchesa‘hierarchy’

of ‘formal grammar types’

In other words,

L can be recognised by a machineM in aclassT of machinesif and only if L can bedescribed by agrammar G in a particularclassTG of formal grammars.

These ‘hierarchies’ are ‘natural’.

Languages recognised by Finite StateAutomata are at thelowest level of this.

Introduction 35

COMP209


Section 2

Finite Automata

(Determinism and Non-Determinism)

36 Finite Automata

We now consider the ‘simplest’ class ofmachine model,

Finite State Automata

We concentrate on these asrecognisers.

Formally,

A finite state automaton (DFA), M , isdescribed by a quintuple,

M = ( Σ,Q, S, F ,δ )

Σ: a finite alphabet.

Q = ( q0, q1 , . . . ,qk ): finite set ofstates.

S∈Q: initial state.

F ⊆ Q: final (or accepting) states.

δ : Q × Σ→Q: state-transition mapping.

Finite Automata 37

Consider the example given earlier,

0,10,1

q0q1

q2

2,3,4,5,...,8,9 2,3,4,5,...,8,9

0,1,2,3,...,8,9

Σ = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9; Q = q0, q1, q2

S = q0 ; F = q1

δ : Q × Σ → Qq σ → δ (q,σ )q0 0, 1 → q1q0 2, 3, 4, 5, 6, 7, 8, 9 → q2q1 0, 1 → q1q1 2, 3, 4, 5, 6, 7, 8, 9 → q2q2 0, 1, . . . , 8, 9 → q2

38 Finite Automata

A L arger Example

0, 1

0

0

0

0

1

1

1

1

101

0

q1

q6q5

q4

q3q2q0

DFA MX Recognising a LanguageLX.

Σ = 0, 1 ; Q = q0, q1, q2, q3, q4, q5, q6;S = q0 ; F = q3, q4, q5.δ is easily extracted from the diagram.

Finite Automata 39

It is not obvious by inspection, exactly whatLX is in the larger example.

We shall use this example to illustrate thegeneral concept ofL(M)⊆Σ* :

‘the language over Σ* recognised by M’

Consider any DFA, M = (Σ,Q, S, F ,δ ).

Notice the following:

For each stateq∈Q, δ maps eachσ ∈Σ toexactly one‘next’ stateq′∈Q, i.e.

δ is total anddeterministic.

Thus for every w∈Σ* , the ‘processing’ ofwby M can be described by thesequence ofstates visited as each symbol inw is ‘read’.

Whether w is accepted (in L(M)) orrejected (not in L(M)) depends on whetheror not the last state visited is an acceptingone, i.e in the setF .

40 Finite Automata

Example using the DFAMX

w = w1w2w3w4w5w6w7w8 ∈ 0, 1 * .If w = 00111100 whichis in LX:

i wi Current State Next State0 - - q01 0 q0 q02 0 q0 q03 1 q0 q14 1 q1 q45 1 q4 q46 1 q4 q47 0 q4 q58 0 q5 q5 ∈ F

If w = 10100001 which isnot in LX:

i wi Current State Next State0 - - q01 1 q0 q12 0 q1 q23 1 q2 q34 0 q3 q35 0 q3 q36 0 q3 q37 0 q3 q38 1 q3 q6 ∈ F

Finite Automata 41

In order formally to capture the concept of‘sequence of states visited’, we proceed bygeneralising the state-transition mapping,δ ,from single symbolsin Σ to words in Σ* .

For M = (Σ,Q, S, F ,δ ), the mappingδ * : Q × Σ* → Q

is defined as:

δ * ( q, w ) =

q if w = εδ ( δ * ( q, u ),σ ) if w = u ⋅ σ

Notice that this definition isrecursive:

The casew = ε , indicates the stateq.The general case, (w = ε ) indicates that

‘the state reached byM from q on the wordw = u⋅σ is the state reached by applying thetransition functionδ to the state reached byM on the word u and the symbolσ ∈Σ’

42 Finite Automata

Example usingMX

With w = 110,q = S = q0.

δ * ( q, 110 ) = δ ( δ * ( q, 11 ), 0 )

= δ ( δ ( δ * ( q, 1) , 1 ), 0 )

= δ ( δ ( δ ( δ * ( q, ε ), 1 ), 1 ), 0 )

= δ ( δ ( δ ( q0, 1 ), 1 ), 0 )

= δ ( δ ( q1, 1), 0 )

= δ ( q4, 0 )

= q5 ∈ F

Finite Automata 43

δ * admits a formal definition of

‘the language, L(M), recognised by the DFAM = ( Σ,Q, S, F ,δ )’

L( M ) = w ∈Σ* : δ * ( q0, w ) ∈ F

Notice that this definition iscomputation-ally based, i.e. it isnot an ‘ad hoc’ descrip-tion.

Thus for ‘suitable’ l anguages,L, over Σ, onecan present afinite computational defini-tion of all the words inL by,

describing any DFA, M = (Σ,Q, S, F ,δ )for which

L( M ) = L[ i.e. for allw ∈Σ* , w ∈ L ⇔ w ∈ L(M)].

This, however, raises the question:‘What is meant by a "suitable" language?’

We (temporarily) defer discussion of this.

44 Finite Automata

Non-deterministic Finite State Automata

It was noted above that δ : Q × Σ → Q wasdefined to be:

total anddeterministic

i.e. every state/symbol pair has a definedtransition toexactly onenext state.

We now consider the effect of changing thisto allow non-determinism in the state-tran-sition functionδ .

2 mechanisms are used to do this:

• more than one ‘choice’ of next state.• ‘null’ transitions between states.

In the first:δ (q,σ ) need not beunique.

In the second: the ‘current’ state may changewithout any input occurring.

Finite Automata 45

1

1

0 0

1

0

q1

q0

q2

‘Simple’ Example of Non-deterministic FSA

In the example above, the initial state,q0,has transitions to stateq1 and stateq2 bothof which are labelled 0.

Similarly, q2 has transitions labelled 1 tostatesq0 andq1.

Notice, also, thatq0 has no transitionlabelled 1.

46 Finite Automata

Formally, a non-deterministic FSA,(NDFA) M , is described by a quintuple,

M = ( Σ,Q, S, F ,δ )whereΣ, Q, S, and F are as before, but δ isnow a mapping,

δ : Q × Σ → ℘( Q )℘( Q ) denoting thepowerset (set of allsubsets) ofQ.

For the example NDFA,δ is

δ : Q × Σ → ℘(Q)q σ → δ (q,σ )q0 0 → q1, q2q0 1 → ∅q1 0 → q2 q1 1 → q0 q2 0 → ∅q2 1 → q0, q1

Finite Automata 47

Interpretation

The change fromQ to ℘(Q) as the range ofδ , clearly has implications for the statesequence functionδ * and consequently forhow the term‘language, L(M), recognised by the NDFA M’is defined.

Before considering these we describe a‘physical’ interpretation of how non-deter-minism should be viewed in this context.

Consider the transitionδ ( q0, 0 ) = q1, q2

for the example machine.

We view this as modelling:

‘If 0 is read when in stateq0 theneither oneof statesq1 or q2 could occur as the ‘next’state’

48 Finite Automata

It is important to remember that:

a)Exactly oneof these states is chosen.b) Which one isnot predictable.

Thus if δ (q,σ ) = R⊆ Q, then in stateqreading σ the next state will be some(unknown) state inR.

Non-determinism (in the sense used here)shouldnot be thought of as a ‘random pro-cess’.

Although ‘random’ or ‘ probabilistic’ meth-ods define aparticular type of non-deter-minism, the concept of ‘non-deterministicchoice’ that we wish to use, precludes anypossibility of modelling by a stochastic pro-cess.

[Probabilistic automata have been widelystudied, however, a treatment of these is out-with the scope of this module.]

Finite Automata 49

Languages Recognised by NDFAs

Recall that the language recognised by aDFA, M , was defined with respect to itsassociated ‘state-sequence’ function

δ * : Q × Σ* → Q

In a deterministic automaton, starting fromthe initial state, any word w ∈Σ* , can bethough of as traversing a unique sequence(or path) of states.

In a non-deterministic automaton, this‘sequence’ may not beunique: there may bemany possible sequences of states that areconsistent with a singlew.

50 Finite Automata

Example

If w = 010, the ‘tree structure’ describespossible computations:

q1

q0q1

11

0 0 000

0

q1 q2 q2q2q1

q0

1

0

q2

q0

5 ‘computation paths’;3 accept (q2);2 reject (q1);

Question: Is 010 accepted or not???

Finite Automata 51

The definition ofw ∈Σ* being accepted bythe NDFAM is, informally, stated as:

‘ w is accepted by the NDFA, M, if there is atleast onecomputation path of M on w thatendsin a stateq∈F ’

So in the example, 010is accepted.

The significance of the computation beingendedshould be noted, i.e.all symbols inwmust be read.

If we consider w = 0100 in the example,then sinceq2 has no ‘0-transition’ the pro-cessing ofw would be ‘stuck’ at this point.We cannot conclude that 0100 is acceptedatthis point (even thoughq2 cannot be left).[ 0100 is accepted by continuing with the0-transition fromq1].

52 Finite Automata

We can now define an analogue ofδ * fornon-deterministic automata.

For the NDFA M = (Σ,Q, S, F ,δ ) theReachability Function,

ρ* : Q × Σ* → ℘(Q)is,

ρ* ( q, w ) =

q if w = ε

q’ ∈ρ* ( q, u )∪ δ ( q’,σ ) if w = u ⋅ σ

And we now hav e,

The language,L(M), recognised by theNDFA M = (Σ,Q, S, F ,δ ) is,

L( M ) = w∈Σ* : ρ* ( q0, w) ∩ F = ∅

Finite Automata 53

Example

wq’ ∈ρ* (q,u)

∪ δ (q’,σ ) ρ* (q0, w)

ε - q0

ε ⋅0 δ (q0, 0) q1, q20⋅1 δ (q1, 1) ∪ δ (q2, 1) q0, q101⋅0 δ (q0, 0) ∪ δ (q1, 0) q1, q2

ρ* ( q0, 010 ) = q1, q2 F = q2F ∩ ρ* ( q0, 010 ) = q2 = ∅.

∴ 010∈ L accepted by the NDFA example.

54 Finite Automata

For ‘suitable’ l anguages we now hav e twomethods for describing the set of wordsL ⊆ Σ* :

a) By a DFA,M for which L( M ) = L.b) By a NDFA,M’ for which L( M ) = L.

Of course, since the transition function,δwith deterministic machines is a "restricted"form of that allowed in non-deterministicautomata, it follows that:

‘Any language that is "suitable" (in sense(a)) isalso"suitable" (in sense (b))’

More formally,

L ⊆ Σ* : ∃ DFA, M , with L(M) = L ⊆ L ⊆ Σ* : ∃ NDFA, M’ , with L(M’ ) = L

Finite Automata 55

It turns out, however, that we do not ‘gainanything’ by way of languages recognisableusing NDFAs but not recognisable usingDFAs.

Theorem 1:For any NDFA, M = (Σ,Q, S, F ,δ ) there isa deterministic FA. M’ = (Σ,Q’, S’, F’ ,δ ’ )such that

L( M ) = L( M’ )

We first illustrate the idea behind the proof.

Consider the ‘general’ computation tree forthe NDFAexemplar:

56 Finite Automata

q0

∅

0 1

q12

q2

0 0

1

1

q0

q1,q2

q2

0 1

q0

q0

0

0

1

1

Tree vertices: labelled withdistinct subsetsof Q arising in the reachability functionρ* .

Graph edges: transitions between theseunder 0 or 1.

This automaton isdeterministic andequiv-alent to the non-deterministic example (itacceptsexactly the same language).

Finite Automata 57

Proof of Theorem 1

LetMnd = (Σ,Qnd, Snd, Fnd,δ nd )

be someNDFA recognisingL( Mnd) ⊆ Σ* .

We construct aDFA,

Md = (Σ,Qd, Sd, Fd,δ d )

such thatL( Md ) = L( Mnd ).

The idea is that eachsingle state inQd willcorrespond to somesubset of states fromQnd.

More precisely, R⊆ Qnd, will map to somestateqR ∈Qd if and only if

∃ w ∈Σ* : ρ* ( Snd, w ) = R.

58 Finite Automata

Once all of the required states inQd arefound, the state transition function

δ d : Qd × Σ → Qd

is a (total) function defined so that

∀ Ri ⊆ Qnd for which there is a correspond-ing stateqRi

∈Qd:

δ d( qRi,σ ) = qRj

⇔

q ∈ Ri

∪ δ nd( q,σ ) = Rj ⊆ Qnd

Finite Automata 59

Algorithmic Construction

We use a stack,∆, to keep trace of the sub-sets ofQnd.

Qd : = qSnd; // Starting point

Sd : = qSnd;

push Snd onto ∆while ∆ is not empty

R : = Top( ∆ );pop( ∆ ); // Remove top of stack.for eachσ ∈Σ

V : =

q ∈ R∪ δ nd( q,σ ); // (a)

if qV ∈Qd // ‘new’ subset ofQndpush V onto∆;Qd : = QD ∪ qV ;if V ∩ Fnd = ∅

Fd : = Fd ∪ qV ; // (b);

;δ d( qR,σ ) = qV ; // (c);

;

60 Finite Automata

LetMd = ( Σ,Qd, Sd, Fd,δ d )

be the DFA constructed from the NDFA,Mnd = (Σ,Qnd, Snd, Fnd,δ nd)

Each state,qR ∈QD is formed from somesetof statesR⊆ Qnd by the algorithm.

L( Md ) = L( Mnd )will follo w by proving: ∀ w ∈Σ* ,

δ *d( qR, w) = qV ⇔ V =

q∈R∪ ρ*

nd(q, w)

We use induction on |w| ≥ 0.

Inducti ve Base: |w| = 0, i.e.w = ε .

δ *d( qR, ε ) = qR;

q ∈ R∪ ρ*

nd( q, ε ) =q ∈ R∪ q = R.

∴ Inductive base holds.

Finite Automata 61

Inducti ve Step: (|w| ≤ k) ⇒ (|w| = k + 1)

Let w = u⋅σ , where |u| ≤ k andσ ∈Σ.

δ *d( qR, u⋅σ ) = δ d( δ *

d( qR, u ),σ )

= δ d( qV ,σ ) = qY

The setsV, Y are subsets ofQnd. Which?

V =q ∈ R∪ ρ*

nd( q, u ) [Ind. Hyp.]

Y =q∈V∪ δ nd( q,σ ) [(a)+(c)]

And,

q ∈ R∪ ρ*

nd( q, u⋅σ ) =q ∈ R∪

q’ ∈ρ*nd(q,u)∪ δ nd( q’,σ )

=q’ ∈V∪ δ nd( q’,σ ) = Y

∴ δ *d( qR, u⋅σ ) = qY ⇔

q∈R∪ ρ*

nd(q, u⋅σ ) = Y

Completing the induction.

62 Finite Automata

That L( Mnd ) = L( Md ), now follows bynoting that,

qR ∈ Fd ⇔ R∩ Fnd = ∅ [(b)]

So,

w ∈ L( Md ) ⇔ δ *d( q Snd , w ) ∈ Fd

⇔ ρ*nd( Snd, w ) ∩ Fnd = ∅

⇔ w ∈ L( Mnd )

Finite Automata 63

Example

Applying the process to our example NDFA,leads to the DFA, with:

Qd = q0 , q1,2 , q2 , q Sd = q0 ;Fd = q1,2 , q2 .

δ d : Qd × Σ → Qd

q σ → δ d(q,σ )q0 0 → q1,2q0 1 → qq1,2 0 → q2q1,2 1 → q0q2 0 → qq2 1 → q0q 0 → qq 1 → q

Note: The empty subset must be included inthe set of reachable subsets if it arises (thestateq above).

64 Finite Automata

ε-Transition Automata

We hav eseen that NDFA using a ‘multiplechoices’ sense of non-determinism do notextend the range of ‘suitable’ l anguages forDFA.

What about the other mechanism:‘null ’-transitions?

Formally, an ε-NDFA, M , is a quintuple,M = ( Σ,Q, q0, qF ,δ )

whereΣ andQ are as before;q0 ∈Q: Single initial state, having only out-going transitions, all of which are labelledε .qF ∈Q: Single accepting state, having onlyincoming transitions, all of which arelabelledε .

δ : Q × Σ ∪ ε → ℘(Q)The state-transtion function as forNDFAbutaugmented to allow ε -moves between somestates.

Finite Automata 65

Example ε-Transition Automaton

ε

εε

1

q0 qF

1

ε ε

1 ε

q1

q2

0

q5

q3 q4

66 Finite Automata

Interpretation

In anε − NDFA, M , supposeδ has a transi-tion of the form,

δ ( qi , ε ) = R⊆Q

if M can reach stateqi , then M can moveto any q∈R without any further inputoccurring. Whether theε -move is made ornot is decided non-deterministically.

The insistence onunique accepting andstarting states such that:a) q0 is left via anε -move and never re-entered.b) qF is entered via anε -move and neverleft.

is purely for technical convenience.

Tr ivial exercise: Show that any (normal)NDFA M can be transformed to an ‘equiv-alent’ ε -NDFA with unique start and accept-ing states meeting these requirements.

Finite Automata 67

Languages Accepted byε-NDFAs

We can define the concept ofw ∈Σ* beingaccepted by anε -NDFA, M , by extendingthe definition of ofρ* for anε -NDFA:

Unfortunately, this involves some additionalcomplications due to the fact that we musttake account of states reached byε -moves atany point during the processing ofw,

Let M = ( Σ,Q, q0, qF ,δ ) be an ε -NDFA.

First we definek-reachability (k ≥ 0)

ρ (k) : Q × Σ* → ℘(Q)to capture

‘the set of states reachable from q on w afterexactlyk moves’

68 Finite Automata

ρ (0)( q, w ) =

∅ if w=εq if w = ε

i.e. ‘in 0 moves state q cannot be left, andonly the empty word can be "read"’

For k > 0, ρ (k)(q, w) is

∅ if |w| > k

q’ ∈ρ (k−1)(q,ε )∪ δ (q’, ε ) if w = ε

q’ ∈ρ (k−1)(q,u)∪ δ (q’,σ ) ∪

q’ ∈ρ (k−1)(q,w)∪ δ (q’, ε ) if w = u⋅σ

Finally,

ρ* (q, w) =∞

k=0∪ ρ (k) (q, w)

w is accepted by theε -NDFA, M , ifqF ∈ ρ* ( q0, w)

So that,L( M ) = w ∈Σ* : qF ∈ ρ* ( q0, w )

Finite Automata 69

Example

qF ∈ ρ* ( q0, 0 )

k ρ (k−1)( q0, ε ) ρ (k−1)( q0, 0 ) ρ (k)(q0, 0)0 - - ∅1 q0 ∅ ∅2 q1 ∅ ∅3 q2 ∅ q24 q5 q2 q55 q1, qF q5 q1, qF

qF ∈ ρ* ( q0, 1 )

ρ* ( q0, 1 ) = q3

∴ 0 ∈ L( M ) and 1∈ L( M ).

70 Finite Automata

ε -transitions allow automata to be ‘gluedtogether’ to give an elegant method of ‘com-bining’ languages.

e.g. if MR, and MT recogniseR, T ⊆ Σ* ,

q0 MR MT

ε ε ε

MR

MT

q0

ε

ε ε

ε

recognise:R∪ T, R⋅T respectively.

Finite Automata 71

From these, and similar examples, it mightappear that

L ⊆ Σ* : ∃ ε − NDFA, M s. t. L(M) = L

properly contains

L ⊆ Σ* : ∃ NDFA, M s. t. L(M) = L

Or, in informal terms,

‘there are "more" "suitable" languages forε-NDFAs than there are for "ordinary"NDFAs’

In fact this isnot the case, as we shall show(constructively) in

Theorem 2:

L ⊆ Σ* : ∃ ε − NDFA, M s. t. L(M) = L = L ⊆ Σ* : ∃ NDFA, M s. t. L(M) = L

72 Finite Automata

Proof of Theorem 2

Let Mε = (Σ,Qε , q0, qF ,δ ε ) be an ε -NDFA.An equivalent NDFA without ε -moves isbuilt in 2 stages.

a) Forming anε -NDFA without ε -loops.b) Forming an equivalent NDFA to this.

Stage 1: Removal of ε -loops

R= c1, c2 , . . . ,ct is an ε -loop if

ci ∈ δ ( ci−1, ε ) ∀ 2≤ i ≤ t

c1 ∈ δ ( ct , ε )

[ e.g. q1, q2, q5 in t he example.].

Note that:a) q0 and qF cannot occur in any ε -loop.(Exerc: Why?)b) Any ε -move such that q ∈δ ( q, ε ), isredundant.

∴ ε -loops have at least 2 distinct states, anddo not contain the start or final state.

Finite Automata 73

If R= c1 , . . . ,ct is an ε -loop in Mε , forma new ε -NDFA M’ = (Σ,Q’, q0, qF ,δ ’ ) asfollows:

Q’ = Q − R∪ qR i.e. Remove all the states inR from Qadding a new stateqR to ‘represent’ these.

For each transition such thatqk ∈δ ( qi ,α )of Mε , (whereα ∈Σ ∪ ε f orm transitionsin δ ’ using the following:

qk ∈δ ’ ( qi ,α ) if qi ∈ R and qk ∈ R

qk ∈δ ’ ( qR,α ) if qi ∈ R and qk ∈ R

qR ∈δ ’ ( qi ,α ) if qi ∈ R and qk ∈ R

qR ∈δ ’ ( qR,α ) if qi ∈ R, qk ∈ R and α ∈Σ

The process is continued until noε -loopsremain.

74 Finite Automata

Example

Removing theε -loop q1, q2, q5 , from theexample, produces

1

q0 qF

1

q3 q4

1

εε

ε

q125

0

Finite Automata 75

Stage 2:Removal of ε -moves

On completion of Stage 1, anε -NDFA, M ,without ε -loops has been built.

Let Mε = (Σ,Qε , q0, qF ,δ ε ) be the ε -loopfreeε -NDFA.

Mnd = (Σ,Qnd, q0, Fnd,δ nd) an equivalentNDFA (without ε -moves) is built by:

Qnd : = Qε − qF

For eachq ∈Qnd ∩ Qε , σ ∈Σ

δ nd( q,σ ) =q’ ∈ρ*

ε (q,ε )∪ δ ε ( q’,σ )

Finally,

Fnd = q ∈Qnd∩Qε : qF ∈ ρ*ε ( q, ε )

76 Finite Automata

Example

The ε -move free NDFA resulting from theexample automaton is:

1

q0

1

q3 q4

1

0

0

0

1

q125

Fnd = q0, q125, q4 .

Finite Automata 77

Correctness Proof

Stage 1:

Let M1 = (Σ,Q1, q0, qF ,δ1) hav e an ε -loopthroughR⊆ Q1.

Let M2 = ( Σ,Q2, q0, qF ,δ2 ) result byremoving thisε -loop.

Supposew ∈ L( M1 ). Let k be the leastvalue for which

qF ∈ ρ (k)1 ( q0, w ).

There is a sequence of states and transitionsin M1,

q0 → α1 → s1 → α1 → s2 → α2 → . . .

. . .α k−1 → sk−1 → α k → qF(w1)

such thatα1⋅α2⋅ . . . ⋅α k = w.

Since k is minimal, this sequence cannotcontain anε -loop.

78 Finite Automata

Consider the sequence of states inM2defined by:

t j =

qR if sj ∈ R

sj otherwise(w2)

The construction ensures thatwith theexception ofmoves sj → ε → sj+1 wheresj ,sj+1 ∈ R each move si → α i → si+1 has amatching move in M2. The missing movesare of the formqR → ε → qR (i.e. anε -movefrom qR to itself) and these can be replacedby the single stateqR.

The proof thatqF ∈ ρ*

2( q0, w) ⇒ qF ∈ ρ*1(q0, w)

is similar.

Stage 2: Similar to Stage 1. (Exercise).

Finite Automata 79

COMP209


Section 3

Regular Languagesand

Finite Automata

80 Regular Languages

We hav enow seen three different forms offinite automaton:

a)Deterministic (DFA)b) Non-deterministic (NDFA)c) ε-transition (ε − NDFA)

These are ‘equally powerful’ in the sensethat the set

of ‘suitable’ l anguages forDFAis exactly the same as the set

of ‘suitable’ l anguages forNDFAwhich is exactly the same as the set

of ‘suitable’ l anguages forε − NDFA

We now return to the question:

What is meant by a‘suitable’ language?

Regular Languages 81

Formal Grammars and Finite Automata

Recall that a formal grammar was intro-duced as

G = ( V, T, P, S)

V: a set of variable (or ‘non-terminal’)symbols.T: a set of terminal symbols (V ∩ T = ∅).S: thestart symbol (S ∈V).P: a set ofproduction rules, of the form

l i → r i .both l i and r i being words in (V ∪ T)* , l icontaining at least one symbol inV.

It was also claimed that,

‘Different grammar’‘types’ match‘different machine’ ‘ capabilities’


Finite State Automata formed the‘simplest’ class of machine model

Intuition would suggest a ‘match’ w ith the‘simplest’ structure of grammar,

i.e. that which imposes the greatest restric-tions on the form of grammar productions.

What form would this be?

If pi : Li → Ri is a production:Vi → σ or Vi → σ ⋅V j or Vi → ε

Vi , V j variable symbols inV.σ a terminal symbol inT.


Such grammars are called

Right Linear Grammars (RLG):

Thus, if G = (V,T, P, S) is a RLG andw = x Vi y ∈(V ∪ T)* , then only words ofthe form

x σ y if Vi → σ ∈ P(G)

x σ V j y if Vi → σ ⋅ V j ∈ P(G)

can be generated fromw by G.

Question: Is this intuition justified?i.e. is it the case that,∀ L ⊆ Σ* :

∃ RLG G: L(G) = L ⇔ ∃ DFA M : L(M) = L

Answer: Yes.


Theorem 3:

L ⊆ Σ* : ∃ RLG Gwith L(G) = L

= L ⊆ Σ* : ∃ DFA M with L(M) = L

Proof of Theorem 3

Given a DFA M = ( Σ,Q, S, F ,δ ), form theRLG, GM = ( VM , Σ, PM , SM ) by:

VM = Vi : qi ∈Q SM = V0

PM is formed by the rules:

M has P(G) has1 qi ∈ F Vi → ε2 δ ( qi ,σ ) ∈ F Vi → σ3 δ ( qi ,σ ) = q j Vi → σ ⋅V j


Similarly,

Given the RLG, G = ( V, Σ, P, S), form theNDFA, MG = ( Σ,QG, qG

0 , FG,δG ) with

QG = qi : Vi ∈V ∪ qF

qG0 = qS(G); qF ∈ FG

andδG given by:

G has δG has1 Vi → ε qi ∈ FG

2 Vi → σ qF ∈δ ( qi ,σ )3 Vi → σ ⋅V j q j ∈δ ( qi ,σ )

We claim that:

∀ w ∈Σ* :

δ * ( q0, w ) ∈ F ⇔ SM ⇒(*)G(M) w

S ⇒(*)G w ⇔ ρ*

M ( qG0 , w ) ∩ FG = ∅


Only the first of these will be proved. Thesecond is similar.

Supposew = σ1σ2. . .σ k hasδ * ( q0, w ) ∈ F .

Let <s1, s2, . . . ,sk > be the state-sequencesuch that

δ ( q0,σ1 ) = s1

δ ( si ,σ i+1) = si+1 1 ≤ i < k

sk ∈ F

GM has productions with left-hand sides< L1, L2 , . . . ,Lk >

such that

S = V0 → σ1 L1

Li → σ i+1 Li+1 1 ≤ i < k

Lk−1 → σ k or Lk → ε

∴ SM ⇒(*)G(M) σ1

. . .σ k = w

In the same way, from the derivationsequence proving w ∈ L(GM ), a sequence ofstates in M giving δ * ( q0, w) ∈ F isformed.


Regular Sets and Regular Expressions

The correspondence betweenDFA andRLGs is giv en by (rather obvious) directtranslations

< Q, δ > ↔ <V, P >

It is debatable, however, to what extenteither of these mechanisms assist with thefollowing questions:

a) Given a DFA, M , giv e a ‘succinct’description ofL(M).b) Given a RLG, G, giv e a ‘succinct’description ofL(G).c) Given L ⊆ Σ* , ∃?DFA, M : L(M) = L.d) Given L ⊆ Σ* , ∃?RLG, G: L(G) = L.e) Given L ⊆ Σ* construct a DFA, M , suchthat L(M) = L.f) Given L ⊆ Σ* construct a RLG, G, suchthat L(G) = L.

Note the difference between (c,d) (existencequestions) and (e, f) (synthesisquestions).


One possible approach might be to try andfind a set of operations, Φ, on sets ofwords over an alphabet Σ which allow newsets of words to be formed by applying oper-ations inΦ to ‘previously built’ sets.

Thus if one takes theindividual symbols inΣ as the ‘initial ’ terms, then, depending onour choice ofΦ, we can describe a language,L, by giving the sequence of operations inΦwhich must be applied in composition togenerateL.

Now, for any particular choice of operations,Φ, there will besomeclass of languages thatcan be describedusingΦ.


Question 1:

Is it possible to choose operations,Φ, so thatthis process definesexactly the class of lan-guages recognisable byDFA?

Question 2:

If it is possible, what choice of operationsachieves this?

The answer to the first question is that itispossible to define such a set.

In this section we describe how this is doneand prove that the class of resulting lan-guages are exactly those captured byDFA.


Regular Sets

A regular set (or regular language), is a setL ⊆ Σ* that can be formed bysufficientlymany applications of the following opera-tions:

a) ∅ (the empty set) is a regular set.b) ε ( thesetcontaining the emptyword )is a regular set.c) ∀ σ ∈Σ: σ is a regular set.d) If V, W ⊆ Σ* are regular sets, then so areall of the sets:

V ∪ W ; V ⋅ W ; V*

Examples

0 ∪ 1 *

0 ∪ 1 ⋅0 ∪ 1 ⋅0 ∪ 1 *

0 * ⋅1 ⋅0 ⋅1 ⋅0 * ∪ 1 ⋅1 * ⋅0 *

[Note: Every oneof these examples hasbeen seen earlier.]


Regular Expressions

The formalism used to described regularsetsbecomes rather cumbersome even whendescribing ‘simple’ set structures.

A more convenient (and clearly equivalentapproach) is that ofregular expressions

A regular expressionover Σ is recursivelydefined as follows:

a) ∅ is a regular expression.b) ε is a regular expression.c) ∀ σ ∈Σ: σ is a regular expression.d) If V, W are regular expressions over Σ,then so areall of

(V + W) ; (V ⋅ W) ; (V* )

[Note: Where no ambiguity arises, bracketsare omitted.].


Examples

( 0+ 1 )*

( 0+ 1 )⋅( ( 0+ 1 )⋅( 0+ 1 ) )*

0* ⋅1⋅( 0⋅⋅1⋅0* + 1 )⋅1* ⋅0*

Interpreting+ as ∪ gives an obvious map-ping between regular sets and regularexpressions.

Thus a Regular Expression over Σ is adescription of a particular

Regular Setover Σ.We denote byL(R) the regularsetdescribedby the regularexpression, R.

Formally, for R a regular expression the reg-ular setL(R) is:

L( R) =

∅ if R = ∅ ε i f R = εσ i f R = σL(S) ∪ L(T) if R = S+ T

L(S)⋅L(T) if R = S⋅TL(S)* if R = S*


Note that:

a) There may bemany different expressionsfor a single set.

b) There isexactly oneset corresponding toa single expression.

c) A regular expression can be seen both as a

description

of the set of words in a (regular) languageand as an

operational process

for generating these.


Properties of +, ⋅, *

R, S, T denote arbitrary regular expressions.

R + S = S + R

( R+ S) + T = R + ( S+ T )

( R ⋅ S) ⋅ T = R ⋅ ( S⋅ T )

R ⋅ ( S+ T ) = R ⋅ S + R ⋅ T

The important properties of* are:

( R* )* = R*

R ⋅ R* = R* ⋅ R

R ⋅ R* + ε = R*

R ⋅ ( S⋅ R)* = ( R ⋅ S)* ⋅ R

( R+ S)* = ( R* + S* )*

= ( R* ⋅ S* )*

= R* ⋅ ( S⋅ R* )*

These are easily proved using the basic defi-nitions of⋅, ∪ (i.e. +) and * .

Exercise:Do this.


Equivalence ofRegular Expressions and Finite Automata

Theorem 4:L ⊆ Σ* is a regular set if and only if there isa DFA, M , for whichL( M ) = L.

First we outline the proof structure:

Recall that

a) Any regularset is described by a regularexpression.b) DFA, NDFA, and ε -NDFA describeexactly the same classof languages.

The proof is carried out in two stages:

I) For any regular expression,R, constructsomeFA, M , for whichL( M ) = L( R)

II) For any DFA, M , construct some regularexpression,R, for whichL( R) = L( M ).


Regular Expressions→ Finite AutomataBase Cases

ΣΣ

q0

q1

q0q1

Σ

Σ

Σ − σ

σ

q2

q0 Σ

R= ∅ R= ε

R= σ

DFA Constructions forRegular Expressions (a-c)


Regular Expressions→ Finite AutomataComposite Cases

R andS are regular expressions.

R

S

ε

ε

ε

ε

R Sε ε ε

Rε εεε

ε

ε

R + S

R⋅S

R*

ε-NDFA Constructions forRegular Expressions (d)


Correctness of Construction

Formally, this is by induction on the totalnumber of occurrences of +, ⋅, * .

Thus letT be any regular expression con-tainingk ≥ 0 operations.

Inducti ve Base: k = 0

T ∈ ∅, ε ,σ .The correctness of these constructions beingobvious.

Inducti ve Step: (≤ k − 1) ⇒ k

Assuming correctness for expressions with≤ k − 1 operations. Let T be a regularexpression havingk operations.

T ∈ R + S, R ⋅ S, R* .Inductively we may construct (correct)DFAfor R and S since these expressions usefewer thank operations.


If T = R + S, we add ε -transitions fromaccepting states ofR, S, to a new (single)accepting state, andε -transitions from a newstart state to the initial states ofR andS.

If T = R⋅S, ε -transitions connect acceptingstates ofR to the initial state ofS (theseaccepting states inR being changed to ‘ordi-nary’ states, i.e. non-accepting.)

Finally if T = R* , an ε -loop from acceptingstates in R back to its initial state isarranged.

Exercise:Complete the formal details of theinductive step, describing the exact form oftheε -NDFA,

MT = ( QT , Σ, qT0 , qT

F ,δT )in terms of theDFA

MR = ( QR, Σ, qR0 , FR,δ R )

MS = ( QS, Σ, qS0, FS,δ S )

for each of the casesT = R+ S ; T = R ⋅ S ; T = R*


Finite Automata → Regular Expressions

This is a little more complicated.

The key idea is to form asystem of ‘simultaneous’ equations,

E(M) from M = (Q, Σ, q0, F ,δ ):

a) There are exactly |Q| equations, in thissystem. One for each state inQ.b) Each equation,Ei , is aregular expressionover Σ and E j : 1 ≤ j ≤ |Q| .c) Ei describes the set of words that areacceptedstarting from the state qi .d) The aim is to ‘reduce’ this system so thata ‘closed form’ solution forE0 is found, i.e.a solution

E0 = Rwhere R is a regular expression involvingonly the operations +, ⋅, * and ∅, ε , Σ .

So∀ i R contains no occurrence ofEi


Construction of theEquational SystemE( M )

The closed form solution forEi - the equa-tion corresponding to the stateqi ∈Q,should be aregular expression, Ri , suchthat:

L( Ri ) = w ∈Σ* : δ * ( qi , w ) ∈ F i.e. the set of words that would be acceptedif qi were the initial state.

But this set,L( Ri ) is just,

σ ∈Σ∪ σ ⋅u ∈Σ* : δ * ( δ ( qi ,σ ), u ) ∈ F ( *)

Let qi ,σ denote the stateδ ( qi ,σ ).


Question

What is the set in (*) in terms ofσ ∈Σ∪ Ei ,σ ?

Answer

σ ∈Σ+ σ ⋅ Ei ,σ + Λi

where

Λi =

∅ if qi ∈ F

ε if qi ∈ F

In other words, for eachqi in Q, Ei is

Ei =

σ ∈Σ+ σ ⋅Ei ,σ + ∅ if qi ∈ F

σ ∈Σ+ σ ⋅Ei ,σ + ε if qi ∈ F

(**)


Example

0, 1

0

0

0

0

1

1

1

1

101

0

q1

q6q5

q4

q3q2q0

E0 = 0 ⋅ E0 + 1 ⋅ E1 + ∅E1 = 0 ⋅ E2 + 1 ⋅ E4 + ∅E2 = 0 ⋅ E6 + 1 ⋅ E3 + ∅E3 = 0 ⋅ E3 + 1 ⋅ E4 + εE4 = 0 ⋅ E5 + 1 ⋅ E4 + εE5 = 0 ⋅ E5 + 1 ⋅ E6 + εE6 = 0 ⋅ E6 + 1 ⋅ E6 + ∅


Reduction of Equational System

In the example, there areseven interdepen-dent relationships that in total define the lan-guage accepted by the automaton illustrated.

In general, there will be |Q| such relation-ships describingL( M ).

The problem now is how to use these rela-tionships to construct a solution

E0 = Rwith R a regular expression over Σ for which

L( R) = L( M )

= w ∈Σ* : δ * ( q0, w ) ∈ F


First notice that a typical relationship

Ei =σ ∈Σ+ σ ⋅ Ei ,σ + Λi

satisfiesexactly oneof the following:

A) Ei does notoccur on the right-hand side.B) Ei doesoccur on the right-hand side.

Case A arises if∀ σ ∈Σ δ ( qi ,σ ) = qi(i.e. no symbol inΣ yields a transition fromqi to itself).

Case B arises if∃ σ ∈Σ δ ( qi ,σ ) = qi(i.e. there is a transition fromqi to itself,labelledσ ∈Σ).

We subsequently refer to

Case A asnon-iterativeandCase B asiterati ve

relationships forEi .


Examples

E1 = 0 ⋅ E2 + 1 ⋅ E4 + ∅E2 = 0 ⋅ E6 + 1 ⋅ E3 + ∅

(Non-iterative)

E0 = 0 ⋅ E0 + 1 ⋅ E1 + ∅E3 = 0 ⋅ E3 + 1 ⋅ E4 + εE4 = 0 ⋅ E5 + 1 ⋅ E4 + εE5 = 0 ⋅ E5 + 1 ⋅ E6 + εE6 = 0 ⋅ E6 + 1 ⋅ E6 + ∅

(Iterative)


Since the operation of concatenation (⋅) dis-tributes over the operation of union (+), i.e

R ⋅ ( S + T ) = R ⋅ S + R ⋅ T

we maysubstitute the right-hand side ofthe relationship governing any non-iterativeEi , for any other occurrence ofEi .

Of course, thismay lead to relationships,Ei ,which had been non-iterative, becomingiterative.

For example, if

E1 = 0 ⋅ E2 + 1 ⋅ E3 + ∅E2 = 0 ⋅ E1 + 1 ⋅ E3 + ∅both of which are non-iterative. Substitutingthe RHS ofE1 in E2 gives:

E2 = 0 ⋅ ( 0 ⋅ E2 + 1 ⋅ E3 + ∅ ) + 1 ⋅ E3 + ∅

= 00⋅E2 + ( 01+ 1 ) ⋅ E3 + ∅


Substitution Rule

Given the system ofk relationships< E0, E1 , . . . ,Ek >

in which E j is non-iterative

Form the j-substituted system, of (k − 1)relationships

< E0, E1 , . . . ,E j−1, E j+1 , . . . ,Ek >

in which theRHS of the relationshipE j issubstituted for every occurrence ofE j in thesystem

< E0, E1 , . . . ,E j−1, E j+1 , . . . ,Ek >

We say a system isfully substituted underthis rule, if every relationship within it isiterati ve, i.e. no more applications are pos-sible (within the current system).


Example

The1-substitutedsystem for the example:

E0 = 0 ⋅ E0 + 1 ⋅ ( 0⋅E2 + 1⋅E4 + ∅ ) + ∅E2 = 0 ⋅ E6 + 1 ⋅ E3 + ∅E3 = 0 ⋅ E3 + 1 ⋅ E4 + εE4 = 0 ⋅ E5 + 1 ⋅ E4 + εE5 = 0 ⋅ E5 + 1 ⋅ E6 + εE6 = 0 ⋅ E6 + 1 ⋅ E6 + ∅

The2-substitutedsystem from this is,

E0 = 0⋅E0 + 1⋅(0⋅(0E6 + 1E3 + ∅) + 1⋅E4 + ∅) + ∅E3 = 0⋅E3 + 1⋅E4 + εE4 = 0⋅E5 + 1⋅E4 + εE5 = 0⋅E5 + 1⋅E6 + εE6 = 0⋅E6 + 1⋅E6 + ∅


This system, which isfully substituted canbe written as:

E0 = 0⋅E0 + 100E6 + 101E3 + 11E4 + ∅E3 = 0⋅E3 + 1 ⋅ E4 + εE4 = 0⋅E5 + 1 ⋅ E4 + εE5 = 0⋅E5 + 1 ⋅ E6 + εE6 = ( 0+ 1 )⋅E6 + ∅Every relationship of which is iterative.


Reduction of Iterative RelationshipsArden’s Rule

Except in rather trivial cases, the systemE(M) associated with theDFA, M , will notgive the desired solution forE0 after the firstset of applications of the reduction ruledescribed above.

The resulting system< E0 , . . . ,Er >

will contain relationships of the form,

Ei = ( Wi ) ⋅ Ei + Ui (§)

whereWi is some regular expression over Σ,andUi is a regular expression over

Σ ∪ E j : 0≤ j ≤ r and j=i

How can this expression be rewritten in anon-iterative form, i.e so thatEi doesnotoccur on the RHS?


In order to get some insight into the process,consider the diagrammatic representation ofthe identity (§) below

qi Ui

Wi

The language recognised fromqi is:concatenations of words in L( Wi )

concatenated withoneword in L(Ui ).

i.e. (Wi )* ⋅Ui .

This suggests theiterati ve relationship,

Ei = ( Wi ) ⋅ Ei + Ui

can be replaced by thenon-iterative form:

Ei = ( Wi )* ⋅Ui


Pictures aren’t ProofsThe discussion above motivates the state-ment of the second reduction rule which isused to reduceiterati ve relationships.

It did not constitute a formal proof.

The result we need is known as

Arden’s Rule

Let L(R) be a regular language described bythe iterative relationship

R = S⋅ R + T

whereS and andT are regular expressions.

a) R = S* ⋅ T is a solution ofR = SR+ T.b) If ε ∈ L(S), R = S* T is theunique solu-tion of R = SR+ T.


Proof of Arden’s Rule

a) It must be shown thatR = S* T, satisfiesL(R) = L(SR+ T), i.e.

S* ⋅T = S⋅( S* ⋅T ) + T

S⋅( S* ⋅T ) + T = S⋅S* ⋅T + T

= ( S⋅S* + ε ) ⋅ T = S* ⋅ T

b) Supposeε ∈ L(S) and L(V) = L(S⋅V + T).

We show L(V) = L(S* T).

L( V ) ⊆ L( S* T ):

Assume the contrary.Let w be ashortestword in

L(V) − L(S* T)

w ∈ L(V) = L(SV+ T). As w ∈ L( S* T )∴ w ∈ L(T) and w ∈ L( S⋅V )

∴ w = ws ⋅ wv, for wordsws ∈ L(S) and wv ∈ L(V).


Furthermore |ws| > 0, as ε ∈ L(S).

If wv ∈ L( S* T )thenw = wswv ∈ L(S S* T + T) = L( S* T ).Contradicting,w ∈ L( S* T ).

If wv ∈ L( S* T ),wv ∈ L(V) and |wv | < |w|,Contradictingw being a

shortestword in L(V) − L( S* T ).

∴ L( V ) ⊆ L( S* T ).

L( S* T ) ⊆ L( V )

The argument is similarly by contradiction.Let w be ashortestword in

L(S* T) − L(V)

w ∈ L(S( S* T ) + T). w ∈ L( SV+ T ).∴ w ∈ L(T) and w ∈ L( S⋅ S* ⋅T )

∴ w = ws ws* wt , for wordsws ∈ L(S), ws* ∈ L(S* ), wt ∈ L(T)


Again, |ws| > 0, as ε ∈ L(S).

If ws* wt ∈ L( V )thenw = wsws* wt ∈ L(S V+ T) = L( V ).Contradicting,w ∈ L( V ).

If ws* wt ∈ L( V ),ws* wt ∈ L(S* T) and |ws* wt | < |w|,Contradictingw being a

shortestword in L(S* T) − L(V).

∴ L( S* T ) ⊆ L( V ).

We hav eproved that if V satisfiesL(V) = L(SV+ T) whenε ∈ L(S),

then

L( V ) ⊆ L( S* T ) and L( S* T ) ⊆ L(V).

i.e. L( V ) = L( S* T ).


Necessity of Conditionε ∈ L(S)If ε ∈ L(S), then there is nounique solution,R, such thatL(R) = L(SR+ T).

Let L(S) = L(X) ∪ ε , with ε ∈ L(X).∀ Y : L(Y) ⊆ Σ*

R = S* ⋅T + X* ⋅Yis a solution ofR= S⋅R + T.

Proof: Need to show,

L( S* T + X*Y ) = L( S⋅(S* T + X*Y) + T )

The right-hand side is:

= S⋅S* ⋅T + T + ( X + ε )⋅X* ⋅Y= ( S⋅S* + ε )⋅T + ( X⋅X* + X* )⋅Y (†)

= S* T + X*Y

Notice that the derivation of (†) requiresS = X + ε , without which the final line ofthe derivation can only be reduced to

S* ⋅ T + X* ⋅ Y = S* ⋅ T + X ⋅ X* ⋅ Y

whose unique solution isY = ∅.


Summary

To obtain a solution forE0 in the systemE( M ) = < E0, E1 , . . . ,Ek >

i.e. a regular expression, R, such thatL(R) = L(M)

Repeat the following 2 steps until a regularexpression forL( E0 ), over Σ is obtained.

1) Construct afully substituted system,

E’( M ) = < E0’ , E1’ , . . . ,Er’ >from E(M) (r’ ≤ k).

2) Apply Arden’s Rule to remove iterati verelationships. Applythe substitution rule, tothe non-iterative relationships that resultfrom this.

Of course, the standard simplifications usingproperties of regular expressions can beemployed at any stage to obtain more man-ageable forms.


Example

The fully substituted example was:

E0 = 0⋅E0 + 100E6 + 101E3 + 11E4 + ∅E3 = 0⋅E3 + 1 ⋅ E4 + εE4 = 0⋅E5 + 1 ⋅ E4 + εE5 = 0⋅E5 + 1 ⋅ E6 + εE6 = ( 0+ 1 )⋅E6 + ∅

Applying Arden’s Rule toE6:

E6 = ( 0 + 1 )* ⋅ ∅ = ∅

The 6-substituted system resulting,

E0 = 0⋅E0 + 101E3 + 11E4 + ∅E3 = 0⋅E3 + 1 ⋅ E4 + εE4 = 0⋅E5 + 1 ⋅ E4 + εE5 = 0⋅E5 + ε


Applying Arden’s Rule toE5,

E5 = 0* ⋅ ε = 0*

Substituting forE5:

E0 = 0⋅E0 + 101E3 + 11E4 + ∅E3 = 0⋅E3 + 1 ⋅ E4 + εE4 = 1 ⋅ E4 + 0⋅0* + ε

Applying Arden’s Rule toE4,

E4 = 1* ⋅ ( 00* + ε ) = 1* ⋅ 0*

and substituing forE4,

E0 = 0⋅E0 + 101E3 + 11⋅1* ⋅0*

E3 = 0⋅E3 + 1 ⋅ 1* ⋅ 0* + ε

Applying Arden’s Rule toE3, giv es

E3 = 0* ⋅ ( 1 ⋅ 1* ⋅ 0* + ε )

= 0* ⋅ 1 ⋅ 1* ⋅ 0* + 0* ⋅ 0*

= 0* ⋅ ( 1 ⋅ 1* + ε ) ⋅ 0*

= 0* ⋅ 1* ⋅ 0*


Substituting forE3,

E0 = 0⋅E0 + 101⋅0* 1* 0* + 11⋅1* ⋅0*

= 0 E0 + 1⋅( 010* + 1 )⋅1* 0*

Finally, applying Arden’s Rule toE0

E0 = 0* ⋅ 1 ⋅ ( 010* + 1 ) ⋅ 1* ⋅ 0*

Thus, LX, the language recognised by theautomaton first introduced on p.38, is thatdefined by the regular expression

0* ⋅ 1 ⋅ ( 010* + 1 ) ⋅ 1* ⋅ 0*

With this example the proof of Theorem 4,

L ⊆ Σ* is a regular set if and only if there isa DFA, M , for whichL( M ) = L.

is complete.


Summary

Theorems (1-4) have established that forL ⊆ Σ* , the following areequivalent

a) ∃ DFA, M , for whichL( M ) = L.b) ∃ NDFA, M , for whichL( M ) = L.c) ∃ ε − NDFA, M , for whichL( M ) = L.d) ∃ RLG, G, for whichL( G ) = L.e) ∃ reg. expr.,R, for whichL(R) = L.

(a) ≡ (b) Theorem 1

(b) ≡ (c) Theorem 2

(a) ≡ (d) Theorem 3

(a) ≡ (e) Theorem 4

It should be noted that the conversion fromautomata to regular expressions in Theorem4, may be applieddirectly to NDFA (the useof DFA in the proof is merely for simplifica-tion).

For ε -NDFA, however, to ensure the unique-ness of solutions resulting from Arden’sRule,ε -loops ought to be removed.


COMP209


Section 4

Properties ofRegular Languages

Periodicity,Closure, and Decision

124 Properties of Regular Languages

Limitations of Finite AutomataPeriodicity and the Pumping Lemma

SupposeM is a DFA with n states and thatL( M ) contains words,

w = σ1σ2. . .σ k,

whose length,k, is at leastn.

What can be deduced about the process bywhich M reaches an accepting state givensuchw?

Certainly, there is asequence ofk + 1 statesof M ,

q0 qσ1qσ2

. . .qσ k

with

qσ1= δ ( q0,σ1 )

qσ j= δ ( qσ j−1

,σ j ) 2 ≤ j ≤ k

which are traversed.

SinceM has onlyn states andk + 1 >n thissequence must containat least two occur-rencesof somestateq.

Properties of Regular Languages 125

q0qX

σ i

LOOP

σ j+1

σ k

Thus, there is some stateqX, entered withσ iand re-entered withσ j : i.e.qσ i

= qσ j.

Taking this view, we can regard w as dividedinto 3 parts:

σ1σ2. . .σ i σ i+1

. . .σ j σ j+1. . .σ k

x y z

From our assumptions,w = x⋅y⋅z ∈ L(M).

It must also be the case, however, that:

x⋅ z ∈ L( M ) (by ignoring LOOP)

x⋅y⋅y ⋅ z ∈ L( M ) (by going through LOOP twice)

x⋅yk ⋅ z ∈ L( M ) (by going through LOOP k times)


The Pumping Lemma(for Regular Languages)

The informal development indicates that:

An n state DFA, M, that accepts words, w, oflength at least n,must accept all wordsx ⋅ yi ⋅ z, ∀ i ≥ 0, for some x, y, z, such thatw = x⋅y⋅z, | x⋅ y |≤ n, and |y| ≥ 1.

The formal statement ofThe Pumping Lemma

for Regular Languages,as this is known, is

Let L be a regular language. There is acon-stant, m, such that if w ∈ L with |w|≥ m,then w may be written asx⋅y⋅z where|x⋅y| ≤ m, |y| ≥ 1, and∀ i ≥ 0, x⋅yi ⋅z ∈ L


Proof of Pumping Lemma

Let L be a regular language andM a DFAwith L( M ) = L. Fix m = |Q| and let

w = σ1. . .σ k ∈ L k ≥ m.

There must be positionsi and j within w s.t.

δ * ( q0,σ1. . .σ i ) = qX

= δ * ( q0,σ1. . .σ i

. . .σ j )

Let i be the smallest such index. Sincew ∈ L

δ * ( qX,σ j+1. . .σ k ) ∈ F

Setting

x = σ1. . .σ i ; y = σ i+1

. . .σ j ; z = σ j+1. . .σ k

| x⋅ y |≤ m ; |y|≥ 1 ; w = x⋅y⋅z

Since for allt ≥ 0;

δ * ( q0, x ) = δ * ( q0, x⋅yt ) = qX

∴ δ * ( q0, x⋅yt ⋅ z ) ∈ F

i.e. x⋅yt ⋅z ∈ L, ∀ t ≥ 0.


Applications

The Pumping Lemma, provides a very pow-erful tool with which to demonstrate thatspecific languages arenot regular.

Note that, although it has been hinted thatthere are languages thatcannot be recog-nised by DFA, we hav egiven no concreteevidence of this fact.

With the property of regular languagesdescribed by the Pumping Lemma, we arenow able to provide such evidence.

In particular, we can make precise the asser-tion at the start of the module concerning

L(d) ⊂ 0, 1 * = w : w = Reverse(w)

L(e) ⊂ 0, 1 * = w : w has equal numbers of 0s and 1s

L(g) ⊂ 1 * = w : |w| is a prime number

L(h) ⊂ 0, 1 * = w: w = 1i0 j and j = i2


Proving L is not RegularUsing the Pumping Lemma

Suppose we have been given a description ofsome languageL ⊆ Σ* .How may it be shown, using the PumpingLemma, thatL is not regular?

Certainly, if L is not regular, then it mustcontainarbitrarily long words.(Exercise:Why?)

The argument proceeds by contradiction:AssumeL is regular.

a) Given any constant, m ≥ 1, choosesomeword in w ∈ L with |w| ≥ m.b) Given any partition of w into x, y, and zfor which:

| x⋅y |≤ m ; | y |≥ 1 ; w = x ⋅ y ⋅ zprove that for some t ≥ 0, the word x⋅yt ⋅zcannot belong toL.

If both are possible: any DFA acceptingL,alsoaccepts words thatare not in L.


Some Examples

Example 1:L(d) ⊂ 0, 1 * = w : w = Reverse(w)

is not a regular language.

Proof: For any constant m, letw = 0m 1 0m ∈ L(d). For any partition of winto x, y, z for which |xy| ≤ m |y| ≥ 1 andw = xyz, we must have

z = 0m−|xy| ⋅ 1 ⋅ 0m

∴ w = 0|x| ⋅0|y| ⋅ 0m−|xy| ⋅ 1 ⋅ 0m

Now chooset = 0 and then0|x| 0m−|xy| 1 0m = 0m−|y| 1 0m ∈ L(d)

Example 2:

L(e) ⊂ 0, 1 * = w : w has equal numbers of 0s and 1s


Proof: Exercise. (Usew = 0m ⋅ 1m ∈ L(e)

to construct the counterexample).


Example 3:L(g) ⊂ 1 * = w : |w| is a prime number


Proof: Given m, let p be any prime numbersuch thatp> m, and setw = 1p ∈ L(g). Con-sider anyx, y, z for which

| xy | ≤ m ; | y |≥ 1 ; x ⋅ y ⋅ z = 1p

x, y, z can be written as,1p = 1|x| ⋅ 1|y| ⋅ 1p−|x⋅y| = 1|y| ⋅ 1p−|y|

Now set t = p − |y|, to give1|x| ⋅ 1(p−|y|)|y| ⋅ 1p−|x|−|y| = 1(p−|y|)(p−|y|)|y| ∈ L(g)

Example 4:L(h) ⊂ 0, 1 * = w: w = 1i0 j and j = i2


Proof: Exercise.(Usew = 1p0p2

, for p> m).


Non-trivial Exercise(Maths. and Joint Maths/C.S. Only)

Suppose we view w ∈1 ⋅ 0, 1 * as thebinary r epresentation of some naturalnumber,bin( w ) ∈N.

[ The condition thatw starts with a 1 is toensure that every natural number has aunique representation.]

Show that, the language,PRIMES

w ∈ 1 ⋅ 0, 1 * : bin(w) is a prime number


Note that whereas,L(g) uses aunary encod-ing system for numbers,PRIMES uses abinary encoding and it cannot be deducedthat PRIMESis not regular fromL(g) beingso.

[ Hint: Use Fermat’s Theorem:2p−1 ≡ 1 mod p for all primesp> 2.]


Closure Properties

SupposeL1 and L2 are both regular lan-guages over Σ.

We know, from the definition ofregular lan-guage, that:

L1 ∪ L2

L1 ⋅ L2

( L1 )*

are also regular languages.

What, however, can we say about, e.g.:

L1 ∩ L2

Co− ( L1 )

etc

?


In general, suppose

Ψ : (℘( Σ* ))k → ℘( Σ* )

is an arbitrary operation defining some lan-guage over Σ from any collection of k ≥ 1languages over Σ.

For example: Ψ = ∪ with k = 2, Ψ = Co−with k = 1.

The class of properties which we are con-cerned with here, are known as:

Closure Properties of(Families of) Languages


Formally,

A family of languages over Σ, is a subset ofall possible languages over Σ. We use ℜ todenote an arbitrary family so that

ℜ ∈ ℘ ( ℘( Σ* ) )

A family, ℜ is said to beclosed under an operation

Ψ of k arguments if

∀ < L1, L2 , . . . ,Lk > ∈( ℜ )k

Ψ( L1, L2 , . . . ,Lk ) ∈ ℜ

That is, applyingΨ to any collection of klanguages inℜ, always produces some lan-guage inℜ.


Example

For the family,

Reg = L ⊆ Σ* : L is a regular language

a) Regis closed under∪.i.e. the union of regular languages is a regu-lar language.

b) Regis closed under concatenation, (⋅).i.e. the concatenation of regular languages isa regular language.

c) Regis closed under* .i.e. The *-closure of a regular language is aregular language.

The questions introduced above can bephrased as:

Is Regclosed underintersection(∩)

Is Regclosed undercomplement(Co−)?


Theorem 5:

a) Regis closed under complement.b) Regis closed under intersection.

Proof:a) Let L ∈ Reg. From Theorem 4, there is aDFA,

M = ( Q, Σ, q0, F ,δ )for which L( M ) = L.

The DFA, MCo = ( QCo, Σ, qCo0 , FCo,δCo )

withL( MCo ) = w ∈Σ* : w ∈ L = Co− ( L )

is formed byQCo = Q; qCo

0 = q0; δCo = δ ;and

FCo = qCok ∈QCo : qk ∈ F

i.e. q is an accepting state inQCo if and onlythe corresponding state inM is not anaccepting state.


It is obvious that:

L( MCo ) = w ∈Σ* : δ *Co( qCo

0 , w ) ∈ FCo

= w ∈Σ* : δ * ( q0, w ) ∈ F

= Σ* − w ∈Σ* : δ * ( q0, w ) ∈ F

= Co− ( L( M ) ) = Co− ( L )

b) Since ∪, ∩,Co− defines a Booleanalgebra with respect to sets of words over Σ,from De Morgan’s Laws:

Co− ( L1 ∪ L2 ) = Co− ( L1 ) ∩ Co− ( L2 )

L1 ∩ L2 = Co− ( Co− ( L1 ) ∪ Co− ( L2 ) )


(Non-trivial) ExerciseGive a direct construction of aDFA, M∩,recognisingL1 ∩ L2 from DFA M1 and M2with

L( M1 ) = L1 andL( M2 ) = L2.i.e. without using DeMorgan’s Laws.

Hint: If,M1 = ( Q1, Σ, q1

0, F1,δ1 )M2 = ( Q2, Σ, q2

0, F2,δ2 )

consider theDFA,M∩ = ( Q∩, Σ, q∩

0 , F∩,δ∩ )

For whichQ∩ = Q1 × Q2 ; q∩

0 = < q10, q2

0 >;andδ∩(< qi , q j > ,σ ) =< δ1(qi ,σ ),δ2(q j ,σ ) >

If F∩ = F1 × F2, what is

w ∈ Σ* : δ *∩( < q1

0, q20 > , w ) ∈ F∩ ?


Some More Closure Properties

We giv e two further properties under whichregular languages are closed.

The first of these,substitution, provides auseful mechanism for mapping between dif-ferent alphabets.

The secondquotient illustrates that closurepropertiesdo not require explicit effectiveconstructions in order for the property tohold.


Substitution

Let Σ1, and Σ2 be alphabets;℘( Σ*

2 ) the set ofall languages over Σ2.

A substitution function, f , is a mappingfrom symbols in Σ1 to languagesover Σ2,i.e.

f : Σ1 → ℘( Σ*2 )

f is extended to f (word) mapping fromwords over Σ1, by

f (word)( w ) =

ε if w = εf ( σ ) if w = σ ∈Σ1

f ( σ )⋅ f (word)( u ) if w = σ ⋅u ∈Σ*1

Finally, f (word) is extended f (lang) mappingfrom languagesover Σ1 to languagesoverΣ2 by,

f (lang) ( L ) =w ∈ L∪ f (word)( w )


Despite the, superficially, inv olved definitionof substitution, the proof of the followingresult is quite easy.

Theorem 6:Let f : Σ1 → ℘( Σ*

2 ), be such that∀ σ ∈Σ1 f ( σ ) ∈ Reg.

If L is a regular language over Σ1 thenf (lang)( L ) is a regular language over Σ2.

Proof: (Outline)Let R be a regular expression over Σ1 forwhich L( R) = L. Let Rσ be the regularexpression over Σ2, for the languagef ( σ ).

To obtain a regular expression R( f ), forf (lang)( L ), replace each occurrence of thesymbol σ in R by the regular expression,Rσ .


It is easy to show, that

L( (S+ T) f ) = L( Sf ) ∪ L( T f )

L( (S⋅T) f ) = L( Sf ) ⋅L( T f )

L( (S* ) f ) = L( ( Sf )* )

i.e. if R ∈ S+ T, S⋅T, S* , where S and Tare regular expressions, then the languagedescribed by applying the substitutionf toS and T separately, is exactly the same asthe language obtained by applyingf to thecombination of these.

An easy induction on the number of opera-tions definingR completes the proof.


Example

With Σ1 = 0, 1 , Σ2 = a, b .

R= 0⋅1 ( 0⋅0+ 1⋅1 )* ⋅1 ⋅ 0

Let f : 0, 1 → ℘( a, b * ) be giv en by:

f ( σ ) =

( a⋅a ) if σ = 0

( b⋅b⋅a ⋅a)* if σ = 1

Then,R f

= (aa)(bbaa)* ((aaaa) + (bbaa)* (bbaa)* )* (bbaa)* (aa)

= (aa)(aaaa+ (bbaa)* )* (aa)

w = 0110∈ L(R);

f (word)( w ) ∈℘( a, b * )

= aa(bbaa)* (bbaa)* aa= aa(bbaa)* aa

andL( aa(bbaa)* aa) ⊂ L( R f ).


Exercise

Let Bk be the alphabet containing 2k sym-bols Bk = 0, 1, 2 , . . . , 2k − 1 e.g.B4 = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A. B,C, D, E, F

Any word, starting with a symbol other than0 in B*

k can be interpreted as theuniquebase 2k representation of some naturalnumberbasek( w ) ∈N.

Using the fact that the languagePRIMES

w ∈ 1 ⋅ 0, 1 * : bin(w) is a prime number

defined on p.130 is not regular, show that∀ k > 1, the languagePRIMES_base_k,

w ∈ B*k : w = σ ⋅ u,σ ∈ Bk−1, basek(w) is a prime number

is not a regular languagewithout using thePumping Lemma.

[Hint: Use an appropriate substitution,f : Bk → ℘( 0, 1* ).

]


Quotient

Let L1 and L2 be languages over the samealphabetΣ.

The quotient of L1 with respect to L2,(denotedL1 / L2), is

L1 / L2 = v : ∃ u ∈ L2 such thatv ⋅ u ∈ L1

In other wordsL1/L2 comprises those words(v) for which there issome word u ∈ L2such that concatenatingv andu gives a wordin L1.

ExampleΣ = 0, 1, 2, 3.

L1 = L( (00+ 11)* ⋅(22+ 33)⋅2* )L2 = L( 33⋅2* ).

L1/L2 = L( ( 00 + 11 )* )since given any w ∈ L( (00+ 11)* ),w⋅33 ∈ L1 and 33∈ L2.

[N.B. In general, (L1/L2)⋅L2 = L1 ]


Theorem 7:Let L1 be a regular language over Σ and L2beany language over Σ.

L1/L2 ∈ Reg.

Proof:Let M = ( Q, Σ, q0, F ,δ ) be a DFA withL(M) = L1. DefineFquot to be the subset ofQ such that:

Fquot = q ∈Q : ∃ u ∈Σ* such thatδ * ( q, u ) ∈ F

The,DFA, M’ , giv en by

( Q, Σ, q0, Fquot,δ )

is such thatL( M’ ) = L1/L2.

To see this observe that,

w ∈ L( M’ ) ⇔ δ * ( q0, w ) = q ∈ Fquot

⇔ ∃ u ∈ L2 such thatδ * ( q, u ) ∈ F

⇔ δ * ( q0, w⋅u ) ∈ F )

⇔ w ⋅ u ∈ L1 whenu ∈ L2

⇔ w ∈ L1/L2


Commentary on the Proof

A ‘ problem’ with the argument above is thatit is not constructive, i.e. given L1 and L2we are not toldhow to identify the setFquotthat will form the set of accepting states inM’ .

The proof relies simply on the observationthat, irrespective of the languageL2, thereexists such a subsetamong the states ofM ,the DFA recognising L1. (Of course thissubset may be anything from the empty setto every state inQ).

Notice that inspectingev ery state q ∈Q inturn will effect aconstructive solution pro-vided that it is possible correctly to deter-mine the following:

Is w ∈Σ* : δ * ( q, w ) ∈ F ∩L2 = ∅? (*)


Decision Methods for Regular Sets

In discussing how a DFA recognisingL1/L2could be constructed, it was noted (in (*))that this was possible in those cases forwhich it coulddecidedif the interesction oftwo languages was empty or not.

Given a DFA, M = ( Q, Σ, q0, F ,δ ), the lan-guage,

Li = w ∈Σ* : δ * ( qi , w ) ∈ F

is a regular language.

From Theorem 5(b), we know that the inter-section of regular languages is a regular lan-guage.

∴ A DFA, M , acceptingL1/L2 when bothL1, L2 ∈ Reg can be explicitly constructedif:

There is an algorithm which given adescription of a regular language,L, asinput, returnstrue if L = ∅ and false other-wise.


Decision Questions for Languages

There are many questions of the form

‘Does a given language L, have a particularproperty of interest?’

In dealing with suchlanguage propertieswe are generally interested in 2 types ofresult:

a) Positive (algorithmic) results: a descrip-tion of an algorithm that takes as input a(finite) description of a languageL return-ing true if and onlyL has the property con-cerned.

b) Negative (‘undecidability’) results: a for-mal proof that no algorithm for deciding theproperty concerned is possible.

We shall deal with the latter category muchmore extensively in the final section of thismodule.


Properties of Interestfor Regular Languages

Assume, without loss of generality, that topresent afinite description of some regularlanguage,L, a regular expression,R, withL(R) = L is used.

Three basic questions we can seek algo-rithms for are:Given R a regular expression over Σ:

Is L( R) = ∅?Is L( R) afinite language?

Is L( R) an infinite language?

The relationships between the languagesdescribed by different regular expressions,R1 and R2 also present important decisionquestions:Given R1, R2 regular expressions over Σ:

Is L( R1 ) = L( R2 )?Is L( R1 ) ⊆ L( R2 )?

We shall show thatall of these properties forregular languages have decision algorithms.


Deciding if L isEmpty, Finite, or Infinite

Theorem 8:Let M = (Q, Σ, q0, F ,δ ) be aDFA.

a) L( M ) = ∅ ⇔ ∃ w ∈ L(M) s. t. |w| < |Q|

b) L( M ) is infinite ⇔ ∃ w ∈ L(M) s. t. |Q| ≤ |w| < 2|Q|

Proof:a) That L(M) is non-empty if it acceptssomethingis obvious. SupposeL( M ) = ∅,let w be a shortest word inL(M). From thePumping Lemma |w| < |Q| otherwise wecould write w = xyz with |y| ≥ 1 andxz∈ L(M), contradicting the choice ofw.

b) If w ∈ L(M), with |Q| ≤ |w| < 2|Q|, thenthe Pumping Lemma shows L(M) to be infi-nite. If L(M) is infinite, letw be the shortestword in L(M) of length≥ |Q|. If |w| ≥ 2|Q|the Pumping Lemma gives a contradictionsince we can writew = xyz, |xy| ≤ |Q|,|y| ≥ 1 and xz∈L(M).


Consequences of Theorem 8

From Theorem 8(a), we get a (not verygood) algorithm to test ifL(M) = ∅:

Generate each word over Σ of length up to|Q| − 1, and test ifM accepts any. If all arerejected thenL( M ) = ∅.

And similarly, from Theorem 8(b), another(not very good) algorithm to test ifL(M) isinfinite.

Generate each word over Σ of lengthbetween |Q| and 2|Q| − 1. If all are rejectedthenL(M) is finite.


(Not too difficult) Exercise

Describe a more efficient algorithm for test-ing if L(M) = ∅ which works by construct-ing the set of states thatcould be reachedfrom the initial state.

[ Obviously if no q ∈ F can be reached thenL(M) = ∅.].

(Slightly more difficult) Exercise

Using a similar approach, describe a moreefficient algorithm for testing ifL( M ) isinfinite.

[ Consider constructing (non-simple) pathsof states starting fromq0 which containtwooccurrences of some state inQ and end withsome state inF .]


Comparison of Languages Accepted by DFA

We now giv e two easy constructions fordeterminining ifDFA M1 andM2 satisfy

L( M1 ) = L( M2 )L( M1 ) ⊂ L( M2 )

To simplify notation, letL denoteCo− (L).

For the first construction, using Theorem5(a,b) build a DFA,M3 for which L( M3 ) is

( L(M1) ∩ L(M2) ) ∪ ( L(M2) ∩ L(M1) )

Then,L( M3 ) = ∅ ⇔ L( M1 ) = L( M2 )

For the second constructM3 so that,

L( M3 ) = ( L( M1 ) ∩ L(M2) )

If L( M3 ) = ∅ thenL(M1) ⊆ L(M2 ).Equality can be ruled out by using the firsttest described.


COMP209


Section 5

Construction and Uniqueness ofMinimum Number of States

Finite Automata

Minimum State DFA 157

A very important property of regular lan-guages is that for every regular languageRover Σ there is a ‘unique’ minimum num-ber of states, DFA, M , such thatL( M ) = R.

In addition, this automaton can be recoveredby an efficient algorithm from any DFArecognisingR.

Question

Why ‘unique’ rather thanunique?

Answer

Because choosing any renaming f : Q ↔ Qof the states of aDFA, M , obviously gives aDFA, M’ , recognising exactly the samelanguage asM .

158 Minimum State DFA

To make this concept of ‘unique’ precise,we can formally define such renaming pro-cesses via an equivalence relation≡iso, sothat

Let,M1 = ( Q1, Σ, q1

0, F1,δ1 )M2 = ( Q2, Σ, q2

0, F2,δ2 )

M1 ≡iso M2, if there exists a bijectionβ : Q1 ↔Q2 such that,

β ( q10 ) = q2

0

qk ∈ F1 ⇔ β ( qk ) ∈ F2

∀ σ ∈Σδ1( qi ,σ ) = q j

⇔δ2( β ( qi ),σ ) = β ( q j )

Thus, ‘unique’ is with respect to member-ship of an equivalence class under the rela-tion ≡iso.


Example

0, 1

00

0

0

1

1

11

101

0

q1

q6q5

q4

q3q2q0

0, 1

00

0

0

1

1

11

101

0

q6

q3q2

q1

q0q5q4


Example Continued

Using the bijection:

β ( q0 ) = q4

β ( q1 ) = q6

β ( q2 ) = q5

β ( q3 ) = q0

β ( q4 ) = q1

β ( q5 ) = q2

β ( q6 ) = q3

We hav e:

q10 = q0 ; q2

0 = q4 = β ( q10 )

F1 = q3, q4, q5

F2 = q0, q1, q2 = β ( q3 ), β ( q4 ), β ( q5 )

The transition functions meet the require-ments, so as is evident from the diagram, theautomata are equivalent under≡iso.


Overview of Mininimisation Algorithm

The key idea underlying the method fordetermining if aDFA, M , has the minimumnumber of states needed to recogniseL(M),is that of identifying sets ofindistinguish-ablestates inQ(M).

What is meant by two states -qi , q j of aDFA, M , being indistinguishable?

If there is somew ∈Σ* for which

δ * ( qi , w ) ∈ F

AND

δ * ( q j , w ) ∈ F

OR

δ * ( qi , w ) ∈ F

AND

δ * ( q j , w ) ∈ F

then, clearly, the statesqi and q j performdifferent roles withinM .

Formally, the language recognised withqi asinitial state, isdifferent from the languagerecognised withq j as initial state:w is amember ofexactly oneof these.


If, on the other hand,

∀ w ∈Σ*

EITHER

δ * ( qi , w ) ∈ F andδ * ( q j , w ) ∈ F

OR

δ * ( qi , w ) ∈ F andδ * ( q j , w ) ∈ F

then the language recognised starting fromqi is identical to that recognised startingfrom q j .

But if this is the case, why are two separatestates (qi andq j ) necessary inM?

The answer, of course, is that it isnot neces-sary: aDFA recognising the same languageasM is given by:

a) Deleting qi , q j f rom Q(M).b) Adding a new stateq i , j .c) Modifying δ : transitions into qi or q jbecome transitions intoq i , j ; transitionsfrom qi become transitions fromq i , j .

The new DFA recognisesL( M ) with oneless state.


Thus, we say that the statesqi and q j of aDFA, M , are indistinguishable if

Li = w ∈Σ* : δ * ( qi , w ) ∈ F

is the same as

L j = w ∈Σ* : δ * ( q j , w ) ∈ F

If Li = L j , then qi and q j are distinguish-ablestates inM .

It is obvious that in a minimum number ofstatesDFA, M , every state,qi of M , is dis-tinguishable from every other stateq j of M .


Detecting Indistinguishable State Sets

Let M = ( Q, Σ, q0, F ,δ ). Define a relation,∼, over pairs of states inQ by

qi ∼ q j if Li = L ji.e. if the two states are indistinguishable.

The relation∼ is an equivalence relation,i.e.

∀ i , j , k

qi ∼ qi

qi ∼ q j ⇔ q j ∼ qi

qi ∼ q j andq j ∼ qk ⇒ qi ∼ qk

The number of equivalence classes definedby ∼ for the DFA, M , will thus correspondto the number ofstatesin a minimisedDFAacceptingL(M).


Exercise

LetM = ( Q, Σ, q0, F ,δ )

be aDFA.

Using the results already proved that:

a) DFA accept exactly the class of regularlanguages (henceL(M) for any DFA M isdescribed by some regular expression).

b) there is an algorithm to test if two regularexpressions,S andT, describe the same lan-guage - i.e.L(S) = L(T).

describe a method of constructing the parti-tion of Q into the equivalence classesinduced by the indistinguishabilty relation∼.

[Note: This algorithm is far from being themost efficient approach that could be used.].


A better (in comparison with the exercisemethod suggested above) approach to con-structing the required partition ofQ is takesome initial ‘approximation’ to i t, and‘refine’ this until the final set of equivalenceclasses has been identified.

From our definition, we know that twostates, qi and q j are distinguishable ifLi = L j .i.e. there is some word, w ∈Σ* , belonging toexactly one ofLi andL j .

How long would a distinguishingw have tobe?

At most |Q| − 1. (Exercise:Why?)

The fact that there is anupper bound on thelengths of ‘distinguishing words’ indicatesthat the process below terminates.


k-indistinguishability

For a DFA, MM = ( Q, Σ, q0, F ,δ )

The statesqi , q j are 0-indistinguishableqi ∼0 q j

if both are inF or neither of them are.

Statesqi , q j arek-indistinguishable (k > 0)qi ∼k q j

if

∀ σ ∈Σ δ ( qi ,σ ) ∼k−1 δ ( q j ,σ )

and

qi ∼0 q j

qi is k-distinguishable from q j if it is notthe case thatqi ∼k q j .

Thus, ifqi ∼k q j then there is no word, w, oflength ≤ k for which exactly one ofδ * ( qi , w ) ∈ F , δ * ( q j , w ) ∈ F holds, i.e.Liand L j containexactly the samesubset ofwords of length≤ k.


It should be clear thatqi ∼ q j if and only if∀ 0 ≤ k < |Q| qi ∼k q j

This gives the following procedure for deter-mining the equivalence classes forM underthe relation∼, which works by refining thepartition of Q induced by∼k to form thatinduced by∼k+1.

We usePk = < C1; C2 ; . . . ; Cm >

to denote the equivalence classes induced by∼k. Thus

Ci ⊆ Q and∀ q, q’ ∈Ci q ∼k q’.

In the algorithm description,Ci will bereferred to a ablock of the partitionPk.


State Partitioning Algorithm

1) k : = 0; P0 : = < Q − F ; F >

2) k : = k + 1;

3) Form the partitionPk:

IfPk−1 = <C1 ; C2 ; . . . ; Cr >

then the two states,q, q’ are in the sameblockCi of Pk if and only if:

a) q and q’ are in the same blockC j ofPk−1.

AND

b) ∀ σ δ ( q,σ ) andδ ( q’,σ ) are in the sameblock,C j ,σ of Pk−1.

4) If Pk = Pk−1 go to (2).


Commentary

a) On each iteration, each block,Ci , of thepartition Pk, either remains unchanged or issplit into smaller sets which correspond toan equivalence class of (k + 1)-indistinguish-able states within thek-indistinguishableblockCi .

b) In implementing Step(3), the ‘obvious’approach of considering all pairs of distinctstates can be improved upon using appropri-ate data structures.

c) The final partition produced correspondsto the equivalence classes induced by therelation∼ onQ.

d) In constructing the final equivalentautomaton, we would always consider thatstates unreachable fromq0 are eliminated.


Example

q0 q1 q2 q3

q4 q5q6 q7

0

1

0

0

0

0

0

0

1

1

1

1

1

01

1

[ From Hopcroft and Ullman, 1979, p.68]


Initial Partition P0:

C0,1 C0,2

q0, q1, q3, q4, q5, q6, q7 q2

Formation of Partition P1:

Block C0,1

State 0-block 1-blockq0 C0,1 C0,1

q1 C0,1 C0,2

q3 C0,2 C0,1

q4 C0,1 C0,1

q5 C0,2 C0,1

q6 C0,1 C0,1

q7 C0,1 C0,2

Block C0,2


∴ P1 is

C1,1 C1,2 C1,3 C1,4

q0, q4, q6 q1, q7 q3, q5 q2


Formation of Partition P2:

Block C1,1


q4 C1,2 C1,3

q6 C1,1 C1,1

Block C1,2


q7 C1,1 C1,4

Block C1,3


q5 C1,4 C1,1

Block C1,4


∴ P2 is

C2,1 C2,2 C2,3 C2,4 C2,5

q0, q4 q6 q1, q7 q3, q5 q2


It is easy to check that no further changesoccur to this partition, i.e.P3 = P2.

So the minimised form of the exampleautomaton has5 states (rather than8) and isshown below:

q0,4

q3,5

q1,7 q6

q2

1

0 0 0

00

1

1

1

1


But, how do we know it is minimal?(The Myhill-Nerode Theorem)

The construction that we have just presentedtakes a given DFA, M recognisingL, and,by identifying equivalence classes of indis-tinguishable states, forms aDFA acceptingL but with (possibly) fewer states.

There are, however, infinitely many DFA,that recognise anyoneregular language,L.

Suppose we had started with a ‘different’DFA, M’ , for L.

QuestionCould it happen that the ‘reduced’DFAformed fromM’ has adifferent number ofstates compared to the ‘reduced’ form ofM?

AnswerNo (assuming states unreachable fromq0are removed from both automata).


Mor e Equivalence Relations

In order to establish uniqueness of the min-imised DFA, we introduce (another) equiv-alence relation over words in Σ* .

Let L ⊆ Σ* , the relation≈L between words inΣ* isx ≈L y ⇔ ∀ z ∈Σ* : ( x⋅z ∈ L ⇔ y⋅z ∈ L )

Notice ≈L is properly defined, since (atworst) eachw ∈Σ* could be the only mem-ber of its equivalence class.

For L ⊆ Σ* , we define Index( L ) to be thetotal number of equivalence classesinduced by≈L

Our statement of the next theorem is techni-cally different from the orignal form proved.The version we give is, however, triviallydeducible from the ‘usual’ form.


Theorem 9: (Myhill-Nerode Theorem)Let L be a regular language over Σ, and

M = ( Q, Σ, S, F ,δ )a minimum number of statesDFA withL( M ) = L.

Index( L ) = |Q( M ) |

Proof:a) |Q( M ) | ≤ Index( L )We may assume thatIndex( L ) is finite(otherwise the inequality is trivially correct),and letr = Index(L) with

<C1 ; C2 ; . . . ; Cr >the partition ofΣ* induced by≈L .

Let M’ = (Q, Σ, S, F ,δ ) be the DFA:

Q = q1, q2 , . . . ,qr

F = qi : L ∩ Ci = ∅

S = qi : ε ∈Ci

δ ( qi ,σ ) = q j if ( ∀ w ∈Ci w⋅σ ∈C j )


Note thatδ is well-defined since from thedefinition of≈L :

w1 ≈L w2 ⇒ ( ∀ σ ∈Σ w1⋅σ ≈L w2⋅σ )

We claim thatL( M’ ) = L,

L( M’ ) = w ∈Σ* : δ * ( S, w ) ∈ F

= w : ε ⋅ w ∈Ci : L ∩ Ci = ∅

∪ Ci

= L

So we have a Index(L)-state DFA, M’recognisingL.

b) |Q(M)| ≥ Index(L)Define the relation≈M over Σ* by

x ≈M y ⇔ δ * ( q0, x ) = δ * ( q0, y ).Obviously the number of equivalenceclasses induced by≈M is exactly |Q(M)|.

If we take any equivalence class,Si in theinduced partition ofΣ* , we hav e

∀ x, y ∈ Si ∀ z ∈Σ*

x⋅z ≈M y⋅zand

x⋅z ∈ L ⇔ y⋅z ∈ L


Thus,x ≈M y ⇒ x ≈L y

showing thatthe number of equivalence classes of≈M

i.e. |Q(M)|is at least as large as

the number of equivalence classes of≈Li.e. Index( L ).

So we have both,|Q( M )| ≤ Index( L )

and|Q( M )| ≥ Index( L )

Hence,

|Q( M ) | = Index( L )


Uniqueness of Minimal State Construction

Theorem 10:Let

M’ = ( Qm, Σ, Sm, Fm,δ m )be theDFA resulting from the State Minimi-sation Algorithm when given

M = ( Q, Σ, S, F ,δ )where we assume thatQm has no unreach-able states.

|Qm | = Index( L( M ) ).

Proof: We know that |Qm|≥ Index( L(M) ).

Suppose |Qm | > Index( L(M) ) and let,

( D1 ; D2 ; . . . ; Dm )

( C1 ; C2 ; . . . ; Dr )

be the partitions ofΣ* induced by the rela-tions≈M’ and≈L(M) respectively.


Since,x ≈M’ y ⇒ x ≈L(M) y

there must be setsDi , D j , andCk for whichDi ∪ D j ⊆ Ck

Consider the corresponding statesqi andq jin Qm.

These must bedistinguishable (or the algo-rithm would have merged them into onestate). Thus,∃ w ∈Σ* δ * ( qi , w ) ∈ Fm ⇔ δ * ( q j , w ) ∈ Fmi.e.∃ x ∈ Di , y ∈ D j : x⋅w ∈ L(M) ⇔ y⋅w ∈ L(M)

But this contradictsx ∈Ck and y ∈Ck,since

x, y ∈Ck ⇒ x ≈L(M) y

⇔ ∀ w ( x⋅w ∈ L(M) ⇔ y⋅w ∈ L(M) )

This contradiction shows that we must have|Qm|= Index( L(M) ). .


Summary

a) The Minimisation Algorithms and its cor-rectness proof via the Myhill-Nerode Theo-rem complete our development of the firstpart of the module.

b) DFA and regular languages provide avery extensive collection of ideas, a numberof which will recur when examining more‘powerful’ ‘ black-box’ configurations.

c) Despite the flexibility of DFA we haveseen that there are some quite simple lan-guages which are beyond their recognitioncapabilities, e.g. palindromes, equal num-bers of zeros and ones.

d) In the next part of the module we con-sider a ‘natural’ enhancement ofDFA capa-bilities that doesextend the range of "suit-able" languages.


COMP209


Section 6

Context-Free Grammars

184 Contex-Free Grammars

Introduction

Over the next few lectures we examine asecond class of methods for describing andrecognising languages.

We introduce this by considering a simpleextension to the one class ofgrammars thathas been seen.

Later we shall see how this extension can beparalleled by a similarly simple extension tothe ‘black-box’ functionality offered byDFA.

Context-Free Grammars 185

Why another class of grammar?

Recall that the class of grammars corre-sponding toDFA — Right Linear Gram-mars — restrictproduction rules to theform:

Vi → σ ⋅ V j ; Vi → σwhereσ ∈T (i.e. aterminal symbol)

Consider an application such as

• Checking if astatementin aJava programis syntactically correct

Among the ‘sub-tasks’ that one might haveto carry out in order to do this, are:

a) Checking if arithmetic expressions are‘well-formed’;b) Checking if an "if .. then .. else" state-ment has correct nesting of sub-statements:i.e. that there are no unmatched or sym-bols.


Neither of these can be recognised usingRLGs.

Exercise:Both (a) and (b) involve recognis-ing properly balanced sequences of left andright brackets — ‘(’ and ’)’ in (a); ‘’ and‘’ in (b). Prove that these languages are notregular.[Hint: Use the Pumping Lemma and wordsof the form m⋅ m]

One of the most important applications ofFormal Grammars, in Computer Science, isas a means of providing afinite descriptionof all syntactically correct statementswithin a High-Level Programming Lan-guage.

Clearly, RLGs are insufficient for this pur-pose.

What is the ‘minimal’ extension in the formof allowed production rules that wouldremove this problem?


Context-Free Grammars

A context-free grammar (CFG),G = ( V,T, P, S), is a formal grammar inwhich all productions,p ∈ P take the form

Vi → wwhereVi ∈V andw ∈( V ∪ T )*

So while theleft-hand side of any produc-tion is (still) only allowed to containexactlyone non-terminal symbol, theright-handside may comprise an arbitrary word builtfrom terminal and non-terminal symbols.

It should be obvious that any languagedefined by aRLG (i.e. regular language) isdefined by aCFG (since RLGs are just arestricted type ofCFG), i.e.CFGs areat least as ‘expressive’ as RLGs.

Before examining their properties in greaterdetail, a simple example that CFGs aremore‘expressive’ than RLGs is given.


A Context-Free Grammar for aNon-regular Language

It was shown earlier (Ex. 1, p.130) that thelanguage,

L(d) = w ∈ 0, 1 + : w = Reverse( w )


L(d) is, however, generated by theCFGG = ( V,T, P, S) with

V = S ; T = 0, 1 andP having 6 production rules:

S → 0 S0 ; S → 1 S1

S → 0 0 S → 1 1

S → 0 ; S → 1

Exercise: Show, using induction on |w| ≥ 1,that S⇒(*)

G w if and only if w is a palin-drome, i.e.w = Reverse(w).


Derivation Trees

Given some description of a languageL ⊆ Σ* as a Formal Grammar, G, and aword, w ∈Σ* , the most basic question toaddress is

Is w ∈ L( G )?With the operational mechanisms providedby DFA and RLGs when L is a regular lan-guage such questions are ‘straightforward’.

In effect, if G is a RLG then the ‘chain ofproductions’ used to show S ⇒(*)

G w isimmediate (there is at most one non-terminalto expand at each step).

Derivations inCFGs are complicated by thepossible presence of several non-terminalsymbols in a production rule,

Derivation (a.k.a. Parse) trees provide ameans with which to illustrate how w isderived and suggest an automated method oftesting if w is accepted by a given CFG.This is important in Compiler Construction.


Let G = ( V,T, P, S) be a CFG.

A derivation tr eein G is a treeD, each ver-tex u of which has a labelλ( u ) satisfying:

1) ∀ u λ( u ) ∈V ∪ T ∪ ε .

2) λ( root of D ) = S.

3) If u is non-leaf vertex in D, thenλ( u ) ∈V

4) If λ( u ) = A ∈V and u has children< u1 , . . . ,un > (from ‘left-to-right’) then

A → λ( u1 ) ⋅ λ( u2 ) ⋅ . . . ⋅ λ( un ) ∈ P.

5) If λ( u ) = ε , thenu is a leaf and its parentin D has no other children.

Note Tw o or more vertices may have exactlythe same label fromV ∪ T ∪ ε .


Example 1

S

1

S

S

S

0

1

0

0 0

0

Derivation tree (for word 0010100) using‘Palindrome’G.


Example 2

op

opd

num

digit

6

3 5

+

*

E

E E

E Eop

opd opd

num num

digit digit

EE

E

E

op

opdop E

numopd +

*

num

opd

digitnum

digit

6

digit

3

5

Tw o derivation trees (for 6+ 3 * 5) in EXPRCFG.


Properties and Attributesof Derivation Trees

A derivation tree illustrates how a word wmay be derived from S( G ) using the pro-duction rules inP( G ).

The word produced by concatenating termi-nal symbols labelling the leaf vertices of aderivation tree (using the ‘natural’ left-rightordering) is called theyield of the tree.

[Hence the three examples have yields,0010100, 6+3*5, and 6+3*5].

There may be many derivation trees forGwith the same yield.


A sub-tree of a derivation tree is formedfrom any vertex of the tree together with allof its descendants.

A sub-tree whose root vertex is labelled witha non-terminalX ∈V is called anX-tree.

Obviously if W is a derivation tree withyield w, and there is a non-terminal labelXin W, then the yield,x, of the X-tree is asub-word of w, i.e. w = u ⋅ x ⋅ v for someuandv.

In the examples:

There arefour different S-trees in Example1 (including the entire tree). These haveyields

0, 101, 01010 and 0010100.In Example 2, each tree has5 E-trees. Thefirst with yields:

6, 3, 5, 3* 5 and 6+ 3 * 5.The second with yields,

6, 3, 5, 6+ 3 and 6+ 3 * 5.


We will merely state the following resultthat captures the precise connection betweenDerivation Trees and Context-Free Gram-mars.

Theorem 11: Let G = ( V,T, P, S) be aCFG andw ∈T* .

w ∈ L( G ) (i.e. S⇒(*)G w) if and and only if

there is a derivation tree in G which hasyield w.

The languages,L ⊆ Σ* for which there existsa context-free grammar, G, with L( G ) = L,are known as the

Context-Free Languages(CFL)

As we saw above,

Regular Languagesover Σ ⊂Context− Free Languagesover Σ


Simplification of CFGs

It may appear as if the extension in produc-tion rule forms from

V → σ ; V → σ ⋅ W (RLGs)to

X → w ∈( V ∪ T )* (CFGs)is rather ‘extreme’ in the sense of forming a‘natural hierarchy’ of grammar/languagetypes.

e.g. why not use an extension which boundsthe total number of variable symbols thatcan appear on the right-hand side of a pro-duction, so thatRLGs allowing at most oneare developed by allowing two, three, etc.

In fact, as we shall see as a consequence ofthe following processes, the class of CFGs,implicitly embody such an extension already.

i.e. an arbitrary CFG,G, can be expressed asa CFG, G’, where every production ofG’containsat most two variable symbols in itsright-hand side.


Before proving this result, we consider pro-cedures which allow ‘ redundant’ variableand terminal symbols to be removed fromany giv en CFG.

Let G = ( V,T, P, S) be a CFG and letX ∈V be some variable symbol inG.

How can it be determined ifX is ‘actuallyneeded’ with respect toL( G ) as defined bytheCFG, G?

We can identify 2 ‘obvious’ necessarycon-ditions:

a) ∃ u, v ∈(V ∪ T )* s.t.S⇒(*)G u ⋅ X ⋅ v.

b) ∃ w ∈T* s.t. X ⇒(*)G w.

i.e. (a) states that there is a derivation from S(the start symbol) that leads to some wordcontaing the variable X; (b) that there is aderivation from X that results in a wordcomprising only terminal symbols.


(a) and (b) do notguarantee that X isneeded inG, since the wordsu andv in con-dition (a) may involve redundant non-termi-nal symbols.

As a formal definition for a variable,X ∈Vin a CFG, G = (V,T, P, S) being ‘produc-tive’ we hav e:

A symbol X is productive in the CFGG = (V,T, P, S) if there are words w ∈T* ,u, v ∈( V ∪ T )* such that

S ⇒(*)G u ⋅ X ⋅ v ⇒(*)

G wA symbol which is not productive is calledredundant.

The following procedures construct aCFG,G’ = (V’,T ’, P’, S’) from G = (V,T, P, S)in such a way that,L(G’) = L(G) and withev ery symbol ofV’ being productive.


1) Vold: = ∅;2 Vnew: = X : X → w, for somew ∈T* ;3 Vold: = Vnew;4) Vnew: = Vold ∪ X : X → w, for w∈( T ∪Vold )* ;5) if Vnew=Vold then go to (3).6) V’: = Vnew

Procedure 6.1

The productions,P’ of G’ are those produc-tion G in which only symbols inV’ ∪ Toccur.


TheCFG, G’, generated fromG by the pro-cess above, ensures that each symbol in itsatisfies condition (b). To ensure condition(a) the following suffices:

1) V’ : = S;2) For each symbolX ∈V’ for each produc-tion X → w, add variable symbols inw to V’and terminal symbols inw to T ’.3) Repeat (2) until no changes occur inV’ ∪ T ’.

Procedure 6.2

Again the productions that are included arethose involving only symbols from the finalsetV’ ∪ T ’


SupposeG0 = (V0,T0, P0, S0) is a CFG andapplying Procedure 6.1 toG0 results in theCFG

G1 = ( V1,T1, P1, S1 )then applying Procedure 6.2 toG1 producestheCFG

G2 = ( V2,T2, P2, S2 ).

What properties doG2 and G1 have w.r.t.G0?

It is certainly the case that, ifL( G0 ) = ∅a) S0 = S1 = S2.[the start symbol is the same in all 3CFGs]b) V2 ⊆ V1 ⊆ V0; T2 ⊆ T1 = T0.c) ∀ X ∈V0, X ∈V1 ⇔ (∃ w ∈T*

0 X ⇒(*)G0

w)d) ∀ X ∈V1,

X ∈V2

⇔(∃ u, v ∈( V1 ∪ T1 )* s. t. S1 ⇒(*)

G1u⋅X⋅v

(a)-(d) are immediate from the definitions ofProc. 6.1 and 6.2.


Combining (c) and (d) we deduce that,

∀ X ∈V0 X ∈V2 ⇔ X is productive

∀ σ ∈T0 σ ∈T2 ⇔ (∃ u, v ∈T*0 S0 ⇒(*)

G0u⋅σ ⋅v)

i.e. G2 contains no redudant variables or‘unused’ terminal symbols.

We can interpret the operations of Proc. 6.1and 6.2. as follows:

Procedure 6.1., iterates ‘backwards’ startingfrom variables with productions yieldingwords containing only terminal symbols, sothatV1 ev entually contains all symbols inV0with derivations in P0 leading to terminalwords.

Procedure 6.2., iterates ‘forwards’ fromS1 = S0, the start symbol so thatV2 ev entu-ally contains all variables inV1 that ‘can bereached’ from the start symbol. Similarly, T2will contain all terminal symbols that canoccur in the words ofL( G0 ).


Example

Let G = (V,T, P, S) with V = A, B, S,T = a, and P = S→AB, S→a, A→a.

Applying Procedure 6.1, gives

V1 = S, A

P1 = S→a, A→a

Applying Procedure 6.2 leaves V2 = SandP2 = S→ a.

Note that the order of application is impor-tant: applying Procedure 6.2 first toG wouldleave a CFG with the symbolA that couldnot be eliminated by 6.1


Nullable Symbols,ε -Productionsand Unit Productions

We now turn to two further simplificationsthat can effectively be carried out onCFGs.

Consider the following cases forG = (V,T, P, S):

i) ε ∈ L( G );ii) ∃ X,Y ∈V s.t. X → Y ∈ P.

Intuitively, one would expect that:

In the former case, since ¬( S⇒(*)G ε ), any

symbolX ∈V for which X ⇒(*)G ε is ‘unnec-

essary’.

In the latter case, the productionX → Y,ought to be eliminable by ‘substituting’‘appropriate’ words over (V ∪ T) foroccurences ofX in P.

These intuitions are justified.


A production of the formX → ε is called anε-production.

A production of the formX → Y, (whereY ∈V) is called a

unit production .

Theorem 12:

Let G = (V,T, P, S) be aCFG, the languageL( G ) − ε is generated by someCFG,G’ = (V’,T ’, P’, S’) without ε -productionsor redundant symbols.

Proof: Given G, we say thatX ∈V isnullable

if X ⇒(*)G ε .

First we find all nullable symbols inG,

1) N0 : = X ∈V : X → ε ∈ P ;2) N1: = N0 ∪ X∈V: X→Y1 ⋅. .Yk and∀ i Yi ∈N0 ;3) if N1 = N0 then go(2).

Correctness of this method is obvious.


Next the productions,P of G, are modified,so that no derivation X ⇒(*)

G’ ε is possible inG’, for anyX ∈V’.

Suppose that,X → Y1 ⋅ Y2 ⋅ . . . ⋅ Yk

is a production inP.

In P’ this is replaced by (a set of) produc-tions

PX = X → z1 ⋅ z2 ⋅ . . . ⋅ zk using

1) If Yi ∈ N0 (i.e. Yi is not nullable),zi = Yi .2) If Yi ∈ N0, add both rules withzi = Yi andzi = ε to PX.3) If a ruleX → ε results (i.e. allYi are nul-lable) this isnot added toPX.


Example

If,X → A ⋅ B ⋅ C ⋅ A ∈ P

with A andC both nullable, thenP’ wouldcontainall of the following

PX =

X → ABCA, X → ABA,

X → ABC, X → AB,

X → BCA, X → BA,

X → BC, X → B

In generalX → Y1 ⋅ Y2 ⋅ . . . ⋅ Yk in which r Yisymbols where nullable would create 2r pro-ductions inP’ (2r − 1 if r = k) one for each(non-empty, for r = k) subset.

To complete the Theorem proof it suffices toapply Procedure 6.1 and 6.2 to the (ε -pro-duction free)CFG, noting that neither pro-cess introducesnew variable symbols orproductions.


Unit Production Elimination

Theorem 13:

Let G = (V,T, P, S) be aCFG, the languageL( G ) − ε is generated by someCFG,G’ = (V’,T ’, P’, S’) without ε -productions,unit productions or redundant symbols.

Proof: From Theorem 12, we may assumethat G contains noε -productions. We buildP’, the unit production free set, as follows.

P’ is initially set to contain, X → w ∈ P : w ∈V

i.e. allnon-unit productions inP.

Suppose∃ X,Y ∈V s.t. X ⇒(*)G Y,

[Note: this is easily tested for:recall G has no ε -productions, so ifX ⇒(*)

G Y, then some derivation must havethe form,

X ⇒G Y1 ⇒G Y2 ⇒G. . .⇒G Yk ⇒G Y

where theYi are all different symbols inV.]


If X ⇒(*)G Y, then P’ has added to it all pro-

ductions X → w : B → w ∈ P and w∈V

Let G’ = (V,T, P’, S) beCFG resulting.

Certainly if X → w ∈ P’ thenX ⇒(*)G w.,

i.e. if S⇒(*)G’ w thenS⇒(*)

G w.

The converse, that ifS⇒(*)G w thenS⇒(*)

G’ w,is established by considering aleft-mostderivation of w in G, i.e. one in which theleft-most variable is expanded at each step.The, somewhat tedious, argument whichshows any sequence of unit productions inthe former has a correspondingsingle non-unit production in the latter is omitted.

Again, to complete the proof, it suffices tonote that removal of redundant symbols viaProcedures 6.1 and 6.2. cannot create anyunit productions.


Normal Forms forContext-Free Grammars

While the mechanisms for simplifyingCFGs described above are of independentinterest in terms removing some ‘inefficien-cies’, the principal reasons for reviewingthese are:

a) It can be assumed, when considering anyCFG, G, for whichε ∈ L(G) thatG containsno redundant symbols,ε -productions, orunit productions.

b) To assist in proving that any CFG can beexpressed in aNormal Form.

The concept of aNormal Form, i.e. that agiven structure can be described using rulesthat obey precisely defined restrictions, isfundamental in Computer Science.


In addition to the mechanism that we areabout to describe, you may already have metthe idea of Normal Forms in,

Defining Boolean logic functionsof n arguments.

[Any such function can be expressed as a‘disjunction of elementary conjuncts’

(sum of products)‘conjunction of elementary disjuncts’

(product of sums)‘modulo 2 sum of products’

(Zhegalkin-Reed-Muller/Ringsum expansion)are 3 normal forms for Boolean functions]

In addition, there is an extensive dev elopedtheory of Normal Forms in the context of

Relational Database Design

that is of importance in identifying potentialsavings and improvements in the organisa-tion of data within such a system.


We will principally be concerned with therepresentation ofCFGs in

Chomsky Normal Form (CNF)but, for completeness, will mention (and nomore than) the other important Normal Formfor CFGs known as,

Greibach Normal Form

Theorem 14:(Chomsky Normal Form Theorem for CFGs)

Any context-free language,L, for whichε ∈ L, is generated by aCFG,

G = (V,T, P, S)in whichall productions are of the form,

X → Y ⋅ Z or X → σwhereX, Y, Z ∈V andσ ∈T.

[Note: X, Y, Z in X → Y⋅Z arenot requiredto be distinct variables ofV.]


Before proving this, it may be useful tohighlight some significant consequences ofthe Theorem:

Recall the opening discussion re.CFG sim-plification wherein the apparent ‘leap’ from

V → σ ⋅ W ; V → σ (RLGs)to

X → w ∈( V ∪ T )* (CFGs)was remarked.

Using CNF to describe any CFG, it is seenthat the former can be replaced by,

V → U ⋅ W or V → σi.e. the only change is to allow a single vari-able to be used instead of asingle terminalin the first production rule form.

A further important point concerningCNFis with respect to the form ofDerivationTr eesin G which is inCNF.


Each production inG is either aterminal oris expanded asexactly two variable sym-bols.

It follows from this, that any derivation treeis a binary tree: each vertex has exactly 1child (which will be a terminal symbol) orexactly 2 children (both of which will bevariables).

This fact means that thenumber of steps, k,in a derivation in G implies anupper bound(in terms of k) on the length of a wordthereby derived.

We, thus, have a mechanism for relating thenumber of variables in G, the number ofsteps in a derivation, and thelength ofwords inL( G ).

Using similar observations, resulted in ameans for proving that particular languagesare not regular (The Pumping Lemma).


Proof of Theorem 14

Let G = (V,T, P, S) be a CFG withε ∈ L( G ). Without loss of generality it maybe assumed thatG has noε -productions,unit productions, or redundant symbols.

Consider any production inP which violatesthe conditions ofCNF.

X → Y1 ⋅ Y2 ⋅ . . . ⋅ Ym m≥ 2 (6.1)

SupposeYi = σ ∈T, i.e. a terminal symbol.

Modify G by adding a new variableCσ to V,the productionCσ → σ to P, and changingthe (terminal)Yi in the production (6.1) tothe new (non-terminal)Cσ .

If G’ = (V’,T, P’, S) is the CFG resulting, itis obvious thatL(G’) = L(G).


Applying the process of replacing terminals(in productions such as 6.1) by (new) non-terminal symbols and adding appropriateproduction rules, it follows thatG = (V,T, P, S) eventually becomes aCFGG’ = (V’,T, P’, S) for which L( G ) = L(G’)and any productions that arenot in CNFhave the form,

X → Y1 ⋅ Y2 ⋅ . . . ⋅ Yn n ≥ 3 (6.2)

For each production of the form (6.2) intro-ducen − 2 newvariables

D1, D2 , . . . ,Dn−2

and replace the productionX → Y1 ⋅ Y2 ⋅ . . . ⋅ Yn

with the ‘chain’ of productions,


X → Y1 D1 ; D1 → Y2 D2

D2 → Y3 D3

D3 → Y4 D4. . . . . . Di → Yi Di+1. . . . . . Dn−2 → Yn−1 Yn

(6.3)

Let GC = ( VC,T, PC, S) be the final CFGresulting. For the replacement of (6.2) bythe set in (6.3) clearly,

X ⇒(*)GC

Y1 ⋅ Y2 ⋅ . . . ⋅ Yn

thus, L( GC ) = L(G’) = L( G ) and GC is inCNF.


Example

Using,G = (V,T, P, S) with

V = S, A, B

T = a, b

P = S→bA, S→aB,

A→bAA, A→aS, A→a

B→aBB, B→bS, B→b

Only the productionsA→ a, B → b arevalid CNF.

As a first step, we remove illegal occurencesof terminal symbols, to give

V’ = S, A, B,Ca,Cb

P’ = S→Cb A, S→CaB,

A→Cb AA, A→CaS, A→a

B→CaBB, B→CbS, B→b

Ca→a, Cb→b


Then we deal with the productions

A → Cb AA ; B → CaBB

to give

VC = S, A, B,Ca,Cb, D1, D2

P’ = S→Cb A, S→CaB,

A→CbD1, A→CaS, A→a

B→CaD2, B→CbS, B→b,

Ca→a, Cb→b,

D1→AA, D2→BB


Greibach Normal FormThere is another Normal Form forCFGswhich is of some theoretical interest, but wewill describe only for the sake of complete-ness.

Theorem 15:(Greibach Normal Form Theorem for CFGs)

Let L be aCFL with ε ∈ L. There is aCFG,G = (V,T, P, S) with L(G) = L and everyproduction inP having the form

X → σ w σ ∈T, w ∈V*

[ Notew ∈V* meansw canbeε ]

Proof: Omitted.

Algorithms for converting CFGs (ev en thosein CNF) to GNF are rather involved and thesimplest of these may exponentially increasethe size ofV.

This increase is avoidable using a moresophisticated algorithm.


COMP209


Section 7

Pushdown Automata

222 Pushdown Automata

Introduction

We hav eseen that,

Regular Languages

≡Languages recognised by DFA

≡Languages described by Regular Expressions

≡Languages generated by RLGs

(7.1)

By ‘minimally’ changing one of the restric-tions imposed on the form of productions inRLGs, i.e. permitting

V → U ⋅ W (U ,W variables)instead of

V → σ ⋅ W (σ terminal )

a class of languages (thecontext-free lan-guages) that properly contains the regularlanguages is obtained.

Pushdown Automata 223

So the ‘picture’ given in (7.1) has become

Regular Languages

⊂Context-Free Languages

≡????

≡Languages generated by CFGs

⊃Languages generated by RLGs

(7.2)

We now wish to consider theminimal‘black-box’ capabilities that are needed tocapture the class of Context-Free Lan-guages, i.e. to answer the question,

Regular Languages are to DFA

as

Context-Free Languages are to ????


If we consider the definition ofDFA given,one ‘obvious’ limitation of these is apparent:

A DFA can only ‘remember’ informationabout afixed, constantnumber of symbolsfrom any input word it is given.

Thus, if acceptance or otherwise ofw ∈Σ* ispredicated on precise relationships betweensub-words of w that may be of arbitrarylength and separated by an arbitrary dis-tance, then except in special cases, aDFAwill be unable to deal with these.

c.f. the derivation of the Pumping Lemmafor Regular Languages

e.g. an informal argument that the language 0m⋅1m : m≥ 1 is not regular, observes thata DFA has to recognise ‘how many 0s areseen’ before testing if the number of 1s‘matches’:m can be arbitrarily large, so the‘counting step’ cannot be done with a ‘finitememory’.


A more subtle limitation, but one which alsoarises from the ‘finite memory’ restriction isthe following:

The ‘processing’ of an input word w ∈Σ* israther ‘passive’: symbols are read in orderand used to decide the next machine state;there is no mechanism for rescanning sym-bols, recording these or some ‘transformed’version.

These observations suggest that in order toenrich the functionality ofDFA so that a‘machine model’ w ith the minimal capabil-ity to recogniseContext-Free Languagesresults, the ‘new’ machine class must havesome method of

Recording arbitrarilylarge amounts of information.

Of course, this capability must be limited tothat necessary to move from regular lan-guages toCFLs - i.e. it must not allow non-CFLs to be recognised.


Adding a Stack

The extension made toDFA, is to allow amemory (storage) facility that is used underthe following restrictions:

a) This storage is organised as astack.b) There is no limit on the capacity of this,although (obviously) only afinite amount ofspace will be used during asingle computa-tion.c) The input word will still be processed onesymbol at a time moving from left to right,and cannot be re-read.


Overview ofPushdown Automaton Organisation

x1 x2 x3 xnxk ...

# µ1 µ t

... ...

...

M

Input Read so Far

... ...

Input to Read

Stack Store


Example of M´s Structure

q0 q1

q2

(0,0,00)

(0,#,#0)

(1,0,ε)

(ε,#,#)

(1,0,ε)

Pushdown Automaton(PDA) Example M

A directed edge fromqi to q j labelled( σ ,γ , u ) indicates that in stateqi whenscanningσ with thesymbol at the stack topbeingγ , the next state could beq j with thestack top replaced by theword u.


Formal Definition of Pushdown Automaton

A pushdown automaton (PDA) isdescribed by a septuple,

M = ( Q, Σ, Γ,δ , q0, Z0, F )

where

Q: Finite set ofstates.Σ: Finite input alphabet.Γ: Finite stackalphabet.δ : Q × ( Σ ∪ ε ) × Γ → ℘( Q × Γ* ): State Transitionq0 ∈Q: Initial state.Z0 ∈Γ: Initial stack symbol.F ⊆ Q: Final states

It should be noted thatδ ( q,σ ,γ ) must be afinite subset in℘( Q × Γ* ).


Interpretation

Consider the definition ofδδ : Q × ( Σ ∪ ε ) × Γ → ℘( Q × Γ* )

SupposeM is in stateq the symbol beingscanned on theinput is σ , and the symbolat the topof the stack storage isγ .

δ prescribes as the outcome of this scenario:δ ( q,σ ,γ ) = ( qi1, u1 ) , . . . , (qi k

, uk ) where theqi j

are states inQ and theu j

words in Γ* .

A non-deterministic choice of one of thepairs ( qi j

, u j ) is made and then:

P1)γ is replacedby theword u j .P2) The state changes toq j .


ε-transitions

The interpretation above deals with the casewhere an input symbolσ is actually ‘pro-cessed’.

The transition function, however, allowsε -transitions:

δ ( q, ε ,γ ) = ( qi1, u1 ) , . . . , (qi k, uk )

As with ε -NDFA, an ε -transition can bechosen (non-deterministically), and the pro-cess P1 is carried out. The important distinc-tion in this case is that,

no input symbol is readso if the next input isσ and anε -transitionperformed, the next symbol to read isstill σ .


Important Features

PDAdefined here arenon-deterministicThere are important technical reasons whythis form is used.

As with NDFA w∈Σ* is accepted by aPDA, M , if there isat least onecomputa-tion of M on w which ends in some stateq ∈ F after scanning all ofw.

In order for a transition inδ ( q,σ ,γ ) to beapplicableboth of the following must hold:

σ is the next input symbol (ifσ =ε )γ is the ‘top of stack’ symbol.

The stack is empty at the start of a computa-tion, i.e. contains only the symbolZ0.

Γ (the stack alphabet) doesnot have to bethe same asΣ (the input alphabet)


Example

For the examplePDA,M = ( Q, Σ, Γ,δ , q0, #, F ) has

Q = q0, q1, q2 ; Σ = 0, 1

Γ = 0, # ; F = q2

δ is easily extracted from the diagram, and itshould be noted that there is a singleε -tran-sition available: from stateq1 when the topof stack symbol is the initial symbol #.

Given 000 111as input:

qi σ Read Rest Stack q j1 q0 0 ε 000111 # q02 q0 0 0 00111 #0 q03 q0 0 00 0111 #00 q04 q0 1 000 111 #000 q15 q1 1 0001 11 #00 q16 q1 1 00011 1 #0 q17 q1 ε 000111 ε # q2

So that 000111∈ L(M).

[Exercise:: What, in fact, isL( M )?]


Discussion

Comparing the capabilities ofPDA withε − NDFA( ≡ DFA ) it is seen that transi-tions in the latter dependonly on

the currentstate(q)the current input symbol (σ )

whether anε -transition is available.

Furthermore a transition has no ‘side-effects’: the ‘next’ state is chosen and the‘next’ input symbol read.

For PDA, transitions also depend onthe current stack top symbol (γ ).

and, as well as a ‘next’ state being chosen,next input symbol being scanned thesymbolat the stack top is replaced by a ‘new’ word .

Question: Are these justifiable as ‘minimal ’extensions to the functionality ofDFA?


(Partial) Answer(s)

Of course, one justification of the definitionof PDA is that which we will prove later, i.e.

Theorem 16: L ⊆ Σ* is a Context-Free Lan-guage if and only if there is aPDA, M , forwhich L( M ) = L.

Despite this result, however, some featuresof PDA (as defined) may appear ‘non-mini-mal’: e.g.

‘Minimal’ Extensions to DFA?

a) ‘non-determinism’ in the definition ofδ .b) ‘infinite’ storage capacity of the Stack.c) allowing arbitrary lengthwords (albeitspecified inδ ) to replace singlesymbolsonthe stack.


The presence of ‘non-determinism’ in thebasic definition has already been remarkedupon. That this is required will be shownformally in a later lecture.

As regards (b) - ‘infinite’ stack capacity - itwas noted that in any ‘effective’ computa-tion of a PDA, M , only a finite portion ofthis will be used: of course, by usingε -tran-sitions appropriately it is a trivial exericiseto design a (non-terminating)PDA thatincreases the size of stack on every move.

Equally, howev er, one could designε -NDFAwhich (in principle) could loop indefinitely.

Given that we are concerned with recognis-ing languages of finite length words, itonly matters that we have a model thatcanaccept such in a finite number of moves.

Note also,


Difficult(ish) Exercise:

A PDA with f (|Q|)-bounded stack, is one inwhich the

stack sizeis limited to f (|Q | ) for some (bounded)function f : N → N: e.g.

|Q|2, 2|Q|, etc,

Attempting to exceed this, causes an error(cf. having no available move in NDFA).

Show that the class of languages recognisedby

PDAwith f (|Q|)-bounded stack,is exactly

the class of regular languages.

[N.B. The stack bound is given as a functionof |Q| - the number ofstatesin M andnot asa function of the length of theinput wordbeing scanned. The latter ‘restriction’ (itisn’t!) makes no difference toPDA capabil-ities.]


The case of (c) - allowing arbitrary finitelength words to be placed on the Stack israther more complicated.

We state, without proof, the following resultwhich establishes that our model is, in fact,equivalent, to the model in which the size ofstack changes by±1 on each move:

For any PDA,M = ( Q, Σ, Γ,δ , q0, Z0, F )

there is aPDAM’ = ( Q’, Σ, Γ’ ,δ ’ , q’0, Z’0, F’ )

such thatL( M ) = L(M’ ) and for any( q j , u ) ∈δ ’ ( qi ,α ,γ ) ( α ∈Σ ∪ ε )

u ∈ ε ,γ ,γ ⋅ β ( β ∈Γ )

thus the Stack sizedecreasesby one (u = ε )

orincreasesby one (u = γ ⋅β )

orremains unchanged (u = γ )


Instantaneous Descriptions

GivenM = ( Q, Σ, Γ,δ , q0, Z0, F ) and w ∈Σ*

we need a way to describe:the current state ofM ;

the content of the Stack;how much ofw remains to be scanned.

An instantaneous descrption, (ID), of Mon w is atriple

I = ( q, w, u ) q ∈Q, w ∈Σ* , u ∈Γ* .q represents a current state ofM , w theunscanned input remaining, andu the cur-rent Stack content.

For I = ( qi ,σ ⋅w, u⋅γ ), J = ( q j , w, u v) twoIDs of M we write

I —M Jif ( q j , v ) ∈δ ( qi ,σ ,γ ), and

I —(*)M J

if there is a sequenceI = I0, I1 , . . . , I k = J

of IDs such thatI m —M I m+1 ∀ 0≤ m< k


PDA Accepting by ‘Empty Stack’

It is possible to consider an alternative con-cept of a PDA accepting an input wordw ∈Σ* .

Let I = ( q0, w, Z0 ) be the initial ID of M .

Definition: The language,L( M ) recognisedby thePDA,

M = ( Q, Σ, Γ,δ , q0, Z0, F )usingempty stackis

w ∈Σ* : ∃ q ∈Q, I —(*)M ( q, ε , Z0 ) ,

i.e. the set of inputs for which there is somesequence of moves in M , which read all ofw and lead to the stack in its initial (empty)condition.

Acceptance in this manner is clearly inde-pendent of which state ofq is reached, sowithout any loss, for acceptance by emptystack it may be assumed thatF = ∅.


Theorem 17: For any PDA, M1, withL( M1 ) defined ‘by final state’ as: w : ∃ J = (q, ε , u) s. t. I—(*)

M J and q∈Fthere is aPDA, M2, with L( M2 ) defined byempty stack and

L( M1 ) = L( M2 ).Proof: (Outline)Form the state set ofM2 by adding a newstateqerase to Q of M1. Then for stateq ∈ Fin M1 (i.e. final state), addε -transitions

δ ( q, ε ,γ ) = ( qerase,γ ) ∀ γ ∈Γand

δ ( qerase, ε ,γ ) =

( qerase, ε ) if γ = Z0

( qerase, Z0 ) if γ = Z0

i.e. on reaching a final state ofM1, M2 canenter qerase which simply usesε -moves toempty the Stack. Note, the definition of‘acceptance’ requires thatall of the input isread: if the stack is empty while part ofwremains, this isnot an accepting computa-tion.


Proof of Theorem 16

L ⊆ Σ* is a Context-Free Language if andonly if there is a PDA, M , for whichL( M ) = L. Only the construction of aPDAacceptingL( G ) defined by aCFG, G, ispresented. The formation of aCFG from aPDA, M , can be done by a technicallyopaque ‘simulation’ ofM ’s operation byappropriate production rules.

I) CFGs to PDA

Let L be aCFL and G = ( V,T, P, S) be aCFG with L( G ) = L − ε . We mayassume thatG is in CNF with no redundantsymbols.

We first construct aPDA,MG = ( QG, ΣG, ΓG,δG , q0, #, FG )

(accepting by final state) for whichL( MG ) = L( G ) = L − ε


Outline of Construction

Supposew = x1. . . xn ∈T* is a word we

wish to test for membership inL( G ) usingthePDA, MG

MG as built from G relies significantly (forits correctness) on non-determinism.

The key idea is to use the Stack to build up a‘guess’ for a possibleleft-most derivationS⇒(*)

G w, i.e. one in which the left-mostvariable symbol is expanded until a reducesto singleterminal.

Of course, this ‘guess’ will be consistentwith the productions ofG.

As each ‘new’ terminal appears in theguessed derivation it is compared with thenext input xi from w. Should these match,the process continues otherwiseM will haltin a non-accepting state.


If it is the case thatw ∈ L( G ) then certainlythere will be some (left-most) derivationS⇒(*)

G w, and thus, in MG there will besomecomputation onw that reaches a finalstate with all ofw having been read.

The is only, one, minor complication thatarises: ensuring when a productionX → σ isactivated, the symbolσ which will be com-pared with the next input symbol, is at thetop of the stack.

FromG = ( V,T, P, S) we defineMG = ( QG, ΣG, ΓG,δG , q0, #, FG )

with

ΣG = T;ΓG = V ∪ T ∪ # ;QG = q0, qOK ∪ qX : X ∈ΣG FG = qOK


We can viewδG as comprising three stages:

Initiating the guessed derivation.Expanding the guess until a terminal reached.

Checking this against the next input.

With the exception of the final stage, allparts are performed usingε -transitions.

Initiation :δG( q0, ε , # ) = ( qOK, #S)

Thus the start symbol ofG is written on theStack.

Derivation

Recall thatG is in CNF.

For each productionX →U ⋅ W in G, δGcontains a transition,

( qOK,W U ) ∈δG( qOK, ε , X )i.e. if the top of stack symbol isX ∈V ⊂ ΓthenMG may (non-deterministically) chooseto replaceX by the wordW U. (U , W ∈V).


The reason forre versing the order (fromU W to W U) is thatU will be the new topof stack symbol and so can be expanded atthe next move of MG.

Thus, the process is consistent with aleft-mostderivation in G.

For each productionX → σ of G, δG con-tains a transition,

( qσ ,σ ) ∈ δG( qOK, ε , X ).

Checking

This is only performed in the statesqσ , forσ ∈T and forms the only stage where theinput w is examined. The checking move issimply,

δ ( qσ ,σ ,σ ) = ( qOK, ε )Thus, compare the top of stack symbol tothe current input symbol; if these matcherase it from the stack, move to the nextinput symbol, and continue with the processof guessing a derivation.


Example

Using theCFG that we derived aCNF formearlier,

G = (V,T, P, S) with

V = S, A, B,Ca,Cb, D1, D2

T = a, b

P = S→Cb A, S→CaB,

A→CbD1, A→CaS, A→a

B→CaD2, B→CbS, B→b,

Ca→a, Cb→b,

D1→AA, D2→BB

MG = ( QG, ΣG, ΓG,δG , q0, #, FG )where

ΣG = a, b ;ΓG = a, b, S, A, B,Ca,Cb, D1, D2, # ;QG = q0, qOK, qa, qb ;FG = qOK

δG is shown in the diagram below:


q0

qok

qaqb

(ε , #, #S)(a, a, ε ) (b, b, ε )

(ε ,Ca, a)(ε , A, a)

(ε ,Cb, b)(ε , B, b)

(ε , S, BCa), (ε S, ACb), (ε , A, SCa), . . . ,

baab∈ L( G ), a leftmost derivation is,

S→Cb A→bA→bCaS→baS

→baCaB→baaB→baab


Example Continued

Given, baab as input toMG, this could becontructed by the sequence,

Move δ Stack Unread0 δ (q0, ε , #) = (qok, #S) # baab1 δ (qok, ε , S) = (qok, A Cb) #S baab2 δ (qok, ε ,Cb) = (qok, b) #ACb baab3 δ (qok, ε , b) = (qb, b) #Ab baab4 δ (qb, b, b) = (qok, ε ) #Ab baab5 δ (qok, ε , A) = (qok, SCa) #A aab6 δ (qok, ε ,Ca) = (qok, a) #SCa aab7 δ (qok, ε , a) = (qa, a) #Sa aab8 δ (qa, a, a) = (qok, ε ) #Sa aab9 δ (qok, ε , S) = (qok, BCa) #S ab10 δ (qok, ε ,Ca) = (qok, a) #BCa ab11 δ (qok, ε , a) = (qa, a) #Ba ab12 δ (qa, a, a) = (qok, ε ) #Ba ab13 δ (qok, ε , B) = (qok, b) #B b14 δ (qok, ε , b) = (qb, b) #b b15 δ (qb, b, b) = (qok, ε ) #b b16 - # -


The correctness of the construction is imme-diate. The only point of detail required con-cerns the case thatε ∈ L.

[Recall, that theCFG, G, is in CNF andgenerates the languageL( G ) − ε .]

In this case, all that is required is to add theinitial state,q0, to the set offinal states ofMG.

This completes the proof that ifL is aCFL,then there is a PDA, M , for whichL( M ) = L.

II) PDA to CFGs.Omitted.


COMP209


Section 8

Properties ofContext-Free Languages:

Limitations, Closure, Decision

252 Properties of CFLs

When considering the class of Regular Lan-guages earlier, it was seen that:

a) there is a general technique -The Pumping Lemma

- by which many non-regular languages canbeproved not be regular.

b) the class of regular languages satisfies anumber ofclosure properties, being closedunder:

ComplementationIntersection

Union

c) Given descriptions of regular languages,L1, L2, there are ‘effective’ (and ‘efficient’)methods that can determine if,

L1 = ∅, finite or infiniteL1 = L2L1 ⊂ L2

w ∈ L1 (for w ∈Σ* )

Properties of CFLs 253

In this section analogous questions regard-ing the class ofContext-Free Languagesare examined.

Thus,

a) How can a given language,L, be shownnot to be Context-Free?

b) What closure properties hold for the classof CFLs?

c) Given descriptions ofCFLs, L1 and L2,do ‘effective’ methods exists for deciding

L1 = ∅, finite or infinite?L1 = L2?L1 ⊂ L2?

w ∈ L1 (for w ∈Σ* )?


A Pumping Lemma forContext-Free Languages

We know already that:

P1) If L is aCFL over Σ there is aCFG,G = ( V, Σ, P, S)

with L( G ) = L − ε , all productions inPtaking the form

X → Y ⋅ Z or X → σ (Y, Z ∈V, σ ∈Σ)andG having no redundant symbols.[Chomsky Normal Form Theorem].

P2) w ∈ L( G ) if and only if there is aderivation tree in G that hasyield w[Theorem 11].

Suppose thatL is a CFL over Σ such thatinfinitely manyw ∈Σ* , belong toL.

What do (P1) and (P2) indicate aboutL?


Given G = (V, Σ, P, S) with L( G ) = L andG in CNF, consider a derivation tree inGfor which thelongestpath (fromS to a ter-minal leaf) contains exactly n non-leaf ver-tices.

Certainly each of these must be labelledwith avariable from V.

It follows that if n> |V| thensomevariable,W say, must occurmore than onceon thispath.


S

W

u

v

w x

y

W

From this example, we see that,

S ⇒(*)G u ⋅ W ⋅ y, u, y ∈Σ* , W ∈V

W ⇒(*)G v ⋅ W ⋅ x, v, x ∈Σ*

W ⇒(*)G v ⋅ w ⋅ x, w ∈Σ*

S ⇒(*)G u ⋅ v ⋅ w ⋅ x ⋅ y


S

W

u y

W

W

vxw


From this, however, since

S ⇒(*)G u W y ⇒(*)

G u vW xy

it must be the case that for any w ∈Σ* suchthatW ⇒(*)

G w, we hav e

S ⇒(*)G u w y

S ⇒(*)G u v wx y

S ⇒(*)G u v vw x xy

... ...

S ⇒(*)G u vk w xk y

... ...

In summary, If W ∈V is repeated in somederivation tree of uvwxy (whereW ⇒(*)

G v W xandW ⇒(*)G w) thenall words

u vk w xk y k≥ 0are inL( G ).


Pumping Lemma forContext-Free Languages

For any CFL, L there is aconstant, m, suchthat if z ∈ L and |z|≥ m, thenz may be writ-ten as

z = u ⋅ v ⋅ w ⋅ x ⋅ ywhere

1) |v x |≥ 1

2) |v w x|≤ m

3) ∀ k ≥ 0 u vk w xk y ∈ L

Proof: Consider aCNF, G = (V,T, P, S),for which L( G ) = L. If we consider aderivation tree in G in which the longestpath fromS contains at mostn non-leaf ver-tices, then sinceG is in CNF, such aderivation tree is abinary tr eeand thereforehasat most2n leaf vertices.


Recalling that the yield,z, of a derivationtree is formed by concatenating terminalsymbols labelling the leaves, we deduce that

| z | ≤ 2n.Suppose now that z ∈ L has| z |≥ m= 2|V| + 1. It follows that anyderivation tree inG that yieldsz must have apath of non-leaf vertices of length> |V|, andtherefore there issome variable, W, thatoccurs at least twice on this path.

This, with the prior analysis suffices to provethe result.


Applying the Pumping Lemmafor Context-Free Languages

Suppose thatL ⊂ Σ* is some language thatwe wish to prove is not Context-Free.

The strategy adopted in using the PumpingLemma is that of proof by contradiction.

1) For any constant, m, choose some wordz ∈ L with |z| ≥ m.2) For any partition of z as z= uvwxy forwhich

| v x |≥ 1 and |v w x|≤ m

show that there issome value k ≥ 0, forwhich u vk w xk y ∈ L.

From which it follows, that any CFG gener-ating L must, in addition, generate wordsthat arenot in L.


Examples

Example 1: The languageL(g) ⊂ 1 * (p.5)with

L(g) = 1 k : k is a prime number is not Context-Free.

Proof: Suppose G is a CFG withL( G ) = L(g). Giv en m, choose p> m aprime number and setz= 1p. Let z be writ-ten as

uvwxy= 1|u|1|v|1|w|1|x|1|y|

Then,zk = 1|u|1k|v|1|w|1k|x|1|y| ∈ L(G) ∀ k ≥ 0

But,zk = 1p−|w| + k(|v|+|x|)

and now setting k = p − |w| giv es a word inL( G ) but notL(g).


Example 2:The language,L = w ∈ a, b, c * : w = anbncn n> 0

is not Context-Free.

Proof: Suppose G, is a CFG withL( G ) = L. Giv en m, choose z= ambmcm

and consider any partition of z as uvwxywith |vwx| ≤ m |vx| ≥ 1. The subword vwxcannot contain all of the symbols in a, b, c . Thus, either

vwx = ar bs or vwx = br cs (1≤ r + s≤ m)Without loss of generality, suppose the firstapplies. Thenwe can write,w = ap bq, sothat

vwx = ar−p w bs−q

z0 = uv0wx0y = am−(r−p) bm−(s−q) cm ∈ L(G)Since

|vwx| − |w| = |vx| and |vx| ≥ 1we have (r − p) + (s − q) ≥ 1, it follows thatz0 has either less thanm as or less thanm bsand is not of the formanbncn.

Thus this language is not Context-Free.


Example 3: The language,L(h), (p.5)L(h) = w ∈ 0, 1 * : w = 1n 0n2

, n ≥ 1 is not Context-Free.

Proof: SupposeG is aCFG generatingL(h).Given m, fix z= 1m0m2

. Let z be written asuvwxy(|vwx| ≤ m and |vx| ≥ 1). So that,

z = 1m−r v w x0m2−s

w must have the form 1p0q for some p, qsuch that

0 ≤ p ≤ r

0 ≤ q ≤ s;

1 ≤ r + s ≤ m

r + s − (p + q) ≥ 1

uv0wx0y = 1m−(r−p) 0m2−(s−q) ∈ L( G )

(by the Pumping Lemma), and so ifL( G ) = L(h) we must have,

( m − (r − p) )2 = m2 − (s − q)


Rearranging, this is

( s − q ) = ( r − p ) ( 2m− (r − p) ) (8.1)

From r + s − (p + q) ≥ 1 it cannot be thecase thatr = p and s = q, i.e. both sidesmust bepositive, so we can assumep< rand q < s.

We now know that,

s − q ≤ m

(r − p) ( 2m − (r − p) ) ≥ 2m − r ≥ m

r + s ≤ m

(8.2)

The only case for which the first inequalityis not strict is whens = m, q = 0, but thenthe third inequality would give r = p = 0,and (8.1) is not satisfied. Otherwise, we have

s − q < m

(r − p) ( 2m − (r − p) ) ≥ 2m − r ≥ m(8.3)

and again (8.1) has no solution which wouldgive u x y∈ L(h).

It follows thatL(h) is not Context-Free.


Notice that some care needs to be exercisedwhen applying the Pumping Lemma forCFLs.

Having fixed z, with |z|≥ m the argumentmust deal withany partition

u ⋅ v ⋅ w ⋅ x ⋅ yof z that satisfies the conditions |vwx| ≤ m,|vx| ≥ 1.

Compare the proof thatL(h) is not a regularlanguage with the proof that it is not Con-text-Free.

In the former case, the partition intothreeparts, means that the ‘pumped’ sub-wordcan comprise entirely of 0s, whence derivingsomew ∈ L(h) is easy.

In the second case, however, it is necessaryto consider the possibility that the sub-wordvwx has the form 0r 1s, wherew = 0p1q andonly therange for p, q is known.


Closure Properties of CFLs

Theorem 18: The class of Context-FreeLanguages is closed under the operations,

a) Union (∪)

b) Concatenation (⋅)c) * -Closure (* )

Proof: Let L1 andL2 be CFLs over Σ, and

G1 = ( V1, Σ, P1, S1 )

G2 = ( V2, Σ, P2, S2 )

CFGs with L(G1) = L1, L(G2) = L2.

It may assumed thatV1 ∩ V2 = ∅ (by renam-ing variables if necessary).

a) G∪, aCFG generatingL1 ∪ L2, is

G∪ =

V1 ∪V2 ∪ S∪,

Σ,

P1 ∪ P2 ∪ S∪ → S1, S∪ → S2 ,

S∪


b) The construction is similar to (a), exceptthat the new start symbol,S⋅ has the singleproduction rule

S⋅ → S1 S2associated with it inP⋅.

c) A CFG, generating (L )* for theCFL, L,is formed by adding productions

S(* ) → ε andS(* ) → S S(* )to a CFG for L with start symbolS. Thestart symbol of the newCFG beingS(* ). .

So far, all of these are properties shared byregular languages.

In contrast, however, we hav e

Theorem 19: The class ofCFLs is notclosed under

a) Intersection (∩)

b) Complement (Co− )


Proof: Consider the 3 languages overΣ = a, b, c :

L1 = an bn cm : n, m≥ 1

L2 = am bn cn : n, m≥ 1

L3 = an bn cn : n ≥ 1

L3 is not aCFL. [Example 2]

L1 andL2, may be expressed as,

L1 = Xa ⋅ Yc = an bn : n ≥ 1 ⋅ c +

L2 = Ya ⋅ Xbc = a + ⋅ bn cn : n ≥ 1

SinceXa, Xbc, Ya, and Yc are allCFLs andCFLs are closed under concatenation, it fol-lows thatL1 andL2 are bothCFLs.

We now hav e,

L1 ∩ L2 = an bn cn : n ≥ 1 = L3

which is not aCFL.

Part (b) is now immediate, since closureunder complement would, withCFLs beingclosed under union, contradict part (a) [cf.DeMorgan’s Laws]


Decision Methods for CFLs

Theorem 20: Given a description of anyCFL, L, there are effecitive algorithms thatcan decide

a) If L = ∅,b) If L is a finite language.c) If L is aninfinite language.

Proof:a) Let L be described by aCFG,G = (V,T, P, S), and apply Procedure 6.1 toG. If S is identified as a redundant symbolduring this, then, obviously,L( G ) = ∅.

b), c) A condition using the PumpingLemma for CFLs can be defined (as forRegular Languages), however, we use adirect algorithm.

Let L − ε be described by aCFG in CNFG = ( V,T, P, S)


Build a dir ected graph, H( V, F ), from Gas follows:

Each vertexv of H is labelled with a uniquevariable from V;There is an edgefrom the vertex labelledXto the vertex labelledY if and only ifG con-tains a production:

X → Y Z or X → Z Y for someZ ∈V

Then L( G ) is infinite if and only this graphis acyclic.

To see this, first suppose thatH(V, F) con-tains a cycle with vertices labelled,

W1 → W2 → . . . → Wk → W1Since G contains no redundant symbols,there are derivations,

S ⇒(*)G u W1 y ⇒(*)

G uwy (u, w, y ∈T* )G, howev er, is in CNF, so the cycle mustcorrespond to some chain of productions,


For example,

W1 ⇒ X1 W2

⇒G X1 X2 W3

⇒G. . . ⇒G X1X2

. . . Xk W1

and so,

S ⇒(*)G uW1y

W1 ⇒(*)G v W1 x v, x ∈T*

W1 ⇒(*)G w w ∈T*

Hence, S ⇒(*)G u v * w x * y which is

infinite.

Similarly, if L( G ) is infinite, then theremust be somez ∈ L( G ) whose shortestderivation involves two occurences of thesame variable (cf. Pumping Lemma proof),thus H(V, F) contains a cycle involving thisvariable. .


Example

TheCFG, G = (V,T, P, S), havingV = S, A, B,C, D, T = a, b, c, d

and

P =

S→AB, S→BC, B→CC, B→AC, C→AD

A→a, B→b, C→c, D→d

defines the directed graph,

S

A

B C

D

which isacyclic.

ThusL( G ) is finite.


Example Continued

L(G) is the language

aac, acc, aadad, aadc, acad, ab, aaad

ccc, ccad, adadc, adadad, adcc, adcad

cadc, cadad, bc, bad

Adding the production,D → BB, wouldcause an edge directed fromD to B to beadded, thereby creating a cycle,

B → C → D → Bcorresponding to, e.g. the derivation,B ⇒ CC ⇒ ADC ⇒ ABBC⇒ abBc⇒ . . .

(note, trivially,S → AB ⇒(*) aabBc).

So, with this productionL( G ) is infinite.


Deciding w ∈ L for CFLs(The CYK-Algorithm)

Since our machine model corresponding toCFLs - PDA - is non-deterministic, in con-trast to the case of regular languages it is farfrom obvious how to define an algorithmthat decides

w ∈? L L Context-Free Language

Notice that attempts to ‘simulate’ thePDAconstructed in Theorem 16, must be ablecorrectly to determine when an inputw isnot in L within a finite number of steps, i.e.such simulations must recognise when aderivation cannot result inw.

One method is to start a new ‘guess’ whenthe current one hasmore than |w| symbols.

This approach, however, is very inefficient.


The CYK-AlgorithmThe algorithm we now describe was discov-ered independently byCocke,Younger, and,Kasami.

This takes aCFG,G = ( V, Σ, P, S)

in CNF together with a word w ∈Σ+, anddecides ifw ∈ L( G ) using adynamic pro-grammingapproach.

Supposew = w1 w2

. . .wn ∈Σn.Define the subset ofvariables, Di , j byDi , j = X ∈V : X ⇒(*)

G w j w j+1. . .w j+i−1

So Di , j is the set of variables from which thesubword of w starting at w j and havinglengthi , can be derived.

Obviouslyw ∈ L(G) ⇔ S ∈ Dn,1.

The CYK-algorithm works by computingeach of the subsetsDi , j , where 1≤ i ≤ n and1≤ j ≤ n − i + 1


First consider the cases,D1, j (1≤ j ≤ n).

The definition of ofD1, j indicates that thisshould contain those variables ofV whichderive the subword ofw starting atw j andhavinglength 1, i.e.w j

But, w j is a single terminal symbol, and soD1, j simply containsthose variables X ∈V,for which X → w j ∈ P.

What about the cases wherei > 1?

The key observation in this case is that:

X ∈ Di , j ⇔∃ k ∃ Y ∈ Dk, j , Z ∈ Di−k, j+k

X → YZ ∈ P

i.e. X ⇒(*)G w j w j+1

. . .w j+i−1 if and only ifwe can find a positionk and variablesY, Zthat satisfy:

Y ⇒(*)G w j w j+1 . . w j+k−1 ( Y ∈ Dk, j )

Z ⇒(*)G w j+k w j+k+1

. . .w j+i−1 (Z ∈ Di−k, j+k)

X → YZ ∈ P


This gives the complete algorithm as:

Initiation Stage

For each j (1≤ j ≤ n)D1, j : = X ∈V : X → w j ∈ P

General Step

for (i : = 2; i ≤ n; i + +)

for ( j : = 1; j ≤ n − i + 1; j + +)

Di , j : = ∅for (k: = 1; k < i; k + +)

Di , j : = Di , j ∪

X : X → YZ ∈P andY ∈ Dk, j , Z ∈ Di−k, j+k


Example

Let G = ( V, Σ, P, S) with

V = A, B,C, S ; Σ = a, b

P =

S→AB, S→CB, S→SS

C→AS, A→a, B→b

Supposew = a ab a b a

Step 1:D1, j

w j

j = 1 j = 2 j = 3 j = 4 j = 5 j = 6a a b a b bA A B A B B


Step 2:D2, j

Here we useD1, j to consider possiblederivations of the subwords,aa, a, ba, a, ba:

w j w j+1

j = 1 j = 2 j = 3 j = 4 j = 5aa a ba a ba∅ S ∅ S ∅

For D2,2 the subword a of length 2 startingat position 2, we have A→ a (D1,2) andB → b (D1,3). P contains S→ AB, thusS ∈ D2,2.


Step 3:D3, j

D1, j , and D2, j are used to consider the sub-wordsaab, aba, bab, and abb

w j w j+1w j+2

j = 1 j = 2 j = 3 j = 4aab aba bab abbC ∅ ∅ ∅

In D3,1 aab has A→ a (D1,1), S⇒(*) a(D2,2) with C → AS in P, giving C ∈ D3,1.

Step 4:D4, j

w j w j+1w j+2w j+3

j = 1 j = 2 j = 3aaba abab babb∅ S ∅

Using D2,2 (S⇒(*) a) and D2,4 (S⇒(*) a)with the productionS→ SS).


Step 5:D5, j

w j w j+1w j+2w j+3w j+4

j = 1 j = 2aabab ababbC ∅

A→ a (D1,1), S⇒(*) abab(D4,2), C → AS.

Step 6:D6,1

We now find S ∈ D6,1 fromC ⇒(*) aabab(D5,1)

B → b (D1,6)S→ CB

and so conclude thataababb∈ L(G).


The complete tableDi , j is given below:

w1 w2w3w4w5w6 = a ab ab b

ji 1 2 3 4 5 61 A A B A B B2 ∅ S ∅ S ∅3 C ∅ ∅ ∅4 ∅ S ∅5 C ∅6 S


COMP209


Section 9

Deterministic Context Free Languages:Properties and Applications

Deterministic CFLs 285

Deterministic Pushdown Automata

The definition ofPDA - the machine modelthat exactly describes the class of Context-Free Languages - allowednon-determinismin specifying the transition functionδ , i.e.for a PDA,

M = ( Q, Σ, Γ,δ , q0, Z0, F )given,

q ∈Qα ∈Σ ∪ ε γ ∈Γ

there could beseveral possible outcomesfor δ ( q,α ,γ ).

A Deterministic Pushdown Automaton,(DPDA) is a PDA for which there isat mostone possible move available at each step.

286 Deterministic CFLs

More formally, in a DPDA,M = ( Q, Σ, Γ,δ , q0, Z0, F )

The transition function is a mapping,δ : Q × Σ ∪ ε × Γ → Q × Γ*

such that

d1) For eachq ∈Q, σ ∈Σ, γ ∈Γ, a DPDA,M when in stateq with σ the next inputsymbol andγ the stack top, has at most one(state,word) pair (<q’, u>, say) that definesits next move.d2) If there is a move defined for (q ,σ ,γ )with (σ ∈Σ) then there isno move definedfor ( q, ε ,γ , ).

Condition (d2) means that aDPDA cannever make a choicebetween performing anε -move (when the stack top symbol isγ )and reading the next input σ (when thestack top symbol isγ ). At most one of

δ ( q,σ ,γ ) andδ ( q, ε ,γ )has a defined outcome.


Example

The first example of aPDAwe gav e, i.e.

q0 q1

q2

(0,0,00)

(0,#,#0)

(1,0,ε)

(ε,#,#)

(1,0,ε)

is, in fact, aDeterministic PDA.


Deterministic Context-Free Languages

A DPDA is a restricted form ofPDA and wehave seen in Theorem 16 thatThe class of languages recognised byPDA

≡The class of languages generated byCFGs

≡defThe Context-Free Languages (CFLs)

So it is certainly the case thatThe class of languages recognised byDPDA

⊆Context-Free Languages (CFLs)

If L ⊆ Σ* is recognised by aDPDA, then Lis said to be aDeterministic Context-Free Language(DCFL)

Question: Is DCFL = CFL?

Answer: No.


CFLs not Recognisable byDeterministic Pushdown Automata

We now dev elop arguments that can beapplied to show someCFLs are not DCFLs.

We first provide some motivation for theapproach used.

SupposeL ⊆ Σ* is aCFL and a proof thatLis not aDCFL needed.

How could such a proof be constructed?

The only methods we have dev eloped, sofar, that can establishL is not in some classof languages are

Pumping LemmataUnfortunately, such a technique specificallyfor DCFLs has yet to be discovered.

We must consider rather ‘indirect’ methods.


Theorem 21:The class,DCFL, ofDeterministic Context-Free Languages

over Σ is L ⊆ Σ* : L ∈CFL andΣ* − L ∈ CFL

i.e. those Context-Free Languages whosecomplement is also a Context-Free Lan-guage.

Proof: Omitted.

Theorem 21 shows thatL is recognised by aDPDA, M , if and only if Co− ( L ) is recog-nised by someDPDA, M’ .

Since we know from Theorem 19(b), thatthe complement of an arbitraryCFL, L, maynot be aCFL, we may deduce

DCFL ⊂ CFL.


Further Properties of DCFLs

The fact that aDPDA for a DCFL, L, can bechanged to anotherDPDA recognisingwords that arenot in L, is only one differ-ence that restrictingPDAs to be determinis-tic creates, there are a number of others.These are summarised in

Theorem 22:The classDCFL is not closedunderany of the operations:

Union (∪)Intersection (∩)Concatenation (⋅)

*-Closure (* )i.e. if ⊕ is any of the first three, then thereareDCFLs, L1, L2 for which L1 ⊕ L2 is nota DCFL; there areDCFLs, L, for which L*

is not aDCFL.

Proof: Omitted.

Recall thatCFL are closed under all but thesecond (∩).


Discussion

One of the principal ideas underpinning thetheory of Formal Languages and Automatathat we have been emphasising is that of a

‘hierarchy’ of ‘ language classes’matching exactly with a

‘hierarchy’ of ‘ machine capabilities’matching exactly with a

‘hierarchy’ of ‘ formal grammar rules’

So we have, so far:

Language Machine Grammar RulesRegular DFA≡NDFA V→ σ ; V → σ W

CFL PDA V → σ ; V →UW

withRegular⊂ CFL ⊂ . . . (?).

To these have been added,

DCFL DPDA≡PDA V→ σ ; V →UW

with, now,Regular⊂ DCFL ⊂ CFL ⊂ . . . (?)


We may further analyse the‘machine hierarchy’

Machine OrganisationDFA Finite Memory

≡ NDFA Finite Memory< DPDA 1 (unbounded) Stack< PDA 1 (unbounded) Stack

Nevertheless, we view the ‘basic hierarchy’(built so far) as

Regular Languages (Lowest)Context-Free Languages (‘next level’)

. . .≡

DFA (Simplest machine type)PDA (‘next level’)

. . .

Question: Why do wenot choose to viewDCFL ≡ DPDAs

as the ‘second’ l ev el?


Answer(s)

a) The ‘machine hierarchy’ has beenexpressed in terms of increasingmemorycapabilities: from only finite, i.e. indepen-dent of input word length, through to a sin-gle stack whose size can grow arbitrarily.

It is not described in terms of ‘change ofprogram state’ abilities, i.e. determinism,non-determinism, etc.

The distinction betweenDCFLs and generalCFLs, arises through the latternot the for-mer.

b) While there is a formal grammar charac-terisation ofDCFLs, its definition is some-what ‘contrived’ in comparison withRLGs(for Regular) andCNF (for Context-Free)Languages.


Of course this interpretation is, arguably,rather arbitrary.

One side-effect of it is that the question ofdeterministic

versusnon-deterministic

is seen as relating to‘program’ models

for specific‘machine’ types.

e.g. in finite memory machines (i.e. FA)the language recognition capabilities of

non-deterministic (NDFA)and

deterministic (DFA)programs (i.e.δ ) are identical.

In machines with a single unbounded stack,deterministic programs (DPDA)

are ‘less powerful’ thannon-deterministic programs (PDA).


Applications of DCFLs:Programming Language Description

and Syntax Analysis

The earlier discussion of Context-FreeGrammars noted that one important applica-tion of these in Computer Science was in thearea of defining

Programming Language Syntax

In fact most (if not all) widely used High-Level Programming Languages are such thatthe set of all

syntactically correctconstructs in the language is exactlydescribed by somedeterministic CFL.

Why is this significant?


a) Any practical mechanisms for deciding ifw ∈ L (for L a DCFL)

are of considerable importance inCompiler Development.

since, these make it possible to determine ifa program (and/orprogram statement) is avalid construct in the language.

In addition, if a HLL is described by anappropriate grammar generating aDCFLand we have a method of automaticallybuilding a parser for any (suitable)DCFLthen this provides ageneral technique thatcan be used to build the syntax analysisstage forany (suitable)HLL.

[ So, e.g. minor changes to a language’s def-inition need not necessitate developing anew syntax analyserfr om scratch: one canbe constructed from theparser generatorusing the new language grammar.]


b) Using an appropriate formal grammarprovides a concise unambiguous descriptionof valid language statements. Hence, indi-viduals new to the language have a definitivereference against which not only to check

howa particular construct should be describedbut also with which to determine precisely

whythe compiler has indicated a program state-ment to be syntactically incorrect.

Of course, these could all be achieved usingCFGs and the techniques that have alreadybeen described.

What is gained by usingDeterministic CFLs?


Consider the method that was presented fortesting if w ∈ L for L a Context-Free Lan-guage — the CYK-Algorithm.

Why is this ‘unsuitable’ as a ‘practical’syntax checking method for HLL compilers?

a) It requires aCFG inChomsky Normal Form. (CNF)

b) It is ‘too slow’ for ‘practical’ purposes.c) Knowing that a statement is syntacticallycorrect is not, in itself, sufficient: to be use-ful a description of how the statement isgenerated from the grammar is required, i.e.a parser for L should not only decide ifw ∈ L but also return a

derivation treecertifying this.While thiscould be extracted using the tablein the CYK-Algorithm,CNF is not the mosttransparent form for

describingor processingprogramming language syntax.


Example

Consider the ‘expression’CFG defined ear-lier, i.e.

E → (E) | E op E |opdop → + |− | * | /

opd → numnum → digit |digit numdigit → 0|1|2|3|4|5|6|7|8|9

In CNF (having removed the unit produc-tions) this becomes

E → EL CR | Eop E |digit num |0|1|2|3|4|5|6|7|8|9EL → CL EEop → E opop → + |− | * | /

num → digit num |0|1|2|3|4|5|6|7|8|9CL → (CR → )

digit → 0|1|2|3|4|5|6|7|8|9

Notice that theopd variable of the originalis redundant.


Parsing Techniques forDCFLs

We conclude the section on Context-FreeLanguages by presenting a brief overview ofsome standard methods that have beendeveloped for parsing words whenL is aDCFL.

By a parser for L a DCFL we mean amethod that with aCFG

G = ( V, Σ, P, S)definingL, takes:

Input : w ∈Σ*

and returns as output:

Somederivation tr ee in G with yield w, ifw ∈ L.An error message ifw ∈ L.


Overview of Parsing Methods

GIven aCFGG = ( V , Σ, P, S)

describing someDCFL, L, there are twogeneral approaches one could use to test ifw ∈ L( G ):

Bottom-up ParsingBuild a derivation tree starting from thesymbols inw as the leaves of such a tree.

Top-down ParsingTry to construct a derivation tree with yieldw starting fromS (labelling the root of sucha tree).

While there are a number of different algo-rithms that have been used in compilers inpractice any of these can be described interms of one of these approaches.


Shift-Reduce Parsers

These are one class of bottom-up parsers:

Givenw = w1w2w3

. . .wn ∈Σ* ,such methods searchw for some subword

wi wi+1. . .w j

for which X → wi. . .w j is a production in

the grammar.

This scanning process can then continuewith the word

w12. . .wi−1 X wj+1

. . .wnuntil either the start symbolS results(w ∈ L(G)) or no further productions areapplicable.

Of course, in order to be effective some pol-icy for organising the subword search mustbe employed.

One such policy is to search forhandles.


SupposeX → α is some production andk aposition in a word u. The pair <X→α , k >is ahandleof the wordu ∈( V ∪ Σ* ) if

a) u = vα w v ∈( V ∪ Σ* ), w ∈Σ*

b) ∃ a right-most derivation S ⇒(*)G v X w⇒G u

Shift-Reduce parsers can be viewed as con-structing a sequence of right-mostderivations:

< un, un−1, un−2 , . . . ,u1, u0 >where,w = un

un−1 ⇒G un

. . .

uk ⇒G uk+1. . .

S = u0 ⇒G u1


Example

Using,Σ = +, *, (, ), id ; V = E ; S= E,

P =

E → ( E ), E → id

E → E + E, E → E * E

Suppose,w = id + id * id,

k uk Handle5 id + id * id < E→id, 1 >4 E + id * id < E→id, 3 >3 E + E * id < E→id, 5 >2 E + E * E < E→E * E, 3 >1 E + E < E→E + E, 1 >0 E Accept.

Notice that uk ⇒ uk+1 using a right-mostderivation.


Problems with Shift-Reduce Methods

There are two main problems to be solved inimplementing Shift-Reduce methods forparsing:

Identifying an appropriate handle inwDeciding which production to apply.

For manyCFGs there may well be words forwhich a unique handle is not defined:selecting the ‘wrong’ handle (subword andproduction) could lead to the reduction pro-cess rejecting a word.

One major advantage ofDCFLs for definingprogramming languages is that for anyDCFL, L, there is aCFG, G, with L(G) = Land for any w ∈ L there is aunique right-most derivation, S⇒(*)

G w.

This class of grammars forDCFLs arecalled LR(k) grammars (Left-Right scanwith k symbols lookahead).


Top-down Parsers

In a top-down parser the aim is to produce aderivation tree for w be searching for anappropriate sequence of production rulesstarting fromS.

These can be though of as building aleft-mostderivation.

SupposeG = ( V, Σ, P, S) is aCFG.

One, ‘simple’, method that can be used toconstruct a parse forG is to implement aseparate method for each variableX ∈V.

Such a parser scans along an input (onesymbol at a time) using invoking the meth-ods for variables that are required.


Problems with Top-down Parsers

For any ‘non-trivial’ CFG, methods will berecursive. This can create two problems:

a) If more than one production could betested as a possible derivation step thenbacktracking may be needed.

b) Productions of the formX → X u (‘ left-recursion’) must be eliminated to removepossibility of indefinite recursion.

A standard technique (which a number ofyou may see in COMP204 next semester) isthat called ‘recursive descent’ parsing.

These are top-down parsers organised sothat recursive backtracking is neverrequired.

Problems of ‘left-recursion’ when definingprogramming languages are avoidable bycareful restructuring of the grammar.


COMP209


Section 10

Turing Machines

Computability and Decidability

310 Turing Machines

Introduction

The three languages, 1 k : k is a prime number

1 k 0k2: k ≥ 1

an bn cn : n ≥ 1 have been shown earliernot to be Context-Free Languages.

So, with the machine models that have beenconsidered, -

Finite AutomataPushdown Automata

it is not possible to describe aprogram (i.e.state transition function,δ ) that recognises

all of the words in eachand

only the words in each

irrespective of whether such a program isdeterministic or non-deterministic.

Turing Machines 311

Yet, thereare algorithms thatcan:

given 1k, decide ifk is a prime.

given 1i0 j , decide if j = i2.

given ai b j ck, decide ifi = j = k.

Thus, if we wish to be able to associatesome ‘minimal’ capability ‘machine model’with each ‘recognisable’ l anguage, it isclear that some model that is ‘more power-ful ’ thanPDA is required.

Equivalently, some formal grammar typethat is more ‘expressive’ than Context-FreeGrammars is needed.

312 Turing Machines

In order to motivate the ‘next’ (and final)level of machine model, consider in whatways PDAorganisation is limited:

A PDA reads its input word,w, onceonly.

What alternatives for processing are there?

1) w (or a subword) could be saved on thestack for ‘later decision making’.

Problem: there may an arbitrary number ofstack symbols that need to saved on thestack in order for ‘correct’ decisions to bemade:

In a finite number of states andone stack aPDAcannot record for

simultaneoususe,

both an arbitrarily long subword ofwand an arbitrarily long ‘stack word’.

Turing Machines 313

Informal Example

A (very!) informal argument that 1k0k2is

not context-free, could be derived by consid-ering how the sub-word 1k is treated:

A PDA has to recogniseexactly k copies of0k once 1k has been read.

But k can be arbitrarily large, so its valuecannot be recorded using afinite number ofstates.

Thus, the value ofk must somehow be‘remembered within the stack’:

For example,

a) Push the sequence of 1s onto the stackand after each sequence ofk 0s popexactlyone 1 from the stack, repeating until thestack is empty and no more input left.

314 Turing Machines

Problem

It is not possible to count tok using afinite state set

withoutdestroying the value ofk on the stack

(i.e. the 1k value).

No matter what approach is used, at somestage the value ofk must be ‘remembered’

outsideof the stack

The only ‘available resource’ is the finitestate set.

But this cannot store an arbitrarily largeamount of information.

Turing Machines 315

Turing Machines

The machine model we introduce now - theTuring Machine - defines the ‘most power-ful ’ class of machine capabilities.

As with the earlier models -DFA andPDA -its operations are defined through a finiteprogram: i.e. state transition procedure, itsincreased power comes through a develop-ment of its memory organisation.

Although it may not be clear at first, weshall see that this development has a naturalinterpretation within our ‘hierarchy’ ofmachine capabilities.

316 Turing Machines

Turing Machine Overview

M

x1 x2 xkxk−1 xk+1 xn B...... ...... B

Input x1. . . xn stored in the firstn locations.

Locations which have not been used (yet)hold aBlank symbol.

In a single move, M can:

read the current location (xk);write into this.;go to the location to the left (xk−1)or the location to the right (xk+1)

The starting location isx1.

Turing Machines 317

Example of Turing Machine Program (M)

q0

q1 q2

q3 q4

q5q6qA qR

(0, #,R)

(1, #,R)

(σ ,σ , R)

( B, %, %, L)(1, 1,R)

(0, %,L)

(σ ,σ , L)

(#, #,R)(%, %,R)

(1, %,L)

(0, 0,R)( B, %, %, L)

(0, #,R)

(1, #,R)

(σ ,σ , R)

Alphabet: 0, 1, #, % ; (σ ∈ 0, 1 )

Edge fromqi to q j labelled (x, y, L) indi-cates: if reading symbolx in state qi :replace it with symboly; move to symbol onLeft and enter stateq j .qA: Uniqueacceptstate.qR: Uniquereject state.

318 Turing Machines

Formally, a Turing Machine, (TM), M , isdescribed by a septuple,

M = ( Q, Σ, Γ, q0, B,δ , qA, qR )

Q: finite set ofstates;Σ: input alphabet;Γ: tapealphabet,Σ ⊂ Γ;q0: Initial state;B ∈Γ: Blank symbol;δ : Q × Γ → Q × ( Γ − B ) × L, RqA: Halt and acceptstate;qR: Halt and reject state;

The input word w ∈Σ* occupies the first |w|locations (orcells) of an infinite tape.

The tape is scanned by atape head(posi-tioned at cell 1 to start), that can move onlyone cell to the Left or Right after each move.

w is acceptedif M reaches stateqA.w is rejected if M reaches stateqR.

Important : M may fail to reach either.

Turing Machines 319

Turing Machine Operation

The actions of aTM, M , are completely pre-scribed by its transition function,

δ : Q × Γ → Q × ( Γ − B ) × L, RNotice that this isdeterministic.

For each combination of stateqi and symbolσ ∈Γ, δ defines:

The symbolγ ∈Γ − B to be written;Whether the tape head should move to theLeft orRight.The next state to enter.

M starts its operation in stateq0 with thetape head scanning the symbol in cell 1, i.e.the first symbol ofw. and any cell whichdoes not contain aninput symbol containstheBlank symbol.

This cannot bewritten to the tape (or appearinside the inputw)

320 Turing Machines

TM Configurations andInstantaneous Descriptions

Informally, the language accepted by aTM , M , comprises thosew ∈Σ* , uponwhich M reaches its halt and accept state,qA.

To make this precise, the concept of aTMconfiguration or

instantaneous description(ID)is used.

An ID of aTM,M = ( Q, Σ, Γ, q0, B,δ , qA, qR )

on inputw = w1 w2

. . .wn ∈Σ*

is a word of the from,c1 c2

. . .ck−1 q ck ck+1. . .cm B

whereci ∈Γ − B and q ∈Q.

Turing Machines 321

This records the information that:

M is currently in stateq;The tape head ofM is scanning thek’th cell.The i ’th tape cell contains the symbolci ∈Γ − B ( 1 ≤ i ≤ m).

So theinitial configuration of M on inputw (ID0) is,

IDw0 = q0 w1 w2

. . .wn B

Suppose,ID r = c1

. . .ck−1 qi σ ck+1. . .cm B

and thatδ ( qi ,σ ) = ( q j ,γ , D ) D ∈ L, R

The next configuration ofM (ID s) afterapplyingδ , will be

ID s =

c1. . .q j ck−1 γ ck+1

. . .cm B if D = L

c1. . .ck−1 γ q j ck+1

. . .cm B if D = R

ID r —M ID s denotes this withID i —(*)M ID j

indicating there is asequenceof moves ofM leading toID j starting fromID i .

322 Turing Machines

Languages Accepted by Turing Machines

Given,M = ( Q, Σ, Γ, q0, B,δ , qA, qR )

letID (i) = IDs of M whose indicated state isqi ∈ Q

The languageL ⊆ Σ* acceptedby M , isL( M ) = w : ∃ ID k ∈ ID (A) with IDw

0 —(*)M ID k

i.e. all words upon which the initial configu-ration (IDw

0 ) will lead to a configurationindicating the halt and accept stateqA.

ImportantThere are a number of ‘technical subtleties’in this definition that are dealt with later. Byfar the most significant of these is the use of

languageacceptedrather than

languagerecognised,i.e. at this point we are not concerned withM ’s outcome forw ∈ L(M).

Turing Machines 323

Example

For the exampleTM, with input w = 0110:

k IDk δ ( q,α )0 q00110B (q1, #, R)1 #q1110B (q1, 1, R)2 #1q110B (q1, 1, R)3 #11q10B (q1, 0, R)4 #110q1B (q2, %, L)5 #11q20%B (q5, %, L)6 #1q51%%B (q5, 1, L)7 #q511%%B (q5, 1, L)8 q5#11%%B (q6, #, R)9 #q611%%B (q3, #, R)10 ##q31%%B (q3, 1, R)11 ##1q3%%B (q4, %, L)12 ##q41%%B (q5, %, L)13 #q5#%%%B (q6, #, R)14 ##q6%%%B (qA, %, R)

Thus, 0110 is accepted byM .

324 Turing Machines

Example (continued)

The exampleTM in fact accepts all words w ∈ 0, 1 + : w = u reverse(u)

i.e. even length palindromes.

Although this may not be immediately clear(from the state-transition graph), the ‘algo-rithm’ that M uses is:

a) Replace leftmost 0/1 with #;b) Check if this matches the rightmost 0/1

(q1 andq2 do this if 0 occurs);(q3 andq4 do this if 1 occurs);

c) If they matchthen replace the rightmost 0/1 with %;

d) Continue from (a) if any 0/1 symbolsremain.

e) If all symbols are replaced by # or %then accept.

Turing Machines 325

The ‘Machine Hierarchy’and Turing Machines

We now hav ethree basic machine models:Finite Automata

Pushdown AutomataTuring Machines

In each afinite length input word w is ‘pro-cessed’ by aprogram defined by astatetransition function , δ .

The available ‘actions’ of the programdepend on the precise memory regime asso-ciated with the model:

For FA: fixed amount of memory.

For PDA: oneunbounded stack.

For TMs: unbounded memory which a pro-gram can access ‘freely’.(i.e. without the regime imposed by a Stack).

326 Turing Machines

There are two questions that might be asked:

Question 1:

Why should removing the restrictionimposed by a

Stackorganisationbe significant?

i.e. why should‘one type’ of unlimited store

be ‘more powerful’ thananother type?

Question 2:

Wouldn’t a ‘more natural’ extension ofPDAbe to provide

asecond unbounded Stack?

Turing Machines 327

e.g. suppose we define a2-Stack Automa-ton (2 − SA), as

M = ( Q, Σ, Γ, q0, #,δ , F )(as for a deterministic PDA, having noε -moves.)

M hastwo unbounded Stacks,S1 andS2.

An input word w = x1. . . xn is held onS1

(x1 at thetop). with S2 holding only #.

The state transition function encodes movesδ : Q × Γ × Γ → Q × Γ* × Γ*

i.e.δ ( qi ,γ1,γ2 ) = ( q j , u1, u2 )

means

‘ in stateqi , if the top ofS1 is γ1 andS2 is γ2,replace these bywords u1, and u2 and go tostateq j .’

ui must be one ofε (‘pop’ top symbol);γ i(leave stack unchanged); α γ i (α ∈Γ)(‘push’ α onto stack).

328 Turing Machines

Theorem 23: L ⊆ Σ* is accepted by someTM, M if and only if L is accepted by some2 − SA, M’ .

Proof: (Outline)

1) TM ⇒ 2 − SA:Given M , aTM acceptingL, a2 − SA, M’ isconfigured so thatS1 always holds the con-tents of M ’s tape up to and including thesymbol currently scanned (which is at thetop), whileS2 holds the (non-blank) portionto the right of the current symbol.

The current configuration ofM is easilyreflected by appropriate moves in M’ .

Turing Machines 329

For example,ID = c1

. . .ck−1 qi σ ck+1. . .cm B

with δ ( qi ,σ ) = (q j ,γ , D)

S1 S2 becomes S1 S2σ ck+1 ck−1 γ

ck−1 ck+2 ck−2 ck+1. . . . . . . . . . . .c1 cm c1 cm# # # #

D = Left

S1 S2 becomes S1 S2σ ck+1 ck+1 ck+2

ck−1 ck+2 γ ck+3. . . . . . . . . . . .c1 cm c1 cm# # # #

D = Right

330 Turing Machines

2) 2− SA⇒ TM:A TM, M represents the content,γ i vi of Si(i ∈1, 2) of the 2− SA, M’ on its tapewith new symbols %,@ ∈Γ(M’ ) separatingthese, so that this has the form

%γ1 u1 @γ2 u2.M can recover both top of stack symbolsand remember these. The changes to the tapeof M reflecting a move of M’ are performedover sev eral stages, e.g.

IfS1 = c1 c2

. . .ckS2 = d1 d2

. . .dmthenM ’s tape holds

%c1. . .ck @d1

. . .dm BIf c1 is replace byu1 andd1 by u2 (|ui |≤ 2)

If u1=α c1, u2 = β d1, thena) M copiesd1

. . .dm 2 cells to the right;b) Insertsβ ;c) Copies @,c1 . .ck 1 cell right;d) Insertsα .etc for other possibilities. .

Turing Machines 331

Discussion

In addition to the fact that Theorem 23shows the increase in storage capabilities tobe exactly the same as providing asecondstack store, there is one point to be notedconcerning its proof:

in the second part we did not present adetailed description of how the TM transi-tion function,δ , was defined in terms of thatof the 2− SA: instead an

‘algorithmic ’overview of M was giv en.

It will in f act,alwayssuffice to describe spe-cific TM programs and actions using thislevel of description: i.e.

For design of aTM, M , if it i s possiblealgorithmically

to describeMs actions, then it may beassumed

that it is possible formally to defineδ for M .

332 Turing Machines

The Class of LanguagesAccepted by Turing Machines

Recall that the language over Σ acceptedbyaTM,

M = ( Q, Σ, Γ, q0, B,δ , qA, qR )comprises thosew ∈Σ* upon which M oninput w reaches the halt and accept stateqA.

Any L ⊆ Σ* for which there is aTM, M ,with L( M ) = L is called a

Recursively Enumerable (r.e.)language.

Any L ⊆ Σ* for which there is aTM, M ,such that

∀ w ∈ L M reaches the accept stateqA on inputw

∀ w ∈ L M reaches the reject stateqR on inputw

is called aRecursiveLanguage

Turing Machines 333

Terminology and Interpretation

The terms recursively enumerable andrecursive date from the earliest studies intothe questions:

a) Is it possible to construct ‘effective’ algo-rithmic methods for acceptingany lan-guage?

b) Can somespecific language beprovednot have an ‘effective’ acceptance algo-rithm?

c) Are different ‘general’ ‘machine models’equally ‘powerful’: i.e. are there ‘reason-able’ machine models that can accept lan-guages not accepted by ‘other reasonablemodels’?

Much of the remainder of this module isconcerned with these questions.

334 Turing Machines

Suppose a languageL ⊆ Σ* is recursivelyenumerable.

What does this mean in ‘computational’terms?

It means that there is analgorithmicmethod (‘program’) that:

1) Takes as input anyw ∈Σ* .2) If w ∈ L then this can be confirmed in

a finite number of steps

If L is recursive then there is an algorithmicmethod thatnot only reports if w ∈ L butalsoreports ifw ∈ L.

The TM, M , provides (a possible) algorith-mic method.

Turing Machines 335

For a language,L, to be recursive may beseen as a ‘minimal’ requirement in computa-tional terms:

For such languages, methods exists thatallow the status of any word, w with respectto L to be decided in a finite number ofsteps.

The term decidable is often used of lan-guages with this property, i.e.

Recursive ≡ Decidable

If L is r.e. but not recursive, the ‘best’ that ispossible is a method that willhalt andacceptanyw ∈ L within a finite time.

Such methods, however, mayfail to reach a halting state

when given wordsw not belonging toL.

The termsemi-decidableis sometimes usedto describe such languages.

336 Turing Machines

Formal Grammars and r.e. Languages

Turing machines have been presented as the‘final’ l ev el of the hierarchy of ‘black-box’capabilities.

Implicit in this description ofTMs are sev-eral claims:

a) If a languageL is not r.e. (or notrecur-sive) then no ‘reasonable’ ‘ extension’ ofTM capabilities will provide an ‘effective’model that can be used to accept (or recog-nise)L.

b) A class of formal grammars correspond-ing to the r.e. languages ought to correspondto the ‘mostexpressive’ f orm.

Turing Machines 337

Thus (a)+(b) can be viewed as stating:

Intuitively, the class of formal grammarscorresponding to r.e. languages, should besuch that:

For L ⊆ Σ* there issomeformal grammar,G, with L( G ) = L

if and only ifL is recursively enumerable

(i.e. there is aTM, M , acceptingL).

Of course, given that we have definedclasses of formal grammar so far byrestricting the form of allowable productionrules, the ‘most expressive’ class of gram-mars can only be those for which

no restrictions whatsoever areplaced on the form ofgrammar productions

338 Turing Machines

Unrestricted Grammars

G = ( V, Σ, P, S) is an unrestricted formalgrammar when productionspi ∈ P take theform,

Li → RiLi ∈( V ∪ Σ )+

(Li containing at least one variable fromV)Ri ∈( V ∪ Σ )*

As before, w ∈Σ* , is in the language,L( G ), generated byG, if S⇒(*)

G w.

A further justification for the machine modeldefined by Turing machines is

Theorem 24: L ⊆ Σ* is r.e. if and only ifthere is a formal grammar (i.e. unrestricted)G = ( V, Σ, P, S) for whichL( G ) = L.

Proof: (Omitted). .

Turing Machines 339

SummaryLanguage, Machines, Grammars

The Chomsky Hierarchy

Theorem 24 ‘completes’ our ‘hierarchy’ as:

The Chomsky Language HierarchyType Name Grammar Machine

3 Regular RLG FA2 Context-Free CFG PDA0 r.e. Unrestricted TM

We hav eshown (earlier),

Regular⊂ CFL ⊂ r . e

≡FA < PDA < TM

≡RLG < CFG < Unrestricted

Of course this table raises some questions.

We first deal with the ‘most obvious’ one.

340 Turing Machines

What about ‘Type 1’ Languages?Digression

(Context-Sensitive Languages)

There is, in fact, a ‘layer’ of this hierarchythat falls strictly betweenCFLs (Type 2) andr.e. languages (Type 0).

There are several technical reasons whythese are not treated in any depth within thismodule and they are discussed now only forreasons of completeness.

This ‘missing’ level is the class ofContext-Sensitive Languages(CSL)

the corresponding grammar class (CSG)imposing the restriction that productionrulesLi → Ri must satisfy

Li ∈( V ∪ Σ )+ ; Ri ∈( V ∪ Σ )+ ; | Ri | ≥ | Li |

i.e. whenever u ⇒(*)G w, |w| ≥ |u|.

Turing Machines 341

The matching machine model (LBA - LinearBounded Automata) forCSLs is ‘similar’ toTuring Machines, but with the following dif-ferences:

a) The transition function isnon-determin-istic

b) Only the space occupied by the input isavailable. (thisof course, can be used in anyway consistent withTM capabilities: over-written, scanned repeatedly, etc)

The (non-Context-Free) languages:

1 k : k is a prime number

1 k 0k2: k ≥ 1

an bn cn : n ≥ 1

areall CSLs.

342 Turing Machines

So why ‘ ignore’ CSLs?

a) Important as this class is, in comparisonwith the other 3 levels very little is knownabout properties ofCSLs, e.g.

1) Whether deterministic LBA acceptexactly the same languages asnon-deter-ministic LBA is a major open question, cf.DCFL vCFL.

2) Closure under complementation has onlyrecently (1988) been proved: in other classes‘most’ questions were resolved between1935 and 1970.

b) Very few ‘natural’ examples of r.e. lan-guages that arenot CSLs are known andproofs arehighly non-trivial

c) The machine restriction suggests thatissues regarding CSLs, in particular provingthat someL ∈CSLare more ‘naturally’ con-sidered withinComputational ComplexityTheory (COMP202) rather than Automataand Formal Languages.

Turing Machines 343

Example

The following are recursive but not CSLs:Σ = ∃, ∀ , = , +, ’ , (, ), x, ∧, ∨, ¬, 0, 1

Th+N =

w :

w is a well-formed First-Order

sentence about addition over Nwhich istrue

e.g.∀ x ( ∃ x’ ( (x’ + 1) = x) ) ∨ ( x = 1 )

[for any positive integer (x): either there is apositive integer (x’) less than it (x’ + 1 = x)or the integer is 1].

Σ = 0, 1, (, ), *, ⋅, ∩, +, ¬

TOT =

w :

w is a well-formed

‘extended’ regular expression

for which L( w ) = 0, 1 *

an ‘extended’ regular expression being onein which additional operations∩ (intersec-tion) and ¬ (complement) may be used.

344 Turing Machines

Turing Machines and Computationof Arithmetic Functions

With the exception of ‘finite state transduc-ers’, at the start of this module, the view ofcomputation that has been presented hasbeen in terms of,

‘Given a (description) of some language, Lover Σ, and a word,w, determine if w∈ L’

When compared with ‘real’ computations,this view may look rather over-simplifiedand contrived, e.g. what does it say aboutcomputing arbitraryarithmetic functions

f : (N ∪ 0) k → ( N ∪ 0 ),i.e. where the argument comprisesk non-negative integers and the result is a non-neg-ative integer?

Let Σ = 0, 1 so that values are written inunary, with 0 used to separate arguments,e.g. the values 2, 0, 5, 17, 0, are ‘coded’

0 11 0 0 11111 0 11111111111111111 0

Turing Machines 345

We may view a Turing machine,M , whoseinput words take the form

# x1 # x2 # . . . # xk−1 # xk

(with xi ∈1 * )

as computingsome functionf : (N ∪ 0) k → N ∪ 0

How is this done?

By interpreting thecontent of M ’s tape,whenM halts as the result.

For example,

Forf ( x1, . . . ,xk ) = y,

M could write the symbol 1 in each of thefirst y locations, and the symbol 0 in all theremaining (non-blank) cells.

Notice that only the computationhaltingdetermines the result,not if this is in qR orqA.

346 Turing Machines

In order to formalise these ideas, letM = ( Q, Σ, Γ, q0, B,δ , qA, qR )

with Σ = 0, 1 , Γ = 0, 1, B ,

M is said to compute the functionf : (N ∪ 0) k → N ∪ 0 if

∀ w ∈ 0 ⋅ 1 * k,w = 0 1x1 0 1x2 0. . .0 1xk

∃ u ∈ 0 * , such thatM halts with the word 1yu

written on its tape,if and only if

f ( x1 , . . . ,xk ) = y

We use f (k)M ( x1 , . . . ,xk ) to denote thek-

argument function computed by theTM, M .

Turing Machines 347

Partial and Total Computable Functions

It is not difficult to constructTMs that com-pute all of the standard one and two argu-ment functions e.g.

m + n ; m * n ; 2n

m − n ; m/n ; log n Obviously for the first 3 functions, there isalwaysa uniquely defined result.

What about the cases,m − n (whenn> m) m/n (whenn = 0) log n (whenn = 0)

though?

For such as these we have to distinguishbetweentotal functions - with a defined out-come onev ery point of their domain - andpartial functions - which for some argu-ments maynot have a defined result.

348 Turing Machines

Functions with possibly undefined resultsand the respective TM computations are dis-tinguished in the following,

Definition:f : (N ∪ 0) k → N ∪ 0

is said to be apartial r ecursive function ifthere is aTM, M , for which

f (k)M ( x1 , . . . ,xk ) = f (x1 , . . . ,xk )

whenever f has a defined result for< x1 , . . . ,xk >.

f : (N ∪ 0) k → N ∪ 0is a total recursive function, if there aTM,M , for which

f (k)M ( x1 , . . . ,xk ) = f (x1 , . . . ,xk )

and f is definedfor every < x1 , . . . ,xk >.

Turing Machines 349

Discussion

The first studies of ‘computability’ werecouched in terms of decribing what we havedefined as ‘partial recursive’ f unctions: the(archaic sounding) term

Recursive Function Theorystill survives as the general name for thisfield of research.

With some minor technical development it iseasy to translate between the (superficially)different concepts,

r.e. language↔ Partial recursive functionRecursive language↔ Total recursive function

When dealing withlanguagesand testingmembership, we use

Decidable

When dealing with functions and theirev aluation, we use

Computable

350 Turing Machines

The Church-Turing Hypothesis(and its Implications)

Turing Machines were described as themost ‘powerful’

machine class within the ‘hierarchy’ ofmachine types.

In the commentary following Theorem 23, itwas claimed that

‘When designing aTM , M: if it i s possiblealgorithmically

to describeM actions, then it may beassumed

it is possible formally to defineδ for M.’

We hav e further seen that, the class of lan-guages accepted by someTM, is exactly theclass of languages for which some formalgrammar can be defined.

Turing Machines 351

Finally, it has been argued thatTMs providea ‘natural’ mechanism for computing thevalues of multi-argument numeric functions.

Question

What basis is there for accepting the firsttwo assertions?

Certainly there are a number of reasons whythese may appear to be ‘exaggerations’,

For example:

a) the available ‘actions’ arevery limited: a‘program’ can change a single symbol at atime; can only ‘move’ l eft or right of its cur-rently scanned tape cell;

b) ‘real’ computers have a much more ver-satile set of ‘instructions’ provided.

352 Turing Machines

It is, however, not of any importance, thatthe actions available to aTM program (i.e.δ ) are rather limited (by comparison with a‘typical’ processor instruction set).

What matters and goes some way to justifythe assertions made, is that:

The capabilities of Turing Machinesare sufficient

tosimulate

the the operation ofany ‘ real’ computer,whether we look at such as

carrying out a decision processs(i.e. recognising a language)

orCalculating a function.

Turing Machines 353

The formal definition of Turing machines,can be seen as embodying the followingfundamental assertion

All ‘reasonable’models of computationindefining ‘effective’ algorithms must be suchthat:

a) ‘Programs’ specified in the model arefinite

(one cannot ‘write’infinite programs).

b) A ‘program’ may only employ ‘opera-tions’ that are within the capability providedby the ‘model’.

c) There can only be afinite number of‘basic’ operations provided within any suchmodel.

TheChurch-Turing Hypothesis is a preciseformulation of this assertion.

354 Turing Machines

The Church-Turing Hypothesis

Version 1: (Decidability)If L ⊆ Σ* is accepted within some ‘reason-able’ model of computation thenL is recur-sively enumerable.i.e. there is aTM, M , acceptingL.

If L ⊆ Σ* is recognisedwithin some ‘rea-sonable’ model of computation theL isrecursive.i.e. there is aTM, M , accepting allw ∈ Land rejecting allw ∈ L.

Version 2: (Computability)If f : ( N ∪ 0 ) k → N is a partial functionthat can be computed within some ‘reason-able’ model of computation thenf is apar-tial recursive function.i.e. there is aTM, M , with f (k)

M ≡ f .

If f is a total function computed withinsome ‘reasonable’ model of computationthen f is atotal recursive function.

Turing Machines 355

Discussion

The Church-Turing Hypothesis may be sum-marised, informally, as:

"any ‘computational problem’ (functioncomputation, language membership) thatcan be ‘effectively programmed’ within some‘reasonable’ model, can be ‘solved’ using aTuring Machine."

This hypothesiscannotbeproved,[ it assumes some ‘intuitive view’ of what‘ reasonable model’ means: there is,however,noprecisedefinition that can capture this.]

There is one consequence of the CTH thatis, perhaps, not immediate from the wordingin terms ofTMs being able to replicate theoperation of any ‘ reasonable model’.

356 Turing Machines

Suppose it can be proved that some lan-guage,L, is not recursive (resp.not recur-sively enumerable).

What would such a result suggest whentaken in conjunction with the Church-Turinghypothesis?

It would indicate thatno effective algorithm whatsoever

could be found to recogniseL (resp. acceptL)

Thus, a proof thatL is not recursive is muchmore than a result about ‘technical limita-tions’ of Turing Machines:

in view of CTH, such a proof constitutes ademonstration of theimpossibility of recog-nising L by any ‘ realistic’ algorithm.

Turing Machines 357

Supporting Evidence for CTH

Although the Church-Turing hypothesis can-not be proved, its validity has not been‘seri-ously’ challenged since its formulation in1936.

Since then there have been a great numberof different ‘models of computation’ pro-posed: many of these bear no resemblance to‘machine’ models or other ‘computer-like’systems.

So we have,λ-calculus

Post SystemsUnlimited Register Machines

Markov AlgorithmsGodel-Herbrand-Kleene Calculus

Horn ClausesQuantum Turing Machines

etc etc

358 Turing Machines

The λ-calculus will have seen by some ofyou in COMP205.

Horn Clauses are foundational to Logic Pro-gramming (COMP208).

Every one of these systems (and all othermodels that have been put forward) can beshown to beno more powerful than TuringMachines.

The Church-Turing Hypothesis justifies oursubsequent practice of developing TuringMachine descriptions using a

high-level algorithmic description.

instead of indicating the form ofδ .

The advantages of this approach will be seenin the next section.

Turing Machines 359

COMP209


Section 11

Universal Turing Machines

360 Universal Turing Machines

Introduction

Even taking into the Church-Turing Hypoth-esis as a basis for the existence ofTM, ML ,that can accept/recognise any decidablelan-guageL, there is one facet ofTM descrip-tion which does not reflect ‘real program-ming’ practice.

Consider the process of realising, e.g. a Javaimplementation of some algorithm.

The program is (ultimately) interpretedwithin a single framework, i.e. the Java Vir-tual Machine.

It is obvious that one does not ‘create’ anew‘instance’ of such a machine for every pro-gram.

Universal Turing Machines 361

In other words, in ‘standard’ programmingenvironments there is a system,S, that:

Given a suitable description of any ‘valid’program P and input datax ‘controls’ theexecution ofP on x.

It is often the case that such systems may bedeveloped using the sameHLL as that of theprograms ‘controlled’.

For example,Compilers

Operating Systems (Unix and C)

The model forTMs, however, that we havepresented specifies the ‘program’,δ , as‘hard-wired’ into the machine.

In this section we construct auniversal Tur-ing Machine, i.e. asingleTM that can sim-ulate any giv en TM, M .


Universal Machines

We first give a precise definition of what willbe understood by the concept of ‘universalmachine’.

SupposeMC is some model of computation(e.g. Turing Machines,PDA, Java VM) andP some valid program in the modelMC (i.eδ , δ , Java code).

UMC is aUniversal Machine for MC, if

u1)UMC is anMC-program.u2) UMC takes as inputany MC-programPandany input w for the programP.u3)UMC halts on <P, w > ⇔ P halts onwu4) For language recognition,

< P, w > is accepted/rejected⇔

P accepts/rejectswu5) For function computation, the valuereturned byUMC equals that returned byPon inputw.


Thus, in the specific context ofTuringMachines, a Universal Turing Machine(UTM) is one which:

a) Starts with an‘encoding’ (η(M)w))

of aTM M and inputw for M on its tape.

b) Halts if and onlyM halts onw.

c)UTM accepts/rejectsη(M)w

⇔M accepts/rejectsw.

d) fUTM(η(M)w)) = y ⇔ fM ( w ) = y.

So if one can construct auniversal Turingmachine, this can be viewed as a ‘natural’counterpart to a ‘standard’ stored programcomputer.

The remainder of this section describes sucha construction.


Design of a Universal Turing Machine

In order to simplify the design it will behelpful to establish some simple technicallemmata.

These in turn will show that,

The alphabets can be restricted toΣ = 0, 1 andΓ = 0, 1, B

UTM can be described as amultiple tape

machine.

Using the second property, we can dedicateseparate tapes to holding

InputWorkspace

Outputetc


Using only a Binary Alphabet

Given any TM,M = ( Q, Σ, Γ, q0, B,δ , qA, qR )

there is mappingφ : Γ → 0, 1 * and aTMMb = (Qb, 0, 1, 0, 1,B, q0, B,δ b, qA, qR)such that

w ∈ L(M) ⇔ φ (w) ∈ L( Mb )

Proof: (Outline)SupposeΓ − B = γ1 , . . . ,γ k . Use theword 1 j 0k+1− j to code γ j e.g. ifΓ = 1, 2, 3, 4, B t hen 143 would be coded

10000 11110 11100.Each transition ofδ in M is simulated by asequence of transitions ofδ b in Mb whichwill always end with the tape-head ofMbpositioned over the start of some block ofk + 1 symbols. Any move of M just requireschanging the current block ofk + 1 scannedby Mb and moving k + 1 places left or right.Sincek is constant this is easily achieved.


Multiple Tracks and Multiple Tapes

In a multi-track TM the tape is consideredas divided into

k tracks,each track square recording one symbol.

The tape head reads and printsk-tuples.

x1 x2. . . xi

. . . xn B . . .

B B . . . B . . . B B . . .

B B . . . B . . . B B . . .

M

1 2 i n. . . . . .

Tr ack 1

Tr ack 2

Tr ack 3


The use of multiple-tracks is simply aprogramming ‘trick’

andnot a change in the definition ofTM wehave been using.

Since the number of tracks isfixed (k) allthat is being used is the obversation that:

If Γ is the alphabet for a 1-trackTM, M ,then there are

|Γ |k possibilitiesfor k-tuples fromΓ on ak-track tape.

i.e. ‘multi-track’ TMs simply use a largerfinite alphabet.


k-Tape Turing Machines

A k-tape TM, employs k ≥ 1 distinct tapeseach of which is scanned by a separate tapehead moving independently.

The combination determines the symbolwritten to each tape and the direction differ-ent heads move in.

Thusδ is nowδ : Q × Γk → Q × Γk × L, R k

The input is held on Tape 1, and to startev ery tape head is scanning its left-most tapecell.

[ A k-tapeTM shouldnot be thought of as a‘parallel’ computer model: its actions arestill controlled by a single ‘program’]


k-tapes vs. 1-tape

Theorem 25:

If L ⊆ Σ* is accepted by ak-tapeTM, Mk,then L is r.e, i.e. accepted by somesingletapeTM, M1.

If L ⊆ Σ* is recognised by a k-tape TM,then L is recursive, i.e recognised by somesingletapeTM,

Proof: (Outline)Given a k-tape,TM, Mk, its actions are sim-ulated by a 2k-track single tapeTM, M1.

For each tape,Tapei of Mk, the tape ofM1uses one track to record the contents ofTapei and one track to record the currentposition of the tape head forTapei .

A move is simulated byM1 scanning its tapeto find thek symbols currently scanned oneach tape; and then updating the track sym-bols and head positions.


. . . c11 c1

2 c13

. . . c1i−1 c1

i c1i+1

. . .

. . . c21 c2

2 c23

. . . c2i−1 c2

i c2i+1

. . .

. . . . . . . . .

. . . ck1 ck

2 ck3

. . . cki−1 ck

i cki+1

. . .

k-tape Turing machineMk

Head1 ∅ ∅ ∅ . . . ∅ ⊕ ∅ . . .

Tape1 c11 c1

2 c13

. . . c1i−1 c1

i c1i+1

. . .

Head2 ∅ ⊕ ∅ . . . ∅ ∅ ∅ . . .

Tape2 c21 c2

2 c23

. . . c2i−1 c2

i c2i+1

. . .

. . . . . . . . . . . . . . .

Headk ∅ ∅ ⊕ . . . ∅ ∅ ∅ . . .

Tapek ck1 ck

2 ck3

. . . cki−1 ck

i cki+1

. . .

1-tape Turing machineM1


We may now assume that anyTM, M , is( Q, 0, 1 , 0, 1,B , q1, B,δ , q2, q3 )

i.e. for any TM, M , we can construct anequivalent machine,M’ , which employs abinary alphabet, hasq1 as its start state,q2as itsacceptstate, andq3 as itsreject state.

A TM satisfying this is instandard formStandard= M : M is in standard form

Theorem 26: There is an ‘encoding’scheme,

η : Standard→ 0, 1 *

such thatLcode = η( M ) : M ∈ Standard

is recursive.i.e. we can build a TM, M , that halts andaccepts any input that defines the encodingof some TM in standard form; halts andrejects any input not corresponding to suchan encoding.


Proof: The key observation made is that theactions of M ∈Standard are completelydescribed by its transition function.

There is no needexplicitly to encode thefact that q1 is the start state andq2, q3 thehalt states: forM in standard form this isalwaysthe case.

Thus to describeM , all thatη( M ) must rep-resent is the set of moves

δ ( qi ,σ ) = ( q j ,γ , D ) D ∈ L, R, Since |Γ | = 3 there areexactly 3(|Q| − 2)such moves.(there are no transitionsfrom halt states).Let Γ = 0, 1, B = γ1,γ2,γ3 Each move δ ( qh,γ i ) = ( q j ,γ k, D ) isencoded as abinary word

moveh,i =

0h 1 0i 1 0j 1 0k 0 if D = L

0h 1 0i 1 0j 1 0k 00 if D = R(m1)

Finally,η( M ) is giv en by111move1,111move1,211. . . . 11move|Q|,211move|Q|,3111


To see that the languageLcode is recursive, itsuffices to observe that given w ∈ 0, 1 *

for w to be inLcode all of the following mustbe true:

a) w beings and ends with the word 111.b) w consists of words

moveh,iseparated by 11.c) There are

exactly 3wordsmoveh,? for anyqh.

Given w ∈ 0, 1 * that satisfies (a),(b) and(c) it is easy to check that eachmoveh,i sub-word is valid, i.e. that it observes the form of(m1), with

1≤ i ≤ 3 (γ i ∈ 0, 1, B )

1≤ k ≤ 2 (γ k ∈ 0, 1 )

moveh,i ends with0 or 00


Theorem 27:There is a Turing machine,UM , which givenu ∈ 0, 1 * acts as follows:

U1) If u = η(M) x for someη( M ) ∈ Lcode,thenUM simulatesM on inputx, i.e.

UM halts and accepts (rejects)η(M)x⇔

M halts and accepts (rejects)x

UM fails to halt onη(M)x⇔

M fails to halt onx

U2) If u = η( M )x for any η(M) ∈ Lcode,UM does nothing.


Proof: (Outline)UM is formed as a 3-tape machine:

Tape 1 holds the input wordu ∈ 0, 1 * .Tape 2 will hold the contents ofMs tape(assumingu = η(M)x)Tape 3 holds the word in 0i 1 * to indicateM (at this stage) is in stateqi .

To start UM checks ifu = η(M)x, and if so,copies x to Tape 2 and writes 0 to Tape 3(start stateq1 of M).The tape head on Tape 2 is set to location 1.

For each move of M , UM looks up the cur-rent state and symbol scanned on Tape 2 inthe encodingη(M). Using these data,UM ,can update Tape 2 and the recorded state.

If the state recorded on Tape 3 is 00 (accept)or 000 (reject), thenUM halts and accepts(00), resp. halts and rejects.


COMP209


Section 12

Undecidable Languages

Undecidable Languages 377

Introduction

Every language that we have seen on themodule so far is

Recursively Enumerable

In fact, they hav eall been

Recursive

In a ‘practical’ computing context, thisequates to these languages having ‘effective’decision algorithms, i.e. we can

‘write a program’in, sayJava, that given a binary word, w, asits input,

a)Always comes to halt.b) Returns the resultacceptif w is in Lc) Returns the resultreject if w is not inL

for any of the specificL seen so far.

378 Undecidable Languages

Albeit that only avery small number of dif-ferent languages have been viewed, onemight conjecture from these examples that:

a) All languagesL ⊆ 0, 1 * arerecursive, (decidable)

or failing this,

b) All languagesL ⊆ 0, 1 * arer.e. (semi-decidable)

or failing this,

c) All‘interesting’ languages

are decidable.

or failing this,

d) All‘interesting’ l anguages

are semi-decidable.


Leaving aside the (arguably subjective)notion of what constitutes an ‘interesting’language, what can be said about the firsttwo, i.e.

Is it the case that all languages aredecidable?

or(at least) semi-decidable?

Before addressing these, it worth noting thatthere aretwo issues raised in attempting toprove that neither is true.

1) By anexistenceproof:i.e. giving an ‘indirect ’ argument that theremust be some languages that are not r.e.(semi-decidable)

2) By a proof thatan explictly definedlanguage

is not decidable.


Explicitly Defined Languages

SupposeL ⊆ 0, 1 * is suspectedto be anon r.e. language, i.e. notsemi-decidable.

L must containinfinitely many words.

How can we ‘describe’L?

We cannot ‘list’ all of the words inL.We cannot give a grammar forL.We cannot give a program (e.g.TM) for L.

(because either of the last two would implyL is semi-decidable)

Informally, L is an explicitly defined lan-guage if there is a

finite descriptionthat characterises which words belong toL.

In trying to find explicitly defined non r.e.languages, such descriptions must be ‘adhoc’, i.e. non-computational.


Some Examples

2 ‘non-computational’ definitions we haveseen already are:

1 k : k is a prime number 1 k 0k2

: k ≥ 1

Such descriptions do notexplain

given 1k, how to show if k is prime.given 1i0 j , how to show if j = i2.

Each can be described by aTM and by aformal grammar.

A finite description of a languageL maygive no indication of how to construct adecision algorithm for L.

[ Regarding ‘interesting’ l anguages, thatthese be‘explicitly defined’ is a minimal cri-terion.]


Using Closure Properties

The existence ofCFLs which are not deter-ministic, was proved by showing thatDCFLs were closed undercomplementation

Since it was known thatCFLs (in general)did not have this property, an explicit con-struction ofL ∈CFL − DCFL would followfrom a proof that

L ∈CFL but Co− (L) ∈CFL.

Can we use a ‘similar’ indirect method toassist in finding explicitly definednon-r.e.languages?

Theorem 28:(Closure Properties ofRecursiveLanguages)If L1, L2 are recursive (i.e. decidable) thenso are:

L1 ∪ L2 ; L1 ∩ L2 ; Co− ( L1 ).Proof: Easy exercise.


On the other hand,

Theorem 29:(Closure Properties ofr.eLanguages)If L1, L2 are r.e. (i.e. semi-decidable) thenso are:

L1 ∪ L2 ; L1 ∩ L2If L is r.e.but not recursive then

Co− (L) is not r.e. (semi-decidable).

Proof: Let M1, M2 acceptL1, L2.M∪ acceptsw ∈ L1 ∪ L2 by alternating asimulation ofone move of M1 on w withone move of M2 on w. If either simulationreaches the accept state,M∪ halts andaccepts.

L1 ∩ L2: exercise.


The r.e. languages, however, are not closedundercomplement

SupposeL is r.e. but not recursive.

If Co− ( L ) were r.e. there would beTMsML acceptingL

MCL acceptingCo− (L).Consider theTM, M , that on inputw alter-nates a simulation ofML on w with a simu-lation of MCL on w.

Certainlyexactly one of these must reach itsaccept state:

if ML acceptsw, thenM halts and accepts;if MCL acceptsw, thenM halts and rejects.

This implies that,L is recursive: a contra-diction, whenceCo− (L) cannot be r.e.


It follows from Theorem 29, that anexplictly defined language which is

not semi-decidablecan be constructed from an explicitlydefined language,L, if it can be proved that:

L is semi-decidableand

L is not decidable.

Since for such languagesL the language w : w ∈ L

is not semi-decidable.


The Halting Problem(for Turing Machines)

One aspect of the encoding functionη( M )for Turing machines in standard form, is thatit provides a means by which

computational questionsconcerning the behaviour of

specific Turing machine programscan be formulated.

ExamplesThe question ‘Does M acceptε’ , is equiv-alent todeciding

η( M ) ∈? η( M ) : ε ∈ L( M ) The question

‘Does M make at least n2 moves onsomew of length n, for every n’?

is equivalent to deciding

η( M ) ∈?

η( M ) :∀ n ∃ w ∈ 0, 1 n on which

M makes≥ n2 moves


Notice that questions of this form

‘Does this Turing machine belong toa particular class of Turing machines’

and the existence or otherwise of decisionalgorithms that treat them, are of muchgreater significance than merely technicalquestions aboutoneformalism.

Suppose we take any high-level program-ming language and appropriate abstract plat-form upon which programs may beexecuted.We may equally phrase such questions as,

‘Does thisprogram havea particular behaviour’

e.g.‘Does this program compute a specific function’

‘Does this program always terminate’‘Does this program behave identically to another’

etc etc


We shall return to the implications of thisinterpretation later.

We now consider a particular property ofTuring machines and its associated decision(language membership) problem.

This is known as,The Halting Problem

and is the language (over 0, 1 * ),

LHP =

u :

u = η( M )w with M ∈Standard,

and

M on inputw halts

Thus,LHP comprises the set ofpairs ofTM programs (M) and Inputs (w)

for which a positive answer would bereturned to the question,

‘Does this Turing machine, M, eventuallyreach one of the halting states(qA or qR)when given the input w’?


Theorem 30: LHP is undecidable.i.e. there isno TM, M , such that

∀ w ∈ LHP, M halts and accepts.

and∀ w ∈ LHP, M halts and rejects.

(HP)

Proof: Suppose, by way of contradiction,thatthere is aTM, MHP, such that

∀ w ∈ LHP, MHP halts and accepts.

and∀ w ∈ LHP, MHP halts and rejects.

Without loss of generality, let MHP be instandard form.

Now consider a TM, MNOT−HP whosebehaviour is the following:


a) MNOT−HP checks if its inputu is in Lcode,i.e. takes the formη( M ) -(an encoding of aTM, M)

b) If u ∈ Lcode then

MNOT−HP goes into an infinite loop.

otherwise (//u = η(M) ∈ Lcode)

MNOT−HP simulates the actions ofMHP on input η( M )η( M )

c) If MHP would acceptη(M)η(M)

MNOT−HP goes into an infinite loop.

If MHP would reject η(M)η(M)

MNOT−HP halts and accepts


Notice that the construction ofMNOT−HPassumesonly the existence of aTM, MHP,deciding the Halting Problem languageLHP.

MHP is used as a ‘sub-procedure’ byMNOT−HP in order to determine if

theprogram, (M),(given in the encoded formη(M)),

halts whengiven its own description, η(M), as input

ThusMNOT−HP determines the answer to thequestion,

‘Does this program halt when given its owndescription as input?

If the answer is ‘Yes’:MNOT−HP enters an infinite loop;

If the answer is ‘No’:MNOT−HP halts and accepts;


[Note: there is nothing ‘exceptional’ asregards a program,P, being given its owndescription as input.This is common in ‘practical’ computing,A Pascal compiler may be written in Pascal;An editor may edit its source code, etc]

QuestionWhat is the input forMNOT−HP?

AnswerA TM program description -η( M ).

QuestionWhat isMNOT−HP?

AnswerA TM (in standard form).

∴ MNOT−HP has an encodingη( MNOT−HP ) ∈ Lcode

and this code can be givento MNOT−HP itself

as a possible input.


What would happen in this case?i.e what happens whenMNOT−HP is given

its own description, η( MNOT−HP ),as input?

The description of MNOT−HP’s operationshow that there are only 2 possibilities:

Either

MNOT−HP enters aninfinite loop onη( MNOT−HP )

or

MNOT−HP halts and acceptsη( MNOT−HP )


Suppose

MNOT−HP enters aninfinite loop onη( MNOT−HP )

This means that, at Step (c) the simulation ofMHP indicates

η( MNOT−HP )η( MNOT−HP )would be accepted (byMHP).

MHP, howev er, is assumed to decide theHalting Problem Language, LHP, and so,

η( MNOT−HP )η( MNOT−HP ) ∈ LHP,i.e.

TheTM MNOT−HPhalts

on the inputη( MNOT−HP ).

This contradicts the premise,MNOT−HP enters aninfinite loop onη( MNOT−HP )


On the other hand, suppose

MNOT−HP halts and acceptsη( MNOT−HP )

This means that, at Step (c) the simulation ofMHP indicates

η( MNOT−HP )η( MNOT−HP )would be rejected (byMHP).

Hence,η( MNOT−HP )η( MNOT−HP ) ∈ LHP,

i.e.TheTM MNOT−HP

does not halt(i.e. enters an infinite loop)on the inputη( MNOT−HP ).

Again this contradicts the premise,MNOT−HP halts and acceptsη( MNOT−HP )


In summary, the definition of the TMMNOT−HP, is such that there is

no consistent outcome whenits input is η( MNOT−HP )

From the assumption thatMHP recognisesLHP, and the design ofMNOT−HP we see

MNOT−HP halts onη( MNOT−HP )⇔

MNOT−HP does not haltonη( MNOT−HP ).

So we deduce that,MHP cannotbe constructed.

i.e LHP is undecidable.


The Halting Problem Language,LHP is,however,

semi-decidable(recursively enumerable)

i.e. There is aTM, M (a)HP, that halts and

accepts inputs η( M ) w whenever the TM,M halts onw; M (a)

HP may, howev er, fail tohalt whenM does not halt onw.

[All that M (a)HP does, having checked that its

input is of the formη( M )w, is to simulatethe machineM on inputw: if the simulationhalts thenM (a)

HP halts and accepts.]

This gives the following,

Corollary : The languageCo− ( LHP ), i.e.

u :

u = η( M )w with M ∈Standard,

and

M on inputw doesnot halt

is not semi-decidable(i.e. not r.e.)


Discussion

The exact result established in Theorem 30may be, informally, phrased as:

‘ It is not possible to construct a Turingmachine (program), that can distinguishTuring machine (programs) that halt on agiven input, from those that fail to halt on agiven input.’

On the surface, this merely seems to be aneclectic technical detail regarding the scopeand power of Turing machine programs.

The Church-Turing Hypothesis implies thatthis result has more far-reaching conse-quences.


Tw o Consequences of Theorem 30from the Church-Turing Hypothesis

‘ It is not possible to construct

any ‘effective’ algorithm,

that can distinguish TMs that halt on a giveninput, from those that fail to halt on a giveninput.’

If S is any ‘ reasonable’ model of computa-tion that is ‘at least as powerful ’ as TuringMachines, then

‘It is not possible to construct

any ‘effective’ algorithm,

that can distinguish programs in the systemS, that halt on a given input, from those thatfail to halt on a given input.’


In other words,not only is‘the Halting Problemfor Turing Machines’

impossible to solve by using ‘effective’algorithms,but also

any analogous ‘Halting Problem’onany general model of computation.

For example,For any sufficiently general high-level pro-gramming language, (Java, Ada, etc) therewill never be algorithms that can distin-guish betweenprograms that halt on giveninput data and those which fail to halt.

So, for example, it is not possible to ‘embed’within a Compiler, a test if a source programand data might loop indefinitely.


Summary

In proving ‘undecidability’ properties of lan-guages, our concern is to argue that

no ‘effective’ algorithmic method exists.

Turing Machines (by virtue of the Church-Turing Hypothesis) provide onemodel uponwhich to build such arguments.

The Church-Turing Hypothesis contends,

If there is no Turing machine deciding(semi-deciding) some languageLthen there is

no ‘effective’ algorithm whatsoeverfor deciding (semi-deciding)L.

i.e. The critical point isnot merelythe impossibility of aTM program

butthe impossibility ofany program

(i.e algorithm)


Undecidable Languages ‘related’to Halting Problems

The construction used in the proof of Theo-rem 30, relies on treating the encoding of aparticularTM (MNOT−HP) as both a

programand an

input word for a program

This was all right since the general HaltingProblem, i.e.any TM with any input word,was being examined.

From what has been presented, however, itmight be argued that:

‘Even though, in general, there is no methodthat ‘works’ for ev ery combination of pro-gram and input, it might be possible todesign algorithms that work forev ery pro-gram provided that only certain inputwords to these are tested.’


For example, in the context of trying toembed a ‘halting test’ into a high-level lan-guage compiler, Theorem 30, leaves openwhether deciding halting on, e.g. emptyinput data, is possible.

In fact, even such ‘simplified’ Halting Prob-lems are undecidable.

For w ∈ 0, 1 * the languageLwHP is

LwHP =

u :

u = η( M ) with M ∈Standard,

and

M on inputw halts

Theorem 31:∀ w ∈ 0, 1 * , Lw

HP is undecidable.


Proof: Suppose thatMwHP recognisesLw

HP,i.e.

∀ u ∈ LwHP, Mw

HP halts and accepts.

and∀ u ∈ Lw

HP, MwHP halts and rejects.

MwHP can be used to decide thegeneral

Halting Problem,LHP as follows:

Build a TM, Reduce, which given an inputof the formη( M ) x proceeds by construct-ing a newTM, M’ usingη( M ) and x.

The machine M’ simply simulates theactions ofM on its inputx.

Having constructedM’ , Reducethen tests ifη(M’ ) ∈ Lw

HP, using MwHP.

Notice that M’ ignores any word on itsinput.


Reduce:halts and acceptsη(M) x

if MwHP halts and acceptsη( M’ );

halts and rejectsη(M) xif Mw

HP halts and rejectsη( M’ );

Exactly one of these must occur, i.e. MwHP,

always reaches some halting state.

But,Mw

HP halts and acceptsη( M’ )⇔

M’ halts (on inputw)⇔

M halts on inputx

SimilarlyMw

HP halts and rejectη( M’ )⇔

M’ does not halt (on inputw)⇔

M does not halt on inputx


Thus if Reducecould be built thenLHP would be decidable.

From Theorem 30,LHP is undecidable,

So theTM Reducecannot be constructed.

Since the only assumption made in its defi-nition is that aTM recognisingLw

HP exists,we conclude thatLw

HP is undecidable.

In addition to the result that ‘restricted Halt-ing Problems’ of this type remain undecid-able, there is one important feature of theproof that should be noted:

Use ofreduction: i.e. defining an algorithmfor one problem (LHP) using an algorithmfor another (Lw

HP).

Informally, it provides a means for demon-strating a property of a languageL’ by relat-ing it to another languageL.


Deciding Properties of LanguagesRice’s Theorem

An important consequence of the existenceof a universal Turing machine is that wecan define a

universal language,Luniv ⊂ 0, 1 * , i.e.Luniv = u : u = η(M)x, η(M) ∈ Lcode, x ∈ L( M )

This language isrecursively enumerable(semi-decidable)

but it is not recursive (decidable).

A further important consequence of theencoding mechanism is that it provides acomputational process for examiningprop-erties of r.e. languages, i.e.

L ⊆ 0, 1 * is r.e.⇔ ∃ M with L( M ) = L

⇔∀ x ∈ L η( M )x ∈ Luniv

∀ x ∈ L η( M )x ∈ Luniv


Thus identifyingr.e. languages

with theencoding

of any TM accepting them, provides a meansfor examining questions such as,

Is ε ∈ L?Is L = ∅

Is L recursive, context-free, regular ?etc

via thelanguage of related TM encodings:

So, if M is such thatL( M ) = L, the ques-tions above correspond to,

η( M ) ∈? η(M) : ε ∈ L( M )

η( M ) ∈? η(M) : L( M ) = ∅

η( M ) ∈? η(M) : L( M ) is recursive


Properties and Familiesof Languages

Recall that afamily of languages, is just asubset, ℜ, of the set of all possible lan-guages over some alphabet,Σ.

Thus, the question,‘Does the language L havea particular propertyΠ?’

is equivalent to,‘ Is L a member of thefamily , ℜΠof languages having propertyΠ?’

We now examine the issue of whichproper-ties of the r.e. languages is it possible todevelop decision algorithms for.

i.e. Given some r.e. languageL;for whichproperties, Π, can one decide if

L ∈Π?


Some Example Properties

We useΣ = 0, 1 .

The property ofL beingempty is the prop-erty, Π∅, containingexactly one language,i.e.

Π∅ = ∅

The property of L containing theemptyword is

Πε = L ⊆ 0, 1 * : ε ∈ L

The property ofL being aunary languageis,Πunary = L ⊆ 0, 1 * : L ⊆ 1 * or L⊆0 *


Very Important

The propertyΠ∅, - theempty language- isnot the same as

the empty propertyThe family ℜ∅ corresponding to the formercontains

exactly one language(i.e. the empty language∅)

The family corresponding to the latter con-tains

no languages at all.


Of course the phrasing,

Given some r.e. languageL and propertyΠ;decide if L ∈Π? is not usable in the senseof formulating a decision question.

2 issues have to be addressed:

a) How is a language,L, to be presented?b) How is a ‘property’Π to be viewed?

Without loss of generality, it suffices to con-sider languages over 0, 1 .

Furthermore, since afinite description of alanguage has to be given, we can only con-sider

properties ofrecursively enumerablelanguages

[ a general algorithm testing ifL ∈Π mustbe able to treat descriptions of differentL insome ‘uniform’ manner: this rules out tryingto interpret ‘ad hoc’ natural language defini-tions.]


Concentrating on properties of r.e. languagesnow giv es a solution to (a) and (b) above.

a) L is r.e. if and only if there is aTM, M , instandard form acceptingL.

∴ The questionL ∈Π, can be viewed as adecision problemconcerning

Turing machine (encodings) -η( M ).

Similarly, aproperty , Π, is asubsetof (r.e.)languages, and soΠ can be interpreted as

asetof Turing machine (encodings).

So the question,‘ Is L ∈Π’

can be addressed by defining,LΠ = η( M ) : L( M ) ∈ Π

and then, given ML with L(ML) = L, decid-ing L ∈Π is

equivalent to decidingη( ML ) ∈? LΠ


Examples

For Π∅ being the property‘L is the empty language’

LΠ∅= η( M ) : L( M ) = ∅

(i.e. for all inputs x, M never reaches itsaccept state onx).

For theempty propertyLΠ = ∅

(i.e. no r.e. language (TM) would beaccepted.)

For the propertyΠR.E of L being r.e.LΠR.E.

= η( M ) : L(M) is r.e. i.e.ev ery r.e. language (TM) is accepted.)


The ‘Trivial’ Properties

The 2 properties,The Empty Property

The Property of being r.ewill be denoted subsequently by

ℵ andR. EThus,

ℵ = ∅R. E = L : L is r.e.

Both of the correspondinglanguages

L ℵ = ∅LR.E. = η( M ) : L(M) is r.e.

aredecidable(recursive).

For Lℵ, a TM, simply enters its reject stateonev ery input.For LR.E, a TM simply checks if its input isa TM encoding and, if so, enters its acceptstate (rejecting otherwise). Hence

LR.E. ≡ Lcode.


These properties -ℵ andR. E. - are called

Tr ivial Properties

The reason being that:

In the case ofℵ:No r.e. language has it.

In the case ofR. E.:Every r.e. language has it.

Thus, the decision processesη( M ) ∈ Lℵ andη( M ) ∈ LR.E.

are ‘trivial’.

Again we emphasise that theThe EmptyProperty (ℵ)

andThe Property of being anEmptyLanguage(Π∅)

are different .


Notice that the propertyΠ∅ of a languagebeing empty has been considered earlier inthe context of a

language descriptionbeing supplied in the form of aDFA orCFG.

In both cases there were easy decision meth-ods available.

When L is presented as aTM encodingη( M ), the question

‘Does M accept any words?’(i.e. ‘Is η( M ) ∈ LΠ∅

?’)is not straightforward .

As a simple illustrative exercise consider thefollowing r.e. languages:

Fermat= 1n : n > 2 and ∃ x, y, z with zn = xn + yn

G’bach= 12n : n > 1 and ∀ primesp, q, p + q = 2n

(G’bachis actuallyrecursive)

What would a decision method forLΠ∅

imply in these cases?


We now hav e a complete framework withwhich to consider the question:

‘Which properties,Π, of r.e. languagesaredecidable?’

That is, forΠ a set of r.e. languages, when isit the case that

LΠ = η( M ) : L(M) ∈Π is decidable (recursive)?

The final result that will be proved in thismodule is

Rice’s Theorem

LΠ is decidable⇔

Π is a Trivial Property.

Thus, onlyLℵ andLR.E are decidable.


Proof of Rice’s Theorem

It has already been shown that ifΠ is trivialthenLΠ is decidable.To complete the proof it must be shown thatIf LΠ is decidablethen Π = ℵ or Π = R. E..

Suppose the contrary, i.e.

There is some property,Π, such that

Π is not empty, (i.e.ℵ) ; Π ⊂ R. E

LΠ is decidable

From the fact thatΠ = ∅, it follows thatΠ containsat least oner.e. language.

From the fact thatΠ = R. E, it follows thatThere isat least oner.e. languagenot in Π

Since LΠ is decidable, there is aTM, MΠ,such that

∀ η( M ) L( M ) ∈Π MΠ halts and acceptsη( M )

∀ η( M ) L( M ) ∈Π MΠ halts and rejectsη( M )


Using theTM, MΠ, we show that theuni-versal language,

Luniv = u : u = η(M)x, x ∈ L( M )

is decidable.

Since we know that Luniv is not decidable,this would imply the contradiction needed tocomplete the proof.

First observe we may assume thatthe Empty Language

is not one of the r.e. languages inΠ:

[ If ∅ ∈Π, then we can use the propertyR. E − Π, i.e the property

‘ L is r.e. andL ∈Π’This property is not trivial, does not containthe empty language, and is decidable ifLΠis decidable.]

Let L be any r.e. language inΠ.(thus,L = ∅).


In summary, we hav eso far:

MΠ aTM recognisingLΠ.(by the assumptions made earlier).

L ∈Π a (non-empty) r.e. language inΠ;

ML aTM acceptingL (sinceL is r.e.)

We now show how to combine:ML andMΠ

in order to build aTM, Mu, that:

On inputy = η( M ) x:Halts and acceptsif x ∈ L( M )Halts and rejectsif x ∈ L( M )

i.e. Mu would prove that Luniv is decidable.


The TM, Mu, behaves as follows givenη( M )x:

1) Mu usesη( M ) x to compile thedescription of aTM, Check( M , x )that acts as follows on inputw ∈ 0, 1 * :

2) Check( M , x ):a) SimulatesM on inputx;b) If x ∈ L(M) then

Check(M , x) enters an infinite loop.c) If x ∈ L(M) (M halts and acceptsx) then

Check(M , x) simulatesML on inputw;Check(M , x) halts and acceptswonly if ML halts and acceptsw.

3) Having constructedCheck( M , x ),Mu then simulatesMΠ with inputη( Check( M , x ) );If MΠ halts and accepts then

Mu halts and acceptsη(M)xIf MΠ halts and rejects then

Mu halts and rejectsη(M)x


YESNO

ML

YES

LOOPNO

M

x

w

The MachineCheck( M , x )

What do we know about the TM,Check( M , x )?

Suppose thatx is not accepted byM :thenCheck( M , x ) accepts

no words at all,(its inputw is never read).

∴ x ∈ L( M ) ⇒ L( Check( M , x ) ) = ∅If x is accepted byM thenCheck( M , x ) accepts

exactly the same language asML .∴ x ∈ L( M ) ⇒ L( Check( M , x ) ) = L( ML )


EitherL( Check(M , x) ) = ∅

(whenx ∈ L(M))or

L( Check(M , x) ) = L( ML ) = L(whenx ∈ L( M ))

SinceL ∈Π and ∅ ∈Π

we see that theTM, Mu

halts and acceptsη(M)x if η(Check(M , x)) ∈ LΠ

if x ∈ L( M )

halts and rejectsη(M)x if η(Check(M , x)) ∈ LΠ

if x ∈ L( M )

so if LΠ were decidable, thenMu decidesLuniv.

This contradiction, establishes thatLΠ is notdecidable.

Documents

COMP209 Automata and Formal Languages Section 1 Introductioncgi.csc.liv.ac.uk/~ped/teachadmin/COMP209/Automata-and-Comput… · COMP209 Automata and Formal Languages Section 1 Introduction