Unit 7 - PowerPoint · Unit 9 More Pushdown Automata Context-free Languages Pumping Lemma for CFL...

Preview:

Citation preview

1

Unit 9

More Pushdown AutomataContext-free LanguagesPumping Lemma for CFL

Reading: Sipser, chapter 2.3

2

Properties of PDAs

• An NFA can only distinguish between |Q| different characterizations.

• A PDA can distinguish between an unlimited number of characterizations.

• A PDA can recognize non-regular languages because the stack can ‘count’.

• A PDA can count ‘more than once’. But it can not mix counters. Only one active counter can be used each time.

3

Example: L = {aibicjdj |i,j0}

Construct a PDA to recognize:

L = {aibicjdj |i,j0}

We will use the empty stack model.

The basic idea:

• Push to the stack an A for each a, pop an A

for each b.

• Push to the stack a C for each c, pop a C for

each d .

syntactic computational

4

Properties of PDAs

• We had two ways to describe regular languages:

Regular-Expressions DFA / NFA

• How about context-free-languages?

computational

CFG

syntactic

PDA

CFG=PDA

Theorem: A language is context-free iff

some pushdown automaton recognizes it.

Proof:

• CFLPDA: we show that if L is CFL then a PDA recognizes it.

• PDA CFL: we show that if a PDA recognizes L then L is CFL.

5

From CFG to PDA

6

• Proof idea: Use PDA to simulate leftmost derivations.

• Leftmost derivation : A derivation of a string is a leftmost derivation if at every step the leftmost remaining variable is the one replaced.

• We use the stack to store the suffix that has not been derived so far.

• Any terminal symbols appearing before the leftmost variable are matched right away.

7

Different derivations for the same parse tree

235

23E

2 EE

2E

EEE

235

E35

E E5

E EE

EEE

5 + 3 x 2

E

leftmostderivation

rightmostderivation

E

E E

E

CFG: EEE | E+E

E0 | 1 | 2 | … | 9

control

5 + 3 x 2 ExE$

input: stack:

control

5 + 3 x 2 E$

E

input: stack:

EEE

EE

Starting configuration:

control

5 + 3 x 2

5+ExE$

input: stack:

E E5

E EE

EEE

control

5 + 3 x 2 E+ExE$

input: stack:E EE

EEE

E+EE

5+EE

control

5 + 3 x 2 ExE$

input: stack:

E E5

E EE

EEE

EE

control

5 + 3 x 2 3xE$

input: stack:

E 35

E E5

E EE

EEE

3E

control

5 + 3 x 2 E$

input: stack:

E

E 35

E E5

E EE

EEE

2 35

E 35

E E5

E EE

EEE

control

5 + 3 x 2 2$

input: stack:

2

2 35

E 35

E E5

E EE

EEE

control

5 + 3 x 2 $input: stack:

The string ‘5 + 3 x 2’ is accepted

Informally:

1. Place the marker symbol $ and the start variable

S on the stack.

2. Repeat the following steps:

– If the top of the stack is a variable A:

Choose a rule A→1…k and substitute A with 1…k

– If the top of the stack is a terminal a:

Read next input symbol and compare to a

If they don’t match, reject (die)

– If top of stack is $, go to accept state

13

From CFG to PDA

• For a given CFG G=(V,,S,R),

we construct a PDA P=(Q,,,,q0,F) where:

– Q={qstart, qloop, qaccpt}

– = V{$}

– q0=qstart

– F={qaccpt}

14

From CFG to PDA

• We define as follows (shorthand notation):

– (qstart,,)={(qloop,S$)}

– (qloop,,A)={(qloop, 1…k) | for each A1…k in R}

– (qloop,a,a)={(qloop,) | for each a }

– (qloop,,$)={(qaccpt,)}

15

From CFG to PDA

,S$qstart qloop qaccpt

{,A1…k | for rules A 1…k}

{a,a | for all a}

,$

• Construct a PDA for the following CFG G:

SaTb | b L(G)= a*bTTa |

16

Example:

,S$qstart qloop qaccpt

,SaTb

,Sb

,TTa

,T

a,a

b,b

,$

17

From PDA to CFG

• First, we simplify the PDA:

– It has a single accept state qf

– $ is always popped exactly before accepting

– Each transition is either a push, or a pop, but

not both

context-free grammar pushdown automaton

18

From PDA to CFG

• single accept state qf:

,

,

19

From PDA to CFG

• $ is always popped exactly before accepting:

{,A | A, A$}

,$

20

From PDA to CFG

• Each transition is either a push, or a pop:

,ab ,a ,b

, ,z ,z

z

21

From PDA to CFG

• For any word w accepted by a PDA

P=(Q,,,,q0,qf) the process starts at q0 with an

empty stack and ends at qf with an empty stack.

• Definition: for any two states p,qQ we define

Lp,q to be the language that if we starts at p with

an empty stack and run on wLp,q we end at q

with an empty stack.

• We define for Lp,q a variables Ap,q s.t.

Lp,q = {w | Ap,q* w}

• Note, that L(P)=Lq0,qf

22

From PDA to CFG• Consider a word wLp,q

• While running w on P, the stack is empty at p and

at q but what happens in the middle?

• Two possibilities:

– Option 1: The stack also empty in the middle

– Option 2: The stack never empty in the middle

p qr p q

stack

height

23

From PDA to CFG

Option 1: The stack also empty in the middle

• If the stack become empty at some state r then the

word wLpq can be reconstructed by a

concatenation of a word from Lpr and a word from

Lrq, thus Lpr Lrq Lpq

• In the CFG we express this by a rule: Apq AprArq

p qr

generated by Apr generated by Arq

24

From PDA to CFG

Option 2: The stack never empty in the middle

• The symbol that has been pushed at p is the

symbol that is popped at q.

• Thus, if at p we read a symbol a and moved to r,

while from state s we read a symbol b and moved

to q, aLr,sbLp,q and in CFG we have Apq aArsb

p q

generated by Ars

r s

a b

25

From PDA to CFG

Let P=(Q, , , , q0, qf) a given PDA.

We construct a CFL G=(V,,S,R) as follows*:

• V = {Ap,q | p,qQ}

• S=A

• R is a set of rules constructed as follows:

q0,qf

* Proof of correctness and further reading at the supplementary

material in the course web page .

26

From PDA to CFG• Add the following rules to R:

1. For each p,q,r,sQ, t, and a,b,

if (r,t)(p,a,) and (q,)(s,b,t) add a rule

Apq aAr,sb

2. For each p,q,rQ, add a rule Ap,q Ap,r Ar,q

1. For each pQ, add the rule Ap,p

p ra,t

s qb,t

p r q

p

pop tpush t

27

Example:

qs q0

#,,$q1

0,A 1,A

q2

,$

qs q0

#,z,$q1

0,A 1,A

q2

,$q3

,z

L(P)=0n#1n

28

Example:

start variable: AS2

productions:

ASS → ASSASS

ASS → AS0A0S

ASS → AS1A1S

ASS → AS2A2S

AS1 → ASSAS1

A00 →

...

A11 → A22 →

AS2 → A01

A01 → 0 A011

A33 →

AS1 → AS0A01

AS1 → AS1A11

ASS →

qs q0

#,z,$q1

0,A 1,A

q2

,$q3

,z

A01 → #A33

CFG=PDA

29

• We have shown that a language is context-free

iff some pushdown automaton recognizes it.

• In particular all regular languages can be

generated by CFGs and so can be recognized

by PDA.

• The class of languages accepted by non-

deterministic PDAs is larger than those

accepted by deterministic PDAs.

DPDA

30

The Context-free Languages

the regular languages

context-free languages

31

Non context-free Languages

• Consider the language L={aibici |i0}.

• When trying to build a push-down automaton that recognizes L, we can compare the number of a-'s with b-'s or c-'s but not both;

• If we compared the number of a-'s to the number of b-'s then we can't compare c-'s with any of them, as at this stage the stack (or counter) is empty.

32

• So some languages seem to be not CFL.

• The question is which?

• This can be determined using the pumping lemma for context-free languages.

Non context-free Languages

33

The Pumping Lemma - background

• Let L be a CFL and let G be a simplegrammar (no unit/ rules) generating it.

• Let wL be a long enough word (we will say later what is long).

• The parsing tree of w contains a long path from S to some leaf (terminal).

• On this long path some variable R must repeat (remember, w is long).

34

The Pumping Lemma - background

• Divide w into uvxyz

according to the parse

tree, as in the figure.

• Each occurrence of R

has a subtree under it.

xu v y z

S

R

R

35

The Pumping Lemma - background

• The upper occurrence of R has a larger subtree

and generates vxy.

• The lower occurrence of R has a smaller

subtree and generates only x.

• Both subtrees are generated by the same

variable R.

• That means if we substitute one for the other we

will still obtain valid parse trees.

xu v y z

S

R

R

36

Replacing the smaller by

the larger repeatedly

generates the string

uvixyiz at each i>0.

Replacing the larger by

the smaller generates the

string uxz or uvixyiz

where i=0.

Therefore, for all i0, wi = uvixyiz is also in L

u v y z

S

R

R

v yx

u z

S

R

x

37

The Pumping Length

• That means that every CFL has a special

value called the pumping length such that all

strings longer than the pumping length can

be "pumped".

• The string can be divided into 5 parts

w=uvxyz.

• The second and fourth can be pumped to

produce additional words in L.

• for all k0, wk = uvkxykz can also be

generated by the grammar.

38

Pumping Lemma for CFL

Lemma: Let L be a context-free language.

There is a positive integer p (the pumping

length) such that for all strings wL with

|w|p, w can be divided into five pieces

w=uvxyz satisfying the following conditions:

1. |vy|>0

2. |vxy|p

3. for each i0, uvixyizL

39

Proof - value of p• First we find out the value of p.

• Let G be a CFG for CFL L.

• Let b be the maximum number of symbols in

the right side of any rule in G.

• So we know that in any parse tree of G a node

can't have more than b children.

• So if the height of a parsing tree for wL is h

then |w|< bh (h>logb|w|).A

1 2 3 b

A123 b

40

Proof - value of p• Let |V| be the number of variables in G.

• We set p = b|V|+2 . (h>logb|p|).

• Then for any string of length p the parse tree

requires height at least |V|+2 (Note, b>1 since

there are no unit rules).

• Given a string wL, s.t. |w| p , since G has

only |V| variables, at least one of the variables

repeats (height |V|+2 |V|+1 variables +

terminal).

• W.l.o.g. assume this variable is R

41

Proof – condition 1

• To prove condition 1 (|vy|>0) we have to show it

is impossible that both v and y are .

• We use a grammar without unit rules.

• But the only way to have v=y=, is to have a

rule R R, which is a unit rule. Contradiction.

• So condition 1 is satisfied.

xu v y z

S

R

R

42

Proof – condition 2

• To prove condition 2 (|vxy|p) we will check

the height of the subtree rooted in first R =

the subtree that generates vxy.

• Its height is at most |V|+2 (R was selected as

a variable that has two occurrences within

the bottom |V|+1 levels of the parsing tree).

• So it can generate a string of length at most

b|V|+2.

• Since p= b|V|+2, condition 2 is satisfied.

u v y z

S

R

R

x

u z

S

R

x

v y

R

Replacing the smaller by the larger repeatedly generates the string uv ixy iz at each i>0.

Replacing the larger by the smaller generates the string uxz or uv ixy iz

where i=0.

Therefore, for all i0, wi=uvixyiz is also in L 43

Proof – condition 3

44

Usage of the lemma• We use the pumping lemma to prove that a

language is not context-free.

General Structure:

• Assume (by contradiction) that L is context-

free and therefore should fulfill the lemma.

• Let p be the pumping length for L.

• Select a word wL s.t. |w|>p.

• Show that for any partition of w into five parts

uvxyz such that |vy|>0 and |vxy|p, there

exists an i such that wi= uvixyiz L

• Contradiction!

45

Usage of the lemma - Example

• We use the pumping lemma to prove that the language L={anbncn |n0} is not context-free.

Proof:

• Assume, by contradiction, that L is context-free thus satisfying the lemma.

• Let p be the pumping length for L.

• We select the string w= apbpcp . wL and |w|>p, so it can be pumped.

46

L={anbncn |n0}

• Divide w into five parts uvxyz such that |vy|>0

and |vxy|p.

• There are two cases:

1. u and y are both homogeneous.

2. v or y is heterogeneous.

47

.Let }.0|{ uvxyzcbawncbaL pppnnn

ap bp cp

v y

v y

v

y

Case 1:

v and y are

homogeneous

v y

Case 2

v or y is

heterogeneous

or

v y

v y

48

L={anbncn |n0}

Case 1: v and y contain only one type of

alphabet symbol. By choosing i=2 we get

w2=uv2xy2z in which the number of

appearances of one or two symbols

increased while the third symbol remain

unchanged. So w2 cannot contain the same

number of a's, b's and c's and w2L.

49

L={aibici |i0}

Case 2: v or y contain two types of alphabet

symbols. (Cannot have three because

condition 2 holds). So if we choose i=2 then

the order of the symbols in v2 or y2 was

destroyed and w2=uv2xy2z L.

Conclusion: The assumption that L is context-

free is false. and L={anbncn |n0} is not CFL.

50

Pumping Lemma – another example

Let L={ww| w{0,1}*}

[ 00, 110110L ; 010, 0010L ]

Prove that L is not context-free

Proof: In class.

51

Intersection 1

CFL are not closed under intersection.

Proof: by contradiction

L1={anbnck |n,k>0}

L2={akbncn | n,k>0}

Both L1 and L2 are CFL but

L1L2 = {ajbjcj | j>0} is not a CFL.

52

Intersection 2

• CFL and RL are closed under intersection.

Proof:

Build a product automaton of a PDA and a DFA:

Note that the resulting automaton is a PDA.

DFA

PDA

ANDinput accept/reject

stack

53

Complement

Theorem: CFL are not closed under complement.

Proof: by contradiction.

• Take two CFLs L1 and L2 and assume that ~L1

and ~L2 are CFL. (~L denotes complement)

• CFL are closed under union operation so

~L1~L2 is a CFL.

• Using our assumption again we get that

~(~L1~L2) is a CFL as well.

• But ~(~L1~L2) =L1L2 and we already know

that CFLs are not closed under intersection.

Recommended