43
CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata, 4th ed., by Peter Linz, published by Jones and Bartlett Publishers, Inc., Sudbury, MA, 2006. They are intended for classroom use only and are not a substitute for reading the textbook.

CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Embed Size (px)

Citation preview

Page 1: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

CSE 3813Introduction to Formal Languages and Automata

Chapter 8 Properties of Context-free Languages

These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata, 4th ed., by Peter Linz, published by Jones and Bartlett Publishers, Inc., Sudbury, MA, 2006. They are intended for classroom use only and are not a substitute for reading the textbook.

Page 2: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for context-free languages

• Suppose you have a CFG G in which the variable A is used in two different rules, to derive two different strings, e.g.,

(1) S vAz(2) A wAy(3) A x

• We can use these rules, applying rule 2 recursively, to generate the following string:

S vAz vwAyz vwwAyyz vwwwAyyyz ... vwnxynz.

Page 3: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

Of course, we can apply rule 3 at any point along the way to bring the process to a halt. Thus, the following strings are all legitimate strings in the language:

vwxyz, vwwxyyz, vwwwxyyyz, etc.

In fact, with rules 2 and 3 in the language, there is no way to prevent the language from containing an infinite number of strings of the form vwnxynz.

Page 4: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

Remember the definition of Chomsky Normal Form grammars: A CFG is in Chomsky Normal Form if every production is of one of these two types:

A BCA a

Remember also that we can put any CFG grammar into CNF (omitting the null string, if it belongs to the original language).

Page 5: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

If a grammar is in CNF, then its derivation tree will be binary; that is, every node will have at most two children. Why? There are only 3 possibilities:(1) The node represents the first type of rule above, in which a single variable produces two variables.(2) The node represents the second type of rule above, in which a single variable produces a single terminal.(3) The node is a terminal node and so has no children.

Page 6: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

•A path in a binary tree is either empty, or consists of a node, one of its descendants, and all of the nodes in between.•The length of a path is the number of nodes it contains (for this class, we will us this definition; however, most of the time length and height are in terms of the number of edges, not number of nodes).•The height of a binary tree is the length of its longest path.

Page 7: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

• You could create a very tall binary tree by having all branches be unary. • You can create the shortest possible binary tree by having all of its branches be binary, except possibly for some or all of the branches at the bottom level of the tree.

Page 8: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

• What is the smallest height possible in a binary tree of 7 nodes? How many leaf nodes does it have?

height = 3

num. leaves = 4

Page 9: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

•What is the smallest height possible in a binary tree of 15 nodes? How many leaf nodes does it have?

height = 4

num. leaves = 8

Page 10: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

• What is the smallest height possible in a binary tree of 31 nodes? How many leaf nodes does it have?

height = 5

num. leaves = 16

Page 11: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

•What is the smallest height possible in a binary tree of (2n) - 1 nodes? How many leaf nodes does it have?

• height = n

• num. leaves = 2n-1

Page 12: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

Note the pattern here:In a completely filled binary tree with

(2n) – 1 nodes, half of the nodes (rounding up) will be leaves. That is, (2n) / 2 nodes will be leaf nodes. And we can rewrite (2n) / 2 as 2n-1.

This leads us to the following lemma:

Page 13: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

Lemma: For any h 1, a binary tree which has more than 2h-1 leaf nodes must have a height greater than h.

Example:If a binary tree has 17 leaf nodes, can it have

a height of 5?No; a complete binary tree of height 5 has

only 16 leaf nodes. A binary tree with 17 leaves must have a height greater than 5.

Page 14: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

Here is the point of all this:

If the height of the derivation tree for a given string in the language is h, and there are fewer than h production rules in the grammar, then at least one rule must recur on the same path in the derivation of this string.

Page 15: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

For a variable to recur farther down in the same path, it must be either:• self-recursive (e.g., A aA)or• path-recursive (e.g., A aB, and B bA )

In either case, this variable may be pumped an unrestricted number of times.

Page 16: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Theorem 8.1

Let L be a CFL. Then there is an integer m so that for any w L satisfying |w| m, there are strings u, v, x, y, and z satisfying

w = uvxyz|vy| > 0|vxy| mfor any i > 0, uvixyiz L

Page 17: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

The pumping lemma for CFLs

• We can use the pumping lemma for context-free languages to prove that there must exist some language that is not context-free. • We do this by assuming that the language is context free; this means that there must be an m satisfying the conditions given above. • If we find that this causes a contradiction, then we know the language can’t be a CFL.

Page 18: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Proof

• Given the language L = {aibici | i 1}, assume that L is context-free. • Let w = ambmcm, with |w| m. • According to theorem 8.1, |vy| > 0. Thus, v and y together must contain at least one type of symbol. • According to theorem 8.1, |vxy| m. Thus, the string vxy can contain at most two distinct types of symbols.

Page 19: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Proof

• The string vxy can’t contain all three symbols, a, b, and c. (Why? Because |vxy| m.)• The string uv2xy2z contains additional occurrences of the symbols in v and y.• Therefore, uv2xy2z cannot contain equal numbers of all three symbols.• But the pumping lemma says that uv2xy2z must be a legitimate string in L. Obviously, this is a contradiction. • Consequently, L cannot be a context-free language.

Page 20: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Example

Given the language L = {aibici | i 1}, how would you try to process this language using a push-down automaton?

We can insure that we have an equal number of a’s and b’s, by pushing the a’s onto the stack one at a time, then popping them off and matching them up with the b’s one by one.

Page 21: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Example

• However, once we have done that, we don’t have anything left to match the c’s with, so we can’t guarantee that we have the same number of c’s as a’s and b’s. • We can’t solve this problem by pushing a’s or b’s back onto the stack.• This is due to the limitations of the type of memory we have in a PDA.

Page 22: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Pumping lemma (again)

The pumping lemma for regular languages states: every sufficiently long string in a regular language contains a short substring that can be pumped.

The pumping lemma for context-free languages states: every sufficiently long string in a context-free language contains two short (and close-together) substrings that can be pumped (the same number of times).

Page 23: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Formal statement (again)

Let L be a context-free language. Then there exists some positive integer m such that any string w L of length |w| m can be decomposed into substrings, u, v, x, y, z, such that w = uvxyz, and

|vxy| m,

|v| > 0 or |y| > 0,

uvkxykz L, for k 0

Page 24: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Informal statement

Every context-free language has a “pumping length” such that every string in the language that is longer than this can be pumped to yield another string in the language.

The string can be divided into five parts such that the second and fourth parts can be repeated together, or “pumped,” any number of times, and the resulting string remains in the language.

Page 25: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

In the pumping lemma for regular languages, the “pumping length” m reflects the number of states of the finite automaton.

In the pumping lemma for context-free languages, what does m reflect? Roughly, it is the length of the longest string that can be generated by a parse tree in which the same nonterminal never occurs twice on the same path through the tree.

What is m?

Page 26: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

In a sufficiently large parse tree, some nonterminal must repeat along some path from the root. This follows from the pigeonhole principle.

S

A

A

u v x y z

Page 27: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Proof IdeaThe repetition of some nonterminal along a path

through the parse tree allows us to replace the subtree under the last occurrence of the nonterminal with the subtree under an earlier occurrence of the nonterminal and still get a valid parse tree

This corresponds to pumping v and yNote that the parse tree of the previous slide

corresponds to the following derivation:

uvxyzuvAyzuAzS

Page 28: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Important to remember

You can use a pumping lemma to prove that a language is not context-free (or regular).

You cannot use a pumping lemma to prove that a language is context-free (or regular).

Page 29: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Exercise

The language L = {ww | w {a, b}*} is not context-free.

Pick a string in L. Try ambmambm. Then note that you must consider three cases. It must be the case that vxy is a substring of the prefix ambm, or the “middle” bmam, or the suffix ambm.

Intuitively, why can’t a PDA accept this language, although it can accept the language {wwR | w {a, b}*}?

Page 30: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Pumping Lemma for Linear Languages

Let L be an infinite linear language. Then there exists some positive integer m, such that any w L, with |w| m can be decomposed as w = uvxyz with

|uvyz| m|vy| 1

such thatuvixyiz L

for all i = 0,1,2…

Page 31: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Pumping Lemma for Linear Languages

Note that the conclusion for this theorem is different from Theorem 8.1, since in 8.1 we have

|vxy| mand in Theorem 8.2 we have

|uvyz| mThis implies that the strings v and y to be pumped must now be within m symbols of the left and right ends of w, respectively. The middle string x can be of arbitrary length.Theorem 8.2 helps establish the fact that the family of linear languages is a proper subset of the family of context-free languages.

Page 32: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Closure properties for context-free languages

The family of context-free languages is closed under the operations of:

UnionConcatenationKleene closure

but not under the operations of

Intersection Complementation

Page 33: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Definition

A context-free grammar (CFG) is a 4-tuple G = (V, T, S, P) where V and T are disjoint

sets, S V, and P is a finite set of rules of the form A x, where A V and x (V T)*.

V = non-terminals or variablesT = terminals S = Start symbolP = Productions or grammar rules

Page 34: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Closure properties of CFGs

CFLs are closed under Union, Concatenation and Kleene closure.

Proof by construction: Let

G1 = (V1, T1, S1, P1) and

G2 = (V2, T2, S2, P2)with

L1 = L(G1) and

L2 = L(G2)

Page 35: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Union

We create grammar Gu = (Vu, T1 T2, Su, Pu) generating

L1 L2

1. Rename the elements of V2 if necessary so that V1 V2 = .

2. Create a new start symbol Su, not already in V1 or V2.

3. Set Vu = V1 V2 {Su}

4. Set Pu = P1 P2 {Su S1 | S2}

Construction completed.

Page 36: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Concatenation

We create grammar Gc = (Vc, T1 T2, Sc, Pc) generating L1L2

1. Rename the elements of V2 if necessary so that V1 V2 = .

2. Create a new start symbol Sc, not already in V1 or V2.

3. Set Vc = V1 V2 {Sc}

4. Set Pc = P1 P2 {Sc S1S2}

Construction completed.

Page 37: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Closure under Kleene star

Let G1 be any context-free grammar with the starting symbol S. Adding the rules

S λ and

S SS

creates a new context-free grammar G2 such that L(G2) is the result of applying the Kleene star operator to L(G1).

Page 38: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Kleene Closure

We create grammar G* = (V, T, S, P) generating L1*

1. Create a new start symbol S, not already in V1.

2. Set V* = V1 {S}

3. Set P* = P1 {S S1S | l}

Construction completed. (See text for justification.)

Page 39: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Not closed under intersection

The context-free languages are not closed under Intersection. However, the intersection of a context-free language with a regular language is always a context-free language.

The context-free languages are not closed under Complementation

Page 40: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Corollary:

Are Regular Languages context free?

Yes.

Why?

We can express any Regular language in the form of a CFG.

Regular languages are a proper subset of CFGs.

Page 41: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Are Regular Languages context free?

Proof:According to your textbook, the set of regular

languages is the smallest set that contains all languages , {}, and {a} (for every a ) and is closed under the operations of union, concatenation, and Kleene*. We just demonstrated that the operations of union, concatenation, and Kleene* on CFGs produce CFGs, so all we need to do is show that the languages , {}, and {a} have CFGs.

Page 42: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Are Regular Languages context free?

The empty language can be written

S S

The language consisting of a null string can be written

S The language consisting of single characters can be

written

S a

QED

Page 43: CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our

Decision properties of context-free languages

Can decide:Membership EmptyInfinite

But there is no algorithm for deciding whether two CFGs generate the same language!