Upload
meljun-cortes-mbampa
View
224
Download
0
Embed Size (px)
Citation preview
8/13/2019 MELJUN CORTES Automata Theory 8
1/26
CSC 3130: Automata theory and formal languages
Context-free languages
MELJUN P. CORTES MBA MPA BSCS ACS
MELJUN CORTES
8/13/2019 MELJUN CORTES Automata Theory 8
2/26
Context-free grammar
This is an a different model for describinglanguages
The language is specified by productions
(substitution rules) that tell how strings can be
obtained, e.g.
Using these rules, we can derivestrings like this:
A 0A1
A B
B #
A, Bare variables
0, 1, #are terminals
Ais the start variable
A 0A1 00A11 000A111 000B111 000#111
8/13/2019 MELJUN CORTES Automata Theory 8
3/26
Some natural examples
Context-free grammars were first used for naturallanguages
a girl with a flower likes the boy
SENTENCE
NOUN-PHRASE VERB-PHRASE
ART NOUN PREP ART NOUN VERB ART NOUN
CMPLX-NOUNCMPLX-NOUN CMPLX-NOUN
PREP-PHRASE
CMPLX-VERB
NOUN-PHRASE
8/13/2019 MELJUN CORTES Automata Theory 8
4/26
Natural languages
We can describe (some fragments) of the Englishlanguage by a context-free grammar:
SENTENCE NOUN-PHRASE VERB-PHRASE
NOUN-PHRASE CMPLX-NOUN
NOUN-PHRASE CMPLX-NOUN PREP-PHRASE
VERB-PHRASE CMPLX-VERBVERB-PHRASE CMPLX-VERB PREP-PHRASE
PREP-PHRASE PREP CMPLX-NOUN
CMPLX-NOUN ARTICLE NOUN
CMPLX-VERB VERB NOUN-PHRASE
CMPLX-VERB VERB
ARTICLE a
ARTICLE the
NOUN boy
NOUN girlNOUN flower
VERB likes
VERB touches
VERB sees
PREP with
variables: SENTENCE, NOUN-PHRASE,
terminals: a, the, boy, girl, flower, likes, touches, sees, with
start variable:SENTENCE
8/13/2019 MELJUN CORTES Automata Theory 8
5/26
Programming languages
Context-free grammars are also used to describe(parts of) programming languages
For instance, expressions like (2 + 3) * 5or
3 + 8 + 2 * 7can be described by the CFG
+
*
()
0 1
9
Variables:
Terminals:+, *, (, ), 0, 1, , 9
8/13/2019 MELJUN CORTES Automata Theory 8
6/26
Motivation for studying CFGs
Context-free grammars are essential forunderstanding the meaning of computer
programs
They are used in compilers
code:(2 + 3) * 5
meaning: add 2and 3, and then multiply by 5
8/13/2019 MELJUN CORTES Automata Theory 8
7/26
Definition of context-free grammar
A context-free grammar(CFG) is a 4-tuple(V, T, P, S)where
Vis a finite set of variablesor non-terminals
Tis a finite set of terminals(VT= )
Pis a set of productionsor substitution rulesof theform
whereAis a symbol in Vand a is a stringover VT Sis a variable inVcalled the start variable
A a
8/13/2019 MELJUN CORTES Automata Theory 8
8/26
Shorthand notation for productions
When we have multiple productions with thesame variable on the left like
we can write this in shorthandas
E E + E
E E * E
E (E)
E N
E E + E | E * E | (E) | 0 | 1
N 0N | 1N | 0 | 1
Variables:E, N
Terminals:+, *, (, ), 0, 1
Start variable: E
N 0N
N 1N
N 0
N 1
8/13/2019 MELJUN CORTES Automata Theory 8
9/26
Derivation
A derivationis a sequential application ofproductions:
E
derivation
E * E
(E) * E
(E) * N
(E + E ) * N
(E + E ) * 1
(E + N) * 1
(N + N) * 1
(N + 1N) * 1(N + 10) * 1
(1 + 10) * 1
a b
means bcan be obtained
from awith one production
a b*
means bcan be obtained
from aafter zero or more
productions
8/13/2019 MELJUN CORTES Automata Theory 8
10/26
Language of a CFG
The language of a CFG(V, T, P, S)is the set of allstrings containing only terminals that can bederived from the start variable S
This is a language over the alphabet T
A language Lis context-freeif it is the language
of some CFG
L = {| T* and S }*
8/13/2019 MELJUN CORTES Automata Theory 8
11/26
Example 1
Is the string 00#11in L?
How about 00#111, 00#0#1#11?
What is the language of this CFG?
A 0A1 | B
B #
variables:A, B
terminals: 0, 1, #
start variable:A
L= {0n#1n: n 0}
8/13/2019 MELJUN CORTES Automata Theory 8
12/26
Example 2
Give derivations of (), (()())
How about ())?
S SS | (S) | convention:variables in uppercase,
terminals in lowercase, start variable first
S (S) (rule 2)
() (rule 3)
S (S) (rule 2)
(SS) (rule 1)((S)S) (rule 2)
((S)(S)) (rule 2)(()(S)) (rule 3)(()()) (rule 3)
8/13/2019 MELJUN CORTES Automata Theory 8
13/26
Examples: Designing CFGs
Write a CFG for the following languages Linear equations over x, y, z, like:
x + 5yz = 911xy = 2
Numbers withoutleading zeros, e.g., 109, 0but not 019
The language L= {anbncmdm| n0, m0}
The language L= {anbmcmdn| n0, m0}
8/13/2019 MELJUN CORTES Automata Theory 8
14/26
Context-free versus regular
Write a CFG for the language (0 + 1)*111
Can you do so for everyregular language?
Proof:
S A111
A | 0A | 1A
Every regular language is context-free
regular
expressionDFANFA
8/13/2019 MELJUN CORTES Automata Theory 8
15/26
From regular to context-free
regular expression
a (alphabet symbol)
E1+ E2
CFG
E1E2
E1*
grammar with no rules
S
S a
S S1| S2
S S1S2
S SS1 |
In all cases, Sbecomes the newstart symbol
8/13/2019 MELJUN CORTES Automata Theory 8
16/26
Context-free versus regular
Is every context-free language regular? No! We already saw some examples:
This language is context-free but not regular
A 0A1 | BB # L= {0n#1n: n 0}
8/13/2019 MELJUN CORTES Automata Theory 8
17/26
Parse tree
Derivations can also be represented using parsetrees
E E + E
V + E
x + E
x + (E)
x + (E -E)
x + (V -E)x + (y -E)
x + (y -V)
x + (y -z)
E
E E+
V ( E )
E - E
V V
x
y z
E E + E | E - E | (E) | V
V x | y | z
8/13/2019 MELJUN CORTES Automata Theory 8
18/26
Definition of parse tree
A parse treefor a CFG Gis an ordered tree withlabels on the nodes such that
Every internal node is labeled by a variable
Every leaf is labeled by a terminal or
Leaves labeled by have no siblings
If a node is labeledAand has childrenA1, , Akfrom
left to right, then the rule
is a production in G.
A A1Ak
8/13/2019 MELJUN CORTES Automata Theory 8
19/26
Left derivation
Always derive the leftmost variablefirst:
Corresponds to a left-to-right traversalof parse
tree
E E + E
V + E
x + Ex + (E)
x + (E -E)
x + (V -E)
x + (y -E)
x + (y -V)
x + (y -z)
E
E E+
V ( E )
E - E
V V
x
y z
8/13/2019 MELJUN CORTES Automata Theory 8
20/26
Ambiguity
A grammar is ambiguousif some strings havemore than one parse tree
Example: E E + E | E -E | (E) | VV x | y | z
x + y + z
E
E E+
E E+VV Vx
y z
E
E E+
E E+ V
V V
x y
z
8/13/2019 MELJUN CORTES Automata Theory 8
21/26
Why ambiguity matters
The parse tree represents the intended meaning:
x + y + z
E
E E+
E E+VV Vx
y z
E
E E+
E E+ V
V V
x y
z
first add yand z,and then add this to x
first add xand y,and then add zto this
8/13/2019 MELJUN CORTES Automata Theory 8
22/26
Why ambiguity matters
Suppose we also had multiplication:E E + E | E -E | E E| (E) | VV x | y | z
x y + z
E
E E*
E E+V
V Vx
y z
E
E E+
E E V
V V
x y
z
first x y, then + zfirst y + z, then x
8/13/2019 MELJUN CORTES Automata Theory 8
23/26
Disambiguation
Sometimes we can rewrite the grammartoremove the ambiguity
Rewrite grammar so cannot be broken by +:
E E + E | E -E | E E | (E) | VV x | y | z
E T | E + T | E -T
T F | T FF (E) | V
V x | y | z
Tstands for term:x * (y + z)
Fstands for factor: x, (y + z)
A term always splits into factors
A factor is either a variable or a
parenthesized expression
8/13/2019 MELJUN CORTES Automata Theory 8
24/26
Disambiguation
Example
x y + z
E T | E + T | E -T
T F | T FF (E) | V
V x | y | z
V V V
F
T
FT
E
E
T
F
8/13/2019 MELJUN CORTES Automata Theory 8
25/26
Disambiguation
Can we always disambiguate a grammar?
No, for two reasons
There exists an inherently ambiguouscontext-free L:Every CFG for this language is ambiguous
There is no general procedure that can tell if a
grammar is ambiguous
However, grammars used in programming
languages can typically be disambiguated
8/13/2019 MELJUN CORTES Automata Theory 8
26/26
Another Example
Is ab, baba, abbbaain L?
How about a, bba?
What is the language of this CFG?
Is the CFG ambiguous?
S aB | bAA a | aS | bAA
B b | bS | aBB