26
CSC 3130: Automata theory and formal languages Context-free languages MELJUN P. CORTES, MBA,MPA,BSCS,ACS MELJUN P. CORTES, MBA,MPA,BSCS,ACS MELJUN CORTES MELJUN CORTES

MELJUN CORTES Automata Theory (Automata8)

Embed Size (px)

DESCRIPTION

MELJUN CORTES Automata Theory (Automata8)

Citation preview

Page 1: MELJUN CORTES Automata Theory (Automata8)

CSC 3130: Automata theory and formal languages

Context-free languages

MELJUN P. CORTES, MBA,MPA,BSCS,ACSMELJUN P. CORTES, MBA,MPA,BSCS,ACS

MELJUN CORTESMELJUN CORTES

Page 2: MELJUN CORTES Automata Theory (Automata8)

Context-free grammar

• This is an a different model for describing languages

• The language is specified by productions (substitution rules) that tell how strings can be obtained, e.g.

• Using these rules, we can derive strings like this:

A → 0A1A → BB → #

A, B are variables0, 1, # are terminalsA is the start variable

A 0A1 00A11 000A111 000B111 000#111

Page 3: MELJUN CORTES Automata Theory (Automata8)

Some natural examples

• Context-free grammars were first used for natural languages

a girl with a flower likes the boy

SENTENCE

NOUN-PHRASE VERB-PHRASE

ART NOUN PREP ART NOUN VERB ART NOUN

CMPLX-NOUNCMPLX-NOUN CMPLX-NOUN

PREP-PHRASE

CMPLX-VERB

NOUN-PHRASE

Page 4: MELJUN CORTES Automata Theory (Automata8)

Natural languages

• We can describe (some fragments) of the English language by a context-free grammar:

SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB

ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → withvariables: SENTENCE, NOUN-PHRASE, …

terminals: a, the, boy, girl, flower, likes, touches, sees, with

start variable: SENTENCE

Page 5: MELJUN CORTES Automata Theory (Automata8)

Programming languages

• Context-free grammars are also used to describe (parts of) programming languages

• For instance, expressions like (2 + 3) * 5 or 3 + 8 + 2 * 7 can be described by the CFG

<expr> <expr> + <expr>

<expr> <expr> * <expr>

<expr> (<expr>)

<expr> 0

<expr> 1

<expr> 9

Variables: <expr>

Terminals: +, *, (, ), 0, 1, …, 9

Page 6: MELJUN CORTES Automata Theory (Automata8)

Motivation for studying CFGs

• Context-free grammars are essential for understanding the meaning of computer programs

• They are used in compilers

code: (2 + 3) * 5

meaning: “add 2 and 3, and then multiply by 5”

Page 7: MELJUN CORTES Automata Theory (Automata8)

Definition of context-free grammar

• A context-free grammar (CFG) is a 4-tuple (V, T, P, S) where– V is a finite set of variables or non-terminals– T is a finite set of terminals (V T = )– P is a set of productions or substitution rules of the

form

where A is a symbol in V and is a string over V T

– S is a variable in V called the start variable

A →

Page 8: MELJUN CORTES Automata Theory (Automata8)

Shorthand notation for productions

• When we have multiple productions with the same variable on the left like

we can write this in shorthand as

E E + E

E E * E

E (E)

E N

E E + E | E * E | (E) | 0 | 1

N 0N | 1N | 0 | 1

Variables: E, N

Terminals: +, *, (, ), 0, 1

Start variable: E

N 0N

N 1N

N 0

N 1

Page 9: MELJUN CORTES Automata Theory (Automata8)

Derivation

• A derivation is a sequential application of productions:

E

deri

vati

on

E * E (E) * E (E) * N (E + E ) * N (E + E ) * 1 (E + N) * 1 (N + N) * 1 (N + 1N) * 1 (N + 10) * 1 (1 + 10) * 1

means can be obtainedfrom with one production

*

means can be obtainedfrom after zero or moreproductions

Page 10: MELJUN CORTES Automata Theory (Automata8)

Language of a CFG

• The language of a CFG (V, T, P, S) is the set of all strings containing only terminals that can be derived from the start variable S

• This is a language over the alphabet T

• A language L is context-free if it is the language of some CFG

L = { | T* and S }*

Page 11: MELJUN CORTES Automata Theory (Automata8)

Example 1

• Is the string 00#11 in L?

• How about 00#111, 00#0#1#11?

• What is the language of this CFG?

A → 0A1 | BB → #

variables: A, Bterminals: 0, 1, # start variable: A

L = {0n#1n: n ≥ 0}

Page 12: MELJUN CORTES Automata Theory (Automata8)

Example 2

• Give derivations of (), (()())

• How about ())?

S SS | (S) | convention: variables in uppercase, terminals in lowercase, start variable first

S (S) (rule 2) () (rule 3)

S (S) (rule 2) (SS)

(rule 1) ((S)S) (rule 2) ((S)(S)) (rule 2) (()(S)) (rule 3) (()())

(rule 3)

Page 13: MELJUN CORTES Automata Theory (Automata8)

Examples: Designing CFGs

• Write a CFG for the following languages– Linear equations over x, y, z, like:

x + 5y – z = 911x – y = 2

– Numbers without leading zeros, e.g., 109, 0 but not 019

– The language L = {anbncmdm | n 0, m 0}

– The language L = {anbmcmdn | n 0, m 0}

Page 14: MELJUN CORTES Automata Theory (Automata8)

Context-free versus regular

• Write a CFG for the language (0 + 1)*111

• Can you do so for every regular language?

• Proof:

S A111A | 0A | 1A

Every regular language is context-free

regularexpression

DFANFA

Page 15: MELJUN CORTES Automata Theory (Automata8)

From regular to context-free

regular expression

a (alphabet symbol)

E1 + E2

CFG

E1E2

E1*

grammar with no rules

S→

S →a

S→ S1 | S2

S→ S1S2

S→ SS1 |

In all cases, S becomes the new start symbol

Page 16: MELJUN CORTES Automata Theory (Automata8)

Context-free versus regular

• Is every context-free language regular?

• No! We already saw some examples:

• This language is context-free but not regular

A → 0A1 | BB → #

L = {0n#1n: n ≥ 0}

Page 17: MELJUN CORTES Automata Theory (Automata8)

Parse tree

• Derivations can also be represented using parse trees

E E + E V + E x + E x + (E) x + (E E) x + (V E) x + (y E) x + (y V) x + (y z)

E

E E+

V ( E )

E E

V V

x

y z

E E + E | E - E | (E) | V V x | y | z

Page 18: MELJUN CORTES Automata Theory (Automata8)

Definition of parse tree

• A parse tree for a CFG G is an ordered tree with labels on the nodes such that– Every internal node is labeled by a variable– Every leaf is labeled by a terminal or – Leaves labeled by have no siblings

– If a node is labeled A and has children A1, …, Ak from left to right, then the rule

is a production in G.

A → A1…Ak

Page 19: MELJUN CORTES Automata Theory (Automata8)

Left derivation

• Always derive the leftmost variable first:

• Corresponds to a left-to-right traversal of parse tree

E E + E V + E x + E x + (E) x + (E E) x + (V E) x + (y E) x + (y V) x + (y z)

E

E E+

V ( E )

E E

V V

x

y z

Page 20: MELJUN CORTES Automata Theory (Automata8)

Ambiguity

• A grammar is ambiguous if some strings have more than one parse tree

• Example: E E + E | E E | (E) | V V x | y | z

x + y + z

E

E E+

E E+V

V Vx

y z

E

E E+

E E+ V

V V

x y

z

Page 21: MELJUN CORTES Automata Theory (Automata8)

Why ambiguity matters

• The parse tree represents the intended meaning:

x + y + z

E

E E+

E E+V

V Vx

y z

E

E E+

E E+ V

V V

x y

z

“first add y and z, and then add this to x”

“first add x and y, and then add z to this”

Page 22: MELJUN CORTES Automata Theory (Automata8)

Why ambiguity matters

• Suppose we also had multiplication:E E + E | E E | E E | (E) | V V x | y | z

x y + z

E

E E*

E E+V

V Vx

y z

E

E E+

E E V

V V

x y

z

“first x y, then + z”“first y + z, then x ”

Page 23: MELJUN CORTES Automata Theory (Automata8)

Disambiguation

• Sometimes we can rewrite the grammar to remove the ambiguity

• Rewrite grammar so cannot be broken by +:

E E + E | E E | E E | (E) | V V x | y | z

E T | E + T | E TT F | T FF (E) | VV x | y | z

T stands for term: x * (y + z)F stands for factor: x, (y + z)

A term always splits into factors

A factor is either a variable or a parenthesized expression

Page 24: MELJUN CORTES Automata Theory (Automata8)

Disambiguation

• Example

x y + z

E T | E + T | E TT F | T FF (E) | VV x | y | z

V V V

F

T

FT

E

E

T

F

Page 25: MELJUN CORTES Automata Theory (Automata8)

Disambiguation

• Can we always disambiguate a grammar?

• No, for two reasons– There exists an inherently ambiguous context-

free L:Every CFG for this language is ambiguous

– There is no general procedure that can tell if a grammar is ambiguous

• However, grammars used in programming languages can typically be disambiguated

Page 26: MELJUN CORTES Automata Theory (Automata8)

Another Example

• Is ab, baba, abbbaa in L?

• How about a, bba?

• What is the language of this CFG?

• Is the CFG ambiguous?

S aB | bAA a | aS | bAAB b | bS | aBB