MELJUN CORTES Automata Theory 8

Embed Size (px)

Citation preview

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    1/26

    CSC 3130: Automata theory and formal languages

    Context-free languages

    MELJUN P. CORTES MBA MPA BSCS ACS

    MELJUN CORTES

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    2/26

    Context-free grammar

    This is an a different model for describinglanguages

    The language is specified by productions

    (substitution rules) that tell how strings can be

    obtained, e.g.

    Using these rules, we can derivestrings like this:

    A 0A1

    A B

    B #

    A, Bare variables

    0, 1, #are terminals

    Ais the start variable

    A 0A1 00A11 000A111 000B111 000#111

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    3/26

    Some natural examples

    Context-free grammars were first used for naturallanguages

    a girl with a flower likes the boy

    SENTENCE

    NOUN-PHRASE VERB-PHRASE

    ART NOUN PREP ART NOUN VERB ART NOUN

    CMPLX-NOUNCMPLX-NOUN CMPLX-NOUN

    PREP-PHRASE

    CMPLX-VERB

    NOUN-PHRASE

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    4/26

    Natural languages

    We can describe (some fragments) of the Englishlanguage by a context-free grammar:

    SENTENCE NOUN-PHRASE VERB-PHRASE

    NOUN-PHRASE CMPLX-NOUN

    NOUN-PHRASE CMPLX-NOUN PREP-PHRASE

    VERB-PHRASE CMPLX-VERBVERB-PHRASE CMPLX-VERB PREP-PHRASE

    PREP-PHRASE PREP CMPLX-NOUN

    CMPLX-NOUN ARTICLE NOUN

    CMPLX-VERB VERB NOUN-PHRASE

    CMPLX-VERB VERB

    ARTICLE a

    ARTICLE the

    NOUN boy

    NOUN girlNOUN flower

    VERB likes

    VERB touches

    VERB sees

    PREP with

    variables: SENTENCE, NOUN-PHRASE,

    terminals: a, the, boy, girl, flower, likes, touches, sees, with

    start variable:SENTENCE

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    5/26

    Programming languages

    Context-free grammars are also used to describe(parts of) programming languages

    For instance, expressions like (2 + 3) * 5or

    3 + 8 + 2 * 7can be described by the CFG

    +

    *

    ()

    0 1

    9

    Variables:

    Terminals:+, *, (, ), 0, 1, , 9

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    6/26

    Motivation for studying CFGs

    Context-free grammars are essential forunderstanding the meaning of computer

    programs

    They are used in compilers

    code:(2 + 3) * 5

    meaning: add 2and 3, and then multiply by 5

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    7/26

    Definition of context-free grammar

    A context-free grammar(CFG) is a 4-tuple(V, T, P, S)where

    Vis a finite set of variablesor non-terminals

    Tis a finite set of terminals(VT= )

    Pis a set of productionsor substitution rulesof theform

    whereAis a symbol in Vand a is a stringover VT Sis a variable inVcalled the start variable

    A a

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    8/26

    Shorthand notation for productions

    When we have multiple productions with thesame variable on the left like

    we can write this in shorthandas

    E E + E

    E E * E

    E (E)

    E N

    E E + E | E * E | (E) | 0 | 1

    N 0N | 1N | 0 | 1

    Variables:E, N

    Terminals:+, *, (, ), 0, 1

    Start variable: E

    N 0N

    N 1N

    N 0

    N 1

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    9/26

    Derivation

    A derivationis a sequential application ofproductions:

    E

    derivation

    E * E

    (E) * E

    (E) * N

    (E + E ) * N

    (E + E ) * 1

    (E + N) * 1

    (N + N) * 1

    (N + 1N) * 1(N + 10) * 1

    (1 + 10) * 1

    a b

    means bcan be obtained

    from awith one production

    a b*

    means bcan be obtained

    from aafter zero or more

    productions

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    10/26

    Language of a CFG

    The language of a CFG(V, T, P, S)is the set of allstrings containing only terminals that can bederived from the start variable S

    This is a language over the alphabet T

    A language Lis context-freeif it is the language

    of some CFG

    L = {| T* and S }*

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    11/26

    Example 1

    Is the string 00#11in L?

    How about 00#111, 00#0#1#11?

    What is the language of this CFG?

    A 0A1 | B

    B #

    variables:A, B

    terminals: 0, 1, #

    start variable:A

    L= {0n#1n: n 0}

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    12/26

    Example 2

    Give derivations of (), (()())

    How about ())?

    S SS | (S) | convention:variables in uppercase,

    terminals in lowercase, start variable first

    S (S) (rule 2)

    () (rule 3)

    S (S) (rule 2)

    (SS) (rule 1)((S)S) (rule 2)

    ((S)(S)) (rule 2)(()(S)) (rule 3)(()()) (rule 3)

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    13/26

    Examples: Designing CFGs

    Write a CFG for the following languages Linear equations over x, y, z, like:

    x + 5yz = 911xy = 2

    Numbers withoutleading zeros, e.g., 109, 0but not 019

    The language L= {anbncmdm| n0, m0}

    The language L= {anbmcmdn| n0, m0}

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    14/26

    Context-free versus regular

    Write a CFG for the language (0 + 1)*111

    Can you do so for everyregular language?

    Proof:

    S A111

    A | 0A | 1A

    Every regular language is context-free

    regular

    expressionDFANFA

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    15/26

    From regular to context-free

    regular expression

    a (alphabet symbol)

    E1+ E2

    CFG

    E1E2

    E1*

    grammar with no rules

    S

    S a

    S S1| S2

    S S1S2

    S SS1 |

    In all cases, Sbecomes the newstart symbol

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    16/26

    Context-free versus regular

    Is every context-free language regular? No! We already saw some examples:

    This language is context-free but not regular

    A 0A1 | BB # L= {0n#1n: n 0}

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    17/26

    Parse tree

    Derivations can also be represented using parsetrees

    E E + E

    V + E

    x + E

    x + (E)

    x + (E -E)

    x + (V -E)x + (y -E)

    x + (y -V)

    x + (y -z)

    E

    E E+

    V ( E )

    E - E

    V V

    x

    y z

    E E + E | E - E | (E) | V

    V x | y | z

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    18/26

    Definition of parse tree

    A parse treefor a CFG Gis an ordered tree withlabels on the nodes such that

    Every internal node is labeled by a variable

    Every leaf is labeled by a terminal or

    Leaves labeled by have no siblings

    If a node is labeledAand has childrenA1, , Akfrom

    left to right, then the rule

    is a production in G.

    A A1Ak

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    19/26

    Left derivation

    Always derive the leftmost variablefirst:

    Corresponds to a left-to-right traversalof parse

    tree

    E E + E

    V + E

    x + Ex + (E)

    x + (E -E)

    x + (V -E)

    x + (y -E)

    x + (y -V)

    x + (y -z)

    E

    E E+

    V ( E )

    E - E

    V V

    x

    y z

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    20/26

    Ambiguity

    A grammar is ambiguousif some strings havemore than one parse tree

    Example: E E + E | E -E | (E) | VV x | y | z

    x + y + z

    E

    E E+

    E E+VV Vx

    y z

    E

    E E+

    E E+ V

    V V

    x y

    z

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    21/26

    Why ambiguity matters

    The parse tree represents the intended meaning:

    x + y + z

    E

    E E+

    E E+VV Vx

    y z

    E

    E E+

    E E+ V

    V V

    x y

    z

    first add yand z,and then add this to x

    first add xand y,and then add zto this

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    22/26

    Why ambiguity matters

    Suppose we also had multiplication:E E + E | E -E | E E| (E) | VV x | y | z

    x y + z

    E

    E E*

    E E+V

    V Vx

    y z

    E

    E E+

    E E V

    V V

    x y

    z

    first x y, then + zfirst y + z, then x

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    23/26

    Disambiguation

    Sometimes we can rewrite the grammartoremove the ambiguity

    Rewrite grammar so cannot be broken by +:

    E E + E | E -E | E E | (E) | VV x | y | z

    E T | E + T | E -T

    T F | T FF (E) | V

    V x | y | z

    Tstands for term:x * (y + z)

    Fstands for factor: x, (y + z)

    A term always splits into factors

    A factor is either a variable or a

    parenthesized expression

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    24/26

    Disambiguation

    Example

    x y + z

    E T | E + T | E -T

    T F | T FF (E) | V

    V x | y | z

    V V V

    F

    T

    FT

    E

    E

    T

    F

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    25/26

    Disambiguation

    Can we always disambiguate a grammar?

    No, for two reasons

    There exists an inherently ambiguouscontext-free L:Every CFG for this language is ambiguous

    There is no general procedure that can tell if a

    grammar is ambiguous

    However, grammars used in programming

    languages can typically be disambiguated

  • 8/13/2019 MELJUN CORTES Automata Theory 8

    26/26

    Another Example

    Is ab, baba, abbbaain L?

    How about a, bba?

    What is the language of this CFG?

    Is the CFG ambiguous?

    S aB | bAA a | aS | bAA

    B b | bS | aBB