901440 Syntax Analyzer2

Embed Size (px)

Citation preview

  • 7/28/2019 901440 Syntax Analyzer2

    1/54

    Compiler Construction

    A Compulsory Module for Students in

    Computer Science Department

    Faculty of IT / AlAl Bayt University

    Second Semester 2010/2011

  • 7/28/2019 901440 Syntax Analyzer2

    2/54

    Syntax Analyzer

  • 7/28/2019 901440 Syntax Analyzer2

    3/54

    Top-down parsing

    can be viewed as the problem ofconstructing a parse tree for the inputstring, starting from the root and creating

    the nodes of the parse tree in preorder(depth-first)

    Equivalently, top-down parsing can be

    viewed as finding a leftmost derivationfor an input string.

  • 7/28/2019 901440 Syntax Analyzer2

    4/54

    Top-down Parsing (cont.)

    At each step of a top-down parse, the key problem is that ofdetermining the production to be applied for a nonterminal,say A. Once an A-production is chosen, the rest of the

    parsing process consists of "matching the terminal symbolsin the production body with the input string.

    A general form of top-down parsing, called recursivedescent parsing (may require backtracking to find thecorrect A-production to be applied). a special case of recursive-descent parsing is Predictive parsing

    (where no backtracking is required.

    Another form of top-down parsing is called Nonrecursivedescent parsing (maintain a stack explicitly in additionto using parse table)

  • 7/28/2019 901440 Syntax Analyzer2

    5/54

    Recursive-Descent Parsing

    Algorithm

    A recursive-descent parsing program consists of a set of procedures, one foreach nonterminal.Execution begins with the procedure for the start symbol,Recursive-descent may require backtracking; that is, it may require repeatedscans over the input (backtracking is not very efficient)

  • 7/28/2019 901440 Syntax Analyzer2

    6/54

    Recursive-Descent Parsing (cont.)

    To allow backtracking, the above algorithm needs tobe modified as follows: cannot choose a unique A-production at line (1), so we must

    try each of several productions in some order.

    failure at line (7) is not ultimate failure, but suggests only thatwe need to return to line (1) and try another A-production.

    input error is found only if there are no more A-productions totry.

    In order to try another A-production, we need to be able toreset the input pointer to where it was when we first reached

    line (1). Thus, a local variable is needed to store this inputpointer for future use.

  • 7/28/2019 901440 Syntax Analyzer2

    7/54

    Example:

    Consider the following grammar:

    To construct a parse tree top-down for the input string w = cad:

    We have a match for the first and the second input symbol, so we advance the

    input pointer to dand compare d against the next leaf, labeled b. Since b does

    not match d, so we must try the other production. But in this case we need toreset the pointer of the input string ad start parsing again using the new

    production.

    A left-recursive grammar can cause a recursive-descent parser, even one with

    backtracking, to go into an infinite loop.

  • 7/28/2019 901440 Syntax Analyzer2

    8/54

    Predictive Parsers

    A CFG may be parsed by a recursive-descent parser that needs nobacktracking if

    left recursions eliminated

    left factoring transformations applied

    Building a predictive parser using recursive procedures by : Creating transition diagrams

    Match terminals, and making procedure call for non-terminals.

  • 7/28/2019 901440 Syntax Analyzer2

    9/54

    Consider the following grammar

    After left recursionelimination and left factoring

    6

    43

    T

    +

    E' :

    Transition diagrams can be simplified, provided the sequenceof grammar symbols along paths is preserved. The diagrams in

    Fig. above are equivalent: if we trace paths from E to an

    accepting state and substitute for E', then, in both sets of

    diagrams, the grammar symbols along the paths make up

    strings of the form T + T + . . . + T.

  • 7/28/2019 901440 Syntax Analyzer2

    10/54

    Transition Diagrams for Predictive Parsers

    To construct the transition diagram from a grammar,

    first eliminate left recursion and then left factor the grammar.

    for each non-terminal A, 1. Create an initial and final (return) state.

    2. For each production A XIX2 - . Xn, create a path from the initial to the

    final state, with edges labeled X1, X2,. . . , Xn. If A

    , the path is an edge labeled .

    Transition diagrams for predictive parsers differ from those for lexicalanalyzers. Parsers have one diagram for each non-terminal. The labels ofedges can be tokens or non-terminals.

    A transition on a token (terminal) means that we take that transition if

    that token is the next input symbol. A transition on a non-terminal A is a call of the procedure for A.

    we used tail-recursion removal and substitution of procedure bodies tooptimize the procedure for a non-terminal.

  • 7/28/2019 901440 Syntax Analyzer2

    11/54

    Non-recursive Predictive Parsing (driven

    predictive parsing)

    a + b $

    X

    Y

    Z

    $

    Input buffer

    stack

    Predictive parsing

    program/driver

    Parsing Table M

    A non-recursive predictive parsercan be built by maintaining astack explicitly, rather thanimplicitly via recursive calls.The parser mimics a leftmostderivation.If w is the input that has beenmatched so far, then the stackholds a sequence of grammarsymbols :

    such that the table-driven parser has an input buffer, a stack containing asequence of grammar symbols, a parsing table constructed by Algorithm4.31, and an output stream. The input buffer contains the string to be parsed,followed by the end marker $

  • 7/28/2019 901440 Syntax Analyzer2

    12/54

    Nonrecursive Predictive Parsing

    Method: Initially, the parser is in a configuration with w$ inthe input buffer and the start symbol S of G on top of thestack, above $. The algorithm below uses the predictive

    parsing tableMto produce a predictive parse for the input

  • 7/28/2019 901440 Syntax Analyzer2

    13/54

    Predictive Parsingtable for the grammar

    below:

  • 7/28/2019 901440 Syntax Analyzer2

    14/54

    FIRST and FOLLOW

    FIRST ( A ) = set of terminals that begin the strings derived

    from a. If A , then is also in FIRST( a ).

    FOLLOW (A ) = set of terminals a that can appear

    immediately to the right ofA in some sentential form. Inother words, there exists a derivation S aAab.

    In addition, if A can be the rightmost symbol in somesentential form, then $ is in FOLLOW(A); recall that $ is a

    special "endmarker" symbol that is assumed not to be asymbol of any grammar.

  • 7/28/2019 901440 Syntax Analyzer2

    15/54

    Computing FIRST

    FIRST (X)

    IF X is a terminal, FIRST(X) = {X}

    IF Xis a production, then add to FIRST(X)

    IF X Y1Y2Yk is a production,

    Place a in FIRST(X) ifa in FIRST(Yi) and

    is in all of FIRST(Y1),FIRST(Yi-1)Add to FIRST(X) if is in all FIRST(Yi).

  • 7/28/2019 901440 Syntax Analyzer2

    16/54

    Computing FOLLOW

    Place $ in FOLLOW(S), S is the start symbol, and $

    is the endmarker.

    If there is a productionAaB, then

    everything in FIRST() except for, is placed inFOLLOW(B).

    If there is a productionAaB, or a production

    A

    aB, where FIRST() contains , then everything in FOLLOW(A) is in FOLLOW(B).

  • 7/28/2019 901440 Syntax Analyzer2

    17/54

    Example: computing First()

    stmt expr ;

    expr term expr1 |

    expr1 + term expr1 |

    term factor term1term1 * factor term1 |

    factor ( expr ) | number

    Easy ones:First (expr1) = {+, }

    First(term1) = {*, }

    First(factor) = {(, number}

    Next step:

    First(term) = First(factor)

    First(expr) = First(term) and

    First(stmt) = First(expr ;)Due to expr prod

    Add ; to First(stmt)

  • 7/28/2019 901440 Syntax Analyzer2

    18/54

    Exercise: computing First and Follow

    Given the following Grammar1 ETE

    2 E+TE | -TE |

    3 TFT

    4 T*FT | /FT | 5 F ( E ) | id | num

    Compute FIRST and FOLLOW for all non-terminals

    FIRST(E) = {(, id,num}

    FIRST(E) = {+,-,}FIRST(T) = {(,id,num}

    FIRST(T) = {*,/,}

    FIRST(F) = {(, id,num}

    Follow(E) = {),$}

    Follow(E) = {),$}Follow(T) = {+,-,),$}

    Follow(T) = {+,-,),$}

    Follow(F) = {*,/,+,-,),$}

  • 7/28/2019 901440 Syntax Analyzer2

    19/54

    Construction of a predictive parsing table

  • 7/28/2019 901440 Syntax Analyzer2

    20/54

    Notes:

    For some grammars, however, Mmay have some entries

    that are multiply defined.

    For example: if G is left-recursive or ambiguous, then M[A,d]will

    have at least one multiply defined entry.

    Although left-recursion elimination and left factoring areeasy to do, there are some grammars for which no amount

    of alteration will produce an LL(1) grammar.

    The language in the following example (which abstracts the

    dangling-else) problem has no LL(1) grammar at all.

  • 7/28/2019 901440 Syntax Analyzer2

    21/54

    Cont.

    The parsing table for this grammar appears below.

    The entry for M[S' ,e] contains both S'eSand S'.

    The grammar is ambiguous and the ambiguity is manifested by a

    choice in what production to use when an e (else) is seen. We canresolve this ambiguity

    LL(1) G

  • 7/28/2019 901440 Syntax Analyzer2

    22/54

    LL(1) Grammar: scanning the input from left to right (L), producinga leftmost derivation (L), using one input symbol of lookahead at each step to make

    parsing action decisions (1).

    A Grammar whose parsing table has no multiple-defined

    entries is called LL(1).

    The class of LL(1) grammars is rich enough to cover most

    programming constructs, although care is needed in writing a

    suitable grammar for the source language

    Properties of LL(1) Parsers

    A correct, leftmost parse tree is guaranteed (perform left

    recursion elimination and left factoring)

    All grammars in the LL(1) class should be unambiguous

    All LL(1) parsers operate in linear time, and at most, linear

    space.

  • 7/28/2019 901440 Syntax Analyzer2

    23/54

    LL(1) grammar properties (cont.)

  • 7/28/2019 901440 Syntax Analyzer2

    24/54

    Exercise 1

    A|

    Which two of the following cases may cause the

    grammar to beNOTLL(1)

    1. a in FIRST() and b in FIRST()2. a and b are both in FIRST()

    3. a in both FIRST() and FIRST()

    4. a in FIRST() and FOLLOW(A), inFIRST()

    3 and 4

  • 7/28/2019 901440 Syntax Analyzer2

    25/54

    Exercise 2

    stmt if expr then stmt else stmt

    | if expr then stmt

    S iEtS | iEtSeS | a

    E bAfter left factoring, the new CFG is

    SiEtSS | a

    S eS | e

    E b

    Why is this CFG not LL(1)?

  • 7/28/2019 901440 Syntax Analyzer2

    26/54

    Exercise 2

    SiEtSS | aS eS |

    E

    FIRST(S) = FIRST(S) =

    FOLLOW(S) =

    FOLLOW(S) = Where is the conflict?

    {i, a}{e, }

    {e,$}

    {e,$}

    SeS and Sare both entered for M[S,e] entry

  • 7/28/2019 901440 Syntax Analyzer2

    27/54

    Exercise 3

    Show the following CFG is not LL(1)

    stmtlabel unlabeled_stmt

    labelid: | e

    unlabeled_stmtid := expr

    id is in both First(label) and Follow(label)

    This means both label

    id: and label

    will be inserted into parser table entry M[label,id]

  • 7/28/2019 901440 Syntax Analyzer2

    28/54

    Bottom-Up Parsing

    A bottom-up parse corresponds to the construction of aparse tree for an input string beginning at the leaves (thebottom) and working up towards the root (the top). It isconvenient to describe parsing as the process of building

    parse trees.

    Bottom-up parsing during a left-to-right scan of the inputconstructs a rightmost derivation in reverse.

  • 7/28/2019 901440 Syntax Analyzer2

    29/54

    We can think of bottom-up parsing as the processof "reducing" a string w to the start symbol of thegrammar. At each reduction step, a specificsubstring matching the body of a production is

    replaced by the nonterminal at the head of thatproduction.

    a reduction is the reverse of a step in a derivation(recall that in a derivation, a nonterminal in asentential form is replaced by the body of one of its

    productions). The goal of bottom-up parsing istherefore to construct a derivation in reverse.

  • 7/28/2019 901440 Syntax Analyzer2

    30/54

    Bottom up parsing algorithms are:

    Shift-reduce parsers

    LR grammars, needs too much work to be build

    by hand, tools called automatic parser generatorsmake it easy to construct efficient LR parsers

    from suitable grammars.

    The following derivation corresponds to the parseid*id

    This derivation is in fact a rightmost derivation.

  • 7/28/2019 901440 Syntax Analyzer2

    31/54

    Handler

    Informally, a "handle" is a substring that matchesthe body of a production, and whose reduction

    represents one step along the reverse of a rightmost

    derivation.

  • 7/28/2019 901440 Syntax Analyzer2

    32/54

    Shift-Reduce Parsing

    Shift-reduce parsing is a form of bottom-up parsing in which a stack holdsgrammar symbols and an input buffer

    holds the rest of the string to be parsed.As we shall see, the handle alwaysappears at the top of the stack justbefore it is identified as the handle.

    $ used to mark the bottom of the stackand also the right end of the input.

  • 7/28/2019 901440 Syntax Analyzer2

    33/54

    While the primary operations are shift and reduce, there are actually fourpossible actions a shift-reduce parser can make:1. Shif t . Shift the next input symbol onto the top of the stack.2. Reduce. The right end of the string to be reduced must be at the top of the stack.Locate the left end of the string within the stack and decide with what nonterminal toreplace the string.3.Accept.Announce successful completion of parsing.4. Error. Discover a syntax error and call an error recovery routine.

    The use of a stack in shift-

    reduce parsing is justified by animportant fact: the handle willalways eventually appear on

    top of the stack, never inside.

  • 7/28/2019 901440 Syntax Analyzer2

    34/54

    Problem of shift reduce

    Conflicts During Shift-Reduce Parsing

    Every shift-reduce parser for such a grammarcan reach a configuration in which the parser,

    knowing the entire stack contents and the nextinput symbol, cannot decide whether to shift orto reduce (a shift/reduce conflict), or cannotdecide which of several reductions to make (a

    reduce/reduce conflict)

  • 7/28/2019 901440 Syntax Analyzer2

    35/54

    LR Parsing: Simple LR

    The most prevalent type of bottom-up parser today

    is based on a concept called LR(k) parsing; the "L"

    is for left-to-right scanning of the input, the "R" for

    constructing a rightmost derivation in reverse, andthe k for the number of input symbols of lookahead

    that are used in making parsing decisions.

    The cases k = 0 or k = 1 are of practical interest,

    and we shall only consider LR parsers with k

  • 7/28/2019 901440 Syntax Analyzer2

    36/54

    LR parsing is attractive for a variety of reasons: LR parsers can be constructed to recognize virtually all

    programminglanguage

    constructs for which context-free grammars can be written. Non- LRcontext-free grammars exist, but these can generally be avoided for typicalprogramming-language constructs.

    The LR-parsing method is the most general nonbacktracking shift-reduce

    parsing method known An LR parser can detect a syntactic error as soon as it is possible to do so

    on a left-to-right scan of the input.

    The class of grammars that can be parsed using LR methods is a propersuperset of the class of grammars that can be parsed with predictive or LLmethods. For a grammar to be LR(k), we must be able to recognize theoccurrence of the right side of a production in a right-sentential form, with kinput symbols of lookahead.

    This requirement is far less stringent than that for LL(k) grammars wherewe must be able to recognize the use of a production seeing only the first ksymbols of what its right side derives. Thus, LR grammars can describemore languages than LL grammars.

  • 7/28/2019 901440 Syntax Analyzer2

    37/54

    The principal drawback of the LR method is that it istoo much work to construct an LR parser by hand for atypical programming-language grammar.

    A specialized tool, an LR parser generator, is needed.Fortunately, many such generators are available, one

    of the most commonly used ones, Yacc. Such agenerator takes a context-free grammar andautomatically produces a parser for that grammar.

    If the grammar contains ambiguities or otherconstructs that are difficult to parse in a left-to-right

    scan of the input, then the parser generator locatesthese constructs and provides detailed diagnosticmessages.

  • 7/28/2019 901440 Syntax Analyzer2

    38/54

    Items and the LR(0) Automaton

    An LR parser makes shift-reduce decisions by maintaining

    states to keep track of where we are in a parse. An LR(0) item of a grammar G is a production of G with a

    dot at some position of the body

    Example :the production yields

    One collection of sets of LR(0) items, called the canonicalLR(0) collection, provides the basis for constructing a

    deterministic finite automaton that is used to make parsing

    decisions. called an LR(0) automata

  • 7/28/2019 901440 Syntax Analyzer2

    39/54

    augmented grammar

    If G is a grammar with start symbol S, then G', theaugmented grammarfor G, is G with a new start symbol Stand production S' S.

    The purpose of this new starting production is to indicate to

    the parser when it should stop parsing and announceacceptance of the input. That is, acceptance occurs whenand only when the parser is about to reduce by S' S

  • 7/28/2019 901440 Syntax Analyzer2

    40/54

    Closure of Item Sets

  • 7/28/2019 901440 Syntax Analyzer2

    41/54

  • 7/28/2019 901440 Syntax Analyzer2

    42/54

    The Function GOT0

  • 7/28/2019 901440 Syntax Analyzer2

    43/54

    Example

    For the grammar :

    then CLOSURE(I) contains

    the set of itemsI0. in Fig.

    4.31. represent the

    canonical collection set of

    LR(0)

  • 7/28/2019 901440 Syntax Analyzer2

    44/54

    the sets of items of interest into two classes

    could be divided :

    1. Kernel items: the initial item, S' .S, andall items whose dots are not at the left end.

    2. Non-kernel items: all items with their dots at

    the left end, except for S'.S.

  • 7/28/2019 901440 Syntax Analyzer2

    45/54

    The LR-Parsing Algorithm

    A schematic of an LR parser consists of an input, an output,

    a stack, a driver program, and a parsing table that has two

    pasts (ACTIONa nd GOTO).

  • 7/28/2019 901440 Syntax Analyzer2

    46/54

    Structure of the LR Parsing Table

  • 7/28/2019 901440 Syntax Analyzer2

    47/54

    SLR-parsing algorithm. (simple LR)

    Example: parse id * id + id using the following

  • 7/28/2019 901440 Syntax Analyzer2

    48/54

    Example: parse id * id + id, using the following

    grammar

  • 7/28/2019 901440 Syntax Analyzer2

    49/54

    0id5

    0F3

    See page254 f th

  • 7/28/2019 901440 Syntax Analyzer2

    50/54

    254 of thebook

  • 7/28/2019 901440 Syntax Analyzer2

    51/54

  • 7/28/2019 901440 Syntax Analyzer2

    52/54

  • 7/28/2019 901440 Syntax Analyzer2

    53/54

    Canonical LR(1) parser solve such problem since it provide 1 look-aheadsymbol

  • 7/28/2019 901440 Syntax Analyzer2

    54/54

    Error Recovery

    Selection of a synchronizing set

    Place all symbols in FOLLOW(A) into the

    synchronizing set for non-terminal A.

    Add keywords that begin statements to the set

    Add symbols in FIRST(A)

    May re-parse A rather than pop A

    If a non-terminal can generate e, then the production can be

    used as a default. Pop and continue parsing