Yu-Chen Kuo1 Chapter 4 Syntax Analysis. Yu-Chen Kuo2 4.1 The Role of The Parser A parser obtains a string of tokens from the lexical analyzer and verifies

Yu-Chen Kuo 1

Chapter 4

Syntax Analysis

Yu-Chen Kuo 2

4.1 The Role of The Parser

• A parser obtains a string of tokens from the lexical analyzer and verifies the string can be generated by the grammar for the source language.

• We expect the parser to report any syntax errors in an intelligible fashion. It should also recover from commonly occurring errors so that it can continue processing the remainder of its input.

Yu-Chen Kuo 3

4.1 The Role of The Parser

Yu-Chen Kuo 4

Three Types of Parsers

1. CYK algorithm and Early’s algorithm: inefficient to use in production compilers

2. Top-down method

3. Bottom-up method

Yu-Chen Kuo 5

Syntax Error Handling

• Lexical error: – misspelling an identifier, keyword, or operator

• Syntactic error:– an arithmetic expression with unbalanced

parentheses

• Semantic error: – an operator applied to an incompatible operand

• Logical error: – an infinitely recursive call

Yu-Chen Kuo 6

Syntax Error Handling (Cont.)

• The error handler in a parser has simple-to-state goals:– It should report the presence of errors clearly

and accurately– It should recover from each error quickly

enough to be able to detect subsequence errors– It should not significantly slow down the

processing of correct programs

Yu-Chen Kuo 7

Error-Recovery Strategies

• Panic mode– Discard the input symbol until one of a designated set

of synchronizing tokens is found– synchronizing token: ; end – Guarantee not to go into an infinite loop

• Phrase level– Parser may perform local correction– replace a prefix of the remaining input by some allowed

string; – replace , by ; – delete an extraneous ; or insert missing ; – May lead to an infinite loop if we always insert

something on the input ahead the current input symbol

Yu-Chen Kuo 8

Error-Recovery Strategies (cont.)

• Error production– Grammars to produce errors

• Global correction– Given an incorrect input string x and grammar

G, find a parse tree for a related string y, such that the number of insertions, deletions, and changes of tokens required to transform x into y is as small as possible

– Too costly

Yu-Chen Kuo 9

4.2 Context-Free Grammars

• stmt if expr then stmt else stmt1. Terminals: tokens• if, then, else

2. Noterminals: set of strings• expr, stmt

3. Start symbol• stmt

4. Productions

Yu-Chen Kuo 10

Example 4.2

expr expr op exprexpr (expr)expr - exprexpr id op +op -op *op /op • Terminals: id, +, -, *, / . • Noterminals: expr, op• Start symbol: expr

Yu-Chen Kuo 11

Notational Conventions

1. There symbols are terminals:i) Lower-case letters: a, b, c

ii) Operator symbols: +, -

iii) Punctuation symbols: parentheses, comma

iv) Digits: 0, 1, …,9

v) Boldface strings: id, if

2. There symbols are nonterminal:i) Upper-case letters: A, B, C

ii) The letter S: start symbol

iii) Lower-case italic names: expr, stmt

Yu-Chen Kuo 12

Notational Conventions (cont.)

3. Upper-case letters late in alphabet: X, Y, Z, represent grammar symbols (terminals or nonterminal)

4. Lower-case letters late in alphabet: u, v,…z, represent string of terminals

5. Lower-case Greek letters : , , , represent string of grammar symbols

6. A-productions (all productions): A 1| 2|…| k

7. Start symbol: the left side of the first production

Yu-Chen Kuo 13

Example 4.3

E EAE | (E) | - E | id

A + | - | * | / | By notational conventions −

Nonterminals: E, A

Terminals: remaining symbols

Yu-Chen Kuo 14

Derivations

E E+E | E*E| (E) | - E | id• E derives -E

– E - E

• The derivation of -(id) from E− E - E - (E) -(id)

A , if A : one step derivation• : zero or more steps derivations• : one or more steps derivations

*

Yu-Chen Kuo 15

Derivations (cont.)

, for any string • If , , then • L(G) denotes the language generated by G. L(G)

contains all terminal symbols. w L(G), if S w. String w is call a sentence of G.

• S , may contain nonterminals. We call is a sentential form of G.

• E.g., -(id + id) is a sentence of the grammar, because E -(id + id)

*

*

*

*

Yu-Chen Kuo 16

Leftmost & Rightmost Derivations

• Leftmost derivation ( )– -(E+E) -(id+E) -(id+id)

• Rightmost derivation ( )– -(E+E) -(E+id) -(id+id)

• S . We call is a left-sentential form of G.• S . We call is a cannonical-sentential for

m of G.

lm

lm

rm

rm

Yu-Chen Kuo 17

Parse Tree and Derivations

Yu-Chen Kuo 18

Parse Tree and Derivations (cont.)

Yu-Chen Kuo 19

Ambiguity

• More than one parse tree for some sentences• More than leftmost derivation for some

sentences• More than rightmost derivation for some

sentences

Yu-Chen Kuo 20

4.3 Regular Expression vs. Context-free grammar

• Every language that can be described by a regular expression can also be described by a context-free grammar

– (a|b)*abb

– A0 aA0 | bA0 | aA1

A1 bA2

A2 bA3

A3

• Every regular set is a context-free language

Yu-Chen Kuo 21

Why use regular expression to define the lexical syntax of a language ?

• Why not use CFG for the lexical syntax

1. Lexical rules of a language are frequently quite simple. We do not need a powerful grammar.

2. Regular expression provide a more concise and easier to understand notation for tokens

3. An efficient lexical analysis can be constructed automatically from regular expressions

4. Separating the syntactic structure of a language into lexical and nonlexical parts

Yu-Chen Kuo 22

Why use regular expression to define the lexical syntax of a language ?

• Regular expressions are most useful for describing structure of lexical constructs such as identifies, constants, keywords

• Grammars are most useful for describing nested structure of lexical constructs such balanced parentheses, matching begin-end’s, corresponding if-then-else’s.

• Nested structures can not be described by regular expressions.

Yu-Chen Kuo 23

Verifying the Language Generated by a Grammar

• Proof that L(G) = L– Every string generated by G is in L– Every string in L can be generated by G

• S (S)S | , generates all string of balanced ( )– Every sentence derived from S is balanced by induction

• S (S)S * (x)S * (x)y (n steps)• S * x (less than n setps and must be balanced)• S * y (less than n setps and must be balanced)

– Every balanced string length 2n is derivable from S• w = (x)y of length 2n• x and y are length of less than 2n. They are both balanced and deri

vable from S• S (S)S * (x)S * (x)y =w

Yu-Chen Kuo 24

Eliminating Ambiguity

stmt if expr then stmt

| if expr then stmt else stmt

| other

Yu-Chen Kuo 25

Eliminating Ambiguity (cont.)

• Disambiguating rule: match each else with the closest previous unmatched then

• The statement between a then and an else must be matched

stmt matched_stmt | unmatched_stmtmatched_stmt if expr then matched_stmt else matched_stmt

| otherunmatched_stmt if expr then matched_stmt else unmatched_stmt

| if expr then stmt

Yu-Chen Kuo 26

Eliminating Immediate Left Recursion

• A grammar is left recursive if it has a production A+A

• Top-down parsing methods cannot handle left-recursion grammars because top-down parsing is corresponding to the leftmost derivation.

nmAAAA |...||||...|| 2121

|'|...|'|''

'|...|'|'

21

21

AAAA

AAAA

m

n

Yu-Chen Kuo 27

Eliminating Immediate Left Recursion (cont.)

• Non-immediate left recursionS Aa | b

A Ac | Sd |

S Aa Sda

Yu-Chen Kuo 28

Eliminating General Left Recursion

• Input Grammar G with no cycle (A+A) or -production

Yu-Chen Kuo 29

Eliminating General Left Recursion (cont.)

• Non-immediate left recursion S Aa | b A Ac | Sd |

A Ac | Aad | bd | S Aa | b

A bdA’ A’ cA’ | adA’ |

Yu-Chen Kuo 30

Eliminating Left Factoring

• When it is not clear which of two alternative productions to use to expand a nonterminal A. We rewrite A-production to defer the decision until we have seen enough of the input.

stmt if expr then stmt | if expr then stmt else stmt

stmt if expr then stmt S’ S’ else stmt |

• A 1 | 2 |…| n | A A’| A’ 1 | 2 |…| n

Yu-Chen Kuo 31

Non-Context-Free Language Constructs

• L1={wcw | w is in (a|b)*} is not context-free• L1’={wcwR | w is in (a|b)*} is context-free

– S aSa | bSb | c• L2 = is not context-free• L2’ = is context-free

– S aSd | aAd

A bAc | bc• L2’’= is context-free

}1,1|{ mndcba mnmn

}1,1|{ mndcba nmmn

}1,1|{ mndcba mmnn

Yu-Chen Kuo 32

Non-Context-Free Language Constructs

• L3 = is not context-free• L3’= is context-free

– S aSb | ab

• Context-free grammar can keep count of two items but not three.

• Regular expression cannot keep count.

}0|{ ncba nnn

}1|{ nba nn

Yu-Chen Kuo 33

Top-Down Parsing

• Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string.

• It constructs a parser tree for the input string to root and creating the nodes of the parser tree in preorder.

Yu-Chen Kuo 34

Recursive Descent Parsing

• A general top-down parsing that may involve backtracking

• E.g., S cAd

A ab| a , w=cad

Yu-Chen Kuo 35

Predictive Parsers

• By carefully writing a grammar, eliminating left recursion, and left factoring, we obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking. (predictive parser)

• Predictive Parser is implemented by recursive procedures

Yu-Chen Kuo 36

Predictive Parsers (cont.)

type simple | id | array [simple] of type

simple integer | char | num dotdot num

Yu-Chen Kuo 37

Transition Diagrams for Predictive Parsers

• We can create a transition diagram for a predictive parsers

• For each nonterminal A:1. Create an initial and final state2. For each production A X1X2…Xn, create a path

from the initial to the final state, with edges labeled X

1, X2, …, Xn

• Based on transition diagram to match terminals again lookahead input symbols

Yu-Chen Kuo 38

Transition Diagrams for Predictive Parsers (cont.)

Yu-Chen Kuo 39


Yu-Chen Kuo 40


Yu-Chen Kuo 41

Nonrecursive Predictive Parsing

• It is possible to build a nonrecursive predictive parser by maintaining a stack explicitly, rather than via recursive calls.

• The key problem during predictive parsing is that of determining the production to be applied for a nonterminal. The nonrecursive parser looks up the production to be applied in a parsing table.

Yu-Chen Kuo 42

Nonrecursive Predictive Parsing(Cont.)

Yu-Chen Kuo 43


• The parser has an input buffer, a stack, a parsing table, and an output stream.

• The input buffer contains the strings to be parsed followed by $, a symbol used to indicate the end of the input string.

• The stack contains a sequence of grammar symbol with $ on the bottom, indicating the bottom of the stack, Initially, the stack contains the start symbol S of the grammar on top of $.

Yu-Chen Kuo 44


• The output stream show the derivation steps for the grammar to produce the input string.

• The parser table is a two-dimensional array M[A, a] to show the stack action for a nonterminal A in the top of stack to meet a terminal a or the symbol $.

Yu-Chen Kuo 45

Predictive Parsing Algorithm

• Input. A string w and a parsing table M for G

• Output. A leftmost derivation of w, if wL(G)

• Method. – Put $S on stack where S is the start symbol of G– Put w$ in the input buffer– Execute the predictive parsing program (Fig. 4.14)

Yu-Chen Kuo 46

Predictive Parsing Program

Yu-Chen Kuo 47

Example

• Consider non-left-recursive grammar for arithmetic expression

E TE’

E’ + TE’ |

T FT’

T’ * FT’ | F (E) | id

Yu-Chen Kuo 48

Example (parsing table M)

Yu-Chen Kuo 49

Example (Stack Moves)

Yu-Chen Kuo 50

FIRST and FOLLOW

• The construction of a predictive parser is aided by FIRST and FOLLOW functions.

• These functions help us to construction the predictive parser table.

• FOLLOW function can also be used as synchronizing tokens during panic-mode error recovery.

Yu-Chen Kuo 51

FIRST function

• If is a string of a grammar symbols, FIRST() be the set of terminals that begin the strings derived from .

• If * , FIRST().

Yu-Chen Kuo 52

FIRST Sets

• Compute FIRST(X) for all grammar symbols X, by the following rules until no terminals or can be added to any FIRST(X)

1. If X is a terminal, then FIRST(X)={X}.

2. If X , FIRST(X)

3. If XY1Y2…Yk, then aFIRST(X) where a FIRST(Yi) and FIRST(Y1), FIRST(Y2) ,…,FIRST(Yi-1), Y1Y2…Yi-1 *

Yu-Chen Kuo 53

FIRST sets (cont.)

3. If FIRST(Yj), for all j=1,2,..,k, then FIRST(X).

• Everything in FIRST(Y1) is also in FIRST(X). If Y1 does not derive , nothing more added to FIRST(X). Otherwise, we add FIRST(Y2), and so on.

• For FIRST(X1X2…Xn), FIRST(X1) FIRST(X1X2…Xn), FIRST(X2) FIRST(X1X2…Xn), if FIRST(X1) and

so on. FIRST(X1X2…Xn) if FIRST(Xi) for all i.

Yu-Chen Kuo 54

FOLLOW function

• Define FOLLOW(A), for nonterminal A, to be the set of terminal a that can appear immediately to right of A.

• S*Aa, aFOLLOW(A).

• If A can be the rightmost symbol in some sentential form, then $ FOLLOW(A).

Yu-Chen Kuo 55

FOLLOW Sets

• Compute FOLLOW(A), for nonterminal A, by the following rules until nothing can be added to any FOLLOW set.

1. If S is a start symbol, $FOLLOW(S).2. If AB, FIRST()FOLLOW(B).3. If AB or AB and FIRST(), FO

LLOW(A) FOLLOW(B). (FOLLOW(B) may not FOLLOW(A))

Yu-Chen Kuo 56

Example

E TE’E’ + TE’ | T FT’

T’ * FT’ | F (E) | id

• FIRST(E)=FIRST(T)=FIRST(F)={(, id}• FIRST(E’)={+, }• FIRST(T’)={*, }• FOLLOW(E)=FOLLOW(E’)={ ), $}• FOLLOW(T)=FOLLOW(T’)={+, ), $}• FOLLOW(F)={+, *, ), $}

Yu-Chen Kuo 57

Construction of Predictive Parsing Table

• Suppose A , aFIRST(). Then, the parser will expand A by when the current input symbol is a.

• If A , = or *, then the parser will expand A by when the input symbol is in FOLLOW(A) or if $ on the input has been reached and $FOLLOW(A).

Yu-Chen Kuo 58

Construction of Predictive Parsing Table (Algorithm)

• Input. Grammar G.• Output. Parsing table M.• Method.1. For each production A , do steps 2 and 3.2. For each terminal aFIRST(), add A to

M[A, a]3. If FIRST(), add A to M[A, b] for each ter

minal bFOLLOW(A). If FIRST(), $ FOLLOW(A), add A to M[A, $].

4. Make each undefined entry of M be error.

Yu-Chen Kuo 59

Example

Yu-Chen Kuo 60

LL(1)

• A grammar whose parsing table has no multiply-defined entries is said to be LL(1).

• The first “L” means scanning the input form left to right.

• The second “L”, means a leftmost derivation.

• And, “1” means using one input symbol of lookahead at each step.

Yu-Chen Kuo 61

Example (multiply-defined entry)

S iEtSS’ | a (ambiguous)

S’ eS | FIRST(S)={i,a}, FIRST(S’)={e, }

E b FOLLOW(S)={e,$}, FOLLOW(S’}={e,$}

Yu-Chen Kuo 62

LL(1) Properties

• No ambiguous or left-recursive grammar can be LL(1).

• A grammar G is LL(1) if and only if A | , the following conditions hold:

1. FIRST() FIRST()=.

2. At most one of * or *.

3. If *, then FIRST() FOLLOW(A) )=. • If-then-else statement violates the condition 3, so

not LL(1).

Yu-Chen Kuo 63

Error Recovery in Predictive Parsing

• In nonrecursive predictive parsing, an error is detected in one of the following two situations:

1. When the terminal on top of the stack does not match the next input symbol

2. When nonterminal A is on top of the stack, a is the next input symbol, and the parsing table entry M[A,a] is empty.

Yu-Chen Kuo 64

Error Recovery in Predictive Parsing (cont.)

• Panic-mode error recovery is based on the idea of skipping symbols on the input until a token in a selected set of synchronizing tokens appears.

• Its effectiveness depends on the choice on the synchronizing set.

• Some heuristics are as follows.

Yu-Chen Kuo 65


1. We place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it is likely that parsing can continue.

2. There is hierarchical structure on constructs in a language; e.g., expressions within blocks, and so on. We can add to the synchronizing set of a lower construct the symbols that begin higher constructs.

Yu-Chen Kuo 66


3. If we add symbols in FIRST(A) to the synchronizing set of nonterminal A, then it may be possible to resume parsing according to A if a symbol in FIRST(A) appears in the input.

4. If a nonterminal can generate the empty string, then the production deriving can be used as a default. Doing so may postpone some error detection, but cannot cause an error to be missed.

Yu-Chen Kuo 67


5. If a terminal on top of the stack, cannot be matched, a simple idea is to pop the terminal, issue a message saying that terminal was inserted, and continue parsing.

Yu-Chen Kuo 68

Example

• Add “sync” to indicate synchronizing tokens obtained from FOLLOW sets.

Yu-Chen Kuo 69

Example

• If M[A,a]=, skip a.

• If M[A,a]=sync, A is popped.

• If a token on top of stack does not match input, pop it.

Yu-Chen Kuo 70

Bottom-Up Parsing

• Shift-reduce parsing is a general style of bottom-up parsing.

• It attempts to construct a parse tree for an input string beginning at the leaves and working up towards the root.

• At each reduction step a particular substring matching the right side of a production is replaced by the nonterminal on the left side of that production.

Yu-Chen Kuo 71

Bottom-Up Parsing (cont.)

• If the substring is chosen correctly at each reduction step, a rightmost derivation is traced out in reverse.

Yu-Chen Kuo 72

Example

• Consider the following grammar

S aABe

A Abc | b

B d

• The sentence “a b b c d e” cab be reduced to S by the following reduction steps:

Yu-Chen Kuo 73

Example

• Consider the following grammar

S aABe

A Abc | b

B d

• The sentence “a b b c d e” cab be reduced to S by the following reduction steps:

Yu-Chen Kuo 74

Example

1. a b b c d e (A b) (handle at position 2)2. a A b c d e (A Abc)3. a A d e (B d)4. a A B e (S aABe)5. S• The reductions trace out the following rightmost de

rivation in reverse:

S rm a A B e a A d e a A b c d e a b b c d e

Yu-Chen Kuo 75

Handles

• Informally, a handle of a string is a substring that matched the right side of a production, and whose reduction to the nonterminal on the left side of the production represents one step along the reverse of a rightmost derivation.

• Formally, a handle of a right-sentential form is a production A and a position of where the string may be found and replace by A to produce the previous right-sentential form in a rightmost derivation .

Yu-Chen Kuo 76

Handles (cont.)

• If S * Aw * w, then A in the position following is a handle of w.

• Note:1. The string w to the right of a handle contains only

terminal symbols.2. If a grammar is unambiguous, then every right-

sentential form of the grammar has exactly one handle; otherwise, some right-sentential forms may have more than one handle.

Yu-Chen Kuo 77

Example

• Consider the following ambiguous grammar• EE+E | E*E | (E) | id• Two rightmost derivation of id1+id2*id31. E E+E E+ E*E E+ E*id3 E+ id2*id3

id1+ id2*id3 – id1 is a handle of the right-sentential form id1+id2*id3– E id, replace id1 by E becomes E+ id2*id3

2. E E*E E*id3 E+ E*id3 E+ id2*id3 id1+ id2*id3 two possible handles

Yu-Chen Kuo 78

Handles (cont.)

Yu-Chen Kuo 79

Handles (cont.)

• The handle represents the leftmost complete subtree consisting of a node and all its children.

• Reducing to A in w can be thought of as “pruning the handle”, removing the children of A from the parse tree.

Yu-Chen Kuo 80

Stack Implementation of Shift-Reduce Parsing

• We implement shift-reduce parsing by using a stack to hold grammar symbols and an input buffer to hold input string w.

• We use $ to mark the end of the stack and the input buffer

STACK INPUT

$ w$

Yu-Chen Kuo 81

Stack Implementation of Shift-Reduce Parsing

• The parser shifts input symbols onto the stack until a handle is on top of stack.

• It reduces to the left side of production A.• Repeats this cycle until it has an error or the

stack contains S and the input buffer is empty.

STACK INPUT$S $

Yu-Chen Kuo 82

Example

Yu-Chen Kuo 83

Conflicts during Shift-Reduce Parsing

• There are context-free grammars for which shift-reduce parsing cannot be used.

• It’s possible to reach a configuration such that knowing the entire stack contents and the next input symbol, cannot decide whether to shift or to reduce (a shift/reduce conflict), or which of several reductions to make (a reduce/reduce conflict)

Yu-Chen Kuo 84

Example of Shift/Reduce Conflict

stmt if expr then stmt

| if expr then stmt else stmt

| other

STACK INPUT

$ … if expr then stmt else …$

• Note that if we resolve the conflict in favor of shifting, the parser will behave naturally.

Yu-Chen Kuo 85

Example of Reduce/Reduce Conflict

(1) stmt → id (parameter_list)(2) stmt → expr := expr(3) parameter_list→ parameter_list , parameter(4) parameter_list→ parameter(5) parameter → id(6) expr → id (expr_list)(7) expr → id(8) expr_list → expr_list , expr(9) expr_list → expr

STACK INPUT$ … id ( id , id )…$

Yu-Chen Kuo 86

LR Parsers

• LR(k) parsing is an efficient, bottom-up parsing technique.

• The “L” stands for left-to-right scanning of the input, the “R” for constructing a rightmost derivation in reverse, and the “k” for the number of input symbols of lookahead that are used in making parsing decisions.

• When (k) is omitted, k is assumed to be 1.

Yu-Chen Kuo 87

LR Parsers (cont.)

• LR parsing can be used to parse a large class of context-free grammars than LL parsing.

• The principal drawback of LR parsing is that is too much work to construct an LR parser by hand for a typical programming language grammar.

• We need a specialized tool an LR parser generator. Fortunately, many such generators are available.

Yu-Chen Kuo 88

The LR Parsing Algorithm

• The schematic form of an LR parser:

Yu-Chen Kuo 89

The LR Parsing Algorithm (cont.)

• An LR parser consists of an input, an output, a stack, a driver program, and a parsing table that has two parts( action and goto)

• The driver program is the same for all LR parsers.• The parsing table changes from one parser to anothe

r.• The parsing program reads tokens from an input buf

fer one at a time.

Yu-Chen Kuo 90


• The parser uses a stack to store a string of the form s0X1s1X2s2…Xmsm, where sm is on top. Each Xi is a grammar symbol and each si is a state symbol. Each state symbol summarizes that information contained in the stack below it.

• The combination of the state symbol on top of the stack and the current input symbol are used to index the parsing table to determine the parsing shift-reduce decision.

Yu-Chen Kuo 91


• The parsing table consists of two parts: a parsing action function and a goto function

• An action table entry can have one of four values:1. shift s, where s is a state2. reduce by a grammar production A 3. accept4. error• The function goto takes a state and a grammar sym

bol arguments and produces a state.

Yu-Chen Kuo 92


• A configuration of an LR parser is a pair whose first component is the stack contents and whose second component is the unexpended input:

(s0X1s1X2s2…Xmsm , ai ai+1…an$)

• The next move of the parser is determined by reading ai, the current input symbol, and sm, the state on top of the stack, and then consulting the parsing action table entry action[sm, ai]

Yu-Chen Kuo 93


• A configurations resulting after each of the four types of move are as follows:

1. If action[sm , ai ] = shift s, the parser executes a shift move, entering the configuration

(s0X1s1X2s2…Xmsm ai s , ai+1…an$)

Here the parser has shifted both the current input symbol ai, and the next state s, which is given in action[sm , ai ], onto the stack; ai+1 becomes the current input symbol.

Yu-Chen Kuo 94


2. If action[sm , ai ] = reduce A, the parser executes a reduce move, entering the configuration

(s0X1s1X2s2…Xm-rsm-r A s, ai ai+1…an$)

where s =goto[sm-r, A] and r is the length of .

Here the parser first popped 2r symbols off the stack ( r state symbols and r grammar symbols), exposing state sm-r. The parser then pushed both A and s, the entry for goto[sm-r, A], onto the stack.

Yu-Chen Kuo 95


3. If action[sm , ai ] = accept, parsing is completed.

4. If action[sm , ai ] = error, the parser has discovered an error and calls an error recovery routine.

Yu-Chen Kuo 96

The LR Parsing Program

Yu-Chen Kuo 97

Example

(1) E E + T

(2) E T

(3) T T * F

(4) T F

(5) F (E)

(6) F id

Yu-Chen Kuo 98

Example (cont.)

Yu-Chen Kuo 99

Example (cont.)

See p.220

Yu-Chen Kuo 100

Constructing LR Parsing Tables

• There are three methods for constructing an LR parsing table for a grammar.

(1) Simple LR (SLR) is the easiest to implement, but least powerful.

(2) Canonical LR is the most powerful, and the most expensive.

(3) Lookahead LR(LALR) is intermediate in power and cost.

Yu-Chen Kuo 101

Constructing SLR Parsing Tables

1) FOLLOW(A) for every nonterminal A in G.

2) The augmented grammar G’

3) The canonical collection of sets of LR(0) items C

4) The transition diagram for viable prefixes

5) The parsing table action and goto function

Yu-Chen Kuo 102

Example

(1) E E + T

(2) E T

(3) T T * F

(4) T F

(5) F (E)

(6) F id

Yu-Chen Kuo 103

Step 1: FOLLOW sets for Nonterminals

• FOLLOW(E) = { +, ), $}

• FOLLOW(T)=FOLLOW(F)={+,*,),$}

Yu-Chen Kuo 104

Step 2: The Augment Grammar

• If G is a grammar with start symbol S, then G’, the argument grammar for G, is G with a new start symbol S’ and productions S’S.

• The argument grammar is as follows:

E’ E

E E + T | T

T T * F | F

F (E) | id

Yu-Chen Kuo 105

Step 3: Sets of LR(0) Items

• An LR(0) item( item for short) of a grammar G is a production of G with a dot at some position of the right side.

• The production AXYZ yields the four items• A •XYZ A X • YZ

A XY•Z A XYZ •• The production A generates only one item

A •

Yu-Chen Kuo 106

The Closure Operation

• If I is a set of LR(0) items for a grammar G, then closure(I) is the set of items constructed from I by the following two rules:

1. Initinally, every item in I is added to closure(I).

2. If A•B is closure(I) and B is a production in G, then add the item B • to closure(I), if it is not already there.

3. We apply this rule until no more new items can be added to closure(I).

Yu-Chen Kuo 107

Example

• If I is a set of one item {[E’•E]}, then closure(I) contains the items

E’ • E E • E + T E • T T • T * F T • F F • (E) F • id

Yu-Chen Kuo 108

The Goto Operation

• If I is a set of items and X is a grammar symbol, then goto(I,X) is the closure of the set of all items [AX•] such that [A •X] is in I.

Yu-Chen Kuo 109

The Goto Operation

• If I is a set of items {[E’•E], [E E •+T]}, then goto(I,+) consists of

E E + •T

T •T * F

T • F

F • (E)

F • id

Yu-Chen Kuo 110

The Sets-of-Items Construction

• The algorithm to construct C, the canonical collection of sets of LR(0) items (all possible items) for a augumented grammar G’, is shown below.

Yu-Chen Kuo 111

Example

• closure ({[E’•E]})=I0: E’ •E E • E + T

E • TT • T * FT • FF • (E) F • id

• goto(I0, E)=I1: E’ E •

E E • + T

Yu-Chen Kuo 112

Example (cont.)

• goto(I0, T)= I2: E T •

T T • * F

• goto(I0, F)= I3: T F •

• goto(I0, ( )= I4: F (• E)

E • E + T E • T T • T * F

F • (E) F • id

• goto(I0, id)= I5: F id •

Yu-Chen Kuo 113

Example (cont.)

• goto(I1, +)= I6: E E + • T

T • T * F T •F F • (E)

F • id• goto(I2, *)= I7: T T * • F

F • (E) F • id

Yu-Chen Kuo 114

Example (cont.)

• goto(I4, E)= I8 : F • (E)

E E • + T

• goto(I4, T)= I2

• goto(I4, F)= I3

• goto(I4, ( )= I4

• goto(I4, id)= I5

• goto(I6, T)= I9 : E E + T •

T T • * F

Yu-Chen Kuo 115

Example (cont.)

• goto(I6, F)= I3

• goto(I6, ( )= I4

• goto(I6, id)= I5

• goto(I7, F )= I10 : T T * F •

• goto(I7, ( )= I4

• goto(I7, id )= I5

• goto(I8, ) )= I11 : F (E) •

• goto(I8, + )= I6

• goto(I9, * )= I7

Yu-Chen Kuo 116

Step 4: The Transition Diagram

• The goto functions for the canonical collection of sets of items be shown as a transition diagram.

Yu-Chen Kuo 117

Step 4: The Transition Diagram (cont.)

Yu-Chen Kuo 118

Step 4: The Transition Diagram (cont.)

Yu-Chen Kuo 119

Step 5: The Parsing Table

• State i is construction form Ii.

1) The parsing actions for state i are determinated as follows:

a) If [A•a] is in Ii and goto(Ii, a)=Ij, set action[i,a] to “shift j”. Here a must be a terminal.

b) If [A•] is in Ii, set action [i, a] to “reduce A” for all a in FOLLOW (A). Here A must not S’.

c) If [ S’S•] in Ii, set action[i,$] to “accept”.

Yu-Chen Kuo 120

Step 5: The Parsing Table (cont.)

2) The goto transactions for state i are constructed for all nonterminals A using the rule:

• If goto(Ii, A)=Ij, the goto[i,A]= j.

3) All entries not defined by 1) and 2) are set “error”

4) The start state of the parser is the one constructed from the set of items containing [S’• S]

Yu-Chen Kuo 121

Step 5: The Parsing Table (cont.)

Yu-Chen Kuo 122

Example for a not SLR(1) and unambiguous grammar

• S L = R

• S R

• L * R

• L id• R L

Yu-Chen Kuo 123

Example for a not SLR(1) and unambiguous grammar (cont.)

Yu-Chen Kuo 124

Example for a not SLR(1) and unambiguous grammar (cont.)

• Consider the set of items I2.• I2: S L = R

R L – action[2, =]= “shift 6”– FOLLOW(R) contains =,

set action[2, =]= “reduce R L”• Shift/Reduce Conflict but not ambiguous• In fact, no right-sentential form that begins R=

….. when the viable prefix L only. (*L)• Spilt the state accord to FOLLOW set.

Yu-Chen Kuo 125

Construction Canonical LR Parsing Tables

• An LR(1) item is of the form [A, a] where A is a production and a is a terminal or $.

• The “1” refers to the length of the second component, called the lookahead of the item.

• The lookahead has no effect in an item of the form [A, a], where is not , but an item of the form [A, a] calls for a reduction by A only if the input symbol is a.

Yu-Chen Kuo 126

Construction Canonical LR Parsing Tables

• Thus, we are compelled to reduce by A only on those input symbols a for which [A , a] is an LR(1) item in the state on top of the stack.

• The set of such a’s will always be a subset of FOLLOW(A), but it could be a proper subset.

• The method for constructing the collection of sets of LR(1) items is essentially the same as the way we built the collection of sets of LR(0) items. We only to modify two procedures closure and goto.

Yu-Chen Kuo 127

Construction Canonical LR Parsing Tables (closure function )

Yu-Chen Kuo 128

Construction Canonical LR Parsing Tables (item & goto)

Yu-Chen Kuo 129

Example (cont.)

• Consider the following augmented grammarS’ SS CCC cC | d

• closure ( {[S’ S, $]})= I0: S’ S, $ S CC, $ C cC, c/d C d, c/d

• goto(I0, S)= I1: S’ S , $

Yu-Chen Kuo 130

Example (cont.)

• goto (I0, C)= I2: S C C, $ C cC, $

C d, $• goto (I0, c)= I3: C c C, c/d

C cC, c/d C d, c/d

• goto(I0, d)= I4: C d , c/d• goto (I2, C)= I5: S C C , $

Yu-Chen Kuo 131

Example (cont.)

• goto (I2, c)= I6: C c C, $ C cC, $ C d, $

• goto (I2, d)= I7: C d , $• goto(I3, C)=I8: C cC, c/d• goto(I3, c)=I3

• goto(I3, d)=I4

• goto(I6, C)=I9: C cC, $• goto(I6, c)=I6

• goto(I6,d)=I7

Yu-Chen Kuo 132

Example (Transition Diagram)

• Compare I6 & I3 with different FOLLOW

Yu-Chen Kuo 133

Example (Transition Diagram)

Yu-Chen Kuo 134

Example (Parsing Table)

Yu-Chen Kuo 135

Canonical LR Parser vs. SLR Parser

• Every SLR(1) grammar is a LR(1) grammar.• Canonical LR Parser (LR(1) grammar) may have

more states than SLR parser (SLR(1)) form the same grammar.

• Exercise: check the following grammar is a LR(1) grammar or not.S L = R

S R

L * R

L id

R L

Documents

Yu-Chen Kuo1 Chapter 4 Syntax Analysis. Yu-Chen Kuo2 4.1 The Role of The Parser A parser obtains a string of tokens from the lexical analyzer and verifies