Upload
kenneth-williams
View
233
Download
1
Tags:
Embed Size (px)
Citation preview
Yu-Chen Kuo 1
Chapter 4
Syntax Analysis
Yu-Chen Kuo 2
4.1 The Role of The Parser
• A parser obtains a string of tokens from the lexical analyzer and verifies the string can be generated by the grammar for the source language.
• We expect the parser to report any syntax errors in an intelligible fashion. It should also recover from commonly occurring errors so that it can continue processing the remainder of its input.
Yu-Chen Kuo 3
4.1 The Role of The Parser
Yu-Chen Kuo 4
Three Types of Parsers
1. CYK algorithm and Early’s algorithm: inefficient to use in production compilers
2. Top-down method
3. Bottom-up method
Yu-Chen Kuo 5
Syntax Error Handling
• Lexical error: – misspelling an identifier, keyword, or operator
• Syntactic error:– an arithmetic expression with unbalanced
parentheses
• Semantic error: – an operator applied to an incompatible operand
• Logical error: – an infinitely recursive call
Yu-Chen Kuo 6
Syntax Error Handling (Cont.)
• The error handler in a parser has simple-to-state goals:– It should report the presence of errors clearly
and accurately– It should recover from each error quickly
enough to be able to detect subsequence errors– It should not significantly slow down the
processing of correct programs
Yu-Chen Kuo 7
Error-Recovery Strategies
• Panic mode– Discard the input symbol until one of a designated set
of synchronizing tokens is found– synchronizing token: ; end – Guarantee not to go into an infinite loop
• Phrase level– Parser may perform local correction– replace a prefix of the remaining input by some allowed
string; – replace , by ; – delete an extraneous ; or insert missing ; – May lead to an infinite loop if we always insert
something on the input ahead the current input symbol
Yu-Chen Kuo 8
Error-Recovery Strategies (cont.)
• Error production– Grammars to produce errors
• Global correction– Given an incorrect input string x and grammar
G, find a parse tree for a related string y, such that the number of insertions, deletions, and changes of tokens required to transform x into y is as small as possible
– Too costly
Yu-Chen Kuo 9
4.2 Context-Free Grammars
• stmt if expr then stmt else stmt1. Terminals: tokens• if, then, else
2. Noterminals: set of strings• expr, stmt
3. Start symbol• stmt
4. Productions
Yu-Chen Kuo 10
Example 4.2
expr expr op exprexpr (expr)expr - exprexpr id op +op -op *op /op • Terminals: id, +, -, *, / . • Noterminals: expr, op• Start symbol: expr
Yu-Chen Kuo 11
Notational Conventions
1. There symbols are terminals:i) Lower-case letters: a, b, c
ii) Operator symbols: +, -
iii) Punctuation symbols: parentheses, comma
iv) Digits: 0, 1, …,9
v) Boldface strings: id, if
2. There symbols are nonterminal:i) Upper-case letters: A, B, C
ii) The letter S: start symbol
iii) Lower-case italic names: expr, stmt
Yu-Chen Kuo 12
Notational Conventions (cont.)
3. Upper-case letters late in alphabet: X, Y, Z, represent grammar symbols (terminals or nonterminal)
4. Lower-case letters late in alphabet: u, v,…z, represent string of terminals
5. Lower-case Greek letters : , , , represent string of grammar symbols
6. A-productions (all productions): A 1| 2|…| k
7. Start symbol: the left side of the first production
Yu-Chen Kuo 13
Example 4.3
E EAE | (E) | - E | id
A + | - | * | / | By notational conventions −
Nonterminals: E, A
Terminals: remaining symbols
Yu-Chen Kuo 14
Derivations
E E+E | E*E| (E) | - E | id• E derives -E
– E - E
• The derivation of -(id) from E− E - E - (E) -(id)
A , if A : one step derivation• : zero or more steps derivations• : one or more steps derivations
*
Yu-Chen Kuo 15
Derivations (cont.)
, for any string • If , , then • L(G) denotes the language generated by G. L(G)
contains all terminal symbols. w L(G), if S w. String w is call a sentence of G.
• S , may contain nonterminals. We call is a sentential form of G.
• E.g., -(id + id) is a sentence of the grammar, because E -(id + id)
*
*
*
*
Yu-Chen Kuo 16
Leftmost & Rightmost Derivations
• Leftmost derivation ( )– -(E+E) -(id+E) -(id+id)
• Rightmost derivation ( )– -(E+E) -(E+id) -(id+id)
• S . We call is a left-sentential form of G.• S . We call is a cannonical-sentential for
m of G.
lm
lm
rm
rm
Yu-Chen Kuo 17
Parse Tree and Derivations
Yu-Chen Kuo 18
Parse Tree and Derivations (cont.)
Yu-Chen Kuo 19
Ambiguity
• More than one parse tree for some sentences• More than leftmost derivation for some
sentences• More than rightmost derivation for some
sentences
Yu-Chen Kuo 20
4.3 Regular Expression vs. Context-free grammar
• Every language that can be described by a regular expression can also be described by a context-free grammar
– (a|b)*abb
– A0 aA0 | bA0 | aA1
A1 bA2
A2 bA3
A3
• Every regular set is a context-free language
Yu-Chen Kuo 21
Why use regular expression to define the lexical syntax of a language ?
• Why not use CFG for the lexical syntax
1. Lexical rules of a language are frequently quite simple. We do not need a powerful grammar.
2. Regular expression provide a more concise and easier to understand notation for tokens
3. An efficient lexical analysis can be constructed automatically from regular expressions
4. Separating the syntactic structure of a language into lexical and nonlexical parts
Yu-Chen Kuo 22
Why use regular expression to define the lexical syntax of a language ?
• Regular expressions are most useful for describing structure of lexical constructs such as identifies, constants, keywords
• Grammars are most useful for describing nested structure of lexical constructs such balanced parentheses, matching begin-end’s, corresponding if-then-else’s.
• Nested structures can not be described by regular expressions.
Yu-Chen Kuo 23
Verifying the Language Generated by a Grammar
• Proof that L(G) = L– Every string generated by G is in L– Every string in L can be generated by G
• S (S)S | , generates all string of balanced ( )– Every sentence derived from S is balanced by induction
• S (S)S * (x)S * (x)y (n steps)• S * x (less than n setps and must be balanced)• S * y (less than n setps and must be balanced)
– Every balanced string length 2n is derivable from S• w = (x)y of length 2n• x and y are length of less than 2n. They are both balanced and deri
vable from S• S (S)S * (x)S * (x)y =w
Yu-Chen Kuo 24
Eliminating Ambiguity
stmt if expr then stmt
| if expr then stmt else stmt
| other
Yu-Chen Kuo 25
Eliminating Ambiguity (cont.)
• Disambiguating rule: match each else with the closest previous unmatched then
• The statement between a then and an else must be matched
stmt matched_stmt | unmatched_stmtmatched_stmt if expr then matched_stmt else matched_stmt
| otherunmatched_stmt if expr then matched_stmt else unmatched_stmt
| if expr then stmt
Yu-Chen Kuo 26
Eliminating Immediate Left Recursion
• A grammar is left recursive if it has a production A+A
• Top-down parsing methods cannot handle left-recursion grammars because top-down parsing is corresponding to the leftmost derivation.
nmAAAA |...||||...|| 2121
|'|...|'|''
'|...|'|'
21
21
AAAA
AAAA
m
n
Yu-Chen Kuo 27
Eliminating Immediate Left Recursion (cont.)
• Non-immediate left recursionS Aa | b
A Ac | Sd |
S Aa Sda
Yu-Chen Kuo 28
Eliminating General Left Recursion
• Input Grammar G with no cycle (A+A) or -production
Yu-Chen Kuo 29
Eliminating General Left Recursion (cont.)
• Non-immediate left recursion S Aa | b A Ac | Sd |
A Ac | Aad | bd | S Aa | b
A bdA’ A’ cA’ | adA’ |
Yu-Chen Kuo 30
Eliminating Left Factoring
• When it is not clear which of two alternative productions to use to expand a nonterminal A. We rewrite A-production to defer the decision until we have seen enough of the input.
stmt if expr then stmt | if expr then stmt else stmt
stmt if expr then stmt S’ S’ else stmt |
• A 1 | 2 |…| n | A A’| A’ 1 | 2 |…| n
Yu-Chen Kuo 31
Non-Context-Free Language Constructs
• L1={wcw | w is in (a|b)*} is not context-free• L1’={wcwR | w is in (a|b)*} is context-free
– S aSa | bSb | c• L2 = is not context-free• L2’ = is context-free
– S aSd | aAd
A bAc | bc• L2’’= is context-free
}1,1|{ mndcba mnmn
}1,1|{ mndcba nmmn
}1,1|{ mndcba mmnn
Yu-Chen Kuo 32
Non-Context-Free Language Constructs
• L3 = is not context-free• L3’= is context-free
– S aSb | ab
• Context-free grammar can keep count of two items but not three.
• Regular expression cannot keep count.
}0|{ ncba nnn
}1|{ nba nn
Yu-Chen Kuo 33
Top-Down Parsing
• Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string.
• It constructs a parser tree for the input string to root and creating the nodes of the parser tree in preorder.
Yu-Chen Kuo 34
Recursive Descent Parsing
• A general top-down parsing that may involve backtracking
• E.g., S cAd
A ab| a , w=cad
Yu-Chen Kuo 35
Predictive Parsers
• By carefully writing a grammar, eliminating left recursion, and left factoring, we obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking. (predictive parser)
• Predictive Parser is implemented by recursive procedures
Yu-Chen Kuo 36
Predictive Parsers (cont.)
type simple | id | array [simple] of type
simple integer | char | num dotdot num
Yu-Chen Kuo 37
Transition Diagrams for Predictive Parsers
• We can create a transition diagram for a predictive parsers
• For each nonterminal A:1. Create an initial and final state2. For each production A X1X2…Xn, create a path
from the initial to the final state, with edges labeled X
1, X2, …, Xn
• Based on transition diagram to match terminals again lookahead input symbols
Yu-Chen Kuo 38
Transition Diagrams for Predictive Parsers (cont.)
Yu-Chen Kuo 39
Transition Diagrams for Predictive Parsers (cont.)
Yu-Chen Kuo 40
Transition Diagrams for Predictive Parsers (cont.)
Yu-Chen Kuo 41
Nonrecursive Predictive Parsing
• It is possible to build a nonrecursive predictive parser by maintaining a stack explicitly, rather than via recursive calls.
• The key problem during predictive parsing is that of determining the production to be applied for a nonterminal. The nonrecursive parser looks up the production to be applied in a parsing table.
Yu-Chen Kuo 42
Nonrecursive Predictive Parsing(Cont.)
Yu-Chen Kuo 43
Nonrecursive Predictive Parsing(Cont.)
• The parser has an input buffer, a stack, a parsing table, and an output stream.
• The input buffer contains the strings to be parsed followed by $, a symbol used to indicate the end of the input string.
• The stack contains a sequence of grammar symbol with $ on the bottom, indicating the bottom of the stack, Initially, the stack contains the start symbol S of the grammar on top of $.
Yu-Chen Kuo 44
Nonrecursive Predictive Parsing(Cont.)
• The output stream show the derivation steps for the grammar to produce the input string.
• The parser table is a two-dimensional array M[A, a] to show the stack action for a nonterminal A in the top of stack to meet a terminal a or the symbol $.
Yu-Chen Kuo 45
Predictive Parsing Algorithm
• Input. A string w and a parsing table M for G
• Output. A leftmost derivation of w, if wL(G)
• Method. – Put $S on stack where S is the start symbol of G– Put w$ in the input buffer– Execute the predictive parsing program (Fig. 4.14)
Yu-Chen Kuo 46
Predictive Parsing Program
Yu-Chen Kuo 47
Example
• Consider non-left-recursive grammar for arithmetic expression
E TE’
E’ + TE’ |
T FT’
T’ * FT’ | F (E) | id
Yu-Chen Kuo 48
Example (parsing table M)
Yu-Chen Kuo 49
Example (Stack Moves)
Yu-Chen Kuo 50
FIRST and FOLLOW
• The construction of a predictive parser is aided by FIRST and FOLLOW functions.
• These functions help us to construction the predictive parser table.
• FOLLOW function can also be used as synchronizing tokens during panic-mode error recovery.
Yu-Chen Kuo 51
FIRST function
• If is a string of a grammar symbols, FIRST() be the set of terminals that begin the strings derived from .
• If * , FIRST().
Yu-Chen Kuo 52
FIRST Sets
• Compute FIRST(X) for all grammar symbols X, by the following rules until no terminals or can be added to any FIRST(X)
1. If X is a terminal, then FIRST(X)={X}.
2. If X , FIRST(X)
3. If XY1Y2…Yk, then aFIRST(X) where a FIRST(Yi) and FIRST(Y1), FIRST(Y2) ,…,FIRST(Yi-1), Y1Y2…Yi-1 *
Yu-Chen Kuo 53
FIRST sets (cont.)
3. If FIRST(Yj), for all j=1,2,..,k, then FIRST(X).
• Everything in FIRST(Y1) is also in FIRST(X). If Y1 does not derive , nothing more added to FIRST(X). Otherwise, we add FIRST(Y2), and so on.
• For FIRST(X1X2…Xn), FIRST(X1) FIRST(X1X2…Xn), FIRST(X2) FIRST(X1X2…Xn), if FIRST(X1) and
so on. FIRST(X1X2…Xn) if FIRST(Xi) for all i.
Yu-Chen Kuo 54
FOLLOW function
• Define FOLLOW(A), for nonterminal A, to be the set of terminal a that can appear immediately to right of A.
• S*Aa, aFOLLOW(A).
• If A can be the rightmost symbol in some sentential form, then $ FOLLOW(A).
Yu-Chen Kuo 55
FOLLOW Sets
• Compute FOLLOW(A), for nonterminal A, by the following rules until nothing can be added to any FOLLOW set.
1. If S is a start symbol, $FOLLOW(S).2. If AB, FIRST()FOLLOW(B).3. If AB or AB and FIRST(), FO
LLOW(A) FOLLOW(B). (FOLLOW(B) may not FOLLOW(A))
Yu-Chen Kuo 56
Example
E TE’E’ + TE’ | T FT’
T’ * FT’ | F (E) | id
• FIRST(E)=FIRST(T)=FIRST(F)={(, id}• FIRST(E’)={+, }• FIRST(T’)={*, }• FOLLOW(E)=FOLLOW(E’)={ ), $}• FOLLOW(T)=FOLLOW(T’)={+, ), $}• FOLLOW(F)={+, *, ), $}
Yu-Chen Kuo 57
Construction of Predictive Parsing Table
• Suppose A , aFIRST(). Then, the parser will expand A by when the current input symbol is a.
• If A , = or *, then the parser will expand A by when the input symbol is in FOLLOW(A) or if $ on the input has been reached and $FOLLOW(A).
Yu-Chen Kuo 58
Construction of Predictive Parsing Table (Algorithm)
• Input. Grammar G.• Output. Parsing table M.• Method.1. For each production A , do steps 2 and 3.2. For each terminal aFIRST(), add A to
M[A, a]3. If FIRST(), add A to M[A, b] for each ter
minal bFOLLOW(A). If FIRST(), $ FOLLOW(A), add A to M[A, $].
4. Make each undefined entry of M be error.
Yu-Chen Kuo 59
Example
Yu-Chen Kuo 60
LL(1)
• A grammar whose parsing table has no multiply-defined entries is said to be LL(1).
• The first “L” means scanning the input form left to right.
• The second “L”, means a leftmost derivation.
• And, “1” means using one input symbol of lookahead at each step.
Yu-Chen Kuo 61
Example (multiply-defined entry)
S iEtSS’ | a (ambiguous)
S’ eS | FIRST(S)={i,a}, FIRST(S’)={e, }
E b FOLLOW(S)={e,$}, FOLLOW(S’}={e,$}
Yu-Chen Kuo 62
LL(1) Properties
• No ambiguous or left-recursive grammar can be LL(1).
• A grammar G is LL(1) if and only if A | , the following conditions hold:
1. FIRST() FIRST()=.
2. At most one of * or *.
3. If *, then FIRST() FOLLOW(A) )=. • If-then-else statement violates the condition 3, so
not LL(1).
Yu-Chen Kuo 63
Error Recovery in Predictive Parsing
• In nonrecursive predictive parsing, an error is detected in one of the following two situations:
1. When the terminal on top of the stack does not match the next input symbol
2. When nonterminal A is on top of the stack, a is the next input symbol, and the parsing table entry M[A,a] is empty.
Yu-Chen Kuo 64
Error Recovery in Predictive Parsing (cont.)
• Panic-mode error recovery is based on the idea of skipping symbols on the input until a token in a selected set of synchronizing tokens appears.
• Its effectiveness depends on the choice on the synchronizing set.
• Some heuristics are as follows.
Yu-Chen Kuo 65
Error Recovery in Predictive Parsing (cont.)
1. We place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it is likely that parsing can continue.
2. There is hierarchical structure on constructs in a language; e.g., expressions within blocks, and so on. We can add to the synchronizing set of a lower construct the symbols that begin higher constructs.
Yu-Chen Kuo 66
Error Recovery in Predictive Parsing (cont.)
3. If we add symbols in FIRST(A) to the synchronizing set of nonterminal A, then it may be possible to resume parsing according to A if a symbol in FIRST(A) appears in the input.
4. If a nonterminal can generate the empty string, then the production deriving can be used as a default. Doing so may postpone some error detection, but cannot cause an error to be missed.
Yu-Chen Kuo 67
Error Recovery in Predictive Parsing (cont.)
5. If a terminal on top of the stack, cannot be matched, a simple idea is to pop the terminal, issue a message saying that terminal was inserted, and continue parsing.
Yu-Chen Kuo 68
Example
• Add “sync” to indicate synchronizing tokens obtained from FOLLOW sets.
Yu-Chen Kuo 69
Example
• If M[A,a]=, skip a.
• If M[A,a]=sync, A is popped.
• If a token on top of stack does not match input, pop it.
Yu-Chen Kuo 70
Bottom-Up Parsing
• Shift-reduce parsing is a general style of bottom-up parsing.
• It attempts to construct a parse tree for an input string beginning at the leaves and working up towards the root.
• At each reduction step a particular substring matching the right side of a production is replaced by the nonterminal on the left side of that production.
Yu-Chen Kuo 71
Bottom-Up Parsing (cont.)
• If the substring is chosen correctly at each reduction step, a rightmost derivation is traced out in reverse.
Yu-Chen Kuo 72
Example
• Consider the following grammar
S aABe
A Abc | b
B d
• The sentence “a b b c d e” cab be reduced to S by the following reduction steps:
Yu-Chen Kuo 73
Example
• Consider the following grammar
S aABe
A Abc | b
B d
• The sentence “a b b c d e” cab be reduced to S by the following reduction steps:
Yu-Chen Kuo 74
Example
1. a b b c d e (A b) (handle at position 2)2. a A b c d e (A Abc)3. a A d e (B d)4. a A B e (S aABe)5. S• The reductions trace out the following rightmost de
rivation in reverse:
S rm a A B e a A d e a A b c d e a b b c d e
Yu-Chen Kuo 75
Handles
• Informally, a handle of a string is a substring that matched the right side of a production, and whose reduction to the nonterminal on the left side of the production represents one step along the reverse of a rightmost derivation.
• Formally, a handle of a right-sentential form is a production A and a position of where the string may be found and replace by A to produce the previous right-sentential form in a rightmost derivation .
Yu-Chen Kuo 76
Handles (cont.)
• If S * Aw * w, then A in the position following is a handle of w.
• Note:1. The string w to the right of a handle contains only
terminal symbols.2. If a grammar is unambiguous, then every right-
sentential form of the grammar has exactly one handle; otherwise, some right-sentential forms may have more than one handle.
Yu-Chen Kuo 77
Example
• Consider the following ambiguous grammar• EE+E | E*E | (E) | id• Two rightmost derivation of id1+id2*id31. E E+E E+ E*E E+ E*id3 E+ id2*id3
id1+ id2*id3 – id1 is a handle of the right-sentential form id1+id2*id3– E id, replace id1 by E becomes E+ id2*id3
2. E E*E E*id3 E+ E*id3 E+ id2*id3 id1+ id2*id3 two possible handles
Yu-Chen Kuo 78
Handles (cont.)
Yu-Chen Kuo 79
Handles (cont.)
• The handle represents the leftmost complete subtree consisting of a node and all its children.
• Reducing to A in w can be thought of as “pruning the handle”, removing the children of A from the parse tree.
Yu-Chen Kuo 80
Stack Implementation of Shift-Reduce Parsing
• We implement shift-reduce parsing by using a stack to hold grammar symbols and an input buffer to hold input string w.
• We use $ to mark the end of the stack and the input buffer
STACK INPUT
$ w$
Yu-Chen Kuo 81
Stack Implementation of Shift-Reduce Parsing
• The parser shifts input symbols onto the stack until a handle is on top of stack.
• It reduces to the left side of production A.• Repeats this cycle until it has an error or the
stack contains S and the input buffer is empty.
STACK INPUT$S $
Yu-Chen Kuo 82
Example
Yu-Chen Kuo 83
Conflicts during Shift-Reduce Parsing
• There are context-free grammars for which shift-reduce parsing cannot be used.
• It’s possible to reach a configuration such that knowing the entire stack contents and the next input symbol, cannot decide whether to shift or to reduce (a shift/reduce conflict), or which of several reductions to make (a reduce/reduce conflict)
Yu-Chen Kuo 84
Example of Shift/Reduce Conflict
stmt if expr then stmt
| if expr then stmt else stmt
| other
STACK INPUT
$ … if expr then stmt else …$
• Note that if we resolve the conflict in favor of shifting, the parser will behave naturally.
Yu-Chen Kuo 85
Example of Reduce/Reduce Conflict
(1) stmt → id (parameter_list)(2) stmt → expr := expr(3) parameter_list→ parameter_list , parameter(4) parameter_list→ parameter(5) parameter → id(6) expr → id (expr_list)(7) expr → id(8) expr_list → expr_list , expr(9) expr_list → expr
STACK INPUT$ … id ( id , id )…$
Yu-Chen Kuo 86
LR Parsers
• LR(k) parsing is an efficient, bottom-up parsing technique.
• The “L” stands for left-to-right scanning of the input, the “R” for constructing a rightmost derivation in reverse, and the “k” for the number of input symbols of lookahead that are used in making parsing decisions.
• When (k) is omitted, k is assumed to be 1.
Yu-Chen Kuo 87
LR Parsers (cont.)
• LR parsing can be used to parse a large class of context-free grammars than LL parsing.
• The principal drawback of LR parsing is that is too much work to construct an LR parser by hand for a typical programming language grammar.
• We need a specialized tool an LR parser generator. Fortunately, many such generators are available.
Yu-Chen Kuo 88
The LR Parsing Algorithm
• The schematic form of an LR parser:
Yu-Chen Kuo 89
The LR Parsing Algorithm (cont.)
• An LR parser consists of an input, an output, a stack, a driver program, and a parsing table that has two parts( action and goto)
• The driver program is the same for all LR parsers.• The parsing table changes from one parser to anothe
r.• The parsing program reads tokens from an input buf
fer one at a time.
Yu-Chen Kuo 90
The LR Parsing Algorithm (cont.)
• The parser uses a stack to store a string of the form s0X1s1X2s2…Xmsm, where sm is on top. Each Xi is a grammar symbol and each si is a state symbol. Each state symbol summarizes that information contained in the stack below it.
• The combination of the state symbol on top of the stack and the current input symbol are used to index the parsing table to determine the parsing shift-reduce decision.
Yu-Chen Kuo 91
The LR Parsing Algorithm (cont.)
• The parsing table consists of two parts: a parsing action function and a goto function
• An action table entry can have one of four values:1. shift s, where s is a state2. reduce by a grammar production A 3. accept4. error• The function goto takes a state and a grammar sym
bol arguments and produces a state.
Yu-Chen Kuo 92
The LR Parsing Algorithm (cont.)
• A configuration of an LR parser is a pair whose first component is the stack contents and whose second component is the unexpended input:
(s0X1s1X2s2…Xmsm , ai ai+1…an$)
• The next move of the parser is determined by reading ai, the current input symbol, and sm, the state on top of the stack, and then consulting the parsing action table entry action[sm, ai]
Yu-Chen Kuo 93
The LR Parsing Algorithm (cont.)
• A configurations resulting after each of the four types of move are as follows:
1. If action[sm , ai ] = shift s, the parser executes a shift move, entering the configuration
(s0X1s1X2s2…Xmsm ai s , ai+1…an$)
Here the parser has shifted both the current input symbol ai, and the next state s, which is given in action[sm , ai ], onto the stack; ai+1 becomes the current input symbol.
Yu-Chen Kuo 94
The LR Parsing Algorithm (cont.)
2. If action[sm , ai ] = reduce A, the parser executes a reduce move, entering the configuration
(s0X1s1X2s2…Xm-rsm-r A s, ai ai+1…an$)
where s =goto[sm-r, A] and r is the length of .
Here the parser first popped 2r symbols off the stack ( r state symbols and r grammar symbols), exposing state sm-r. The parser then pushed both A and s, the entry for goto[sm-r, A], onto the stack.
Yu-Chen Kuo 95
The LR Parsing Algorithm (cont.)
3. If action[sm , ai ] = accept, parsing is completed.
4. If action[sm , ai ] = error, the parser has discovered an error and calls an error recovery routine.
Yu-Chen Kuo 96
The LR Parsing Program
Yu-Chen Kuo 97
Example
(1) E E + T
(2) E T
(3) T T * F
(4) T F
(5) F (E)
(6) F id
Yu-Chen Kuo 98
Example (cont.)
Yu-Chen Kuo 99
Example (cont.)
See p.220
Yu-Chen Kuo 100
Constructing LR Parsing Tables
• There are three methods for constructing an LR parsing table for a grammar.
(1) Simple LR (SLR) is the easiest to implement, but least powerful.
(2) Canonical LR is the most powerful, and the most expensive.
(3) Lookahead LR(LALR) is intermediate in power and cost.
Yu-Chen Kuo 101
Constructing SLR Parsing Tables
1) FOLLOW(A) for every nonterminal A in G.
2) The augmented grammar G’
3) The canonical collection of sets of LR(0) items C
4) The transition diagram for viable prefixes
5) The parsing table action and goto function
Yu-Chen Kuo 102
Example
(1) E E + T
(2) E T
(3) T T * F
(4) T F
(5) F (E)
(6) F id
Yu-Chen Kuo 103
Step 1: FOLLOW sets for Nonterminals
• FOLLOW(E) = { +, ), $}
• FOLLOW(T)=FOLLOW(F)={+,*,),$}
Yu-Chen Kuo 104
Step 2: The Augment Grammar
• If G is a grammar with start symbol S, then G’, the argument grammar for G, is G with a new start symbol S’ and productions S’S.
• The argument grammar is as follows:
E’ E
E E + T | T
T T * F | F
F (E) | id
Yu-Chen Kuo 105
Step 3: Sets of LR(0) Items
• An LR(0) item( item for short) of a grammar G is a production of G with a dot at some position of the right side.
• The production AXYZ yields the four items• A •XYZ A X • YZ
A XY•Z A XYZ •• The production A generates only one item
A •
Yu-Chen Kuo 106
The Closure Operation
• If I is a set of LR(0) items for a grammar G, then closure(I) is the set of items constructed from I by the following two rules:
1. Initinally, every item in I is added to closure(I).
2. If A•B is closure(I) and B is a production in G, then add the item B • to closure(I), if it is not already there.
3. We apply this rule until no more new items can be added to closure(I).
Yu-Chen Kuo 107
Example
• If I is a set of one item {[E’•E]}, then closure(I) contains the items
E’ • E E • E + T E • T T • T * F T • F F • (E) F • id
Yu-Chen Kuo 108
The Goto Operation
• If I is a set of items and X is a grammar symbol, then goto(I,X) is the closure of the set of all items [AX•] such that [A •X] is in I.
Yu-Chen Kuo 109
The Goto Operation
• If I is a set of items {[E’•E], [E E •+T]}, then goto(I,+) consists of
E E + •T
T •T * F
T • F
F • (E)
F • id
Yu-Chen Kuo 110
The Sets-of-Items Construction
• The algorithm to construct C, the canonical collection of sets of LR(0) items (all possible items) for a augumented grammar G’, is shown below.
Yu-Chen Kuo 111
Example
• closure ({[E’•E]})=I0: E’ •E E • E + T
E • TT • T * FT • FF • (E) F • id
• goto(I0, E)=I1: E’ E •
E E • + T
Yu-Chen Kuo 112
Example (cont.)
• goto(I0, T)= I2: E T •
T T • * F
• goto(I0, F)= I3: T F •
• goto(I0, ( )= I4: F (• E)
E • E + T E • T T • T * F
F • (E) F • id
• goto(I0, id)= I5: F id •
Yu-Chen Kuo 113
Example (cont.)
• goto(I1, +)= I6: E E + • T
T • T * F T •F F • (E)
F • id• goto(I2, *)= I7: T T * • F
F • (E) F • id
Yu-Chen Kuo 114
Example (cont.)
• goto(I4, E)= I8 : F • (E)
E E • + T
• goto(I4, T)= I2
• goto(I4, F)= I3
• goto(I4, ( )= I4
• goto(I4, id)= I5
• goto(I6, T)= I9 : E E + T •
T T • * F
Yu-Chen Kuo 115
Example (cont.)
• goto(I6, F)= I3
• goto(I6, ( )= I4
• goto(I6, id)= I5
• goto(I7, F )= I10 : T T * F •
• goto(I7, ( )= I4
• goto(I7, id )= I5
• goto(I8, ) )= I11 : F (E) •
• goto(I8, + )= I6
• goto(I9, * )= I7
Yu-Chen Kuo 116
Step 4: The Transition Diagram
• The goto functions for the canonical collection of sets of items be shown as a transition diagram.
Yu-Chen Kuo 117
Step 4: The Transition Diagram (cont.)
Yu-Chen Kuo 118
Step 4: The Transition Diagram (cont.)
Yu-Chen Kuo 119
Step 5: The Parsing Table
• State i is construction form Ii.
1) The parsing actions for state i are determinated as follows:
a) If [A•a] is in Ii and goto(Ii, a)=Ij, set action[i,a] to “shift j”. Here a must be a terminal.
b) If [A•] is in Ii, set action [i, a] to “reduce A” for all a in FOLLOW (A). Here A must not S’.
c) If [ S’S•] in Ii, set action[i,$] to “accept”.
Yu-Chen Kuo 120
Step 5: The Parsing Table (cont.)
2) The goto transactions for state i are constructed for all nonterminals A using the rule:
• If goto(Ii, A)=Ij, the goto[i,A]= j.
3) All entries not defined by 1) and 2) are set “error”
4) The start state of the parser is the one constructed from the set of items containing [S’• S]
Yu-Chen Kuo 121
Step 5: The Parsing Table (cont.)
Yu-Chen Kuo 122
Example for a not SLR(1) and unambiguous grammar
• S L = R
• S R
• L * R
• L id• R L
Yu-Chen Kuo 123
Example for a not SLR(1) and unambiguous grammar (cont.)
Yu-Chen Kuo 124
Example for a not SLR(1) and unambiguous grammar (cont.)
• Consider the set of items I2.• I2: S L = R
R L – action[2, =]= “shift 6”– FOLLOW(R) contains =,
set action[2, =]= “reduce R L”• Shift/Reduce Conflict but not ambiguous• In fact, no right-sentential form that begins R=
….. when the viable prefix L only. (*L)• Spilt the state accord to FOLLOW set.
Yu-Chen Kuo 125
Construction Canonical LR Parsing Tables
• An LR(1) item is of the form [A, a] where A is a production and a is a terminal or $.
• The “1” refers to the length of the second component, called the lookahead of the item.
• The lookahead has no effect in an item of the form [A, a], where is not , but an item of the form [A, a] calls for a reduction by A only if the input symbol is a.
Yu-Chen Kuo 126
Construction Canonical LR Parsing Tables
• Thus, we are compelled to reduce by A only on those input symbols a for which [A , a] is an LR(1) item in the state on top of the stack.
• The set of such a’s will always be a subset of FOLLOW(A), but it could be a proper subset.
• The method for constructing the collection of sets of LR(1) items is essentially the same as the way we built the collection of sets of LR(0) items. We only to modify two procedures closure and goto.
Yu-Chen Kuo 127
Construction Canonical LR Parsing Tables (closure function )
Yu-Chen Kuo 128
Construction Canonical LR Parsing Tables (item & goto)
Yu-Chen Kuo 129
Example (cont.)
• Consider the following augmented grammarS’ SS CCC cC | d
• closure ( {[S’ S, $]})= I0: S’ S, $ S CC, $ C cC, c/d C d, c/d
• goto(I0, S)= I1: S’ S , $
Yu-Chen Kuo 130
Example (cont.)
• goto (I0, C)= I2: S C C, $ C cC, $
C d, $• goto (I0, c)= I3: C c C, c/d
C cC, c/d C d, c/d
• goto(I0, d)= I4: C d , c/d• goto (I2, C)= I5: S C C , $
Yu-Chen Kuo 131
Example (cont.)
• goto (I2, c)= I6: C c C, $ C cC, $ C d, $
• goto (I2, d)= I7: C d , $• goto(I3, C)=I8: C cC, c/d• goto(I3, c)=I3
• goto(I3, d)=I4
• goto(I6, C)=I9: C cC, $• goto(I6, c)=I6
• goto(I6,d)=I7
Yu-Chen Kuo 132
Example (Transition Diagram)
• Compare I6 & I3 with different FOLLOW
Yu-Chen Kuo 133
Example (Transition Diagram)
Yu-Chen Kuo 134
Example (Parsing Table)
Yu-Chen Kuo 135
Canonical LR Parser vs. SLR Parser
• Every SLR(1) grammar is a LR(1) grammar.• Canonical LR Parser (LR(1) grammar) may have
more states than SLR parser (SLR(1)) form the same grammar.
• Exercise: check the following grammar is a LR(1) grammar or not.S L = R
S R
L * R
L id
R L