View
63
Download
2
Category
Preview:
DESCRIPTION
Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Review top-down parsing Recursive descent LL(1) parsing Start bottom-up parsing LR(k) SLR(k) LALR(k). Top-down parsing. - PowerPoint PPT Presentation
Citation preview
Winter 2012-2013Compiler Principles
Syntax Analysis (Parsing) – Part 2Mayer Goldberg and Roman Manevich
Ben-Gurion University
2
Today
• Review top-down parsing– Recursive descent
• LL(1) parsing• Start bottom-up parsing– LR(k)– SLR(k)– LALR(k)
3
Top-down parsing
• Parser repeatedly faces the following problem– Given program P starting with terminal t– Non-terminal N with list of possible production
rules: N α1 … N αk
• Predict which rule can should be used to derive P
4
Recursive descent parsing
• Define a function for every nonterminal• Every function work as follows– Find applicable production rule– Terminal function checks match with next input
token– Nonterminal function calls (recursively) other
functions• If there are several applicable productions for
a nonterminal, use lookahead
5
Boolean expressions example
not ( not true or false )
E => not E => not ( E OP E ) =>not ( not E OP E ) =>not ( not LIT OP E ) =>not ( not true OP E ) =>not ( not true or E ) =>not ( not true or LIT ) =>not ( not true or false )
not E
E
( E OP E )
not LIT or LIT
true false
production to apply known from next token
E LIT | (E OP E) | not ELIT true | falseOP and | or | xor
6
Flavors of top-down parsers
• Manually constructed– Recursive descent (previous lecture, review now)
• Generated (this lecture)– Based on pushdown automata– Does not use recursion
7
Recursive descent parsing
• Define a function for every nonterminal• Every function work as follows– Find applicable production rule– Terminal function checks match with next input
token– Nonterminal function calls (recursively) other
functions• If there are several applicable productions for
a nonterminal, use lookahead
8
Matching tokens
• Variable current holds the current input token
match(token t) { if (current == t) current = next_token() else error}
E LIT | (E OP E) | not ELIT true | falseOP and | or | xor
9
Functions for nonterminals
E() { if (current {TRUE, FALSE}) // E LIT LIT(); else if (current == LPAREN) // E ( E OP E ) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT) // E not E match(NOT); E(); else error;}
LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error;}
E LIT | (E OP E) | not ELIT true | falseOP and | or | xor
Implementation via recursion
E → LIT | ( E OP E ) | not ELIT → true | falseOP → and | or | xor
E() {if (current {TRUE, FALSE}) LIT();else if (current == LPAREN) match(LPARENT);
E(); OP(); E();match(RPAREN);
else if (current == NOT) match(NOT); E();else error;
}
LIT() {if (current == TRUE) match(TRUE);else if (current == FALSE) match(FALSE);else error;
}
OP() {if (current == AND) match(AND);else if (current == OR) match(OR);else if (current == XOR) match(XOR);else error;
}10
11
How is prediction done?
• For simplicity, let’s assume no null production rules– See book for general case
• Find out the token that can appear first in a rule – FIRST sets
p. 189
FIRST sets• For every production rule Aα
– FIRST(α) = all terminals that α can start with – Every token that can appear as first in α under some derivation for α
• In our Boolean expressions example– FIRST( LIT ) = { true, false }– FIRST( ( E OP E ) ) = { ‘(‘ }– FIRST( not E ) = { not }
• No intersection between FIRST sets => can always pick a single rule
• If the FIRST sets intersect, may need longer lookahead– LL(k) = class of grammars in which production rule can be determined
using a lookahead of k tokens– LL(1) is an important and useful class
12
13
Computing FIRST sets
• Assume no null productions A 1. Initially, for all nonterminals A, set
FIRST(A) = { t | A tω for some ω }2. Repeat the following until no changes occur:
for each nonterminal A for each production A Bω set FIRST(A) = FIRST(A) FIRST(B)∪
• This is known as fixed-point computation
14
FIRST sets computation example
STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant
TERM EXPR STMT
15
1. Initialization
STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant
TERM EXPR STMTidconstant
zero?Not++--
ifwhile
16
2. Iterate 1
STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant
TERM EXPR STMTidconstant
zero?Not++--
ifwhile
zero?Not++--
17
2. Iterate 2
STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant
TERM EXPR STMTidconstant
zero?Not++--
ifwhile
idconstant
zero?Not++--
18
2. Iterate 3 – fixed-point
STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant
TERM EXPR STMTidconstant
zero?Not++--
ifwhile
idconstant
zero?Not++--
idconstant
LL(k) grammars• A grammar is in the class LL(K) when it can be derived via:– Top-down derivation– Scanning the input from left to right (L)– Producing the leftmost derivation (L)– With lookahead of k tokens (k)– For every two productions Aα and Aβ we have FIRST(α) ∩
FIRST(β) = {}(and FIRST(A) ∩ FOLLOW(A) = {} for null productions)
• A language is said to be LL(k) when it has an LL(k) grammar
• What can we do if grammar is not LL(k)?
19
• Pushdown automaton uses– Prediction stack– Input stream– Transition table• nonterminals x tokens -> production alternative• Entry indexed by nonterminal N and token t contains
the alternative of N that must be predicated when current input starts with t
LL(k) parsing via pushdown automata
20
21
LL(k) parsing via pushdown automata
• Two possible moves– Prediction
• When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error
– Match• When top of prediction stack is a terminal T, must be equal to
next input token t. If (t == T), pop T and consume t• If (t ≠ T) syntax error
• Parsing terminates when prediction stack is empty– If input is empty at that point, success. Otherwise, syntax
error
22
Model of non-recursivepredictive parser
Predictive Parsing program
Parsing Table
X
Y
Z
$
Stack
$ b + a
Output
( ) not true false and or xor $E 2 3 1 1
LIT 4 5OP 6 7 8
(1) E → LIT(2) E → ( E OP E ) (3) E → not E(4) LIT → true(5) LIT → false(6) OP → and(7) OP → or(8) OP → xor
Non
term
inal
s
Input tokens
Example transition table
23
Which rule should be used
a b c
A A aAb A c
A aAb | caacbb$
Input suffix Stack content Move
aacbb$ A$ predict(A,a) = A aAbaacbb$ aAb$ match(a,a)
acbb$ Ab$ predict(A,a) = A aAbacbb$ aAbb$ match(a,a)
cbb$ Abb$ predict(A,c) = A ccbb$ cbb$ match(c,c)
bb$ bb$ match(b,b)
b$ b$ match(b,b)
$ $ match($,$) – success
Running parser example
24
a b c
A A aAb A c
A aAb | cabcbb$
Input suffix Stack content Move
abcbb$ A$ predict(A,a) = A aAbabcbb$ aAb$ match(a,a)
bcbb$ Ab$ predict(A,b) = ERROR
Illegal input example
25
26
Using top-down parsing approach
• Compute parsing table– If table is conflict-free then we have an LL(k)
parser• If table contains conflicts investigate– If grammar is ambiguous try to disambiguate– Try using left-factoring/substitution/left-recursion
elimination to remove conflicts
27
Marking “end-of-file”• Sometimes it will be useful to transform a
grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ with a new production rule S’ S $where $ is not part of the set of tokens
• To parse an input P with G’ we change it into P$• Simplifies top-down parsing with null
productions and LR parsing
28
Bottom-up parsing
• No problem with left recursion• Widely used in practice• Shift-reduce parsing: LR(k), SLR, LALR– All follow the same pushdown-based algorithm– Read input left-to-right producing rightmost
derivation– Differ on type of “LR Items”
• Parser generator CUP implements LALR
29
Some terminology
• The opposite of derivation is called reduction– Let Aα be a production rule– Let βAµ be a sentence– Replace left-hand side of rule in sentence:
βAµ => βᵕ A handle is a substring that is reduced during a
series of steps in a rightmost derivation
Rightmost derivation example
1 + (2) + (3)
E + (E) + (3)
+
E E + (E) E i
E
1 2 + 3
E
E + (3)
E
( ) ( )
E + (E)
E
E
E
E + (2) + (3) Each non-leaf node represents a handleRightmost
derivation in reverse
30
LR item
N αβ
Already matched To be matched
Input
Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β
31
32
LR items
N αβ Shift Item
N αβ Reduce Item
33
Example
Z expr EOFexpr term | expr + termterm ID | ( expr )
Z E $E T | E + TT i | ( E )
(just shorthand of the grammar on the top)
34
Example: parsing with LR items
Z E $E T | E + TT i | ( E )
E T E E + TT i T ( E )
Z E $
i + i $
Why do we need these additional LR items?Where do they come from?What do they mean?
35
-closure
• Given a set S of LR(0) items
• If P αNβ is in S• then for each rule N in the grammar
S must also contain N
-closure({Z E $}) = E T, E E + T,T i , T ( E ) }
{ Z E $,Z E $
E T | E + TT i | ( E )
36
Example: parsing with LR items
i + i $
E T E E + TT i T ( E )
Z E $
Z E $E T | E + TT i | ( E )
Items denote possible future handles
Remember position from which we’re trying to reduce
37
Example: parsing with LR items
T i Reduce item!
i + i $
E T E E + TT i T ( E )
Z E $
Z E $E T | E + TT i | ( E )
Match items with current token
38
Example: parsing with LR items
i
E T
Reduce item!
T + i $Z E $E T | E + TT i | ( E )
E T E E + TT i T ( E )
Z E $
39
Example: parsing with LR items
T
E T
Reduce item!
i
E + i $Z E $E T | E + TT i | ( E )
E T E E + TT i T ( E )
Z E $
40
Example: parsing with LR items
T
i
E + i $Z E $E T | E + TT i | ( E )
E T E E + TT i T ( E )
Z E $
E E+ T
Z E$
41
Example: parsing with LR items
T
i
E + i $Z E $E T | E + TT i | ( E )
E T E E + TT i T ( E )
Z E $
E E+ T
Z E$ E
E+TT i T ( E )
42
Example: parsing with LR items
E E+ T
Z E$ E
E+TT i T ( E )
E + T $
i
Z E $E T | E + TT i | ( E )
E T E E + TT i T ( E )
Z E $
T
i
43
E T E E + TT i T ( E )
Z E $
Z E $E T | E + TT i | ( E )
E + T
Example: parsing with LR items
T
i
E E+ T
Z E$ E
E+TT i T ( E )
i
E E+T
$
Reduce item!
44
E T E E + TT i T ( E )
Z E $
E $
Example: parsing with LR items
E
T
i
+ T
Z E$E E+ T
i
Z E $E T | E + TT i | ( E )
45
E T E E + TT i T ( E )
Z E $
E $
Example: parsing with LR items
E
T
i
+ T
Z E$E E+ T
Z E$
Reduce item!
i
Z E $E T | E + TT i | ( E )
46
E T E E + TT i T ( E )
Z E $
Z
Example: parsing with LR items
E
T
i
+ T
Z E$E E+ T
Z E$
Reduce item!
E $
i
Z E $E T | E + TT i | ( E )
47
Computing item sets
• Initial set– Z is in the start symbol – -closure({ Zα | Zα is in the grammar } )
• Next set from a set S and the next symbol X– step(S,X) = { NαXβ | NαXβ in the item set S}– nextSet(S,X) = -closure(step(S,X))
48
LR(0) automaton example
Z E$E TE E + TT iT (E)
T (E)E TE E + TT iT (E)
E E + T
T (E) Z E$
Z E$E E+ T E E+T
T iT (E)
T i
T (E)E E+T
E Tq0
q1
q2
q3
q4
q5
q6
q7
q8
q9
T
(
i
E
+
$
T
)
+
E
i
T
(i
(
reduce stateshift state
GOTO/ACTION tables
49
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
q1 q3 q2 shift
q2 ZE$q3 q5 q7 q4 Shift
q4 EE+Tq5 Tiq6 ETq7 q5 q7 q8 q6 shift
q8 q3 q9 shift
q9 TE
GOTO TableACTIONTable
empty – error move
50
LR(0) parser tables
• Two types of rows:– Shift row – tells which state to GOTO for current
token– Reduce row – tells which rule to reduce
(independent of current token)• GOTO entries are blank
51
LR parser data structures
• Input – remainder of text to be processed• Stack – sequence of pairs N, qi– N – symbol (terminal or non-terminal)– qi – state at which decisions are made
• Initial stack contains q0
+ i $input
q0stack i q5
52
LR(0) pushdown automaton
• Two moves: shift and reduce• Shift move
– Remove first token from input– Push it on the stack– Compute next state based on GOTO table– Push new state on the stack– If new state is error – report error
i + i $input
q0stack
+ i $input
q0stack
shift
i q5
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
53
LR(0) pushdown automaton
• Reduce move– Using a rule N α– Symbols in α and their following states are removed from stack– New state computed based on GOTO table (using top of stack, before pushing N)– N is pushed on the stack– New state pushed on top of N
+ i $input
q0stack i q5
ReduceT i + i $input
q0stack T q6
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
54
GOTO/ACTION tableState i + ( ) $ E T
q0 s5 s7 s1 s6
q1 s3 s2
q2 r1 r1 r1 r1 r1 r1 r1
q3 s5 s7 s4
q4 r3 r3 r3 r3 r3 r3 r3
q5 r4 r4 r4 r4 r4 r4 r4
q6 r2 r2 r2 r2 r2 r2 r2
q7 s5 s7 s8 s6
q8 s3 s9
q9 r5 r5 r5 r5 r5 r5 r5
(1)Z E $(2)E T (3)E E + T(4)T i (5)T ( E )
Warning: numbers mean different things!rn = reduce using rule number nsm = shift to state m
55
GOTO/ACTION table
st i + ( ) $ E Tq0 s5 s7 s1 s6q1 s3 s2q2 r1 r1 r1 r1 r1 r1 r1q3 s5 s7 s4q4 r3 r3 r3 r3 r3 r3 r3q5 r4 r4 r4 r4 r4 r4 r4q6 r2 r2 r2 r2 r2 r2 r2q7 s5 s7 s8 s6q8 s3 s9q9 r5 r5 r5 r5 r5 r5 r5
(1)Z E $(2)E T (3)E E + T(4)T i (5)T ( E )
Stack Input Actionq0 i + i $ s5q0 i q5 + i $ r4q0 T q6 + i $ r2q0 E q1 + i $ s3q0 E q1 + q3 i $ s5q0 E q1 + q3 i q5 $ r4q0 E q1 + q3 T q4
$ r3
q0 E q1 $ s2q0 E q1 $ q2 r1q0 Z
top is on the right
56
Are we done?
• Can make a transition diagram for any grammar
• Can make a GOTO table for every grammar
• Cannot make a deterministic ACTION table for every grammar
57
LR(0) conflicts
Z E $E T E E + TT i T ( E )T i[E]
Z E$E T
E E + TT i
T (E)T i[E] T i
T i[E]
q0
q5
T
(
i
E Shift/reduce conflict
…
…
…
58
LR(0) conflicts
Z E $E T E E + TT i V iT ( E )
Z E$E T
E E + TT i
T (E)T i[E] T i
V i
q0
q5
T
(
i
E reduce/reduce conflict
…
…
…
59
LR(0) conflicts
• Any grammar with an -rule cannot be LR(0)• Inherent shift/reduce conflict– A – reduce item– P αAβ – shift item– A can always be predicted from P αAβ
60
Back to the GOTO/ACTIONS tables
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
q1 q3 q2 shift
q2 ZE$q3 q5 q7 q4 Shift
q4 EE+Tq5 Tiq6 ETq7 q5 q7 q8 q6 shift
q8 q3 q9 shift
q9 TE
GOTO TableACTION
Table
ACTION table determined only by transition diagram, ignores input
61
SRL grammars
• A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N
• A reduce item N α is applicable only when the look-ahead is in FOLLOW(N)
• Differs from LR(0) only on the ACTION table
62
SLR ACTION table
(1)Z E $(2)E T (3)E E + T(4)T i (5)T ( E )
State i + ( ) $
q0 shift shift
q1 shift shift
q2 ZE$q3 shift shift
q4 EE+T EE+T EE+Tq5 Ti Ti Tiq6 ET ET ETq7 shift shift
q8 shift shift
q9 T(E) T(E) T(E)
Lookahead token from the
input
Remember: In contrast, GOTO table is indexed by state and a grammar symbol from the stack
63
SLR ACTION tableState i + ( ) [ ] $
q0 shift shift
q1 shift shift
q2 ZE$q3 shift shift
q4 EE+T EE+T EE+Tq5 Ti Ti shift Tiq6 ET ET ETq7 shift shift
q8 shift shift
q9 T(E) T(E) T(E)
vs.
state action
q0 shift
q1 shift
q2 ZE$q3 Shift
q4 EE+Tq5 Tiq6 ETq7 shift
q8 shift
q9 TE
SLR – use 1 token look-ahead LR(0) – no look-ahead
… as before…T i T i[E]
64
Are we done?
(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L
65
S’ → SS → L = RS → RL → * RL → idR → L
S’ → S
S → L = RR → L
S → R
L → * RR → LL → * RL → id
L → id
S → L = RR → LL → * RL → id
L → * R
R → L
S → L = R
S
L
R
id
*
=
R
*
id
R
L*
L
id
q0
q4
q7
q1
q3
q9
q6
q8
q2
q5
66
shift/reduce conflict
• S → L = R vs. R → L • FOLLOW(R) contains =
– S L ⇒ = R * R ⇒ = R
• SLR cannot resolve the conflict either
S → L = RR → L
S → L = RR → LL → * RL → id
=q6
q2(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L
67
LR(1) grammars
• In SLR: a reduce item N α is applicable only when the look-ahead is in FOLLOW(N)
• But FOLLOW(N) merges look-ahead for all alternatives for N
• LR(1) keeps look-ahead with each LR item• Idea: a more refined notion of follows
computed per item
68
LR(1) item
• LR(1) item is a pair – LR(0) item– Look-ahead token
• Meaning– We matched the part left of the dot, looking to match the part on the
right of the dot, followed by the look-ahead token.• Example
– The production L id yields the following LR(1) items
[L → ● id, *][L → ● id, =][L → ● id, id][L → ● id, $][L → id ●, *][L → id ●, =][L → id ●, id][L → id ●, $]
(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L
69
-closure for LR(1)
• For every [A → α ● Bβ , c] in S – for every production B→δ and every token b in
the grammar such that b FIRST(βc) – Add [B → ● δ , b] to S
70
(S’ → ∙ S , $)(S → ∙ L = R , $)(S → ∙ R , $)(L → ∙ * R , = )(L → ∙ id , = )(R → ∙ L , $ )(L → ∙ id , $ )(L → ∙ * R , $ )
(S’ → S ∙ , $)
(S → L ∙ = R , $)(R → L ∙ , $)
(S → R ∙ , $)
(L → * ∙ R , =)(R → ∙ L , =)(L → ∙ * R , =)(L → ∙ id , =)(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
(L → id ∙ , $)(L → id ∙ , =)
(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
(L → * R ∙ , =)(L → * R ∙ , $)
(R → L ∙ , =)(R → L ∙ , $)
(S → L = R ∙ , $)
S
L
R
id*
=
R
*id
R
L
*
L
id
q0
q4 q5
q7
q6
q9
q3
q1
q2
q8
(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
(L → id ∙ , $)
(R → L ∙ , $)
(L → * R ∙ , $)
q11
q12
q10
Rq13
id
71
Back to the conflict
• Is there a conflict now?
(S → L ∙ = R , $)(R → L ∙ , $)
(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)
=
q6
q2
72
LALR
• LR tables have large number of entries• Often don’t need such refined observation
(and cost)• LALR idea: find states with the same LR(0)
component and merge their look-ahead component as long as there are no conflicts
• LALR not as powerful as LR(1)
73
See you next time
Recommended