Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2

Preview:

DESCRIPTION

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Review top-down parsing Recursive descent LL(1) parsing Start bottom-up parsing LR(k) SLR(k) LALR(k). Top-down parsing. - PowerPoint PPT Presentation

Citation preview

Winter 2012-2013Compiler Principles

Syntax Analysis (Parsing) – Part 2Mayer Goldberg and Roman Manevich

Ben-Gurion University

2

Today

• Review top-down parsing– Recursive descent

• LL(1) parsing• Start bottom-up parsing– LR(k)– SLR(k)– LALR(k)

3

Top-down parsing

• Parser repeatedly faces the following problem– Given program P starting with terminal t– Non-terminal N with list of possible production

rules: N α1 … N αk

• Predict which rule can should be used to derive P

4

Recursive descent parsing

• Define a function for every nonterminal• Every function work as follows– Find applicable production rule– Terminal function checks match with next input

token– Nonterminal function calls (recursively) other

functions• If there are several applicable productions for

a nonterminal, use lookahead

5

Boolean expressions example

not ( not true or false )

E => not E => not ( E OP E ) =>not ( not E OP E ) =>not ( not LIT OP E ) =>not ( not true OP E ) =>not ( not true or E ) =>not ( not true or LIT ) =>not ( not true or false )

not E

E

( E OP E )

not LIT or LIT

true false

production to apply known from next token

E LIT | (E OP E) | not ELIT true | falseOP and | or | xor

6

Flavors of top-down parsers

• Manually constructed– Recursive descent (previous lecture, review now)

• Generated (this lecture)– Based on pushdown automata– Does not use recursion

7

Recursive descent parsing

• Define a function for every nonterminal• Every function work as follows– Find applicable production rule– Terminal function checks match with next input

token– Nonterminal function calls (recursively) other

functions• If there are several applicable productions for

a nonterminal, use lookahead

8

Matching tokens

• Variable current holds the current input token

match(token t) { if (current == t) current = next_token() else error}

E LIT | (E OP E) | not ELIT true | falseOP and | or | xor

9

Functions for nonterminals

E() { if (current {TRUE, FALSE}) // E LIT LIT(); else if (current == LPAREN) // E ( E OP E ) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT) // E not E match(NOT); E(); else error;}

LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error;}

E LIT | (E OP E) | not ELIT true | falseOP and | or | xor

Implementation via recursion

E → LIT | ( E OP E ) | not ELIT → true | falseOP → and | or | xor

E() {if (current {TRUE, FALSE}) LIT();else if (current == LPAREN) match(LPARENT);

E(); OP(); E();match(RPAREN);

else if (current == NOT) match(NOT); E();else error;

}

LIT() {if (current == TRUE) match(TRUE);else if (current == FALSE) match(FALSE);else error;

}

OP() {if (current == AND) match(AND);else if (current == OR) match(OR);else if (current == XOR) match(XOR);else error;

}10

11

How is prediction done?

• For simplicity, let’s assume no null production rules– See book for general case

• Find out the token that can appear first in a rule – FIRST sets

p. 189

FIRST sets• For every production rule Aα

– FIRST(α) = all terminals that α can start with – Every token that can appear as first in α under some derivation for α

• In our Boolean expressions example– FIRST( LIT ) = { true, false }– FIRST( ( E OP E ) ) = { ‘(‘ }– FIRST( not E ) = { not }

• No intersection between FIRST sets => can always pick a single rule

• If the FIRST sets intersect, may need longer lookahead– LL(k) = class of grammars in which production rule can be determined

using a lookahead of k tokens– LL(1) is an important and useful class

12

13

Computing FIRST sets

• Assume no null productions A 1. Initially, for all nonterminals A, set

FIRST(A) = { t | A tω for some ω }2. Repeat the following until no changes occur:

for each nonterminal A for each production A Bω set FIRST(A) = FIRST(A) FIRST(B)∪

• This is known as fixed-point computation

14

FIRST sets computation example

STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant

TERM EXPR STMT

15

1. Initialization

STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant

TERM EXPR STMTidconstant

zero?Not++--

ifwhile

16

2. Iterate 1

STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant

TERM EXPR STMTidconstant

zero?Not++--

ifwhile

zero?Not++--

17

2. Iterate 2

STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant

TERM EXPR STMTidconstant

zero?Not++--

ifwhile

idconstant

zero?Not++--

18

2. Iterate 3 – fixed-point

STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant

TERM EXPR STMTidconstant

zero?Not++--

ifwhile

idconstant

zero?Not++--

idconstant

LL(k) grammars• A grammar is in the class LL(K) when it can be derived via:– Top-down derivation– Scanning the input from left to right (L)– Producing the leftmost derivation (L)– With lookahead of k tokens (k)– For every two productions Aα and Aβ we have FIRST(α) ∩

FIRST(β) = {}(and FIRST(A) ∩ FOLLOW(A) = {} for null productions)

• A language is said to be LL(k) when it has an LL(k) grammar

• What can we do if grammar is not LL(k)?

19

• Pushdown automaton uses– Prediction stack– Input stream– Transition table• nonterminals x tokens -> production alternative• Entry indexed by nonterminal N and token t contains

the alternative of N that must be predicated when current input starts with t

LL(k) parsing via pushdown automata

20

21

LL(k) parsing via pushdown automata

• Two possible moves– Prediction

• When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error

– Match• When top of prediction stack is a terminal T, must be equal to

next input token t. If (t == T), pop T and consume t• If (t ≠ T) syntax error

• Parsing terminates when prediction stack is empty– If input is empty at that point, success. Otherwise, syntax

error

22

Model of non-recursivepredictive parser

Predictive Parsing program

Parsing Table

X

Y

Z

$

Stack

$ b + a

Output

( ) not true false and or xor $E 2 3 1 1

LIT 4 5OP 6 7 8

(1) E → LIT(2) E → ( E OP E ) (3) E → not E(4) LIT → true(5) LIT → false(6) OP → and(7) OP → or(8) OP → xor

Non

term

inal

s

Input tokens

Example transition table

23

Which rule should be used

a b c

A A aAb A c

A aAb | caacbb$

Input suffix Stack content Move

aacbb$ A$ predict(A,a) = A aAbaacbb$ aAb$ match(a,a)

acbb$ Ab$ predict(A,a) = A aAbacbb$ aAbb$ match(a,a)

cbb$ Abb$ predict(A,c) = A ccbb$ cbb$ match(c,c)

bb$ bb$ match(b,b)

b$ b$ match(b,b)

$ $ match($,$) – success

Running parser example

24

a b c

A A aAb A c

A aAb | cabcbb$

Input suffix Stack content Move

abcbb$ A$ predict(A,a) = A aAbabcbb$ aAb$ match(a,a)

bcbb$ Ab$ predict(A,b) = ERROR

Illegal input example

25

26

Using top-down parsing approach

• Compute parsing table– If table is conflict-free then we have an LL(k)

parser• If table contains conflicts investigate– If grammar is ambiguous try to disambiguate– Try using left-factoring/substitution/left-recursion

elimination to remove conflicts

27

Marking “end-of-file”• Sometimes it will be useful to transform a

grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ with a new production rule S’ S $where $ is not part of the set of tokens

• To parse an input P with G’ we change it into P$• Simplifies top-down parsing with null

productions and LR parsing

28

Bottom-up parsing

• No problem with left recursion• Widely used in practice• Shift-reduce parsing: LR(k), SLR, LALR– All follow the same pushdown-based algorithm– Read input left-to-right producing rightmost

derivation– Differ on type of “LR Items”

• Parser generator CUP implements LALR

29

Some terminology

• The opposite of derivation is called reduction– Let Aα be a production rule– Let βAµ be a sentence– Replace left-hand side of rule in sentence:

βAµ => βᵕ A handle is a substring that is reduced during a

series of steps in a rightmost derivation

Rightmost derivation example

1 + (2) + (3)

E + (E) + (3)

+

E E + (E) E i

E

1 2 + 3

E

E + (3)

E

( ) ( )

E + (E)

E

E

E

E + (2) + (3) Each non-leaf node represents a handleRightmost

derivation in reverse

30

LR item

N αβ

Already matched To be matched

Input

Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

31

32

LR items

N αβ Shift Item

N αβ Reduce Item

33

Example

Z expr EOFexpr term | expr + termterm ID | ( expr )

Z E $E T | E + TT i | ( E )

(just shorthand of the grammar on the top)

34

Example: parsing with LR items

Z E $E T | E + TT i | ( E )

E T E E + TT i T ( E )

Z E $

i + i $

Why do we need these additional LR items?Where do they come from?What do they mean?

35

-closure

• Given a set S of LR(0) items

• If P αNβ is in S• then for each rule N in the grammar

S must also contain N

-closure({Z E $}) = E T, E E + T,T i , T ( E ) }

{ Z E $,Z E $

E T | E + TT i | ( E )

36

Example: parsing with LR items

i + i $

E T E E + TT i T ( E )

Z E $

Z E $E T | E + TT i | ( E )

Items denote possible future handles

Remember position from which we’re trying to reduce

37

Example: parsing with LR items

T i Reduce item!

i + i $

E T E E + TT i T ( E )

Z E $

Z E $E T | E + TT i | ( E )

Match items with current token

38

Example: parsing with LR items

i

E T

Reduce item!

T + i $Z E $E T | E + TT i | ( E )

E T E E + TT i T ( E )

Z E $

39

Example: parsing with LR items

T

E T

Reduce item!

i

E + i $Z E $E T | E + TT i | ( E )

E T E E + TT i T ( E )

Z E $

40

Example: parsing with LR items

T

i

E + i $Z E $E T | E + TT i | ( E )

E T E E + TT i T ( E )

Z E $

E E+ T

Z E$

41

Example: parsing with LR items

T

i

E + i $Z E $E T | E + TT i | ( E )

E T E E + TT i T ( E )

Z E $

E E+ T

Z E$ E

E+TT i T ( E )

42

Example: parsing with LR items

E E+ T

Z E$ E

E+TT i T ( E )

E + T $

i

Z E $E T | E + TT i | ( E )

E T E E + TT i T ( E )

Z E $

T

i

43

E T E E + TT i T ( E )

Z E $

Z E $E T | E + TT i | ( E )

E + T

Example: parsing with LR items

T

i

E E+ T

Z E$ E

E+TT i T ( E )

i

E E+T

$

Reduce item!

44

E T E E + TT i T ( E )

Z E $

E $

Example: parsing with LR items

E

T

i

+ T

Z E$E E+ T

i

Z E $E T | E + TT i | ( E )

45

E T E E + TT i T ( E )

Z E $

E $

Example: parsing with LR items

E

T

i

+ T

Z E$E E+ T

Z E$

Reduce item!

i

Z E $E T | E + TT i | ( E )

46

E T E E + TT i T ( E )

Z E $

Z

Example: parsing with LR items

E

T

i

+ T

Z E$E E+ T

Z E$

Reduce item!

E $

i

Z E $E T | E + TT i | ( E )

47

Computing item sets

• Initial set– Z is in the start symbol – -closure({ Zα | Zα is in the grammar } )

• Next set from a set S and the next symbol X– step(S,X) = { NαXβ | NαXβ in the item set S}– nextSet(S,X) = -closure(step(S,X))

48

LR(0) automaton example

Z E$E TE E + TT iT (E)

T (E)E TE E + TT iT (E)

E E + T

T (E) Z E$

Z E$E E+ T E E+T

T iT (E)

T i

T (E)E E+T

E Tq0

q1

q2

q3

q4

q5

q6

q7

q8

q9

T

(

i

E

+

$

T

)

+

E

i

T

(i

(

reduce stateshift state

GOTO/ACTION tables

49

State i + ( ) $ E T action

q0 q5 q7 q1 q6 shift

q1 q3 q2 shift

q2 ZE$q3 q5 q7 q4 Shift

q4 EE+Tq5 Tiq6 ETq7 q5 q7 q8 q6 shift

q8 q3 q9 shift

q9 TE

GOTO TableACTIONTable

empty – error move

50

LR(0) parser tables

• Two types of rows:– Shift row – tells which state to GOTO for current

token– Reduce row – tells which rule to reduce

(independent of current token)• GOTO entries are blank

51

LR parser data structures

• Input – remainder of text to be processed• Stack – sequence of pairs N, qi– N – symbol (terminal or non-terminal)– qi – state at which decisions are made

• Initial stack contains q0

+ i $input

q0stack i q5

52

LR(0) pushdown automaton

• Two moves: shift and reduce• Shift move

– Remove first token from input– Push it on the stack– Compute next state based on GOTO table– Push new state on the stack– If new state is error – report error

i + i $input

q0stack

+ i $input

q0stack

shift

i q5

State i + ( ) $ E T action

q0 q5 q7 q1 q6 shift

53

LR(0) pushdown automaton

• Reduce move– Using a rule N α– Symbols in α and their following states are removed from stack– New state computed based on GOTO table (using top of stack, before pushing N)– N is pushed on the stack– New state pushed on top of N

+ i $input

q0stack i q5

ReduceT i + i $input

q0stack T q6

State i + ( ) $ E T action

q0 q5 q7 q1 q6 shift

54

GOTO/ACTION tableState i + ( ) $ E T

q0 s5 s7 s1 s6

q1 s3 s2

q2 r1 r1 r1 r1 r1 r1 r1

q3 s5 s7 s4

q4 r3 r3 r3 r3 r3 r3 r3

q5 r4 r4 r4 r4 r4 r4 r4

q6 r2 r2 r2 r2 r2 r2 r2

q7 s5 s7 s8 s6

q8 s3 s9

q9 r5 r5 r5 r5 r5 r5 r5

(1)Z E $(2)E T (3)E E + T(4)T i (5)T ( E )

Warning: numbers mean different things!rn = reduce using rule number nsm = shift to state m

55

GOTO/ACTION table

st i + ( ) $ E Tq0 s5 s7 s1 s6q1 s3 s2q2 r1 r1 r1 r1 r1 r1 r1q3 s5 s7 s4q4 r3 r3 r3 r3 r3 r3 r3q5 r4 r4 r4 r4 r4 r4 r4q6 r2 r2 r2 r2 r2 r2 r2q7 s5 s7 s8 s6q8 s3 s9q9 r5 r5 r5 r5 r5 r5 r5

(1)Z E $(2)E T (3)E E + T(4)T i (5)T ( E )

Stack Input Actionq0 i + i $ s5q0 i q5 + i $ r4q0 T q6 + i $ r2q0 E q1 + i $ s3q0 E q1 + q3 i $ s5q0 E q1 + q3 i q5 $ r4q0 E q1 + q3 T q4

$ r3

q0 E q1 $ s2q0 E q1 $ q2 r1q0 Z

top is on the right

56

Are we done?

• Can make a transition diagram for any grammar

• Can make a GOTO table for every grammar

• Cannot make a deterministic ACTION table for every grammar

57

LR(0) conflicts

Z E $E T E E + TT i T ( E )T i[E]

Z E$E T

E E + TT i

T (E)T i[E] T i

T i[E]

q0

q5

T

(

i

E Shift/reduce conflict

58

LR(0) conflicts

Z E $E T E E + TT i V iT ( E )

Z E$E T

E E + TT i

T (E)T i[E] T i

V i

q0

q5

T

(

i

E reduce/reduce conflict

59

LR(0) conflicts

• Any grammar with an -rule cannot be LR(0)• Inherent shift/reduce conflict– A – reduce item– P αAβ – shift item– A can always be predicted from P αAβ

60

Back to the GOTO/ACTIONS tables

State i + ( ) $ E T action

q0 q5 q7 q1 q6 shift

q1 q3 q2 shift

q2 ZE$q3 q5 q7 q4 Shift

q4 EE+Tq5 Tiq6 ETq7 q5 q7 q8 q6 shift

q8 q3 q9 shift

q9 TE

GOTO TableACTION

Table

ACTION table determined only by transition diagram, ignores input

61

SRL grammars

• A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N

• A reduce item N α is applicable only when the look-ahead is in FOLLOW(N)

• Differs from LR(0) only on the ACTION table

62

SLR ACTION table

(1)Z E $(2)E T (3)E E + T(4)T i (5)T ( E )

State i + ( ) $

q0 shift shift

q1 shift shift

q2 ZE$q3 shift shift

q4 EE+T EE+T EE+Tq5 Ti Ti Tiq6 ET ET ETq7 shift shift

q8 shift shift

q9 T(E) T(E) T(E)

Lookahead token from the

input

Remember: In contrast, GOTO table is indexed by state and a grammar symbol from the stack

63

SLR ACTION tableState i + ( ) [ ] $

q0 shift shift

q1 shift shift

q2 ZE$q3 shift shift

q4 EE+T EE+T EE+Tq5 Ti Ti shift Tiq6 ET ET ETq7 shift shift

q8 shift shift

q9 T(E) T(E) T(E)

vs.

state action

q0 shift

q1 shift

q2 ZE$q3 Shift

q4 EE+Tq5 Tiq6 ETq7 shift

q8 shift

q9 TE

SLR – use 1 token look-ahead LR(0) – no look-ahead

… as before…T i T i[E]

64

Are we done?

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

65

S’ → SS → L = RS → RL → * RL → idR → L

S’ → S

S → L = RR → L

S → R

L → * RR → LL → * RL → id

L → id

S → L = RR → LL → * RL → id

L → * R

R → L

S → L = R

S

L

R

id

*

=

R

*

id

R

L*

L

id

q0

q4

q7

q1

q3

q9

q6

q8

q2

q5

66

shift/reduce conflict

• S → L = R vs. R → L • FOLLOW(R) contains =

– S L ⇒ = R * R ⇒ = R

• SLR cannot resolve the conflict either

S → L = RR → L

S → L = RR → LL → * RL → id

=q6

q2(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

67

LR(1) grammars

• In SLR: a reduce item N α is applicable only when the look-ahead is in FOLLOW(N)

• But FOLLOW(N) merges look-ahead for all alternatives for N

• LR(1) keeps look-ahead with each LR item• Idea: a more refined notion of follows

computed per item

68

LR(1) item

• LR(1) item is a pair – LR(0) item– Look-ahead token

• Meaning– We matched the part left of the dot, looking to match the part on the

right of the dot, followed by the look-ahead token.• Example

– The production L id yields the following LR(1) items

[L → ● id, *][L → ● id, =][L → ● id, id][L → ● id, $][L → id ●, *][L → id ●, =][L → id ●, id][L → id ●, $]

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

69

-closure for LR(1)

• For every [A → α ● Bβ , c] in S – for every production B→δ and every token b in

the grammar such that b FIRST(βc) – Add [B → ● δ , b] to S

70

(S’ → ∙ S , $)(S → ∙ L = R , $)(S → ∙ R , $)(L → ∙ * R , = )(L → ∙ id , = )(R → ∙ L , $ )(L → ∙ id , $ )(L → ∙ * R , $ )

(S’ → S ∙ , $)

(S → L ∙ = R , $)(R → L ∙ , $)

(S → R ∙ , $)

(L → * ∙ R , =)(R → ∙ L , =)(L → ∙ * R , =)(L → ∙ id , =)(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → id ∙ , $)(L → id ∙ , =)

(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → * R ∙ , =)(L → * R ∙ , $)

(R → L ∙ , =)(R → L ∙ , $)

(S → L = R ∙ , $)

S

L

R

id*

=

R

*id

R

L

*

L

id

q0

q4 q5

q7

q6

q9

q3

q1

q2

q8

(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → id ∙ , $)

(R → L ∙ , $)

(L → * R ∙ , $)

q11

q12

q10

Rq13

id

71

Back to the conflict

• Is there a conflict now?

(S → L ∙ = R , $)(R → L ∙ , $)

(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

=

q6

q2

72

LALR

• LR tables have large number of entries• Often don’t need such refined observation

(and cost)• LALR idea: find states with the same LR(0)

component and merge their look-ahead component as long as there are no conflicts

• LALR not as powerful as LR(1)

73

See you next time

Recommended