Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2

Winter 2012-2013Compiler Principles

Syntax Analysis (Parsing) – Part 2Mayer Goldberg and Roman Manevich

Ben-Gurion University

• Review top-down parsing– Recursive descent

• LL(1) parsing• Start bottom-up parsing– LR(k)– SLR(k)– LALR(k)

Top-down parsing

• Parser repeatedly faces the following problem– Given program P starting with terminal t– Non-terminal N with list of possible production

rules: N α1 … N αk

• Predict which rule can should be used to derive P

Recursive descent parsing

• Define a function for every nonterminal• Every function work as follows– Find applicable production rule– Terminal function checks match with next input

token– Nonterminal function calls (recursively) other

functions• If there are several applicable productions for

a nonterminal, use lookahead

Boolean expressions example

not ( not true or false )

E => not E => not ( E OP E ) =>not ( not E OP E ) =>not ( not LIT OP E ) =>not ( not true OP E ) =>not ( not true or E ) =>not ( not true or LIT ) =>not ( not true or false )

( E OP E )

not LIT or LIT

true false

production to apply known from next token

Flavors of top-down parsers

• Manually constructed– Recursive descent (previous lecture, review now)

• Generated (this lecture)– Based on pushdown automata– Does not use recursion

Recursive descent parsing

• Define a function for every nonterminal• Every function work as follows– Find applicable production rule– Terminal function checks match with next input

token– Nonterminal function calls (recursively) other

functions• If there are several applicable productions for

a nonterminal, use lookahead

Matching tokens

• Variable current holds the current input token

match(token t) { if (current == t) current = next_token() else error}

Functions for nonterminals

E() { if (current {TRUE, FALSE}) // E LIT LIT(); else if (current == LPAREN) // E ( E OP E ) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT) // E not E match(NOT); E(); else error;}

LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error;}

Implementation via recursion

E() {if (current {TRUE, FALSE}) LIT();else if (current == LPAREN) match(LPARENT);

E(); OP(); E();match(RPAREN);

else if (current == NOT) match(NOT); E();else error;

LIT() {if (current == TRUE) match(TRUE);else if (current == FALSE) match(FALSE);else error;

OP() {if (current == AND) match(AND);else if (current == OR) match(OR);else if (current == XOR) match(XOR);else error;

How is prediction done?

• For simplicity, let’s assume no null production rules– See book for general case

• Find out the token that can appear first in a rule – FIRST sets

p. 189

FIRST sets• For every production rule Aα

– FIRST(α) = all terminals that α can start with – Every token that can appear as first in α under some derivation for α

• In our Boolean expressions example– FIRST( LIT ) = { true, false }– FIRST( ( E OP E ) ) = { ‘(‘ }– FIRST( not E ) = { not }

• No intersection between FIRST sets => can always pick a single rule

• If the FIRST sets intersect, may need longer lookahead– LL(k) = class of grammars in which production rule can be determined

using a lookahead of k tokens– LL(1) is an important and useful class

Computing FIRST sets

• Assume no null productions A 1. Initially, for all nonterminals A, set

FIRST(A) = { t | A tω for some ω }2. Repeat the following until no changes occur:

for each nonterminal A for each production A Bω set FIRST(A) = FIRST(A) FIRST(B)∪

• This is known as fixed-point computation

FIRST sets computation example

TERM EXPR STMT

1. Initialization

TERM EXPR STMTidconstant

zero?Not++--

ifwhile

2. Iterate 1

zero?Not++--

ifwhile

zero?Not++--

2. Iterate 2

zero?Not++--

ifwhile

idconstant

zero?Not++--

2. Iterate 3 – fixed-point

zero?Not++--

ifwhile

idconstant

zero?Not++--

idconstant

LL(k) grammars• A grammar is in the class LL(K) when it can be derived via:– Top-down derivation– Scanning the input from left to right (L)– Producing the leftmost derivation (L)– With lookahead of k tokens (k)– For every two productions Aα and Aβ we have FIRST(α) ∩

FIRST(β) = {}(and FIRST(A) ∩ FOLLOW(A) = {} for null productions)

• A language is said to be LL(k) when it has an LL(k) grammar

• What can we do if grammar is not LL(k)?

• Pushdown automaton uses– Prediction stack– Input stream– Transition table• nonterminals x tokens -> production alternative• Entry indexed by nonterminal N and token t contains

the alternative of N that must be predicated when current input starts with t

LL(k) parsing via pushdown automata

• Two possible moves– Prediction

• When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error

– Match• When top of prediction stack is a terminal T, must be equal to

next input token t. If (t == T), pop T and consume t• If (t ≠ T) syntax error

• Parsing terminates when prediction stack is empty– If input is empty at that point, success. Otherwise, syntax

Model of non-recursivepredictive parser

Predictive Parsing program

Parsing Table

$ b + a

Output

( ) not true false and or xor $E 2 3 1 1

LIT 4 5OP 6 7 8

(1) E → LIT(2) E → ( E OP E ) (3) E → not E(4) LIT → true(5) LIT → false(6) OP → and(7) OP → or(8) OP → xor

Input tokens

Example transition table

Which rule should be used

A A aAb A c

A aAb | caacbb$

Input suffix Stack content Move

aacbb$ A$ predict(A,a) = A aAbaacbb$ aAb$ match(a,a)

acbb$ Ab$ predict(A,a) = A aAbacbb$ aAbb$ match(a,a)

cbb$ Abb$ predict(A,c) = A ccbb$ cbb$ match(c,c)

bb$ bb$ match(b,b)

b$ b$ match(b,b)

$ $ match($,$) – success

Running parser example

A A aAb A c

A aAb | cabcbb$

Input suffix Stack content Move

abcbb$ A$ predict(A,a) = A aAbabcbb$ aAb$ match(a,a)

bcbb$ Ab$ predict(A,b) = ERROR

Illegal input example

Using top-down parsing approach

• Compute parsing table– If table is conflict-free then we have an LL(k)

parser• If table contains conflicts investigate– If grammar is ambiguous try to disambiguate– Try using left-factoring/substitution/left-recursion

elimination to remove conflicts

Marking “end-of-file”• Sometimes it will be useful to transform a

grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ with a new production rule S’ S $where $ is not part of the set of tokens

• To parse an input P with G’ we change it into P$• Simplifies top-down parsing with null

productions and LR parsing

Bottom-up parsing

• No problem with left recursion• Widely used in practice• Shift-reduce parsing: LR(k), SLR, LALR– All follow the same pushdown-based algorithm– Read input left-to-right producing rightmost

derivation– Differ on type of “LR Items”

• Parser generator CUP implements LALR

Some terminology

• The opposite of derivation is called reduction– Let Aα be a production rule– Let βAµ be a sentence– Replace left-hand side of rule in sentence:

βAµ => βαµ• A handle is a substring that is reduced during a

series of steps in a rightmost derivation

Rightmost derivation example

1 + (2) + (3)

E + (E) + (3)

E E + (E) E i

1 2 + 3

E + (3)

( ) ( )

E + (E)

E + (2) + (3) Each non-leaf node represents a handleRightmost

derivation in reverse

LR item

N αβ

Already matched To be matched

Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

LR items

N αβ Shift Item

N αβ Reduce Item

Example

Z expr EOFexpr term | expr + termterm ID | ( expr )

Z E $E T | E + TT i | ( E )

(just shorthand of the grammar on the top)

Example: parsing with LR items

Z E $E T | E + TT i | ( E )

E T E E + TT i T ( E )

i + i $

Why do we need these additional LR items?Where do they come from?What do they mean?

-closure

• Given a set S of LR(0) items

• If P αNβ is in S• then for each rule N in the grammar

S must also contain N

-closure({Z E $}) = E T, E E + T,T i , T ( E ) }

{ Z E $,Z E $

E T | E + TT i | ( E )

i + i $

Z E $E T | E + TT i | ( E )

Items denote possible future handles

Remember position from which we’re trying to reduce

T i Reduce item!

i + i $

Z E $E T | E + TT i | ( E )

Match items with current token

Reduce item!

T + i $Z E $E T | E + TT i | ( E )

Reduce item!

E + i $Z E $E T | E + TT i | ( E )

E E+ T

E + i $Z E $E T | E + TT i | ( E )

E E+ T

Z E$ E

E+TT i T ( E )

E E+ T

Z E$ E

E+TT i T ( E )

E + T $

Z E $E T | E + TT i | ( E )

E E+ T

Z E$ E

E+TT i T ( E )

Reduce item!

Z E$E E+ T

Z E $E T | E + TT i | ( E )

Z E$E E+ T

Reduce item!

Z E $E T | E + TT i | ( E )

Z E$E E+ T

Reduce item!

Z E $E T | E + TT i | ( E )

Computing item sets

• Initial set– Z is in the start symbol – -closure({ Zα | Zα is in the grammar } )

• Next set from a set S and the next symbol X– step(S,X) = { NαXβ | NαXβ in the item set S}– nextSet(S,X) = -closure(step(S,X))

LR(0) automaton example

Z E$E TE E + TT iT (E)

T (E)E TE E + TT iT (E)

E E + T

T (E) Z E$

Z E$E E+ T E E+T

T iT (E)

T (E)E E+T

reduce stateshift state

GOTO/ACTION tables

State i + ( ) $ E T action

q0 q5 q7 q1 q6 shift

q1 q3 q2 shift

q2 ZE$q3 q5 q7 q4 Shift

q4 EE+Tq5 Tiq6 ETq7 q5 q7 q8 q6 shift

q8 q3 q9 shift

GOTO TableACTIONTable

empty – error move

LR(0) parser tables

• Two types of rows:– Shift row – tells which state to GOTO for current

token– Reduce row – tells which rule to reduce

(independent of current token)• GOTO entries are blank

LR parser data structures

• Input – remainder of text to be processed• Stack – sequence of pairs N, qi– N – symbol (terminal or non-terminal)– qi – state at which decisions are made

• Initial stack contains q0

+ i $input

q0stack i q5

LR(0) pushdown automaton

• Two moves: shift and reduce• Shift move

– Remove first token from input– Push it on the stack– Compute next state based on GOTO table– Push new state on the stack– If new state is error – report error

i + i $input

q0stack

+ i $input

q0stack

LR(0) pushdown automaton

• Reduce move– Using a rule N α– Symbols in α and their following states are removed from stack– New state computed based on GOTO table (using top of stack, before pushing N)– N is pushed on the stack– New state pushed on top of N

+ i $input

q0stack i q5

ReduceT i + i $input

q0stack T q6

GOTO/ACTION tableState i + ( ) $ E T

q0 s5 s7 s1 s6

q1 s3 s2

q2 r1 r1 r1 r1 r1 r1 r1

q3 s5 s7 s4

q4 r3 r3 r3 r3 r3 r3 r3

q5 r4 r4 r4 r4 r4 r4 r4

q6 r2 r2 r2 r2 r2 r2 r2

q7 s5 s7 s8 s6

q8 s3 s9

q9 r5 r5 r5 r5 r5 r5 r5

(1)Z E $(2)E T (3)E E + T(4)T i (5)T ( E )

Warning: numbers mean different things!rn = reduce using rule number nsm = shift to state m

GOTO/ACTION table

st i + ( ) $ E Tq0 s5 s7 s1 s6q1 s3 s2q2 r1 r1 r1 r1 r1 r1 r1q3 s5 s7 s4q4 r3 r3 r3 r3 r3 r3 r3q5 r4 r4 r4 r4 r4 r4 r4q6 r2 r2 r2 r2 r2 r2 r2q7 s5 s7 s8 s6q8 s3 s9q9 r5 r5 r5 r5 r5 r5 r5

(1)Z E $(2)E T (3)E E + T(4)T i (5)T ( E )

Stack Input Actionq0 i + i $ s5q0 i q5 + i $ r4q0 T q6 + i $ r2q0 E q1 + i $ s3q0 E q1 + q3 i $ s5q0 E q1 + q3 i q5 $ r4q0 E q1 + q3 T q4

q0 E q1 $ s2q0 E q1 $ q2 r1q0 Z

top is on the right

Are we done?

• Can make a transition diagram for any grammar

• Can make a GOTO table for every grammar

• Cannot make a deterministic ACTION table for every grammar

LR(0) conflicts

Z E $E T E E + TT i T ( E )T i[E]

Z E$E T

E E + TT i

T (E)T i[E] T i

T i[E]

E Shift/reduce conflict

LR(0) conflicts

Z E $E T E E + TT i V iT ( E )

Z E$E T

E E + TT i

T (E)T i[E] T i

E reduce/reduce conflict

LR(0) conflicts

• Any grammar with an -rule cannot be LR(0)• Inherent shift/reduce conflict– A – reduce item– P αAβ – shift item– A can always be predicted from P αAβ

Back to the GOTO/ACTIONS tables

q1 q3 q2 shift

q2 ZE$q3 q5 q7 q4 Shift

q4 EE+Tq5 Tiq6 ETq7 q5 q7 q8 q6 shift

q8 q3 q9 shift

GOTO TableACTION

ACTION table determined only by transition diagram, ignores input

SRL grammars

• A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N

• A reduce item N α is applicable only when the look-ahead is in FOLLOW(N)

• Differs from LR(0) only on the ACTION table

SLR ACTION table

(1)Z E $(2)E T (3)E E + T(4)T i (5)T ( E )

State i + ( ) $

q0 shift shift

q1 shift shift

q2 ZE$q3 shift shift

q4 EE+T EE+T EE+Tq5 Ti Ti Tiq6 ET ET ETq7 shift shift

q8 shift shift

q9 T(E) T(E) T(E)

Lookahead token from the

Remember: In contrast, GOTO table is indexed by state and a grammar symbol from the stack

SLR ACTION tableState i + ( ) [ ] $

q0 shift shift

q1 shift shift

q2 ZE$q3 shift shift

q4 EE+T EE+T EE+Tq5 Ti Ti shift Tiq6 ET ET ETq7 shift shift

q8 shift shift

q9 T(E) T(E) T(E)

state action

q0 shift

q1 shift

q2 ZE$q3 Shift

q4 EE+Tq5 Tiq6 ETq7 shift

q8 shift

SLR – use 1 token look-ahead LR(0) – no look-ahead

… as before…T i T i[E]

Are we done?

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

S’ → SS → L = RS → RL → * RL → idR → L

S’ → S

S → L = RR → L

S → R

L → * RR → LL → * RL → id

L → id

S → L = RR → LL → * RL → id

L → * R

R → L

S → L = R

shift/reduce conflict

• S → L = R vs. R → L • FOLLOW(R) contains =

– S L ⇒ = R * R ⇒ = R

• SLR cannot resolve the conflict either

S → L = RR → L

S → L = RR → LL → * RL → id

q2(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

LR(1) grammars

• In SLR: a reduce item N α is applicable only when the look-ahead is in FOLLOW(N)

• But FOLLOW(N) merges look-ahead for all alternatives for N

• LR(1) keeps look-ahead with each LR item• Idea: a more refined notion of follows

computed per item

LR(1) item

• LR(1) item is a pair – LR(0) item– Look-ahead token

• Meaning– We matched the part left of the dot, looking to match the part on the

right of the dot, followed by the look-ahead token.• Example

– The production L id yields the following LR(1) items

[L → ● id, *][L → ● id, =][L → ● id, id][L → ● id, $][L → id ●, *][L → id ●, =][L → id ●, id][L → id ●, $]

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

-closure for LR(1)

• For every [A → α ● Bβ , c] in S – for every production B→δ and every token b in

the grammar such that b FIRST(βc) – Add [B → ● δ , b] to S

(S’ → ∙ S , $)(S → ∙ L = R , $)(S → ∙ R , $)(L → ∙ * R , = )(L → ∙ id , = )(R → ∙ L , $ )(L → ∙ id , $ )(L → ∙ * R , $ )

(S’ → S ∙ , $)

(S → L ∙ = R , $)(R → L ∙ , $)

(S → R ∙ , $)

(L → * ∙ R , =)(R → ∙ L , =)(L → ∙ * R , =)(L → ∙ id , =)(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → id ∙ , $)(L → id ∙ , =)

(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → * R ∙ , =)(L → * R ∙ , $)

(R → L ∙ , =)(R → L ∙ , $)

(S → L = R ∙ , $)

(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → id ∙ , $)

(R → L ∙ , $)

(L → * R ∙ , $)

Back to the conflict

• Is there a conflict now?

(S → L ∙ = R , $)(R → L ∙ , $)

(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

• LR tables have large number of entries• Often don’t need such refined observation

(and cost)• LALR idea: find states with the same LR(0)

component and merge their look-ahead component as long as there are no conflicts

• LALR not as powerful as LR(1)

See you next time

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2

Documents

Course Title: Software Engineering Course no: CSC-351 : 70 ......compiler, compiler-construction tools. 1.6 A Simple One-Pass Compiler: Syntax Definition, Syntax directed translation,

Winter 2006-2007 Compiler Construction T4 – Syntax Analysis (Parsing, part 2 of 2)

Compiler 2 nd Phase Syntax Analysis Unit-2 : Syntax Analysis1

Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up

Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein

XML Syntax and Parsing Concepts - pearsoncmg.comptgmedia.pearsoncmg.com/images/0201703599/sample… · · 2009-06-09XML Syntax and Parsing Concepts Elements, Tags, Attributes,

6.864: Lecture 2, Fall 2005 Parsing and Syntax I · 2020. 12. 30. · Parsing and Syntax I. Overview • An introduction to the parsing problem • Context free grammars • A brief(!)

Compiler Principles Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 1 Mayer Goldberg and Roman Manevich Ben-Gurion University

Syntax and parsing an overview - folk.uio.nofolk.uio.no/liljao/inf5830/inf5830_syntax_handout.pdf · Syntax and parsing (today) Dependency grammar (10/10) Dependency parsing (17/10)

Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2)

Syntax, Grammars & Parsing

1 Compiler Construction Parsing Part I 제 4 주 Parsing

Compiler Construction Parsing Part I

LL(k) Compiler Constructionryan/cse4251/eda180-notes.pdfCompiler Construction More LL parsing Abstract syntax trees Lennart Andersson Revision 2011{01{31 2010 Compiler Construction

2014 | Sem - VII | Syntax Analysis 170701 Compiler · PDF fileThe lexical analyzer is the first phase of a compiler ... (lookahead) to predict the parsing process. The simple block

Parsing Compiler Baojian Hua bjhua@ustc.edu.cn. Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer

Compilation 0368-3133 (Semester A, 2013/14) Lecture 3: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit:

Compiler Components and their Generators - Traditional Parsing Algorithms

Speech & NLP: Syntax & Parsing

Syntax and Parsing