Csci465 1 Chapter 4: Syntax Analysis Csci 465. 2 Objectives Parser and its role in the design of compiler –Techniques used to build hand implementation

Csci4651

Chapter 4: Syntax Analysis Chapter 4: Syntax Analysis

Csci 465

2

ObjectivesObjectives

• Parser and its role in the design of compiler– Techniques used to build hand implementation parses

• Top-down parsing • LL parser

– Algorithms used to build automated parser generators• Bottom-up parsing• LR parser • Simple LR (SLR)

• CFG – Derivations (leftmost and rightmost)– FISRT and FOLLOW

• Error Recovery Handling Techniques

Syntax AnalysisSyntax Analysis

• Every PL has a set of rules prescribing the syntactic structure of the programs written in that language– E.g., Pascal

• Pascal Program is made out of Blocks• A block itself made out of statements• A statement is made out of expressions• An expression is made out of tokens• A token is made out of characters specified by RE

3

GrammarsGrammars

• Grammars?• the set of structural rules that guides the

composition of clauses, phrases and words in any given natural language

• Formal Grammars?• A set of production rules for strings in a formal language.

4

Significant of GrammarsSignificant of Grammars

• Significant of Grammars– Provides a precise, easy-to-understand syntactic

specifications– Automates the construction of an efficient parser– Supports evolvability of an existing language

implementation by adding new programming constructs

5

Csci4656

Parser vs scannerParser vs scanner

• Lexical analyzer– Recognizes token (terminal symbols) from the

sequence of characters in an input string• Parser

– Recognizes a set of related words (or phrases) – how theses words are combined to form

syntactically correct program

Limitation of Regular Expression Limitation of Regular Expression (revisited)(revisited)

• Regular expressions and its recognizers are suitable for indentifying error at word level– E.g.,

• misspelling an identifier, keyword, or operator

• RE can not be used to handle nested or balanced parentheses – E.g., an arithmetic expression with unbalanced

parentheses

7

Role of parserRole of parser

8

LA Parser Rest of FE

Sym. Table

Parse tree codeToken/getchar()Source

Pg

Types of ParserTypes of Parser

• Universal Parsing methods– Cocke-Younger_Kasami Algorithm

• Parse any grammar• Not very efficient to use production compilers

• Top-down– LL parsers (hand-written)

• Bottom-up– LR parsers (automated)

9

Csci46510

Context Free Grammar (CFG)Context Free Grammar (CFG)

• Grammar can be used to describe most of syntax of PL– PLs allow sentence construction with nested and matched

parentheses• Some PL construct can not be defined by Grammar

– E.g., Define/use• These languages are specified by CFG

– Every language defined by CFG can be recognized by Push Down Automata (PDA) or any Language accepted by PDA is CFG

Csci46511

CFG and PDACFG and PDA

• The focus here is on Context Free Language (CFL) that are accepted by PDAs

• CFL:– languages defined by LL(K) Context-Free

Grammars• LL?

– parses the input from Left to Right, and constructs a Leftmost derivation of the sentence

Csci46512

LL ParsingLL Parsing

• What is LL(K) grammar?– A grammar from which we can construct a

deterministic, top-down PDA that looks a head at most k symbols in the input tape

• What is LL(1) grammar?– The most common form of LL(K) grammar– Looks a head at most one symbol– The easiest to convert into PDA

Csci46513

Predicative ParsingPredicative Parsing

Csci46514

PDAPDA

• A push-down automaton is formally defined as a 7-tuple as follows– P = (, Q, , H, h▲ 0, q0, F)

: Alphabet• Q: states• ▲: transition functions• H: finite stack alphabet• h0: initial symbol in H• q0: Initial state• F: finite set of final state

Csci46515

PDAPDA

• ▲ has the following functionality– T:Q()HQH*

• i.e., every transition is defined for a particular state;– reads one input token or skip the input– always pops one symbol off the stack– moves to a new state – pushes a string of zero or more (i.e., *) symbols back onto the

stack

16

Model of PDA

Csci46517

Example 1Example 1

• Let P0 = (={a, b, c}, Q={A,B,C}, ▲,H={h,i},h0= i, q0=A, F={ }) be PDA

• Where ▲ can be defined as follow– T(A, a, i) = (B, h)– T(B, a, h) = (B, hh)– T(C, b, h) = (C,)– T(A, c, i) = (A, )– T(B, c, h) = (C, h)

tabletable

18

Configurations Transitions Actions

(A, aacbb, i) T(A, a, i) = (B, h) read a, pop i, push h, go to B

(B, acbb, h) T(B, a, h) = (B, hh) read a, pop h, push hh, go to B

(B, cbb, hh) T(B, c, h) = (C, h) read c, pop h, push h, go to C

(C, bb, hh) T(C, b, h) = (C,) read b, pop h, go to C

(C, b, h) T(C, b, h) = (C,) read b, pop h, go to C

(C,, ) STOP String is successfullyparsed

Csci46519

Push Down Automata (PDA): Push Down Automata (PDA): ImplementationImplementation

• PDA used to implement top-down parser– Starts with the goal symbol on the stack– Rewrites the leftmost non-terminal until the

leftmost symbol is a terminal matching the first token of the input string

– Takes the transition that reads ( matches) that token– Repeats the process until the entire input has been

read or PDA blocks

Top-Down Parsing (revisited)Top-Down Parsing (revisited)

• Top down parsing– Building a parse tree for input string

• Starting from the root • Creating the nodes for the tree in preorder (depth first)

fashion– Finding a leftmost derivation for an input

20

Example: Grammar for Arithmetic Example: Grammar for Arithmetic expressionexpression

21

22

23

24

25

26

Suppose G defined as follows: S c A d A a b| a

FIRST and FOLLOWFIRST and FOLLOW

• The construction of both top-down and bottom up parsers require two functions– FIRST()– FOLLOW()

• These functions help to select the appropriate production

27

Csci46528

FIRST and Follow SetsFIRST and Follow Sets

• To show a grammar is LL(K), need to build– Firstk(w) for all right hand sides w in the

grammar’s production– Followk(N) for all nonterminals N in the grammar– Creat selection sets for all productions– First and Follow sets help to fill in the entries of

the parsing table

Csci46529

First and FollowFirst and Follow

Csci46530

FIRSTFIRSTkk(w)(w)

• The FIRSTFIRSTK of any string w is the set of all terminal strings of K-tokens or fewer that can be derived from w– Firstk(uv) = FirstK(FirstK(u)FirstK(v))

• (i.e. first of u concatenated with first of v)

– Firstk(N) = (FirstK(w)) • (i.e., the union of all first of N such that Nw is a production)

– Firstk(x) = {x} • (i.e., for any terminal x)

– Firstk() = {} • (i.e., for empty string)

Csci46531

Example 1Example 1

• First2(uv) = First2(First2(u)First2(v))– Where

• First2(u)={ab, cd, d, dd, }

• First2(v)={cc, d, }• therefore

– First2(uv) is formed by concatenating each of the First(u) with First (v )– {abcc, abd, ab, cdcc, cdd, cd, dcc, dd, d, ddcc, ddd, dd, cc, d, }

• Take the first two char – {ab, ab, ab, cd, cd, cd, dc, dd, d, dd, dd, dd, cc, d, }

• Removed the duplicates– First2(uv)={ab,cd,dc,dd,d,cc, }

Csci46532

Example 2Example 2• Consider the simple grammar G:

– ABa– Bb– Bc

• Get the First1(A) = First1(First1(B)First1(a))– =First((First(b)First(c))First(a))– =First1( {b,c}{a})– =First1({ba,ca})– ={b,c}

• where– First(b)={b}– First(c)={c}– First(a)={a}

Csci46533

FollowFollowkk(A)(A)

• Followk of a nonterminal A – Refers to the set of all terminal

strings of k-tokens that can follow whatever A derives

34

Example: Follow setExample: Follow set

– For all production BuAv, the Followk(A) can be built – Followk(A) = (Firstk(Firstk(v)Followk(B))

• It means That– to construct the Follow(A), look in the grammar for all productions

in which A occurs in the right hand side (r.h.s) and apply the following rules:

1. the FIRST of everything to the right of the A, including the Follow(B), where B is the non-terminal on L.H.S // BuAv

2. If A is the rightmost symbol in some sentential form, then add (or $) to Follow(A).

3. If v is nonterminal, then everything in FIRST(v) except for is placed in Follow(A)

4. If v derives (v* ), Follow(A) = Follow(B)

Csci46535

Follow: Example 1 Follow: Example 1

• Consider the following grammar– SBx– AaA– Ab– ByAzA

• Compute the Follow1(A)?

Csci46536

Follow: Example 1 (solution) Follow: Example 1 (solution)

• Consider the following grammar– SBx– AaA– Ab– ByAzA– Compute the Follow1(A)?

• Find All A on the R.H.S• Find any terminal right after A

– Add the terminal, z, to the set = {z}• Find Follow of non-terminal on L.H.S of A

– Follow(B)=First(x)= {x}• Follow(A) is L.H.S ignored? recursion

– Follow (A)={x,z}

Csci46537

Example 2: First and FollowExample 2: First and Follow

• Consider the following grammar– ETE’– E’+TE’ |– TFT’– T’*FT’| – F(E)| id

Csci46538

Solution for FIRST()Solution for FIRST()

–FIRST (E)=FIRST(T)=FIRST(F)={(,id}–FIRST(E’)={+, }–FIRST(T’) = {*, }

Csci46539

Solution for Follow()Solution for Follow()


• FOLLOW(E)=FOLLOW(E’)={), } //applied rules 2, 1// • FOLLOW(T)=FOLLOW(T’)= {+, ), } // applied rules 3, 4//• FOLLOW(F) = {*, +, ), } // applied rules 3, 4//

Csci46540

Selection SetsSelection Sets

• The selection set of Selectk of a production is the set of lookahead strings of K tokens that assists the selection of that production in a deterministic top-down parser

More on SelectionMore on Selection

• For each production in a grammar AwSelectk(Aw)=Firstk (Firstk(w) Followk(A))

• A nonterminal A in a grammar is LL(K) iff • For any two selection sets S1 and S2 of the

productions A the following condition holds• S1S2 = {}

• A grammar is LL(K) if every non-terminal in that grammar is LL(K)

41

Csci46542

Example of SelectionExample of Selection

• Consider the simple grammar G1. SaSb2. S

Csci46543

More on SelectionMore on Selection

• SaSb– Select1(SaSb ) = First1 (First1(aSb) Follow1(S))

• First1({a} {$,b}) – $ is in follow because S is a goal symbol

• First1 ({a$, ab}) • {a}

Csci46544

Cont’ (SCont’ (S))

• S– Select1(S) =

• First1(First1()Follow1(S))

• First1({} Follow1(S))

• First1 ({} X{$,b}) • {$,b}

– {$,b} {a} = {}• Which means they have no elements in common for two selections

the G is LL(1)

Csci46545

In Class QuizIn Class Quiz

• Consider the following grammar– SBx– AaA– Ab– ByAzA– BAA– Compute Follow1(A)?

Csci46546

Converting CFG to PDA:1Converting CFG to PDA:1

• PDA can be constructed from a CFG as follows:– PDA. == CFG. – PDA.H == N //finite stack alphabet– PDA.h0 == the goal symbol of CFG– PDA.Q = the only state and it halts on empty

stack

Csci46547

Converting CFG to PDA: 2Converting CFG to PDA: 2

• Two rules– 1. T(q,x,x) = (q, ) (i.e., for every terminal x)

– 2. T(q, , A) = (q, ) (i.e., replace non-terminal A by )• Where is a set of terminal and non-terminal symbols on R.H.S

Csci46548

Example: From CFG to PDAExample: From CFG to PDA

• Consider the following G1 that generates all a’s followed by an equal number of b’s – L(G) ={aabb, aaabbb, …}

• 1) SaSb• 2) S• First (S) = {a, }• Follow (S) = {b}

Csci46549

Example 2: TransitionsExample 2: Transitions

• Covert G1 to PDA1. T(q,, S) = (q, aSb)2. T(q,, S) = (q, )3. T(q,a, a) = (q, )4. T(q,b, b) = (q, )

Csci46550

Example2: Parsing Example2: Parsing

• Input string: aabb• Cnfg0: (q, aabb,S)• Transitions:

1. T(q,, S) = (q, aSb)2. T(q,, S) = (q, )3. T(q,a, a) = (q, )4. T(q,b, b) = (q, )

Step Input Stack Transition

1 .aabb S

2 .aabb aSb 1

3 a.abb Sb 3

4a a.abb b 2?

4b a.abb aSbb 1

5 aa.bb Sbb 3

6a aa.bb aSbbb 1?

6b aa.bb bb 2

7 aab.b b 4

8 empty empty 4

Use first

Use follow

Csci46551

Note on exampleNote on example

• The grammar is LL(1) – Because non-determinism is resolved only by

looking at one symbol– Applied 1: (i.e., S aSb)

• the string is ‘abb’ and ‘a’ is the first token in input– Applied 2 (i.e., S )

• the input string is a ‘b’, which is the token that follows S in the SaSb

LL(1) GrammarsLL(1) Grammars

• Predictive Parsers– recursive-descent parsing method with no-

backtracking can be constructed for LL(1) grammars– LL(1) grammars is rich enough to cover most PL

constructs• A Grammar G is LL(1) if for any distinct productions of G:

A | , the following conditions are satisfied– First()First()={}– At most one of or derive empty string– If (or ) then First () follow(A)={}

52

Construction of Predictive ParserConstruction of Predictive Parser

• The following algorithm collects and uses the information from FIRST and FOLLOW sets into a predictive parsing TABLE M[A,a]– where

• M is 2D Array• A is non-terminal• a {$}

53

The AlgorithmThe Algorithm

• The algorithm follows this idea– Select production A if the next input symbol a

is in FIRST()– Complication can occur if * or =.

• select A if the current input is in the– FOLLOW(A) – $ = the input AND $ is in FOLLOW(A)

54

Parsing Table AlgorithmParsing Table Algorithm

55

Algorithm 4.31 (pp 224)Algorithm 4.31 (pp 224)

• INPUT: G (Grammar)• OUTPUT: Parsing table M• METHOD:

– For each production A of G, do the following:1. For each terminal a in FIRST(A), add A to M[A,a]2. If FIRST() = , then for each terminal b FOLLOW(A), add A to

M[A,b]

3. If is in FIRST() and $ in in FOLLOW(A) , then add A to M[A,$]

4. If after performing the above, there is no other production at all in M[A,a], then set M[A,a] to error (shown by blank in the table)

56

Csci46557

Example : Table DrivenExample : Table Driven

• Consider the following grammar– ETE’– E’+TE’ |– TFT’– T’*FT’– F(E)| id

Csci46558

Parsing Table for the grammar 4.31Parsing Table for the grammar 4.31

Parsing table M

$

E’

T’

Consider E TE’; and E’+TE’ |FIRST (TE’) =FIRST(T) = FISRT(F)= { (,id } // apply rule 1 FIRST(E’) = {+, }; // apply rule 1FOLLOW(E’)= { ), $ } // apply rule 3

M[E,id]

ETE’E’+TE’ |TFT’T’*FT’ |F(E)| id

Implication of Algorithm 4.31Implication of Algorithm 4.31

• The algorithm can be applied to any grammar G to produce a parsing table

• For any LL(1) grammar, each parsing table entry uniquely identifies a production or signals an error

• For Grammar G that is not LL(1) , we may have multiple entries

59

Csci46560

Non-recursive Predictive Parsing: Table-Non-recursive Predictive Parsing: Table-drivendriven

• Table-driven approach parser– Input buffer

• contains the string to be parsed– A stack (used stack instead of recursive calls)

• Grammar symbols with $ at the bottom– A parsing table

• Two-dimensional array M[A, a] indexed by non-terminal A and terminal a

– Output• Desired code (or call to error recovery routine)

Model of a table-driven predictive parserModel of a table-driven predictive parser

61

PP-Program

Parsing Table M

Input

stack

output

Csci46562

The semantics of parserThe semantics of parser

• The behavior of the parser can be specified in terms of its configurations– Initial configuration consists of w$ in the input

buffer• Where

– w is the string of input symbol– $ (end of input marker)

– S goal symbol on top of the stack

Model of a table-driven predictive parserModel of a table-driven predictive parser

63

End of input marker

End of input marker

Csci46564

Program using parsing table (pg226 in Program using parsing table (pg226 in ASU)ASU)

Algorithm :Non- recursive PPInput: A string w and a Parsing table M for GOut: if w is in L(G) a leftmost derivation of w; otherwise, an error indicationMethod:

Csci46565

Example : Table drivenExample : Table driven

• Consider the following grammar– ETE’– E’+TE’ |– TFT’– T’*FT’– F(E)| id

Csci46566

Parsing Table for the grammar 4.11Parsing Table for the grammar 4.11

Csci46567

Figure 4.16Figure 4.16

$

T’

E’

Csci46568

Syntax ErrorsSyntax Errors

• Program can contain errors at many different– Lexical

• E.g., misspelling– Syntactic

• E.g., arithmetic expression with unbalanced parentheses– Semantics

• E.g., incompatible types– Logical

• E.g., infinite loop

Csci46569

Parseing Error handlerParseing Error handler

• The error handler in the parser should achieve proper reporting– The presence and the nature of errors explicitly– Recover from each error quickly

Csci46570

IssuesIssues

• How should error handler report the presence of error?– Print the offending line

• How should the parser recover?– Quitting:

• Not an option because more errors in input is still possible

Csci46571

Error-Recovery StrategiesError-Recovery Strategies

• Panic mode (discards input symbol)• Phrase level (perform local correction on the

rest of input)• Error production (augment G with error

productions)• Global correction (find y; transfer y to x with

minimal changes)– Expensive (time/space)

Csci46572

Panic-mode error recoveryPanic-mode error recovery• Works with synchronization token, which are tokens used in delimiters

– E.g., semicolon, end, etc• Error Recovery Panic-mode error

– Skips symbols on the input until a token in is in Sync set• Where Sync

– A set of words using FOLLOW function (e.g. Semicolon in C and Pascal)– Follow and First sets can be used in the sync set

– Sync set then is used in Parsing table as follows:• If M[A, a] = sync, then pop non-terminal• If M[A, a] =“ ”, then skip input symbol ‘a’• If input symbol ‘a ‘ does NOT match terminal on the stack, then

– pop terminal on the top of stack– Issue a message (e.g., token is inserted)– resume paring

Csci46573

Solution for Follow()Solution for Follow()


• FOLLOW(E)={), $}• FOLLOW(T)= {+, ), $} • FOLLOW(F) = {*, +, ), $}• Synch-set= {*, +, ), $}

Csci46574

Csci46575

Left Recursion and Left Factoring Left Recursion and Left Factoring

• A grammar is a left recursive if it has a non-terminal A such that– A+A

• Left Recursion eliminations methods– Immediate left recursion– Non-immediate left recursion

76

Immediate left recursionImmediate left recursion

• AA |– Can be transformed to

• AA’• A’ A’|

• Elimination process1. Group the offending production A2. Introduce non-terminal A’3. Append non-terminal A’ to the right of both type of

productions (i.e., offending and non-offending)

77

Left Recursion elimination: General Left Recursion elimination: General techniquetechnique

• AA 1| A 2| A 3| …|A n| 1 | 2 |…| n

– where none of I begins with A

• Can be transformed into– A1 A’| 2 A’|…| nA’

– A’ 1 A’| 2 A’| …| m A’| • where

– None of i is empty

• It does not eliminate left recursion involving derivations

78

Non-immediate eliminationNon-immediate elimination

• Consider the following grammar– SAa |b– A Ac|Sd|

• S is left-recursive? why? SAaSda• First eliminate the derivation

– SAa |b– A Ac|Aad|bd| //replace S by its r.h.s in A

• Then apply immediate left recursion elimination methods– SAa |b– A bd A’| A’– A’ cA’|adA’|

79

Csci46580

Left FactoringLeft Factoring• When the selection between the alternative

of a production is NOT clear– Need to differ the selection until there is enough

information

Left Factoring : ExampleLeft Factoring : Example

• Suppose we have the following G– Stmtif Exp Then Stmt– Stmtif Exp Then Stmt else Stmt

• G is NOT LL(1) – Why not?

• Remove left-factored– Stmt if Exp Then Stmt Rest– Rest else Stmt |

81

Summary of Non-Context Free Language Summary of Non-Context Free Language ConstructsConstructs

• Programming constructs that cannot be specified using grammar – Define/USE– Counting number of formal parameters and

actual parameters.

82

examplesexamples

• Examples:• Define/Use: L1 = {wcw| w is in (a|b)*}

– Where » the first w= declaration; c: program; the second w= use» Grammar for C or Java, or Pascal does not distinguish among identifier

presented by different character• Checking the number of formal parameters and actual parameters in procedure

calls: L2 = {Formaln Actualm| n1 and m1}

• Solution:– Leave them up to semantic analyzers to handle them

83

Bottom up parsingBottom up parsing

• Bottom up parsing– Refers to the construction of a parse tree using an

input string staring from the leaves (bottom) all the way towards the root of the tree

84

Example id*idExample id*id

85

G:E E + T | TT T * F | FF (E )| id

86

Bottom-up Parsing: Shift-Reduce Bottom-up Parsing: Shift-Reduce Parsing(SRP)Parsing(SRP)

• General bottom up parsing technique– Shift-Reduce Parsing(SRP)

• The Largest class of SRP known as LR• Builds a parse tree from leaves to the root• Think of the process as a set of reduction steps which

replaces a string w with goal symbol S• Reduction step?

– A step that replaces a specific substring on the R.H.S with L.H.S

• Example: id * id– id*id, F * id, T* id, T * F, E

Csci46587

ExampleExample

• Consider the following grammar – SaABe– AAbc|b– Bd– Input: abbcde– abbcde can be reduced to S (i.e., goal symbol)

• abbcde aAbcdeaAdeaABeS

Csci46588

ExampleExample



Csci46589

ExampleExample



Csci46590

ExampleExample



Csci46591

ExampleExample



Csci46592

ExampleExample



Csci46593

Rightmost (rm) Derivation Rightmost (rm) Derivation

• Using Rightmost derivation starting from S– Srm aABerm aAde rm aAbcderm abbcde

SaABeAAbc|bBd

Csci46594

HandlesHandles

• Handle?– The opposite of derivation– A substring on the R.H.S that can be used to

replace the non-terminal on the L.H.S in the reverse process of rightmost derivation

• Example– abbcde– aAbcde ; b is the a handle at position 2

• Srm aABerm aAde rm aAbcderm abbcde

Csci46595

Example 4.23Example 4.23

• Consider the following grammar– EE +E– EE * E– E (E)– Eid

Csci46596

Example 4.23 (cont’)Example 4.23 (cont’)

Right-Sentential Handle Reducing Production

id1 + id2 * id3 id1 Eid

E+ id2 * id3 id2 Eid

E + E * id3 id3 Eid

E + E * E E*E EE*E

E + E E+E EE+E

E - -

Csci46597

Stack Implementation of SRPStack Implementation of SRP

• Two issues need to be addressed with handles– How to identify substring?– How to identify the right rule?

• SRP can be implemented• Initial state

– Stack: $– Input: w$

• Where – w is a string of input– $ is end of stack/input marker

• Final state– Stack: $ S– Input: $

Csci46598

More on SRPMore on SRP

• SRP works as follows– Shifts input symbols* onto the stack until a handle

β is found in stack– Reduces β to the L.H.S using the an appropriate

production rule– Repeats the process until

• Error is found (no match) • Or, Goal Symbol(S) and end of input ($)is found

Csci46599

(cfg. of SRP for id1+id2*id3)(cfg. of SRP for id1+id2*id3)stack Input Action

$ id1 + id2 * id3$ shift

$id1 + id2 * id3$ Reduce by Eid

$E + id2 * id3$ shift

$E+ id2 * id3$ shift

$E + id2 * id3$ Reduce Eid

$E+E * id3$ Shift

$E+E* id3$ shift

$E+E*id3 $ Reduce Eid

$E+E*E $ Reduce EE*E

$E+E $ Reduce EE+E

$E $ accept

Csci465100

Primary Operations of Shift Reduce Primary Operations of Shift Reduce ParsingParsing

• Main operations include– Shift– Reduce– Accept– Error

Csci465101

Operations: shift actionOperations: shift action

• Shift action?– The next input symbol is shifted onto the top of

the stack

Csci465102

Operations: Reduce actionOperations: Reduce action

• Reduce action– Parser knows the right end of the handle is at

the top of the stack– Parser needs to locate the left end of the handle

within the stack– Parser then decides what non-terminal to replace

for handle

Csci465103

Operations: accept & error actionsOperations: accept & error actions

• Accept action– The parser announces successful completion of

parsing• Error action

– The parser discovers that syntax error has occurred

• calls an error recovery routine

Csci465104

Csci465105

Example: Dangling ElseExample: Dangling Else

• stmt if expr then stmt | if expr then stmt else stmt | others

Stack Input… if expr then stmt else…$ Cannot tell if this is the handle

Csci465106

Example: reduce/reduce conflict Example: reduce/reduce conflict (procedure call or array)(procedure call or array)

• Suppose we have a statement like A(I,J) using this Grammar:

?

?

Csci465107

Con’t: reduce/reduce conflict (procedure Con’t: reduce/reduce conflict (procedure call or array)call or array)

After shifting the first three tokens onto the stack:

Stack Input …id ( id ,id)…

Note: id on top of the stack must be reduced; the parser does not know which rule to apply ( 5 or 7?)

Csci465108

Con’t: reduce/reduce conflict (procedure Con’t: reduce/reduce conflict (procedure call or array)call or array)

One solution is to change the token id in p.1 to procid

Stack Input …id ( id ,id)…

Stack Input …procid ( id ,id)…

Csci465109

Model of LR ParsingModel of LR Parsing

Csci465110

Example: LR parsing actions and goto Example: LR parsing actions and goto functionsfunctions

• Consider the following grammar1) E E + T2) E T3) T T * F4) T F5) F (E)6) F id

Csci465111

Fig4-31Fig4-31

Si :shift and stack state i;

Rj: reduce by production numbered by j

Acc: accept;

Blank: error

Csci465112

Fig4-38Fig4-38

Csci465113

Csci465114

More Example on LR ParsingMore Example on LR Parsing

Csci465115

Csci465116

Csci465117

Csci465118

LR (1) GrammarsLR (1) Grammars

Csci465119

LR Grammars: Building Parsing TableLR Grammars: Building Parsing Table

• How to build an LR parsing table for a given grammar?

• If we can build a parsing table for any given grammar G, then G is said to be an LR grammar– To be an LR implies that

• Left-to-right shift-reduce parser that can identify handles on the top of the stack

Csci465120

LL (k) vs. LR (k)LL (k) vs. LR (k)

• LR– More general and more expressive than CFG– The decisions are made at R.H.S of the rule using

some or all part of the production – Uses more context than LL(k)

Csci465121

Constructing Constructing Simple LR Simple LR ((SLRSLR) parsing ) parsing TablesTables

• The SLR Parsing table– Builds the collection of Items from Grammar– Group items into sets– Use sets as the states of the SLR parser

Csci465122

An item of LR(0)An item of LR(0)

• Item of a grammar G?– A production of G having a dot “.” at some position– DOT is an indicator of

• how much of a production has been seen at a given point (before dot)

• how much of a production remain to be seen (after dot)

Construction of the items for LR(0)Construction of the items for LR(0)

• To construct the canonical LR(0) collection for a grammar G– Need to augment grammar G – Use two functions

• Closure (I) {adds more items to a set of items}• GOTO (I, X) {moves the ‘.’ past the symbol}

– If G is a grammar having S as start symbol, then G’ is the augmented grammar for G with a new start symbol S’

• S’S• S’ is used to signal the successful parsing (stop!)

– Construct transition diagram of DFA D to recognize the prefix of right-sentential forms (i.e., X1X2…X3aiai+1…an) which may appear on the stack of SRP

123

exampleexample

• Example: AXYZ – items for AXYZ results the following productions

• A.XYZ – (i.e., we are hoping to see substring deriving from XYZ next on

the input)• AX.YZ

– (i.e., we have seen on the input a string derivable from X and hoping to see a string derivable from YZ)

• AXY.Z• A XYZ.

– (i.e., it is time to reduce XYZ to A)

124

Csci465125

Closure of Item Sets using Closure Closure of Item Sets using Closure functionfunction

• If I is a set of items for a grammar G, then CLOSURE (I) can be obtained by these rules:

1. Initially adding every item in I to CLOSURE (I)2. IF A .B IS in closure(I) and Bq is a

production, then add the item B.q to Item I. 1. Repeat this rule until no more new item can be

added to closure(I)

Csci465126

Example 4.40Example 4.40

• Consider the augmented expression grammar G:– E’E– EE + T | T– TT * F | F– F (E) | id

Csci465127

Con’tCon’t

– If I is the set of one item {[E’.E]} then closure (I) contains the set of Item I0 in Figure 4.31

– How to compute closure E’.E ? – E’.E

• <by rule 1 add it because an E is right after the dot at R.H.S>

– E.E + T | .T • <by rule 2 add E productions with dots at the left end.>

Csci465128

completecomplete

– If I is the set of one item {[E’.E]}, then closure (I) can be– E’.E < by rule 1>– E.E + T < by rule 2 if A .B, then add B.q >– E .T < by rule 2 >– T.T * F < by rule 2>– T.F < by rule 2>– F .(E) <by rule 2>– F .id <by rule 2>

Csci465129

Create collection of items using closure

< rule 2>

< rule 2>

< rule 2>

130

The function GOTOThe function GOTO

• The second useful function is GOTO (I, X)– Where

• I is a set of items• X is a grammar symbol

– GOTO (I,X) is defined to be the closure of the set of all items [ A X. ] such that

[ A .X ] is in I• Used to define the transitions in LR(0) automaton for

grammar G

131

ExampleExample

• if I is the set of two items – {[E’ E.]}, {[E’ E. + T]},

• then GOTO (I, +) contain the items– E E +.T– T .T *F– T .F– F .(E)– F .id

• Moved the “.” over + and then computed the closure

132

Csci465133

Constructing an SLR-parsing tableConstructing an SLR-parsing table• Assume Grammar G’

1. Build a set of LR(0) items for G’2. Construct State i using Ii

3. The parsing actions for state i are determined as followsa) If [A.a] is in Ii, and GOTO(Ii, a) =Ij, then action [I,a] = sJ (shift state j)b) If [A.] is in Ii, then action[Ii ,a] = reduce Aa for ALL a in Follow(A) (assuming A

is not S’)c) If [S’S.] is in Ii, then action[Ii, $] = accept

4. The goto transition for state Ii are build using non-terminal A: if goto(Ii, A)=Ij, then goto[Ii,A]=J

5. All entries not defined by rule 2, 3 are made “error”6. The initial state of the parser is the one constructed from the set of items

containing [S’S.]

Example 4.47: 1Example 4.47: 1

• Let us build the SLR table for the augmented expression grammar ( see fig .4.13). Consider the set of item I0 as follows:– E’.E – E.E + T – E .T – T.T * F– T.F – F .(E) – F .id

134

Csci465135

Example 4.47: 2Example 4.47: 2

• Consider I0 (state 0)

– The item F .(E)• Results action [0, (] =s4

– The item F .id • results action [0, id] =s5

– No actions for other item

136

Csci465137

Fig4-31Fig4-31



Acc: accept;

Blank: error

Csci465138

Example 4.38 (Reduce)Example 4.38 (Reduce)

• Consider I2 – ET.– TT.*F

• Since FOLLOW (E) = { $, +, ) } and E = E’ then– action[2, +,]=action[2, )]= action[2, $]=Reduce E T (r2)

» Using this rule: If [A.] is Ii, then action[Ii ,a] = reduce Aa for all a in Follow(A)

Csci465139

Fig4-31Fig4-31



Acc: accept;

Blank: error

Csci465140

Second ItemSecond Item

• I2 makes – Action[2, *] = s7

Csci465141

Fig4-31Fig4-31



Acc: accept;

Blank: error

142

Csci465143

Fig4-31Fig4-31



Acc: accept;

Blank: error

Csci465144

Advanced TopicsAdvanced Topics

• Optimizing a grammar• Reducing the size of LR(1) tables

• Combining rows/columns• Shrinking the Grammar• Directly Encoding the table• Using better construction algorithms

Csci465145

Optimizing a GrammarOptimizing a Grammar

• There is a correlation between number of productions rules and amount of work to parse

• Top down Parser– Works with derivation

• Bottom up Parser– performs a reduction for every single derivation

Csci465146

DerivationDerivation

• Rewrite the grammar to shorten the height of parse tree because the shorter tree translated into

• Shorter derivation• Shorter time to parse• Less reductions

• Optimization has no effect on the behavior of parser

Csci465147

Example: Expression TreeExample: Expression Tree

Csci465148

Parse Tree for Exp. GrammarParse Tree for Exp. Grammar

A node with a single child is a candidate for optimization

(x – 2 * y)

Csci465149

Revised GrammarRevised Grammar• Obtain revised

grammar by substitute for Factor by its alternatives

• increases the # of alternatives for terms

• Shrinks the parse tree by eliminating a layer

Csci465150

Revised Expression GrammarRevised Expression Grammar

(x – 2 * y)

Csci465151

General ruleGeneral rule

• useless productions– Production having a single symbol on R.H.S– Use for performing a specific action

• Fold away useless productions– Can increase the size of table in LR(1)

• remove one columns while it may add many rows– Can increase the number of comparison in LL(1)

Csci465152

Reducing the size of LR(1) tables: Reducing the size of LR(1) tables: Combing Rows/ColumnsCombing Rows/Columns

• LR(1) generated for even small grammars can be large

• Combining rows/columns results in a direct reductions in table size and an extra indirection to access the table

• Need to understand the tradeoff

Csci465153

Combing Rows (or columns)Combing Rows (or columns)

• Find two identical rows (or columns)– Each set can be implemented by table generators

once– Remap parser-state to row-index in action table– Same things for identical columns

Csci465154

Shrinking the grammarShrinking the grammar

• To shrink the grammar– Reduce the number of production rules

• Example– Factor num| ident ( * /)– Can be rewritten

• Factor val– Action tables?

• Removes a column from action table• Scanner must return val for ident or num• See the table

Csci465155

Csci465156

Directly Encoding the tableDirectly Encoding the table• Forget about the table driven approach• Use hard-coded implementation

– State is coded as case statement (IF-THEN-ELSE)• Test the type of next symbol• Perform

– Shift– Reduce– Accept– Error

• no table operations, no overhead (good)– No table lookup

• Larger code size (bad)• Not readable (bad)

Csci465157

Using Other Construction AlgorithmsUsing Other Construction Algorithms• LR(1)

– The most general table construction algorithms– Produces the largest tables, but accepts the largest class of

grammar• Is there other variation of LR(1) algorithms?

– Simple LR(1) (or SLR)• Accepts smaller class of grammars than LR(1)• Uses FOLLOW set instead of Lookahead symbol to shift or

reduce• FOLLOW set results in smaller tables

– LookAhead LR(1) (or LALR(1) )• Works on the assumption that some items in the set (state)

are critical that rest can be derived from them

Top-Down vs. Bottom upTop-Down vs. Bottom up

• Adv. of Top-down– Easy to build hand-coded parser– Provide excellent opportunity to detect/recover errors– Easy to find ambiguities– Faster (well-constructed ones!)

• Adv. of Bottom-up – Tool support– Handles a lager class of grammars

158

Documents

Csci465 1 Chapter 4: Syntax Analysis Csci 465. 2 Objectives Parser and its role in the design of compiler –Techniques used to build hand implementation