View
233
Download
0
Tags:
Embed Size (px)
Citation preview
Csci4651
Chapter 4: Syntax Analysis Chapter 4: Syntax Analysis
Csci 465
2
ObjectivesObjectives
• Parser and its role in the design of compiler– Techniques used to build hand implementation parses
• Top-down parsing • LL parser
– Algorithms used to build automated parser generators• Bottom-up parsing• LR parser • Simple LR (SLR)
• CFG – Derivations (leftmost and rightmost)– FISRT and FOLLOW
• Error Recovery Handling Techniques
Syntax AnalysisSyntax Analysis
• Every PL has a set of rules prescribing the syntactic structure of the programs written in that language– E.g., Pascal
• Pascal Program is made out of Blocks• A block itself made out of statements• A statement is made out of expressions• An expression is made out of tokens• A token is made out of characters specified by RE
3
GrammarsGrammars
• Grammars?• the set of structural rules that guides the
composition of clauses, phrases and words in any given natural language
• Formal Grammars?• A set of production rules for strings in a formal language.
4
Significant of GrammarsSignificant of Grammars
• Significant of Grammars– Provides a precise, easy-to-understand syntactic
specifications– Automates the construction of an efficient parser– Supports evolvability of an existing language
implementation by adding new programming constructs
5
Csci4656
Parser vs scannerParser vs scanner
• Lexical analyzer– Recognizes token (terminal symbols) from the
sequence of characters in an input string• Parser
– Recognizes a set of related words (or phrases) – how theses words are combined to form
syntactically correct program
Limitation of Regular Expression Limitation of Regular Expression (revisited)(revisited)
• Regular expressions and its recognizers are suitable for indentifying error at word level– E.g.,
• misspelling an identifier, keyword, or operator
• RE can not be used to handle nested or balanced parentheses – E.g., an arithmetic expression with unbalanced
parentheses
7
Role of parserRole of parser
8
LA Parser Rest of FE
Sym. Table
Parse tree codeToken/getchar()Source
Pg
Types of ParserTypes of Parser
• Universal Parsing methods– Cocke-Younger_Kasami Algorithm
• Parse any grammar• Not very efficient to use production compilers
• Top-down– LL parsers (hand-written)
• Bottom-up– LR parsers (automated)
9
Csci46510
Context Free Grammar (CFG)Context Free Grammar (CFG)
• Grammar can be used to describe most of syntax of PL– PLs allow sentence construction with nested and matched
parentheses• Some PL construct can not be defined by Grammar
– E.g., Define/use• These languages are specified by CFG
– Every language defined by CFG can be recognized by Push Down Automata (PDA) or any Language accepted by PDA is CFG
Csci46511
CFG and PDACFG and PDA
• The focus here is on Context Free Language (CFL) that are accepted by PDAs
• CFL:– languages defined by LL(K) Context-Free
Grammars• LL?
– parses the input from Left to Right, and constructs a Leftmost derivation of the sentence
Csci46512
LL ParsingLL Parsing
• What is LL(K) grammar?– A grammar from which we can construct a
deterministic, top-down PDA that looks a head at most k symbols in the input tape
• What is LL(1) grammar?– The most common form of LL(K) grammar– Looks a head at most one symbol– The easiest to convert into PDA
Csci46513
Predicative ParsingPredicative Parsing
Csci46514
PDAPDA
• A push-down automaton is formally defined as a 7-tuple as follows– P = (, Q, , H, h▲ 0, q0, F)
: Alphabet• Q: states• ▲: transition functions• H: finite stack alphabet• h0: initial symbol in H• q0: Initial state• F: finite set of final state
Csci46515
PDAPDA
• ▲ has the following functionality– T:Q()HQH*
• i.e., every transition is defined for a particular state;– reads one input token or skip the input– always pops one symbol off the stack– moves to a new state – pushes a string of zero or more (i.e., *) symbols back onto the
stack
16
Model of PDA
Csci46517
Example 1Example 1
• Let P0 = (={a, b, c}, Q={A,B,C}, ▲,H={h,i},h0= i, q0=A, F={ }) be PDA
• Where ▲ can be defined as follow– T(A, a, i) = (B, h)– T(B, a, h) = (B, hh)– T(C, b, h) = (C,)– T(A, c, i) = (A, )– T(B, c, h) = (C, h)
tabletable
18
Configurations Transitions Actions
(A, aacbb, i) T(A, a, i) = (B, h) read a, pop i, push h, go to B
(B, acbb, h) T(B, a, h) = (B, hh) read a, pop h, push hh, go to B
(B, cbb, hh) T(B, c, h) = (C, h) read c, pop h, push h, go to C
(C, bb, hh) T(C, b, h) = (C,) read b, pop h, go to C
(C, b, h) T(C, b, h) = (C,) read b, pop h, go to C
(C,, ) STOP String is successfullyparsed
Csci46519
Push Down Automata (PDA): Push Down Automata (PDA): ImplementationImplementation
• PDA used to implement top-down parser– Starts with the goal symbol on the stack– Rewrites the leftmost non-terminal until the
leftmost symbol is a terminal matching the first token of the input string
– Takes the transition that reads ( matches) that token– Repeats the process until the entire input has been
read or PDA blocks
Top-Down Parsing (revisited)Top-Down Parsing (revisited)
• Top down parsing– Building a parse tree for input string
• Starting from the root • Creating the nodes for the tree in preorder (depth first)
fashion– Finding a leftmost derivation for an input
20
Example: Grammar for Arithmetic Example: Grammar for Arithmetic expressionexpression
21
22
23
24
25
26
Suppose G defined as follows: S c A d A a b| a
FIRST and FOLLOWFIRST and FOLLOW
• The construction of both top-down and bottom up parsers require two functions– FIRST()– FOLLOW()
• These functions help to select the appropriate production
27
Csci46528
FIRST and Follow SetsFIRST and Follow Sets
• To show a grammar is LL(K), need to build– Firstk(w) for all right hand sides w in the
grammar’s production– Followk(N) for all nonterminals N in the grammar– Creat selection sets for all productions– First and Follow sets help to fill in the entries of
the parsing table
Csci46529
First and FollowFirst and Follow
Csci46530
FIRSTFIRSTkk(w)(w)
• The FIRSTFIRSTK of any string w is the set of all terminal strings of K-tokens or fewer that can be derived from w– Firstk(uv) = FirstK(FirstK(u)FirstK(v))
• (i.e. first of u concatenated with first of v)
– Firstk(N) = (FirstK(w)) • (i.e., the union of all first of N such that Nw is a production)
– Firstk(x) = {x} • (i.e., for any terminal x)
– Firstk() = {} • (i.e., for empty string)
Csci46531
Example 1Example 1
• First2(uv) = First2(First2(u)First2(v))– Where
• First2(u)={ab, cd, d, dd, }
• First2(v)={cc, d, }• therefore
– First2(uv) is formed by concatenating each of the First(u) with First (v )– {abcc, abd, ab, cdcc, cdd, cd, dcc, dd, d, ddcc, ddd, dd, cc, d, }
• Take the first two char – {ab, ab, ab, cd, cd, cd, dc, dd, d, dd, dd, dd, cc, d, }
• Removed the duplicates– First2(uv)={ab,cd,dc,dd,d,cc, }
Csci46532
Example 2Example 2• Consider the simple grammar G:
– ABa– Bb– Bc
• Get the First1(A) = First1(First1(B)First1(a))– =First((First(b)First(c))First(a))– =First1( {b,c}{a})– =First1({ba,ca})– ={b,c}
• where– First(b)={b}– First(c)={c}– First(a)={a}
Csci46533
FollowFollowkk(A)(A)
• Followk of a nonterminal A – Refers to the set of all terminal
strings of k-tokens that can follow whatever A derives
34
Example: Follow setExample: Follow set
– For all production BuAv, the Followk(A) can be built – Followk(A) = (Firstk(Firstk(v)Followk(B))
• It means That– to construct the Follow(A), look in the grammar for all productions
in which A occurs in the right hand side (r.h.s) and apply the following rules:
1. the FIRST of everything to the right of the A, including the Follow(B), where B is the non-terminal on L.H.S // BuAv
2. If A is the rightmost symbol in some sentential form, then add (or $) to Follow(A).
3. If v is nonterminal, then everything in FIRST(v) except for is placed in Follow(A)
4. If v derives (v* ), Follow(A) = Follow(B)
Csci46535
Follow: Example 1 Follow: Example 1
• Consider the following grammar– SBx– AaA– Ab– ByAzA
• Compute the Follow1(A)?
Csci46536
Follow: Example 1 (solution) Follow: Example 1 (solution)
• Consider the following grammar– SBx– AaA– Ab– ByAzA– Compute the Follow1(A)?
• Find All A on the R.H.S• Find any terminal right after A
– Add the terminal, z, to the set = {z}• Find Follow of non-terminal on L.H.S of A
– Follow(B)=First(x)= {x}• Follow(A) is L.H.S ignored? recursion
– Follow (A)={x,z}
Csci46537
Example 2: First and FollowExample 2: First and Follow
• Consider the following grammar– ETE’– E’+TE’ |– TFT’– T’*FT’| – F(E)| id
Csci46538
Solution for FIRST()Solution for FIRST()
–FIRST (E)=FIRST(T)=FIRST(F)={(,id}–FIRST(E’)={+, }–FIRST(T’) = {*, }
Csci46539
Solution for Follow()Solution for Follow()
• Consider the following grammar– ETE’– E’+TE’ |– TFT’– T’*FT’| – F(E)| id
• FOLLOW(E)=FOLLOW(E’)={), } //applied rules 2, 1// • FOLLOW(T)=FOLLOW(T’)= {+, ), } // applied rules 3, 4//• FOLLOW(F) = {*, +, ), } // applied rules 3, 4//
Csci46540
Selection SetsSelection Sets
• The selection set of Selectk of a production is the set of lookahead strings of K tokens that assists the selection of that production in a deterministic top-down parser
More on SelectionMore on Selection
• For each production in a grammar AwSelectk(Aw)=Firstk (Firstk(w) Followk(A))
• A nonterminal A in a grammar is LL(K) iff • For any two selection sets S1 and S2 of the
productions A the following condition holds• S1S2 = {}
• A grammar is LL(K) if every non-terminal in that grammar is LL(K)
41
Csci46542
Example of SelectionExample of Selection
• Consider the simple grammar G1. SaSb2. S
Csci46543
More on SelectionMore on Selection
• SaSb– Select1(SaSb ) = First1 (First1(aSb) Follow1(S))
• First1({a} {$,b}) – $ is in follow because S is a goal symbol
• First1 ({a$, ab}) • {a}
Csci46544
Cont’ (SCont’ (S))
• S– Select1(S) =
• First1(First1()Follow1(S))
• First1({} Follow1(S))
• First1 ({} X{$,b}) • {$,b}
– {$,b} {a} = {}• Which means they have no elements in common for two selections
the G is LL(1)
Csci46545
In Class QuizIn Class Quiz
• Consider the following grammar– SBx– AaA– Ab– ByAzA– BAA– Compute Follow1(A)?
Csci46546
Converting CFG to PDA:1Converting CFG to PDA:1
• PDA can be constructed from a CFG as follows:– PDA. == CFG. – PDA.H == N //finite stack alphabet– PDA.h0 == the goal symbol of CFG– PDA.Q = the only state and it halts on empty
stack
Csci46547
Converting CFG to PDA: 2Converting CFG to PDA: 2
• Two rules– 1. T(q,x,x) = (q, ) (i.e., for every terminal x)
– 2. T(q, , A) = (q, ) (i.e., replace non-terminal A by )• Where is a set of terminal and non-terminal symbols on R.H.S
Csci46548
Example: From CFG to PDAExample: From CFG to PDA
• Consider the following G1 that generates all a’s followed by an equal number of b’s – L(G) ={aabb, aaabbb, …}
• 1) SaSb• 2) S• First (S) = {a, }• Follow (S) = {b}
Csci46549
Example 2: TransitionsExample 2: Transitions
• Covert G1 to PDA1. T(q,, S) = (q, aSb)2. T(q,, S) = (q, )3. T(q,a, a) = (q, )4. T(q,b, b) = (q, )
Csci46550
Example2: Parsing Example2: Parsing
• Input string: aabb• Cnfg0: (q, aabb,S)• Transitions:
1. T(q,, S) = (q, aSb)2. T(q,, S) = (q, )3. T(q,a, a) = (q, )4. T(q,b, b) = (q, )
Step Input Stack Transition
1 .aabb S
2 .aabb aSb 1
3 a.abb Sb 3
4a a.abb b 2?
4b a.abb aSbb 1
5 aa.bb Sbb 3
6a aa.bb aSbbb 1?
6b aa.bb bb 2
7 aab.b b 4
8 empty empty 4
Use first
Use follow
Csci46551
Note on exampleNote on example
• The grammar is LL(1) – Because non-determinism is resolved only by
looking at one symbol– Applied 1: (i.e., S aSb)
• the string is ‘abb’ and ‘a’ is the first token in input– Applied 2 (i.e., S )
• the input string is a ‘b’, which is the token that follows S in the SaSb
LL(1) GrammarsLL(1) Grammars
• Predictive Parsers– recursive-descent parsing method with no-
backtracking can be constructed for LL(1) grammars– LL(1) grammars is rich enough to cover most PL
constructs• A Grammar G is LL(1) if for any distinct productions of G:
A | , the following conditions are satisfied– First()First()={}– At most one of or derive empty string– If (or ) then First () follow(A)={}
52
Construction of Predictive ParserConstruction of Predictive Parser
• The following algorithm collects and uses the information from FIRST and FOLLOW sets into a predictive parsing TABLE M[A,a]– where
• M is 2D Array• A is non-terminal• a {$}
53
The AlgorithmThe Algorithm
• The algorithm follows this idea– Select production A if the next input symbol a
is in FIRST()– Complication can occur if * or =.
• select A if the current input is in the– FOLLOW(A) – $ = the input AND $ is in FOLLOW(A)
54
Parsing Table AlgorithmParsing Table Algorithm
55
Algorithm 4.31 (pp 224)Algorithm 4.31 (pp 224)
• INPUT: G (Grammar)• OUTPUT: Parsing table M• METHOD:
– For each production A of G, do the following:1. For each terminal a in FIRST(A), add A to M[A,a]2. If FIRST() = , then for each terminal b FOLLOW(A), add A to
M[A,b]
3. If is in FIRST() and $ in in FOLLOW(A) , then add A to M[A,$]
4. If after performing the above, there is no other production at all in M[A,a], then set M[A,a] to error (shown by blank in the table)
56
Csci46557
Example : Table DrivenExample : Table Driven
• Consider the following grammar– ETE’– E’+TE’ |– TFT’– T’*FT’– F(E)| id
Csci46558
Parsing Table for the grammar 4.31Parsing Table for the grammar 4.31
Parsing table M
$
E’
T’
Consider E TE’; and E’+TE’ |FIRST (TE’) =FIRST(T) = FISRT(F)= { (,id } // apply rule 1 FIRST(E’) = {+, }; // apply rule 1FOLLOW(E’)= { ), $ } // apply rule 3
M[E,id]
ETE’E’+TE’ |TFT’T’*FT’ |F(E)| id
Implication of Algorithm 4.31Implication of Algorithm 4.31
• The algorithm can be applied to any grammar G to produce a parsing table
• For any LL(1) grammar, each parsing table entry uniquely identifies a production or signals an error
• For Grammar G that is not LL(1) , we may have multiple entries
59
Csci46560
Non-recursive Predictive Parsing: Table-Non-recursive Predictive Parsing: Table-drivendriven
• Table-driven approach parser– Input buffer
• contains the string to be parsed– A stack (used stack instead of recursive calls)
• Grammar symbols with $ at the bottom– A parsing table
• Two-dimensional array M[A, a] indexed by non-terminal A and terminal a
– Output• Desired code (or call to error recovery routine)
Model of a table-driven predictive parserModel of a table-driven predictive parser
61
PP-Program
Parsing Table M
Input
stack
output
Csci46562
The semantics of parserThe semantics of parser
• The behavior of the parser can be specified in terms of its configurations– Initial configuration consists of w$ in the input
buffer• Where
– w is the string of input symbol– $ (end of input marker)
– S goal symbol on top of the stack
Model of a table-driven predictive parserModel of a table-driven predictive parser
63
End of input marker
End of input marker
Csci46564
Program using parsing table (pg226 in Program using parsing table (pg226 in ASU)ASU)
Algorithm :Non- recursive PPInput: A string w and a Parsing table M for GOut: if w is in L(G) a leftmost derivation of w; otherwise, an error indicationMethod:
Csci46565
Example : Table drivenExample : Table driven
• Consider the following grammar– ETE’– E’+TE’ |– TFT’– T’*FT’– F(E)| id
Csci46566
Parsing Table for the grammar 4.11Parsing Table for the grammar 4.11
Csci46567
Figure 4.16Figure 4.16
$
T’
E’
Csci46568
Syntax ErrorsSyntax Errors
• Program can contain errors at many different– Lexical
• E.g., misspelling– Syntactic
• E.g., arithmetic expression with unbalanced parentheses– Semantics
• E.g., incompatible types– Logical
• E.g., infinite loop
Csci46569
Parseing Error handlerParseing Error handler
• The error handler in the parser should achieve proper reporting– The presence and the nature of errors explicitly– Recover from each error quickly
Csci46570
IssuesIssues
• How should error handler report the presence of error?– Print the offending line
• How should the parser recover?– Quitting:
• Not an option because more errors in input is still possible
Csci46571
Error-Recovery StrategiesError-Recovery Strategies
• Panic mode (discards input symbol)• Phrase level (perform local correction on the
rest of input)• Error production (augment G with error
productions)• Global correction (find y; transfer y to x with
minimal changes)– Expensive (time/space)
Csci46572
Panic-mode error recoveryPanic-mode error recovery• Works with synchronization token, which are tokens used in delimiters
– E.g., semicolon, end, etc• Error Recovery Panic-mode error
– Skips symbols on the input until a token in is in Sync set• Where Sync
– A set of words using FOLLOW function (e.g. Semicolon in C and Pascal)– Follow and First sets can be used in the sync set
– Sync set then is used in Parsing table as follows:• If M[A, a] = sync, then pop non-terminal• If M[A, a] =“ ”, then skip input symbol ‘a’• If input symbol ‘a ‘ does NOT match terminal on the stack, then
– pop terminal on the top of stack– Issue a message (e.g., token is inserted)– resume paring
Csci46573
Solution for Follow()Solution for Follow()
• Consider the following grammar– ETE’– E’+TE’ |– TFT’– T’*FT’| – F(E)| id
• FOLLOW(E)={), $}• FOLLOW(T)= {+, ), $} • FOLLOW(F) = {*, +, ), $}• Synch-set= {*, +, ), $}
Csci46574
Csci46575
Left Recursion and Left Factoring Left Recursion and Left Factoring
• A grammar is a left recursive if it has a non-terminal A such that– A+A
• Left Recursion eliminations methods– Immediate left recursion– Non-immediate left recursion
76
Immediate left recursionImmediate left recursion
• AA |– Can be transformed to
• AA’• A’ A’|
• Elimination process1. Group the offending production A2. Introduce non-terminal A’3. Append non-terminal A’ to the right of both type of
productions (i.e., offending and non-offending)
77
Left Recursion elimination: General Left Recursion elimination: General techniquetechnique
• AA 1| A 2| A 3| …|A n| 1 | 2 |…| n
– where none of I begins with A
• Can be transformed into– A1 A’| 2 A’|…| nA’
– A’ 1 A’| 2 A’| …| m A’| • where
– None of i is empty
• It does not eliminate left recursion involving derivations
78
Non-immediate eliminationNon-immediate elimination
• Consider the following grammar– SAa |b– A Ac|Sd|
• S is left-recursive? why? SAaSda• First eliminate the derivation
– SAa |b– A Ac|Aad|bd| //replace S by its r.h.s in A
• Then apply immediate left recursion elimination methods– SAa |b– A bd A’| A’– A’ cA’|adA’|
79
Csci46580
Left FactoringLeft Factoring• When the selection between the alternative
of a production is NOT clear– Need to differ the selection until there is enough
information
Left Factoring : ExampleLeft Factoring : Example
• Suppose we have the following G– Stmtif Exp Then Stmt– Stmtif Exp Then Stmt else Stmt
• G is NOT LL(1) – Why not?
• Remove left-factored– Stmt if Exp Then Stmt Rest– Rest else Stmt |
81
Summary of Non-Context Free Language Summary of Non-Context Free Language ConstructsConstructs
• Programming constructs that cannot be specified using grammar – Define/USE– Counting number of formal parameters and
actual parameters.
82
examplesexamples
• Examples:• Define/Use: L1 = {wcw| w is in (a|b)*}
– Where » the first w= declaration; c: program; the second w= use» Grammar for C or Java, or Pascal does not distinguish among identifier
presented by different character• Checking the number of formal parameters and actual parameters in procedure
calls: L2 = {Formaln Actualm| n1 and m1}
• Solution:– Leave them up to semantic analyzers to handle them
83
Bottom up parsingBottom up parsing
• Bottom up parsing– Refers to the construction of a parse tree using an
input string staring from the leaves (bottom) all the way towards the root of the tree
84
Example id*idExample id*id
85
G:E E + T | TT T * F | FF (E )| id
86
Bottom-up Parsing: Shift-Reduce Bottom-up Parsing: Shift-Reduce Parsing(SRP)Parsing(SRP)
• General bottom up parsing technique– Shift-Reduce Parsing(SRP)
• The Largest class of SRP known as LR• Builds a parse tree from leaves to the root• Think of the process as a set of reduction steps which
replaces a string w with goal symbol S• Reduction step?
– A step that replaces a specific substring on the R.H.S with L.H.S
• Example: id * id– id*id, F * id, T* id, T * F, E
Csci46587
ExampleExample
• Consider the following grammar – SaABe– AAbc|b– Bd– Input: abbcde– abbcde can be reduced to S (i.e., goal symbol)
• abbcde aAbcdeaAdeaABeS
Csci46588
ExampleExample
• Consider the following grammar – SaABe– AAbc|b– Bd– Input: abbcde– abbcde can be reduced to S (i.e., goal symbol)
• abbcde aAbcdeaAdeaABeS
Csci46589
ExampleExample
• Consider the following grammar – SaABe– AAbc|b– Bd– Input: abbcde– abbcde can be reduced to S (i.e., goal symbol)
• abbcde aAbcdeaAdeaABeS
Csci46590
ExampleExample
• Consider the following grammar – SaABe– AAbc|b– Bd– Input: abbcde– abbcde can be reduced to S (i.e., goal symbol)
• abbcde aAbcdeaAdeaABeS
Csci46591
ExampleExample
• Consider the following grammar – SaABe– AAbc|b– Bd– Input: abbcde– abbcde can be reduced to S (i.e., goal symbol)
• abbcde aAbcdeaAdeaABeS
Csci46592
ExampleExample
• Consider the following grammar – SaABe– AAbc|b– Bd– Input: abbcde– abbcde can be reduced to S (i.e., goal symbol)
• abbcde aAbcdeaAdeaABeS
Csci46593
Rightmost (rm) Derivation Rightmost (rm) Derivation
• Using Rightmost derivation starting from S– Srm aABerm aAde rm aAbcderm abbcde
SaABeAAbc|bBd
Csci46594
HandlesHandles
• Handle?– The opposite of derivation– A substring on the R.H.S that can be used to
replace the non-terminal on the L.H.S in the reverse process of rightmost derivation
• Example– abbcde– aAbcde ; b is the a handle at position 2
• Srm aABerm aAde rm aAbcderm abbcde
Csci46595
Example 4.23Example 4.23
• Consider the following grammar– EE +E– EE * E– E (E)– Eid
Csci46596
Example 4.23 (cont’)Example 4.23 (cont’)
Right-Sentential Handle Reducing Production
id1 + id2 * id3 id1 Eid
E+ id2 * id3 id2 Eid
E + E * id3 id3 Eid
E + E * E E*E EE*E
E + E E+E EE+E
E - -
Csci46597
Stack Implementation of SRPStack Implementation of SRP
• Two issues need to be addressed with handles– How to identify substring?– How to identify the right rule?
• SRP can be implemented• Initial state
– Stack: $– Input: w$
• Where – w is a string of input– $ is end of stack/input marker
• Final state– Stack: $ S– Input: $
Csci46598
More on SRPMore on SRP
• SRP works as follows– Shifts input symbols* onto the stack until a handle
β is found in stack– Reduces β to the L.H.S using the an appropriate
production rule– Repeats the process until
• Error is found (no match) • Or, Goal Symbol(S) and end of input ($)is found
Csci46599
(cfg. of SRP for id1+id2*id3)(cfg. of SRP for id1+id2*id3)stack Input Action
$ id1 + id2 * id3$ shift
$id1 + id2 * id3$ Reduce by Eid
$E + id2 * id3$ shift
$E+ id2 * id3$ shift
$E + id2 * id3$ Reduce Eid
$E+E * id3$ Shift
$E+E* id3$ shift
$E+E*id3 $ Reduce Eid
$E+E*E $ Reduce EE*E
$E+E $ Reduce EE+E
$E $ accept
Csci465100
Primary Operations of Shift Reduce Primary Operations of Shift Reduce ParsingParsing
• Main operations include– Shift– Reduce– Accept– Error
Csci465101
Operations: shift actionOperations: shift action
• Shift action?– The next input symbol is shifted onto the top of
the stack
Csci465102
Operations: Reduce actionOperations: Reduce action
• Reduce action– Parser knows the right end of the handle is at
the top of the stack– Parser needs to locate the left end of the handle
within the stack– Parser then decides what non-terminal to replace
for handle
Csci465103
Operations: accept & error actionsOperations: accept & error actions
• Accept action– The parser announces successful completion of
parsing• Error action
– The parser discovers that syntax error has occurred
• calls an error recovery routine
Csci465104
Csci465105
Example: Dangling ElseExample: Dangling Else
• stmt if expr then stmt | if expr then stmt else stmt | others
Stack Input… if expr then stmt else…$ Cannot tell if this is the handle
Csci465106
Example: reduce/reduce conflict Example: reduce/reduce conflict (procedure call or array)(procedure call or array)
• Suppose we have a statement like A(I,J) using this Grammar:
?
?
Csci465107
Con’t: reduce/reduce conflict (procedure Con’t: reduce/reduce conflict (procedure call or array)call or array)
After shifting the first three tokens onto the stack:
Stack Input …id ( id ,id)…
Note: id on top of the stack must be reduced; the parser does not know which rule to apply ( 5 or 7?)
Csci465108
Con’t: reduce/reduce conflict (procedure Con’t: reduce/reduce conflict (procedure call or array)call or array)
One solution is to change the token id in p.1 to procid
Stack Input …id ( id ,id)…
Stack Input …procid ( id ,id)…
Csci465109
Model of LR ParsingModel of LR Parsing
Csci465110
Example: LR parsing actions and goto Example: LR parsing actions and goto functionsfunctions
• Consider the following grammar1) E E + T2) E T3) T T * F4) T F5) F (E)6) F id
Csci465111
Fig4-31Fig4-31
Si :shift and stack state i;
Rj: reduce by production numbered by j
Acc: accept;
Blank: error
Csci465112
Fig4-38Fig4-38
Csci465113
Csci465114
More Example on LR ParsingMore Example on LR Parsing
Csci465115
Csci465116
Csci465117
Csci465118
LR (1) GrammarsLR (1) Grammars
Csci465119
LR Grammars: Building Parsing TableLR Grammars: Building Parsing Table
• How to build an LR parsing table for a given grammar?
• If we can build a parsing table for any given grammar G, then G is said to be an LR grammar– To be an LR implies that
• Left-to-right shift-reduce parser that can identify handles on the top of the stack
Csci465120
LL (k) vs. LR (k)LL (k) vs. LR (k)
• LR– More general and more expressive than CFG– The decisions are made at R.H.S of the rule using
some or all part of the production – Uses more context than LL(k)
Csci465121
Constructing Constructing Simple LR Simple LR ((SLRSLR) parsing ) parsing TablesTables
• The SLR Parsing table– Builds the collection of Items from Grammar– Group items into sets– Use sets as the states of the SLR parser
Csci465122
An item of LR(0)An item of LR(0)
• Item of a grammar G?– A production of G having a dot “.” at some position– DOT is an indicator of
• how much of a production has been seen at a given point (before dot)
• how much of a production remain to be seen (after dot)
Construction of the items for LR(0)Construction of the items for LR(0)
• To construct the canonical LR(0) collection for a grammar G– Need to augment grammar G – Use two functions
• Closure (I) {adds more items to a set of items}• GOTO (I, X) {moves the ‘.’ past the symbol}
– If G is a grammar having S as start symbol, then G’ is the augmented grammar for G with a new start symbol S’
• S’S• S’ is used to signal the successful parsing (stop!)
– Construct transition diagram of DFA D to recognize the prefix of right-sentential forms (i.e., X1X2…X3aiai+1…an) which may appear on the stack of SRP
123
exampleexample
• Example: AXYZ – items for AXYZ results the following productions
• A.XYZ – (i.e., we are hoping to see substring deriving from XYZ next on
the input)• AX.YZ
– (i.e., we have seen on the input a string derivable from X and hoping to see a string derivable from YZ)
• AXY.Z• A XYZ.
– (i.e., it is time to reduce XYZ to A)
124
Csci465125
Closure of Item Sets using Closure Closure of Item Sets using Closure functionfunction
• If I is a set of items for a grammar G, then CLOSURE (I) can be obtained by these rules:
1. Initially adding every item in I to CLOSURE (I)2. IF A .B IS in closure(I) and Bq is a
production, then add the item B.q to Item I. 1. Repeat this rule until no more new item can be
added to closure(I)
Csci465126
Example 4.40Example 4.40
• Consider the augmented expression grammar G:– E’E– EE + T | T– TT * F | F– F (E) | id
Csci465127
Con’tCon’t
– If I is the set of one item {[E’.E]} then closure (I) contains the set of Item I0 in Figure 4.31
– How to compute closure E’.E ? – E’.E
• <by rule 1 add it because an E is right after the dot at R.H.S>
– E.E + T | .T • <by rule 2 add E productions with dots at the left end.>
Csci465128
completecomplete
– If I is the set of one item {[E’.E]}, then closure (I) can be– E’.E < by rule 1>– E.E + T < by rule 2 if A .B, then add B.q >– E .T < by rule 2 >– T.T * F < by rule 2>– T.F < by rule 2>– F .(E) <by rule 2>– F .id <by rule 2>
Csci465129
Create collection of items using closure
< rule 2>
< rule 2>
< rule 2>
130
The function GOTOThe function GOTO
• The second useful function is GOTO (I, X)– Where
• I is a set of items• X is a grammar symbol
– GOTO (I,X) is defined to be the closure of the set of all items [ A X. ] such that
[ A .X ] is in I• Used to define the transitions in LR(0) automaton for
grammar G
131
ExampleExample
• if I is the set of two items – {[E’ E.]}, {[E’ E. + T]},
• then GOTO (I, +) contain the items– E E +.T– T .T *F– T .F– F .(E)– F .id
• Moved the “.” over + and then computed the closure
132
Csci465133
Constructing an SLR-parsing tableConstructing an SLR-parsing table• Assume Grammar G’
1. Build a set of LR(0) items for G’2. Construct State i using Ii
3. The parsing actions for state i are determined as followsa) If [A.a] is in Ii, and GOTO(Ii, a) =Ij, then action [I,a] = sJ (shift state j)b) If [A.] is in Ii, then action[Ii ,a] = reduce Aa for ALL a in Follow(A) (assuming A
is not S’)c) If [S’S.] is in Ii, then action[Ii, $] = accept
4. The goto transition for state Ii are build using non-terminal A: if goto(Ii, A)=Ij, then goto[Ii,A]=J
5. All entries not defined by rule 2, 3 are made “error”6. The initial state of the parser is the one constructed from the set of items
containing [S’S.]
Example 4.47: 1Example 4.47: 1
• Let us build the SLR table for the augmented expression grammar ( see fig .4.13). Consider the set of item I0 as follows:– E’.E – E.E + T – E .T – T.T * F– T.F – F .(E) – F .id
134
Csci465135
Example 4.47: 2Example 4.47: 2
• Consider I0 (state 0)
– The item F .(E)• Results action [0, (] =s4
– The item F .id • results action [0, id] =s5
– No actions for other item
136
Csci465137
Fig4-31Fig4-31
Si :shift and stack state i;
Rj: reduce by production numbered by j
Acc: accept;
Blank: error
Csci465138
Example 4.38 (Reduce)Example 4.38 (Reduce)
• Consider I2 – ET.– TT.*F
• Since FOLLOW (E) = { $, +, ) } and E = E’ then– action[2, +,]=action[2, )]= action[2, $]=Reduce E T (r2)
» Using this rule: If [A.] is Ii, then action[Ii ,a] = reduce Aa for all a in Follow(A)
Csci465139
Fig4-31Fig4-31
Si :shift and stack state i;
Rj: reduce by production numbered by j
Acc: accept;
Blank: error
Csci465140
Second ItemSecond Item
• I2 makes – Action[2, *] = s7
Csci465141
Fig4-31Fig4-31
Si :shift and stack state i;
Rj: reduce by production numbered by j
Acc: accept;
Blank: error
142
Csci465143
Fig4-31Fig4-31
Si :shift and stack state i;
Rj: reduce by production numbered by j
Acc: accept;
Blank: error
Csci465144
Advanced TopicsAdvanced Topics
• Optimizing a grammar• Reducing the size of LR(1) tables
• Combining rows/columns• Shrinking the Grammar• Directly Encoding the table• Using better construction algorithms
Csci465145
Optimizing a GrammarOptimizing a Grammar
• There is a correlation between number of productions rules and amount of work to parse
• Top down Parser– Works with derivation
• Bottom up Parser– performs a reduction for every single derivation
Csci465146
DerivationDerivation
• Rewrite the grammar to shorten the height of parse tree because the shorter tree translated into
• Shorter derivation• Shorter time to parse• Less reductions
• Optimization has no effect on the behavior of parser
Csci465147
Example: Expression TreeExample: Expression Tree
Csci465148
Parse Tree for Exp. GrammarParse Tree for Exp. Grammar
A node with a single child is a candidate for optimization
(x – 2 * y)
Csci465149
Revised GrammarRevised Grammar• Obtain revised
grammar by substitute for Factor by its alternatives
• increases the # of alternatives for terms
• Shrinks the parse tree by eliminating a layer
Csci465150
Revised Expression GrammarRevised Expression Grammar
(x – 2 * y)
Csci465151
General ruleGeneral rule
• useless productions– Production having a single symbol on R.H.S– Use for performing a specific action
• Fold away useless productions– Can increase the size of table in LR(1)
• remove one columns while it may add many rows– Can increase the number of comparison in LL(1)
Csci465152
Reducing the size of LR(1) tables: Reducing the size of LR(1) tables: Combing Rows/ColumnsCombing Rows/Columns
• LR(1) generated for even small grammars can be large
• Combining rows/columns results in a direct reductions in table size and an extra indirection to access the table
• Need to understand the tradeoff
Csci465153
Combing Rows (or columns)Combing Rows (or columns)
• Find two identical rows (or columns)– Each set can be implemented by table generators
once– Remap parser-state to row-index in action table– Same things for identical columns
Csci465154
Shrinking the grammarShrinking the grammar
• To shrink the grammar– Reduce the number of production rules
• Example– Factor num| ident ( * /)– Can be rewritten
• Factor val– Action tables?
• Removes a column from action table• Scanner must return val for ident or num• See the table
Csci465155
Csci465156
Directly Encoding the tableDirectly Encoding the table• Forget about the table driven approach• Use hard-coded implementation
– State is coded as case statement (IF-THEN-ELSE)• Test the type of next symbol• Perform
– Shift– Reduce– Accept– Error
• no table operations, no overhead (good)– No table lookup
• Larger code size (bad)• Not readable (bad)
Csci465157
Using Other Construction AlgorithmsUsing Other Construction Algorithms• LR(1)
– The most general table construction algorithms– Produces the largest tables, but accepts the largest class of
grammar• Is there other variation of LR(1) algorithms?
– Simple LR(1) (or SLR)• Accepts smaller class of grammars than LR(1)• Uses FOLLOW set instead of Lookahead symbol to shift or
reduce• FOLLOW set results in smaller tables
– LookAhead LR(1) (or LALR(1) )• Works on the assumption that some items in the set (state)
are critical that rest can be derived from them
Top-Down vs. Bottom upTop-Down vs. Bottom up
• Adv. of Top-down– Easy to build hand-coded parser– Provide excellent opportunity to detect/recover errors– Easy to find ambiguities– Faster (well-constructed ones!)
• Adv. of Bottom-up – Tool support– Handles a lager class of grammars
158