View
213
Download
0
Embed Size (px)
Citation preview
1
CMPSC 160Translation of Programming Languages
Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and
Linda Torczon
Lecture-Module #7
Parsing
2
Announcements
• Programming assignment 2 will be on the class webpage, due in two weeks, October 31, Thursday
– In this assignment you will work as teams of two. Please find a partner.
– Start the project early, don’t leave it to the last weekend!
• Homework 2 will be due
• Read chapter 4
• Midterm will be in two weeks, November 5, Tuesday
– in class
– closed books, closed notes
3
Parsing Techniques
Top-down parsers (LL(1), recursive descent)
• Start at the root of the parse tree from the start symbol and grow toward leaves (similar to a derivation)
• Pick a production and try to match the input
• Bad “pick” may need to backtrack
• Some grammars are backtrack-free (predictive parsing)
Bottom-up parsers (LR(1), operator precedence)
• Start at the leaves and grow toward root
• We can think of the process as reducing the input string to the start symbol
• At each reduction step a particular substring matching the right-side of a production is replaced by the symbol on the left-side of the production
• Bottom-up parsers handle a large class of grammars
4
Eliminating Immediate Left Recursion
To remove left recursion, we can transform the grammar
Consider a grammar fragment of the form
A A |
where or are strings of terminal and nonterminal symbols
and neither nor start with A
We can rewrite this as
A RR R |
where R is a new non-terminal
This accepts the same language, but uses only right recursion
A
A
A
A
R
R
R
5
Left-Recursive and Right-Recursive Expression Grammar
1 S Expr2 Expr Expr + Term3 | Expr – Term4 | Term5 Term Term * Factor6 | Term / Factor7 | Factor8 Factor num9 | id
1 S Expr2 Expr Term Expr3 Expr + Term Expr4 | – Term Expr5 | 6 Term Factor Term7 Term * Factor Term8 | / Factor Term9 | 10 Factor num11 | id
6
Predictive Parsing
Basic idea
Given A , the parser should be able to choose between &
FIRST sets
For a string of grammar symbols , define FIRST() as the set of tokens that appear as the first symbol in some string that derives from
That is, x FIRST() iff * x , for some
The LL(1) Property
If A and A both appear in the grammar, we would like
FIRST() FIRST() =
This would allow the parser to make a correct choice with a lookahead of exactly one symbol !
(Pursuing this idea leads to LL(1) parser generators...)
7
Recursive Descent Parsing
Recursive-descent parsing
• A top-down parsing method
• The term descent refers to the direction in which the parse tree is traversed (or built).
• Use a set of mutually recursive procedures (one procedure for each nonterminal symbol)
– Start the parsing process by calling the procedure that corresponds to the start symbol
– Each production becomes one clause in procedure
• We consider a special type of recursive-descent parsing called predictive parsing
– Use a lookahead symbol to decide which production to use
8
Recursive Descent Parsing: Expression Grammar
void main() { lookahead=getNextToken(); S(); match(EOF); }
void S() { Expr(); } void Expr() { Term(); ExprPrime(); } void ExprPrime() { switch(lookahead) { case PLUS : match(PLUS); Term(); ExprPrime(); break; case MINUS : match(MINUS); Term();
ExprPrime(); break; default: return; }}
void Term() { Factor(); TermPrime(); } void TermPrime() { switch(lookahead) { case TIMES: match(TIMES); Factor(); TermPrime(); break; case DIV: match(DIV); Factor(); TermPrime(); break; default: return; }}
void Factor() { switch(lookahead) { case ID : match(ID); break; case NUMBER: match(NUMBER); break; default: error();}
int PLUS=1, MINUS=2, ...int lookahead;void match(int token) { if (lookahead==token) lookahead=getNextToken(); else error(); }
9
Recursive Descent Parsing: Another Grammar
1 S if E then S else S2 | begin S L3 | print E4 L end5 | ; S L6 E num = num
void S() { switch(lookahead) { case IF: match(IF); E(); match(THEN); S(); match(ELSE); S(); break; case BEGIN: matvh(BEGIN); S(); L(); break; case PRINT: match(PRINT); E(); break; default: error(); }}
void E() { match(NUM); match(EQ); match(NUM); }
void L() { switch(lookahead) { case END: match(END); break; case SEMI: match(SEMI); S(); L(); break; default: error(); }}
void main() { lookahead=getNextToken(); S(); match(EOF); }
10
Example Execution For Input: if 2=2 then print 5=5 else print 1=1
main: call S();
S1: find the production for (S, IF) : Sif E then S else S
S1: match(IF);
S1: call E();
E1: find the production for (E, NUM): Enum = num
E1: match(NUM); match(EQ); match(NUM);
E1: return from E1 to S1
S1: match(THEN);
S1:call S();
S2: find the production for (S, PRINT): Sprint E
S2: match(PRINT);
S2: call E();
E2: find the production for (E, NUM): Enum = num
E2: match(NUM); match(EQ); match(NUM);
E2: return from E2 to S2
S2: return from S2 to S1
S1: match(ELSE);
S1: call S();
S3: find the production for (S, PRINT): Sprint E
S3: match(PRINT);
S3: call E();
E3: find the production for (E, NUM): Enum = num
E3: match(NUM); match(EQ); match(NUM);
E3: return from E2 to S3
S3: return from S3 to S1
S1: return from S1 to mainmain: match(EOF); return success;
11
Another Approach: Stack-Based Table-Driven Parsing
The parsing table
• A two dimensional array
M[A, a] gives a production
– A: a nonterminal symbol
– a: a terminal symbol
• What does it mean?
– If top of the stack is A and the lookahead symbol is a then we apply the production M[A, a]
IF BEGIN PRINT END SEMI NUM
S Sif E then S else S Sbegin S L Sprint E
L L end L ; S L
E Enum = num
12
Table-driven Parsers
A table-driven parser looks like
Parsing tables can be built automatically!
ScannerTable-driven
Parser
ParsingTable
ParserGenerator
sourcecode
grammar
IR
Stack
13
Table-Driven Predictive Parsing Algorithm
• Push the end-of-file symbol ($) and the start symbol onto the stack
• Consider the symbol X on the top of the stack and lookahead symbol a
– If X = a = $ announce successful parse and halt
– If X = a $ pop X off the stack and advance the input pointer to the next input symbol
– If X is a nonterminal, look at the production M[X, a]
• If there is no such production (M[X, a] = error), then call an error routine
• If M[X, a] is a production X Y1 Y2 ... Yk , then pop X and push Yk , Yk-1 , ..., Y1 onto the stack with Y1 on top
– If none of the cases above apply, then call an error routine
14
Table-Driven Predictive Parsing Algorithm
Push($); // $ is the end-of-file symbolPush(S); // S is the start symbol of the grammarlookahead = get_next_token();repeat X = top_of_stack(); if (X is a terminal or X == $) then if (X == lookahead) then pop(X); lookahead = get_next_token(); else error(); else // X is a non-terminal if ( M[X, lookahead] == X Y1 Y2 ... Yk) then pop(X); push(Yk); push(Yk-1); ... push(Y1); else error();until (X = $)
15
Recursive Descent Parser On: if 2=2 then print 5=5 else print 1=1
main: call S();
S1: find the production for (S, IF) : Sif E then S else S
S1: match(IF);
S1: call E();
E1: find the production for (E, NUM): Enum = num
E1: match(NUM); match(EQ); match(NUM);
E1: return from E1 to S1
S1: match(THEN);
S1:call S();
S2: find the production for (S, PRINT): Sprint E
S2: match(PRINT);
S2: call E();
E2: find the production for (E, NUM): Enum = num
E2: match(NUM); match(EQ); match(NUM);
E2: return from E2 to S2
S2: return from S2 to S1
S1: match(ELSE);
S1: call S();
S3: find the production for (S, PRINT): Sprint E
S3: match(PRINT);
S3: call E();
E3: find the production for (E, NUM): Enum = num
E3: match(NUM); match(EQ); match(NUM);
E3: return from E2 to S3
S3: return from S3 to S1
S1: return from S1 to mainmain: match(EOF); return success;
16
Table Driven Parser On: if 2=2 then print 5=5 else print 1=1$
Stack lookahead Parse-table lookup$S IF M[S,IF]: Sif E then S else S $S,ELSE,S,THEN,E,IF IF$S,ELSE,S,THEN,E NUM M[E,NUM]: Enum = num $S,ELSE,S,THEN,NUM,EQ,NUM NUM$S,ELSE,S,THEN,NUM,EQ EQ$S,ELSE,S,THEN,NUM NUM$S,ELSE,S,THEN THEN$S,ELSE,S PRINT M[S,PRINT]: Sprint E $S,ELSE,E,PRINT PRINT$S,ELSE,E NUM M[E,NUM]: Enum = num $S,ELSE,NUM,EQ,NUM NUM$S,ELSE,NUM,EQ EQ$S,ELSE,NUM NUM$S,ELSE ELSE$S PRINT M[S,PRINT]: Sprint E $E,PRINT PRINT$E NUM M[E,NUM]: Enum = num $NUM,EQ,NUM NUM$NUM,EQ EQ$NUM NUM$ $ report success!
17
How to Build Parse Tables? FIRST Sets
For a string of grammar symbols define FIRST() as
• The set of tokens that appear as the first symbol in some string that derives from
• If * , then is in FIRST()
To construct FIRST(X) for a grammar symbol X, apply the following rules until no more symbols can be added to FIRST(X)
• If X is a terminal FIRST(X) is {X}
• If X is a production then is in FIRST(X)
• If X is a nonterminal and X Y1Y2 ... Yk is a production then put every symbol in
FIRST(Y1) other than to FIRST(X)
• If X is a nonterminal and X Y1Y2 ... Yk is a production, then put terminal a in
FIRST(X) if a is in FIRST(Yi) and is in FIRST(Yj) for all 1 j i
• If X is a nonterminal and X Y1Y2 ... Yk is a production, then put in FIRST(X) if is in FIRST(Yi) for all 1 i k
18
Computing FIRST Sets for Strings of Symbols
To construct the FIRST set for any string of grammar symbols X1X2 ... Xn (given the FIRST sets for symbols X1 , X2 , ... Xn ) apply the following rules.
FIRST(X1X2 ... Xn ) contains:
– Any symbol in FIRST(X1) other than
– Any symbol in FIRST(Xi) other than , if is in FIRST(Xj) for all 1 j i
, if is in FIRST(Xj) for all 1 i n
19
FIRST Sets
1 S Expr2 Expr Term Expr3 Expr + Term Expr4 | - Term Expr5 | 6 Term Factor Term7 Term * Factor Term8 | / Factor Term9 | 10 Factor num11 | id
Symbol FIRSTS {num, id}Expr {num, id}
Expr { , +, - }Term {num, id}
Term { , *, / }Factor {num, id}num {num}id {id}+ {+}- {-}* {*}/ {/}
20
How to build Parse Tables? FOLLOW Sets
For a non-terminal symbol A, define FOLLOW(A) as:
The set of terminal symbols that can appear immediately to the right of A in some sentential form
To construct FOLLOW(A) for a non-terminal symbol A apply the following rules until no more symbols can be added to FOLLOW(A)
• Place $ in FOLLOW(S) ($ is the end-of-file symbol, S is the start symbol)
• If there is a production A B , then everything in FIRST() except is placed in FOLLOW(B)
• If there is a production A B, then everything in FOLLOW(A) is placed in FOLLOW(B)
• If there is a production A B , and is in FIRST() then everything in FOLLOW(A) is placed in FOLLOW(B)
21
FOLLOW Sets
1 S Expr2 Expr Term Expr3 Expr + Term Expr4 | - Term Expr5 | 6 Term Factor Term7 Term * Factor Term8 | / Factor Term9 | 10 Factor num11 | id
Symbol FOLLOWS { $ } Expr { $ }Expr { $ }Term { $, +, - }
Term { $, +, - }Factor { $, +, -, *, / }
22
LL(1) Parse Table Construction
• For all productions A , perform the following steps:
– For each terminal symbol a in FIRST(), add A to M[A, a]
– If is in FIRST(), then add A to M[A, b] for each terminal symbol b in FOLLOW(A) and add A to M[A, $] if $ is in FOLLOW(A)
• Set all the undefined entries in M to error
23
1 S Expr2 Expr Term Expr3 Expr + Term Expr4 | - Term Expr5 | 6 Term Factor Term7 Term * Factor Term8 | / Factor Term9 | 10 Factor num11 | id
id num + - * / $
S SE SE E ET E ET E
E’ E + T E E - T E E
T TF T TF T
T’ T’ T’ T * F T T / F T T’
F F id F num
Grammar:
LL(1) Parse table:
24
LL(1) gramars
Left-to-right scan of the input, Leftmost derivation, 1-token lookahead
Two alternative definitions of LL(1) grammars:
1. A grammar G is LL(1) if there are no multiple entries in its LL(1) parse table
2. A grammar G is LL(1) if for each set of its productions A 1 | 2 | ... | n
• FIRST(1), FIRST(2), ..., FIRST(n), are all pairwise disjoint
• If i * , then FIRST (j) FOLLOW (A) = for all 1 i
n, i j