37
1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Herbert G. Mayer, PSU CS Status 7/14/2013 Status 7/14/2013

1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

Embed Size (px)

Citation preview

Page 1: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

1

CS 410 / 510Mastery in Programming

Chapter 5LL(1) Parsing

Herbert G. Mayer, PSU CSHerbert G. Mayer, PSU CSStatus 7/14/2013Status 7/14/2013

Page 2: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

2

Syllabus

GoalGoal Grammars Formally, IntuitivelyGrammars Formally, Intuitively BNF, EBNFBNF, EBNF A Sample Ambiguous Grammar G1A Sample Ambiguous Grammar G1 Suitable GrammarSuitable Grammar Left-Recursion EliminationLeft-Recursion Elimination Lambda-Rule EliminationLambda-Rule Elimination Suitable G5 and G6Suitable G5 and G6 Use Grammar for ParsingUse Grammar for Parsing Recursive DescentRecursive Descent Recursive Descent Parser For sRecursive Descent Parser For s Parser for G2Parser for G2 Parser for G4Parser for G4 ReferencesReferences

Page 3: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

3

Goal• Become familiar with Become familiar with suitable grammarssuitable grammars. . Suitable Suitable means, certain rules means, certain rules

are not allowed, such as: left-recursive rules, circular rules, and are not allowed, such as: left-recursive rules, circular rules, and lambda-producing rules – Note: with one exception!!lambda-producing rules – Note: with one exception!!

• The rules of a programming language L specify how to The rules of a programming language L specify how to generate generate strings in L; all other strings are not part of Lstrings in L; all other strings are not part of L

• The number of strings in L (i.e. the size of set { L } ) is generally The number of strings in L (i.e. the size of set { L } ) is generally unbounded for usable programming languagesunbounded for usable programming languages

• One way of expressing language rules is through some grammar GOne way of expressing language rules is through some grammar G

• The class of grammars handled here is restricted to The class of grammars handled here is restricted to context-freecontext-free ones; ones; the more powerful class of grammars with context-sensitive rules is the more powerful class of grammars with context-sensitive rules is excludedexcluded

• A side goal is to learn a particular notation for writing grammars, but A side goal is to learn a particular notation for writing grammars, but that notation is simply a convenience, a handy way of writingthat notation is simply a convenience, a handy way of writing

• WeWe’’ll focus on Backus Naur Form (BNF), AKA Backus Normal Form ll focus on Backus Naur Form (BNF), AKA Backus Normal Form (BNF), from the early days of the Algol-60(BNF), from the early days of the Algol-60

Page 4: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

4

Grammars Formally• A grammar G for language L, named G(L), is a quintuple A grammar G for language L, named G(L), is a quintuple { terminals, { terminals,

nonterminals, metasymbols, start symbol, productions } nonterminals, metasymbols, start symbol, productions } defining all strings in L; a string in L is named a defining all strings in L; a string in L is named a programprogram

• Start SymbolStart Symbol: One of the productions starts the process of generating : One of the productions starts the process of generating strings in L; doesnstrings in L; doesn’’t have to be the first nonterminal defined in Gt have to be the first nonterminal defined in G

• TerminalTerminal: Is a : Is a finalfinal token in a program; e.g. token in a program; e.g. ““hellohello””.. Such a token Such a token cannot derive other strings; it solely represents itselfcannot derive other strings; it solely represents itself

• Nonterminal SymbolNonterminal Symbol: Is a grammar symbol, used as short-hand for a : Is a grammar symbol, used as short-hand for a string of other symbols; is defined on the left-hand side of a production; string of other symbols; is defined on the left-hand side of a production; group multiple alternatives via the metasymbol group multiple alternatives via the metasymbol ||

• MetasymbolMetasymbol: Symbol of the grammar defining the process of string : Symbol of the grammar defining the process of string generation; is not part of the language L defined by G; instead is a generation; is not part of the language L defined by G; instead is a grammar short-hand; hence the name metasymbolgrammar short-hand; hence the name metasymbol

• ProductionProduction: Rule that defines a nonterminal; consists of nonterminal on : Rule that defines a nonterminal; consists of nonterminal on left-hand side being defined, specified by the left-hand side being defined, specified by the ““producesproduces”” metasymbol, metasymbol, plus some string of symbols on the right-hand side that is not circularplus some string of symbols on the right-hand side that is not circular

Page 5: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

5

Grammars, Some Terminology• The empty string is referred to as The empty string is referred to as lambdalambda. We. We’’ll use ll use

lambda as lambda as a convenience in grammar writing; a convenience in grammar writing; otherwise it is superfluous; also frequently referred otherwise it is superfluous; also frequently referred to in the literature as to in the literature as epsilonepsilon

• LambdaLambda is superfluous as a grammar tool, except if is superfluous as a grammar tool, except if the language allows the empty program. In all other the language allows the empty program. In all other cases, rules that produce lambda can be replaced by cases, rules that produce lambda can be replaced by other rules that do not use lambda, at the expense of other rules that do not use lambda, at the expense of a more complex grammara more complex grammar

• Right-hand side of a suitable production –AKA Right-hand side of a suitable production –AKA alternative– eventually starts with a terminal; could alternative– eventually starts with a terminal; could be several terminals, for several alternatives. The set be several terminals, for several alternatives. The set of all of all distinctdistinct terminals that can start a right-hand terminals that can start a right-hand side is called the side is called the first setfirst set

Page 6: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

6

Grammars Intuitively• A grammar G is a set of A grammar G is a set of rules rules to produce to produce programsprograms; programs are ; programs are

strings of characters in a programming language Lstrings of characters in a programming language L

• Each rule has a name on the left-hand side, the Each rule has a name on the left-hand side, the nonterminalnonterminal that that generates at least generates at least one sequence of other symbols; those can be one sequence of other symbols; those can be terminalsterminals or or nonterminals nonterminals listed on the right-hand sidelisted on the right-hand side

• TerminalTerminal is a symbol expressing a value directly, like is a symbol expressing a value directly, like 500500. Can also . Can also be some fixed symbol, like be some fixed symbol, like ++ or or ( ( or or END END . A terminal symbol . A terminal symbol cannot produce other stringscannot produce other strings

• Nonterminal Nonterminal is a name that can be used on the right-hand-side of a is a name that can be used on the right-hand-side of a productionproduction. Occurs at least once on left-hand side of a production, . Occurs at least once on left-hand side of a production, and is defined by a string of and is defined by a string of nonterminalsnonterminals and and terminalsterminals

• When there are multiple When there are multiple rules --AKA productions--rules --AKA productions-- for a for a nonterminalnonterminal, we call these , we call these alternativesalternatives

• One of the One of the nonterminalsnonterminals is the is the start symbol. start symbol. That is where the That is where the generating process starts; often written as the first rule, but must generating process starts; often written as the first rule, but must be clearly identified somehowbe clearly identified somehow

Page 7: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

7

GrammarsExample for grammar GExample for grammar G00::

ss :: s ( s )s ( s )||

Discussion of GDiscussion of G00:: The only nonterminal symbol used in grammar G0 is s. Hence s must also be the start symbol There are 2 meta-symbols, or if we are picky 3

Metasymbol : means “the left side produces the string on the right” Metasymbol | means “another alternative for s” End of all rules means it is the end of G0

Nothing to the right of | means: “this alternative generates the empty string”, i.e. nothing, or lambda; some authors call this epsilon

The first alternative of the 2 above productions in G0 is left-recursive

There are 2 terminal symbols in L(G0 ), these are ( and ) We can debate, whether the empty string lambda is also a terminal symbol I do not count the empty string, as this would create a situation in which

an infinite sequence of the same terminal symbols --of nothings-- is the same as a single occurrence; not suitable for language grammars

Page 8: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

8

BNF, EBNFWhile authoring the report on the language Algol60 in the late 1950s, John While authoring the report on the language Algol60 in the late 1950s, John

Backus developed a convenient short-hand, ably supported by ideas Backus developed a convenient short-hand, ably supported by ideas from Peter Naurfrom Peter Naur Backus Normal Form, AKA Backus Naur Form Typical metasymbols in the Algol60 report ::= | <> [] [ .. ] encloses an optional phrase; allowed once or not at all < .. > defines the non-terminal enclosed; allows disambiguation between,

say, nonterminal <start> and terminal symbol start ::= is the “produces” symbol; we’ll use a simpler one | starts another alternative for a production

The notation found wide acceptance; The notation found wide acceptance; extended extended to allow multiple options, to allow multiple options, by using the additional by using the additional { { and and }} metasymbols metasymbols { .. } states that the .. part is included 0 or more times { .. }+ states that the .. part is included 1 or more times [ .. ] states that the .. part is optional, i.e. included once or not at all Hence this type of grammar is called EBNF, for Extended BNF

Page 9: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

9

A Sample Ambiguous Grammar G1Metasymbol Metasymbol :: means means ““producesproduces””

Metasymbol Metasymbol || means means ““r.h.s. also produces …r.h.s. also produces …””;; another alternative another alternative

Nonterminals Nonterminals ee and and nn

Terminals Terminals + - * / ^ ( ) 0 1 2 3 4 5 6 7 8 9+ - * / ^ ( ) 0 1 2 3 4 5 6 7 8 9

Start Symbol Start Symbol ee

Grammar GGrammar G11

e : e + e -- addition, but that is “semantics”

| e - e -- subtraction

| e * e -- multiplication

| e / e -- division

| e ^ e -- exponentiation

| ( e ) -- grouping with parentheses

| n -- non-terminal for 10 terminals below:

n : 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Page 10: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

10

Strings in G188

0+70+7

6*6-4+26*6-4+2

2*(3+2)2*(3+2)

(((7)))(((7)))

((9)+8)*(((5-4)/2)/0)((9)+8)*(((5-4)/2)/0)

Operator precedence and ambiguity:Operator precedence and ambiguity:

In conventional arithmetic, * and / have stronger binding than + and -; this binding strength is AKA precedence or priority

G1 alone cannot express that!!

Parser discussed does not account for precedences! Can encode this in grammar too, but not covered here, since we do not include semantics discussion, i.e. code generation

If there are multiple parse trees for some strings of terminals, then the grammar is called ambiguous! G1 is ambiguous

Page 11: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

11

Grammar G2

Rewrite GRewrite G11 suitable for RD parsing, introduce metasymbols { } for suitable for RD parsing, introduce metasymbols { } for repetition 0 or more times; see Grepetition 0 or more times; see G22

expression : term { plus_op term }

plus_op : + | -

term : factor { mult_op factor }

mult_op : * | /

factor : primary { ^ primary }

primary : ( expression )| number

number : 0 | 1 | 2 | 3 | 4

| 5 | 6 | 7 | 8 | 9

note that position of semantic action effectively defines precedence; note that position of semantic action effectively defines precedence; important for ^, which is right-associative! Others are usually left-important for ^, which is right-associative! Others are usually left-associative; except in APL! We wonassociative; except in APL! We won’’t cover semantics in CS 410/510t cover semantics in CS 410/510

Page 12: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

12

Suitable GrammarDefinition: Definition: Parsing Parsing means “analysing a string for grammatical means “analysing a string for grammatical

correctness, according to the rules of language Lcorrectness, according to the rules of language L””

Definition: A Definition: A programprogram written in language L is a string of terminal written in language L is a string of terminal symbols; these symbols are strung together according to the grammar symbols; these symbols are strung together according to the grammar rules of Lrules of L

Such a program can be Such a program can be emptyempty only if there is a way for the start symbol to only if there is a way for the start symbol to generate generate lambdalambda

We parse program strings in a We parse program strings in a top downtop down fashion. fashion. Top down means: we Top down means: we start the string generation with the start the string generation with the start symbolstart symbol, matching terminals , matching terminals from the input stream one symbol (i.e. from the input stream one symbol (i.e. terminalterminal) at a time. Other ) at a time. Other methods exist not mentioned here; e.g. methods exist not mentioned here; e.g. bottom-up parsingbottom-up parsing

When we see several alternatives during the parse that may have created When we see several alternatives during the parse that may have created the source string so far, we the source string so far, we look-ahead look-ahead one source symbol to one source symbol to determine the correct next alternativedetermine the correct next alternative

Thus was coined the short-hand LL(1): Thus was coined the short-hand LL(1): LLeft-to-right reading symbols, eft-to-right reading symbols, LLeft-eft-to-right grammar use, to-right grammar use, 11 symbol look-ahead. Notation: symbol look-ahead. Notation: LL(1)LL(1)

Page 13: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

13

Suitable GrammarA meaningful grammar G is suitable for LL(1) parsing, if it adheres to:A meaningful grammar G is suitable for LL(1) parsing, if it adheres to:

1. No lambda productions:1. No lambda productions: Except for the start symbol, no other nonterminal is allowed to generate the

empty string; reason is, a parser can always succeed finding an empty string, so there is no real information in finding lambda

2. No left-recursive rules:2. No left-recursive rules: In the presence of left-recursive rules, a recursive descent resulting parser

would cause infinite regress; i.e. self-recursive calls until stack overflow Details belong into the compiler course

3. No circular productions, AKA :3. No circular productions, AKA : There cannot be productions of the type a : a … - left-recursive without intermediate productions! a : b … b : a … - circular: left-recursive with intermediate productions!

4. No context-sensitive rules:4. No context-sensitive rules: Two or more non-terminals a and b do not occur on the left side of some

production, defining unique strings different from concatenation of a and b: a b : some unique sequence – is not permitted

5. Empty Intersection of First Sets:5. Empty Intersection of First Sets: Set of possible tokens that can start a production is called the First Set If some productions do share a token: factorise out!

Page 14: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

14

Left-Recursion EliminationGrammar G is suitable for LL(1) parsing, if it includes no left-recursive rulesGrammar G is suitable for LL(1) parsing, if it includes no left-recursive rules

If we start out with a left-recursive G, replace the left-recursive productions If we start out with a left-recursive G, replace the left-recursive productions with equivalent ones that are not left-recursivewith equivalent ones that are not left-recursive

Sample grammar G with 2 terminals ‘A’ and ‘B’, one non-terminal ‘a’, Sample grammar G with 2 terminals ‘A’ and ‘B’, one non-terminal ‘a’, producing all strings of a single B followed by any number of A:producing all strings of a single B followed by any number of A:

GG aa :: a Aa A

|| BB

Replace G with G’ which is not left-recursive, but as a result introduces Replace G with G’ which is not left-recursive, but as a result introduces lambda-productions:lambda-productions:

G’G’ aa :: B a’B a’

a’a’ :: A a’A a’

| lambda| lambda

Page 15: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

15

Lambda-Rule EliminationA grammar is suitable for LL(1) parsing, if it exhibits no lambda-productionsA grammar is suitable for LL(1) parsing, if it exhibits no lambda-productions

But if we start out with lambda-productions, we have to transform it into an But if we start out with lambda-productions, we have to transform it into an equivalent one that is free of such rules --except if lambda is a programequivalent one that is free of such rules --except if lambda is a program

G’G’ aa :: B a’B a’

a’a’ :: A a’A a’

| lambda| lambda

One method of lambda-elimination is to expand the grammar with additional One method of lambda-elimination is to expand the grammar with additional productions for each non-terminal that can generate lambda:productions for each non-terminal that can generate lambda:

G’’G’’ aa :: B a’B a’

|| BB -- additional rule-- additional rule

a’a’ :: A a’A a’

| A| A -- additional rule-- additional rule

Page 16: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

16

Suitable G5 and G6

For both of the grammars G5 and G6 below, analyse them for their suitability For both of the grammars G5 and G6 below, analyse them for their suitability of LL1 parsing. In both the start symbol is of LL1 parsing. In both the start symbol is ss, and , and AA and and BB are terminals: are terminals: Is G5 context free? Is G5 lambda-free, aside from the start symbol producing lambda? Is G5 free from left-recursive rules?

GG55 ss :: A bA b || B a B a | lambda| lambda

aa :: AA || A sA s || B a aB a a

bb :: BB || B sB s || A b bA b b

Describe the languages. Compare L(GDescribe the languages. Compare L(G55) and L(G) and L(G66): Are they similar?): Are they similar? Is G6 context free?

Is G6 lambda-free, aside from the start symbol producing lambda?

Is G6 free from left-recursive rules?

GG66 ss :: A s BA s B || B s AB s A || s ss s

|| lambdalambda

Page 17: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

17

Use Grammar for Parsing1.1. Once we have a suitable grammar G, use G to methodically Once we have a suitable grammar G, use G to methodically

(mechanically, automatically) design a parser for language (mechanically, automatically) design a parser for language L(G). The method is named L(G). The method is named Recursive Descent ParsingRecursive Descent Parsing; ; common, old method, outlined belowcommon, old method, outlined below

2.2. Once we have a suitable grammar G, encode G directly as a Once we have a suitable grammar G, encode G directly as a data structure. Then write a simple loop that reads the source data structure. Then write a simple loop that reads the source and traverses the data structure driven by the incoming token and traverses the data structure driven by the incoming token stream, deciding at each point, which production of G to use stream, deciding at each point, which production of G to use that would allow the current source symbol (AKA token)that would allow the current source symbol (AKA token)

3.3. If indeed a person can If indeed a person can ““mechanically implement a parser for all mechanically implement a parser for all strings in Lstrings in L”” given G, then a program can do so as well; Church given G, then a program can do so as well; Church Thesis. These programs exist and are called Thesis. These programs exist and are called parser generatorsparser generators. . Their inventors sometimes call them Their inventors sometimes call them ““Compiler CompilersCompiler Compilers””; ; sounds fanciersounds fancier

4.4. Widely used industrial quality parser generator is YACC, so Widely used industrial quality parser generator is YACC, so named after the tongue-in cheek phrase: Yet Another Compiler named after the tongue-in cheek phrase: Yet Another Compiler Compiler. Available on Unix systemsCompiler. Available on Unix systems

Page 18: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

18

Now for the MAIN idea:

Page 19: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

19

Recursive DescentGoal: Describe an algorithm to mechanically produce a parser for Goal: Describe an algorithm to mechanically produce a parser for

language language L(G)L(G) using a suitable grammar using a suitable grammar GG

Preparation: Write a scanner, AKA lexical analyzer Preparation: Write a scanner, AKA lexical analyzer scan()scan() that reads the that reads the source program one character at a time, and returns a token source program one character at a time, and returns a token tt for each for each string of characters constituting a whole token, AKA lexeme. Lambda string of characters constituting a whole token, AKA lexeme. Lambda is not one of the possible tokens; and then:is not one of the possible tokens; and then:

1.1. For each nonterminal For each nonterminal nn defined in defined in GG,, define a recursive function – define a recursive function –procedure– by that name procedure– by that name n() n() –we–we’’ll re-write some nonterminalsll re-write some nonterminals

2.2. For each nonterminal For each nonterminal nn used on the right-hand-side in used on the right-hand-side in GG, call , call n()n()

3.3. For each terminal For each terminal tt that is required by any alternative in that is required by any alternative in GG, call , call must_be( t )must_be( t ) verify verify tt was found, and was found, and scan()scan() the next token after the next token after tt

4.4. When a production has multiple alternatives, use the mutually When a production has multiple alternatives, use the mutually exclusive exclusive first-sets first-sets of each production and next input token of each production and next input token tt (i.e. (i.e. look ahead 1 token) to determine, which nonterminal look ahead 1 token) to determine, which nonterminal n()n() to call; if the to call; if the first-set does not resolve this: error; we don’t have a suitable first-set does not resolve this: error; we don’t have a suitable grammar! grammar!

Page 20: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

20

Recursive Descent Parser For sGrammar GGrammar G00::

ss :: ( s ) s( s ) s||

Sample strings in L(GSample strings in L(G00):):

() () or or ((())) ((())) oror ()()() ()()() but not but not )()(

scan(): For such simple tokens –AKA lexemes– that consist of scan(): For such simple tokens –AKA lexemes– that consist of single characters single characters ’’((’’ and and ’’))’’, a scanner can be as simple as , a scanner can be as simple as the C/C++ function the C/C++ function getchar()getchar(). But generally, tokens are multi-. But generally, tokens are multi-character symbolscharacter symbols

Function Function must_be( t )must_be( t ) simply checks for expected symbol simply checks for expected symbol tt::

// assume global: char next_char, void function scan()// assume global: char next_char, void function scan()void must_be( char expected )void must_be( char expected ){ // must_be{ // must_be

if ( next_char != expected ) {if ( next_char != expected ) {printf( " Expect ‘%c', is '%c'.\n", expected, next_char );printf( " Expect ‘%c', is '%c'.\n", expected, next_char );} //end if} //end if

scan();scan();} //end must_be} //end must_be

Page 21: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

21

Recursive Descent Parser For s// other declarations here . . .// other declarations here . . .

void scan( )void scan( ){ // scan{ // scan

next_char = getchar();next_char = getchar(); // read next input character// read next input characterif ( BLANK == next_char ) {if ( BLANK == next_char ) { // skip ’ ’// skip ’ ’

scan();scan();}else{}else{

printf( "%c", next_char );printf( "%c", next_char ); // echo the non-blank found// echo the non-blank found} //end if} //end if

} //end scan} //end scan

void s()void s() // start for grammar G0// start for grammar G0{ // s{ // s

if ( next_char == OPEN ) {if ( next_char == OPEN ) { // that is open parenthesis ‘(‘// that is open parenthesis ‘(‘scan();scan(); // throw away the ‘(‘// throw away the ‘(‘s();s(); // recurse for nested (// recurse for nested (must_be( CLOSED );must_be( CLOSED ); // i.e. closed parenthesis ‘)’// i.e. closed parenthesis ‘)’s();s(); // recurse for sequence ( ) ( )// recurse for sequence ( ) ( )

} //end if} //end if // no more OPEN found; return// no more OPEN found; return} //end s} //end s

int main()int main(){ // main{ // main

scan();scan(); // get first ever token// get first ever tokens();s(); // language// languageAssert( EOF, “Garbage found” );Assert( EOF, “Garbage found” );

} //end main} //end main

Page 22: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

22

Repeat of Grammar G2

expression : term { plus_op term }

plus_op : + | -

term : factor { mult_op factor }

mult_op : * | /

factor : primary { ^ primary }

primary : ( expression )| number

number : 0 | 1 | 2 | 3 | 4

| 5 | 6 | 7 | 8 | 9

Page 23: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

23

Parser For G2 expression(), 1// parser for grammar G2:// parser for grammar G2:////// expression// expression : term { plus_op term }: term { plus_op term }// plus_op// plus_op : '+' | '-': '+' | '-'// term// term : factor { mult_op factor }: factor { mult_op factor }// mult_op// mult_op : '*' | '/’: '*' | '/’// factor// factor : primary { ^ primary }: primary { ^ primary }// primary// primary : '(' expression ')': '(' expression ')'//// | number| number// number// number : '0' | '1' | '2' ... '9’: '0' | '1' | '2' ... '9’

#include <stdio.h>#include <stdio.h>

#define #define BLANKBLANK ' '' '#define #define EOLEOL '\n''\n'#define#define OPENOPEN '(''('#define#define CLOSEDCLOSED ')'')'

char next_char = BLANK;char next_char = BLANK; // globally used for "token"// globally used for "token"

#define ASSERT( c )#define ASSERT( c ) \\if ( next_char != c ) {if ( next_char != c ) { \\

printf( "Error, expected '%c', found '%c'\n", c, next_char );printf( "Error, expected '%c', found '%c'\n", c, next_char ); \\}else{}else{ \\

scan();scan(); \\} //end if} //end if

void scan( )void scan( ){ // scan{ // scan

next_char = getchar();next_char = getchar();if ( BLANK == next_char ) {if ( BLANK == next_char ) {

scan();scan();}else{}else{

printf( "%c", next_char );printf( "%c", next_char ); // echo non-blank found// echo non-blank found} //end if} //end if

} //end scan} //end scan

void expression();void expression(); // forward announcement!!// forward announcement!!

Page 24: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

24

Parser For G2 expression(), 2

// scans a single digit number; if not found: error// scans a single digit number; if not found: errorvoid number()void number(){ // number { // number

if ( ( next_char >= '0' ) && ( next_char <= '9' ) ) {if ( ( next_char >= '0' ) && ( next_char <= '9' ) ) { scan();scan();

}else{}else{ printf( "primary expression 0,1,2 .. or '(' expected.\n" );printf( "primary expression 0,1,2 .. or '(' expected.\n" );} //end if} //end if

} //end number} //end number

// parse primary expression, either:// parse primary expression, either:// ( ... ) or a number// ( ... ) or a numbervoid primary()void primary(){ // primary{ // primary

if ( next_char == OPEN ) {if ( next_char == OPEN ) { scan();scan(); expression();expression(); ASSERT( CLOSED );ASSERT( CLOSED );

}else{}else{ number();number();

} //end if} //end if} //end primary} //end primary

Page 25: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

25

Parser For G2 expression(), 3// parse highest priority operator ^// parse highest priority operator ^void factor()void factor(){ // factor{ // factor

primary();primary();while ( next_char == '^' ) {while ( next_char == '^' ) {

scan();scan(); primary();primary();

} //end while} //end while} //end factor} //end factor

// parse multiply operators; no need to write mult_op nonterminal// parse multiply operators; no need to write mult_op nonterminalvoid term()void term(){ // term{ // term

factor();factor();while ( ( next_char == '*' ) || ( next_char == '/' ) ) {while ( ( next_char == '*' ) || ( next_char == '/' ) ) {

// note: abbreviation from “mult_op()”// note: abbreviation from “mult_op()” scan();scan(); factor();factor();

} //end while} //end while} //end term} //end term

// parse adding operators + and -, no need to write plus_op nonterminal// parse adding operators + and -, no need to write plus_op nonterminalvoid expression()void expression(){ // expression{ // expression

term();term();while ( ( next_char == '+' ) || ( next_char == '-' ) ) {while ( ( next_char == '+' ) || ( next_char == '-' ) ) { // note: abbreviation from “plus_op()”// note: abbreviation from “plus_op()”

scan();scan(); term();term();

} //end while} //end while} //end expression} //end expression

Page 26: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

26

Parser For G2 expression(), 4// get first token// get first token// then parse complete expression// then parse complete expression// assert no more source after expression// assert no more source after expression////int main()int main(){ // main{ // main

scan();scan();expression();expression();ASSERT( EOL );ASSERT( EOL );return 0;return 0;

} //end main} //end main

Page 27: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

27

Sample Input for expression()

(( 8) )(( 8) )

( ( ( 5 + 3* 3 ) / ( 5^6 ) - 2 ) ^ ( 2 ^ 6 ^ 7 ) )( ( ( 5 + 3* 3 ) / ( 5^6 ) - 2 ) ^ ( 2 ^ 6 ^ 7 ) )

Page 28: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

28

A Parsing VariationWe broke the general rule for Recursive Descent Parsing, namely We broke the general rule for Recursive Descent Parsing, namely

defining a recursive function defining a recursive function for each for each non-terminal symbols non-terminal symbols in Gin G

We coded the scanning of operators directly in-line; e.g. + and -, We coded the scanning of operators directly in-line; e.g. + and -, or * and / using a while loop to parse one or more of the or * and / using a while loop to parse one or more of the [repeated] operators instead![repeated] operators instead!

In such cases, the semantic actions can be associated with the In such cases, the semantic actions can be associated with the operator just scanned in a left-to-right fashionoperator just scanned in a left-to-right fashion i.e. the semantic actions are done left-associatively

An equally elegant way is to use an If-Statement and call the An equally elegant way is to use an If-Statement and call the parsing function directly recursivelyparsing function directly recursively Easily allowing right-associative semantic actions Recursion parses multiple operators of the same precedence

Page 29: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

29

Change Grammar G2 to G3

expression : term [ plus_op expression ]

plus_op : + | -

term : factor [ mult_op term ]

mult_op : * | /

factor : primary [ ^ factor ]

primary : ( expression )| number

number : 0 | 1 | 2 | 3 | 4

| 5 | 6 | 7 | 8 | 9

Page 30: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

30

Modified Parse For G3// parse highest priority operator ^// parse highest priority operator ^void factor()void factor(){ // factor{ // factor

primary();primary();if ( ‘^’ == next_char ) {if ( ‘^’ == next_char ) {

scan();scan(); factor();factor(); // <- parse repeated ^ operators// <- parse repeated ^ operators

} //end if} //end if} //end factor} //end factor

// parse multiply operators; no need to write mult_op nonterminal// parse multiply operators; no need to write mult_op nonterminalvoid term()void term(){ // term{ // term

factor();factor();if ( ( next_char == '*' ) || ( next_char == '/' ) ) {if ( ( next_char == '*' ) || ( next_char == '/' ) ) {

// note: abbreviation from “mult_op()”// note: abbreviation from “mult_op()” scan();scan(); term(); term(); // <- parse repeated * and / operators// <- parse repeated * and / operators

} //end if} //end if} //end term} //end term

// parse adding operators + and -, skip plus_op nonterminal// parse adding operators + and -, skip plus_op nonterminalvoid expression()void expression(){ // expression{ // expression

term();term(); if ( ( next_char == '+' ) || ( next_char == '-' ) ) {if ( ( next_char == '+' ) || ( next_char == '-' ) ) { // note: abbreviation from “plus_op()”// note: abbreviation from “plus_op()”

scan();scan(); expression(); // <- parse repeated + and - operatorsexpression(); // <- parse repeated + and - operators

} //end if} //end if} //end expression} //end expression

Page 31: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

31

Data Structure and GrammarTo be handled in compiler courseTo be handled in compiler course

Possibly a future extension at CS 410/510Possibly a future extension at CS 410/510

Page 32: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

32

Grammar G4 For Statement s()

s : statement [ s ]

statement : if_statement

| assign_statement

if_statement : IF_SYM expression THEN_SYM statement

[ ELSE_SYM statement ] FI_SYM ‘;’

assign_statement : ident ‘=’ expression ‘;’

-- separate ideas:

expression : as discussed earlier

*_SYM : tokens returned by scan(), e.g. IF_SYM

Page 33: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

33

Parser For G4 Statements s(), Part 1void s();void s(); // forward announcement// forward announcement

void assign_statement()void assign_statement()

{ // assign_statement{ // assign_statement

must_be( ident );must_be( ident );

must_be( assign_sym );must_be( assign_sym );

expression();expression();

must_be( semi_sym );must_be( semi_sym );

} //end assign_statement} //end assign_statement

Page 34: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

34

Parser For G4 Statements s(), Part 2void if_statement()void if_statement(){ // if_statement{ // if_statement

must_be( if_sym );must_be( if_sym );expression();expression();must_be( then_sym );must_be( then_sym );s();s();if ( else_sym == token ) {if ( else_sym == token ) {

scan();scan();s();s();

} //end if} //end ifmust_be( fi_sym );must_be( fi_sym );must_be( semi_sym );must_be( semi_sym );

} //end if_statement} //end if_statement

Page 35: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

35

Parser For G4 Statements s(), Part 3void statement()void statement(){ // statement{ // statement

if ( if_sym == token ) {if ( if_sym == token ) {if_statement();if_statement();

}else{}else{assign_statement();assign_statement();

} //end if} //end if} //end statement} //end statement

void s()void s(){ // s{ // s

statement();statement();// use first-set: more statements?// use first-set: more statements?if ( ( if_sym == token ) || ( ident == token ) ) {if ( ( if_sym == token ) || ( ident == token ) ) {

s();s();} //end if} //end if

} //end s} //end s

Page 36: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

36

Parser For G4 Statements s(), Part 4int main()int main(){ // main{ // main

// ...// ... // initializations// initializationsscan();scan(); // find first token// find first tokens();s(); // list of statements// list of statementsASSRT( EOF );ASSRT( EOF ); // no junk after program// no junk after program

} //end main} //end main

Page 37: 1 CS 410 / 510 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Status 7/14/2013

37

References1.1. Algol-60 Report: http://www.masswerk.at/algol60/report.htmAlgol-60 Report: http://www.masswerk.at/algol60/report.htm

2.2. John Backus, John Backus, http://www-03.ibm.com/ibm/history/exhibits/builders/builders_http://www-03.ibm.com/ibm/history/exhibits/builders/builders_backus.htmlbackus.html

3.3. BNF: BNF: http://cui.unige.ch/db-research/Enseignement/analyseinfo/Abohttp://cui.unige.ch/db-research/Enseignement/analyseinfo/AboutBNF.htmlutBNF.html

4.4. ISO EBNF: http://www.cl.cam.ac.uk/~mgk25/iso-ebnf.htmlISO EBNF: http://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html

5.5. Left-Recursion elimination: Herbert G Mayer, Left-Recursion elimination: Herbert G Mayer, ““Programming Programming LanguagesLanguages””, © 1988 MacMillan Publishing Co., , © 1988 MacMillan Publishing Co., ISBN: 0-02-ISBN: 0-02-378295-1378295-1

6.6. Church Thesis: http://plato.stanford.edu/entries/church-turing/Church Thesis: http://plato.stanford.edu/entries/church-turing/

7.7. YACC: http://dinosaur.compilertools.net/yacc/YACC: http://dinosaur.compilertools.net/yacc/

8.8. http://en.wikipedia.org/wiki/Compiler_Description_Languagehttp://en.wikipedia.org/wiki/Compiler_Description_Language