24
Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

Embed Size (px)

Citation preview

Page 1: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

Session14 (DM62) / 15 (DM63)

Recursive Descendent Parsing

Page 2: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

2

Læringsmål

Kunne redegøre for forskelle på regulære og kontekstfri sprog (rekursive regler).

Kunne forstå kontekstfri grammatikker beskrevet i fx BNF.

Kunne redegøre for, hvordan kontekstfri sprog kan parses vha. rekursiv nedstigning (syntakstræer).

Kunne opbygge en rekursiv nedstignings parser udfra en simpel kontekstfri grammatik (BNF).

Page 3: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

3

The Translation Process A compiler consist of

a number of logical layers and components.

Page 4: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

4

Parsing Parsing (syntax analysis) is the task of determining whether

a program is syntactically correct or not. Doing this the parser determines the syntactic structure of

the program – usually in form of a parse tree or syntax tree. This structure guides the rest of the translation process. The syntax is defined by grammar rules of a context-free

grammar. Grammar rules are define in a manner similar to regular

expressions. The major difference is that grammar rules are recursive. There is no * operation.

There are two general categories of parsing algorithm: Top-down parsing Bottom-up parsing

Page 5: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

5

Context-free Grammars A context-free grammar is a specification for the syntactic

structure of a programming language. As a running example, we will use simple integer

arithmetic expressionsexp -> exp op exp | ( exp ) | numberop -> + | - | *

where number is a regular expression The vertical bar | means choice Concatenation is also use as a standard operation Remark the recursive nature of the definition of exp Note also that the rule use regular expressions as symbols. That is:

The rule is defined over an alphabet which contain tokens. We need also a symbol ε for the empty string of tokens.

Page 6: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

6

Programming Language

Context-free grammar rules determine a programming language: The set of legal strings of tokens.

For example (34-3)*42 corresponds to the legal string of seven tokens defined by exp:

( number – number ) * number On the other hand, the string (34-3*42 corresponds to the

illegal string of six tokens:(number – number * number

Grammar rules are sometimes called production because they “produce” strings in the language.

Page 7: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

7

Backus-Naur Form (BNF) Grammar rules using this form are said to be in Backus-

Naur form (BNF) A BNF for Pascal will begin with grammar rules such as:

program -> program_heading ; program_block .program_heading ->program. . .program_block -> statements …statements -> statements; statement | statementstatement -> if_statement | assign_statement | . . assign_statement -> identifier := exp;

program is called the start symbolprogram, program_heading, program_block, statements,statement, assign_statement are called nonterminalsThe tokens program, identifier and := are examples of

terminals.

Page 8: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

8

Derivation

A derivation is a sequence of replacements of structure names by choices on the right-hand sides of grammar rules

As an example we look at a derivation for the arithmetic expression (34 – 3) * 42:

Page 9: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

9

Parse Tree A parse tree corresponding to a derivation is a labeled tree

in which: the interior nodes are labeled by non-terminals, the leaf nodes are labeled by terminals, and the children of each internal node represent the replacement of

the associated non-terminal

Page 10: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

10

Abstract Syntax Tree

A parse tree contains more info than is absolutely necessary for a compiler to produce object code.

Abstract syntax trees can be thought of as a tree representation of a shorthand notation called abstract syntax

Page 11: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

11

Ambiguous Grammars

Consider the simple integer arithmetic grammarexp -> exp op exp | ( exp ) | numberop -> + | - | *

And consider the string 34-3*42. This string has two different parse trees.

Exercise: Draw two different parse trees for the expression 34-3*43

Page 12: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

12

Ambiguous Grammars Consider the simple integer arithmetic grammar

exp -> exp op exp | ( exp ) | numberop -> + | - | *

And consider the string 34-3*42. This string has two different parse trees:

A grammar that generates a string with two distinct parse trees is called an ambiguous grammar (a serious problem)

Which one is correct?

Page 13: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

13

Ambiguous Grammars

Two basic method are used to deal with ambiguities Disambiguating rule

State a rule that specifies in each ambiguous case which of the parse trees is the correct one. This will correct the ambiguity without changing the grammar, but the grammar rule is no longer only in BNF.

Changing the grammarWe can change the grammar into a different grammar that is correct. This will often complicate the grammar.

Page 14: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

14

Ambiguous Grammars

To remove the ambiguity in the integer arithmetic grammar, we could simply state a disambiguating rule that establish the relative precedence's of the three operations +, - * and that subtraction is considered to be left associative.

To remove the ambiguity without a disambiguating rule (preferable) we must: group the operators into groups of equal precedence Make subtraction (or all operators) left associative

Exercise:Draw the syntax tree for

34-3*42 using this grammar.

Are there more than one?Is operator precedence

ok?

Page 15: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

15

Extended Backus-Naur Form Repetitive and optional constructs are common in programming

languages, and thus in BNF grammar rules. Therefore the BNF notation is sometimes extended to include: Repetition

BNF (left recursive) A -> Aa | bEBNF A -> b {a}

OptionalBNF statement -> if-stmt | other

if-stmt -> if( exp ) statement | if( exp ) statement else statement exp -> 0 | 1EBNF statement -> if-stmt | other

if-stmt -> if( exp ) statement [ else statement ]exp -> 0 | 1

Page 16: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

16

Syntax diagram Graphical representations for BNF or EBNF rules are

called syntax diagrams. They consist of: oval boxes indicate terminals rectangles indicate non-terminals arrowed lines representing sequencing and choices

As an example, consider the grammar rulefactor -> ( exp ) | number

Page 17: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

17

Exercises

Draw the syntax diagram for:

if-statement -> if ( exp ) statement | if ( exp ) statement else statementexp -> true | false

Write down the derivation and syntax tree for the following

expression:3-(4+5*6)

Page 18: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

18

Context-Free Grammar for TINY

Exercise: Draw syntax diagrams that defines,

this part of the TINY grammar:

Page 19: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

19

Top-Down Parsing

A top-down parsing algorithm parses an input string of tokens by tracing out the steps in a leftmost derivation.

Top-down parses come in two forms: Predictive parsers

Attempts to predict the next construction in the input string using one or more look ahead tokens

Backtracking parsersWill try different possibilities for a parse of the input (slow)

There are two kinds of top-down parsers Recursive-decent parsing (suitable for handwritten parses) LL(1) parsing (no longer used in practice).

Page 20: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

20

Recursive-Decent

The idea of recursive-decent parsing is simple: We view the grammar rule for a non-terminal A as a definition for

a method that will recognize an A The right-hand side of the grammar specifies the code structure:

A choice correspond to alternatives (if-statements or case-statement) Non-terminals corresponds to other methods.

Recursive Decent Parsing is important in connection with XML. XML parsers of the DOM

type use recursive decent.

Page 21: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

21

Recursive-Decent – small example

Identifiers descripted in BNF (usually one would use regular expressions)<identifier> ::= <letter>|<letter><charSeq>

<charSeq> ::= <char>|<char><charSeq>

<char> ::= <letter>|<digit>

<letter> ::= a|b|…|z

<digit> ::= 0|1|…|9C#-code

Page 22: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

22

Recursive-Decent: – small example – now in Java!!

Java-code

Page 23: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

23

Exercises

Write a recursive decent parser for the grammar that defines integers:<digit> ::= 0│1│2│3│4│5│6│7│8│9

<sign> ::= +|-

<unsigned integer> ::= <digit>│<digit><unsigned integer>

<integer> ::= <unsigned integer>|<sign><unsigned integer>

Look at the Java-code for the small English grammar. Rewrite the code into C#

Page 24: Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing

24

Exercises - Extra Modify the grammar for integers so decimals are accepted:

<digit> ::= 0│1│2│3│4│5│6│7│8│9

<sign> ::= +|-

<unsigned integer> ::= <digit>│<digit><unsigned integer>

<integer> ::= <unsigned integer>|<sign><unsigned integer>

Write a recursive decent parser for the grammar that defines decimals:

<decimal> ::= <integer><decimal-sign><unsigned integer><integer> ::= <unsigned integer>|<sign><unsigned integer><unsigned integer> ::= <digit>|<digit><unsigned integer><digit> ::= 0|1|2|3|4|5|6|7|8|9<sign> ::= +|-<decimal-sign> ::= .