12
Compiler [email protected] 1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

[email protected] Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Embed Size (px)

Citation preview

Page 1: CompilerSamuel2005@126.com1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Compiler [email protected] 1

Chapter 4

Top-Down Parsing

Recursive-Descent

Gang S. LiuCollege of Computer Science & Technology

Harbin Engineering University

Page 2: CompilerSamuel2005@126.com1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Compiler [email protected] 2

Top-down Parsing

• A top-down parsing algorithm parses an input string of tokens by tracing out the steps in a leftmost derivation.

• The traversal of the parse tree occurs from the root to the leaves.

• Two forms of top-down parsing:1. Predictive parsers.

• Attempts to predict the next construction in the input string using one or more lookahead tokens.

2. Backtracking parsers.• Tries different possibilities for a parse of the input, backing

up an arbitrary amount in the input. May require exponential time

Page 3: CompilerSamuel2005@126.com1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Compiler [email protected] 3

Examples(1) exp => exp op exp(2) => number op exp(3) => number + exp (4) => number + number

exp

exp op exp

number + number

1

2 3 4

exp

exp op exp

number + number

1

4 3 2

(1) exp => exp op exp

(2) => exp op number

(3) => exp + number

(4) => number + number

Leftmost derivation

Rightmost derivation

Preorder numbering

The reverse of a Postorder numbering

Page 4: CompilerSamuel2005@126.com1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Compiler [email protected] 4

Two Kinds of Top-Down Parsing1. Recursive-descent parsing

• Versatile

• Suitable for handwritten parser

2. LL(1) parsing• No longer often used

• Simple scheme with explicit stack

• Prelude for more powerful and complex bottom-up algorithms

• First “L” – the input is processed from left to right

• Second “L” – leftmost derivation

• 1 – one lookahead symbol

Page 5: CompilerSamuel2005@126.com1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Compiler [email protected] 5

match matches the current token with its parameter, advances the input if it succeeds.

match(expToken)

if token = expToken then

getToken();

else error;

endif;

Recursive-Descent• The grammar rule for a nonterminal A is viewed

as a definition for a procedure that will recognize an A.

exp → exp addop term | termaddop → + | -term → term mulop factor | factormulop → *factor → (exp) | number

factor()

switch token

case(: match(();

exp();

match());

break;

case number:

match(number);

break;

default: error;

Page 6: CompilerSamuel2005@126.com1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Compiler [email protected] 6

Choice

ifStmt() match(if); match ((); exp(); match()); statement(); if token = else then match (else); statement(); end if;

statement → if-stmt | otherif-stmt → if (exp) statement [ else statement ]exp → 0 | 1

EBNF is designed to mirror closely the actual code for recursive-descent parser.

Page 7: CompilerSamuel2005@126.com1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Compiler [email protected] 7

Repetitionexp → exp addop term | term

exp → term { addop term }

exp()

term();

while token = + or token = - do

match (token);

term();

end while;

• Left recursive grammar:

• A ::= A α | β – Equivalent to β α*

Page 8: CompilerSamuel2005@126.com1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Compiler [email protected] 8

Problems with Recursive-Descent

1. It may be difficult to convert a grammar into EBNF.

2. It may be difficult to distinguish two or more grammar rule options A → α | β, if both α and β begin with nonterminals. (First set)

3. A → ε. It may be necessary to know what token can come after the nonterminal A. (Follow set)

Page 9: CompilerSamuel2005@126.com1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Compiler [email protected] 9

Reporting Errors

• At a minimum, any parser must indicate that some error exists, if a program contains a syntax error.

• Usually, a parser will attempt to give a meaningful error message and determine the location where that error has occurred.

• Some parsers may attempt some form of error correction.

Page 10: CompilerSamuel2005@126.com1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Compiler [email protected] 10

General Principles

1. A parser should determine that an error has occurred as soon as possible.

2. The parser must pick a place to resume the parse. A parser must try to parse as much of the code as possible.

3. A parser should try to avoid the error cascade problem.

4. A parser must avoid infinite loops an errors.

Page 11: CompilerSamuel2005@126.com1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Compiler [email protected] 11

Panic Mode• A standard form of error recovery in recursive-

descent parsers is called panic mode.• The basic mechanism - a set of synchronizing

tokens.– Tokens may be added to the set as parsing proceeds.

– If error is encountered, the parser scans ahead until it sees one of the synchronizing tokens. Then parsing is resumed.

– Error cascades are avoided.

• What tokens to add to the set?– Symbols like semicolons, commas, parentheses

Page 12: CompilerSamuel2005@126.com1 Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University

Compiler [email protected] 12

Homework

• 4.2 Given the grammar A → ( A ) A | ε , write pseudocode to parse this grammar by recursive-descent.

• 4.3 Given the grammar

• Write pseudocode to parse this grammar by recursive-descent.

statement → assign-stmt | call-stmt | otherassign-stmt → identifier := expcall-stmt → identifier ( exp-list )