55
2/8/2008 Prof. Hilfinger CS164 Lecture 8 1 Introduction to Parsing Lecture 8 Adapted from slides by G. Necula and R. Bodik

Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 1

Introduction to Parsing

Lecture 8Adapted from slides by G. Necula and R. Bodik

Page 2: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 2

Outline

• Limitations of regular languages• Parser overview• Context-free grammars (CFG’s)• Derivations• Syntax-Directed Translation

Page 3: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 3

Languages and Automata

• Formal languages are very important in CS– Especially in programming languages

• Regular languages– The weakest formal languages widely used– Many applications

• We will also study context-free languages

Page 4: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 4

Limitations of Regular Languages

• Intuition: A finite automaton that runs longenough must repeat states

• Finite automaton can’t remember # of times ithas visited a particular state

• Finite automaton has finite memory– Only enough to store in which state it is– Cannot count, except up to a finite limit

• E.g., language of balanced parentheses is notregular: { (i )i | i ≥ 0}

Page 5: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 5

The Structure of a Compiler

Source Tokens

Interm.Language

Lexicalanalysis

Parsing

CodeGen.

MachineCode

Today westart

Optimization

Page 6: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 6

The Functionality of the Parser

• Input: sequence of tokens from lexer

• Output: abstract syntax tree of the program

Page 7: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 7

Example

• Pyth: if x == y: z =1 else: z = 2

• Parser input: IF ID == ID : ID = INT ↵ ELSE : ID = INT ↵

• Parser output (abstract syntax tree):

IF-THEN-ELSE

== = =

ID ID ID ID INTINT

Page 8: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 8

Why A Tree?

• Each stage of the compiler has two purposes:– Detect and filter out some class of errors– Compute some new information or translate the

representation of the program to make thingseasier for later stages

• Recursive structure of tree suits recursivestructure of language definition

• With tree, later stages can easily find “theelse clause”, e.g., rather than having to scanthrough tokens to find it.

Page 9: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 9

Comparison with Lexical Analysis

Syntax treeSequence oftokens

Parser

Sequence oftokens

Sequence ofcharacters

Lexer

OutputInputPhase

Page 10: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 10

The Role of the Parser

• Not all sequences of tokens are programs . . .• . . . Parser must distinguish between valid and

invalid sequences of tokens

• We need– A language for describing valid sequences of tokens– A method for distinguishing valid from invalid

sequences of tokens

Page 11: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 11

Programming Language Structure

• Programming languages have recursive structure• Consider the language of arithmetic expressions

with integers, +, *, and ( )• An expression is either:

– an integer– an expression followed by “+” followed by expression– an expression followed by “*” followed by expression– a ‘(‘ followed by an expression followed by ‘)’

• int , int + int , ( int + int) * int are expressions

Page 12: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 12

Notation for Programming Languages

• An alternative notation: E → int E → E + E E → E * E E → ( E )

• We can view these rules as rewrite rules– We start with E and replace occurrences of E with

some right-hand side• E → E * E → ( E ) * E → ( E + E ) * E → … → (int + int) * int

Page 13: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 13

Observation

• All arithmetic expressions can be obtained bya sequence of replacements

• Any sequence of replacements forms a validarithmetic expression

• This means that we cannot obtain ( int ) )by any sequence of replacements. Why?

• This set of rules is a context-free grammar

Page 14: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 14

Context-Free Grammars

• A CFG consists of– A set of non-terminals N

• By convention, written with capital letter in these notes– A set of terminals T

• By convention, either lower case names or punctuation– A start symbol S (a non-terminal)– A set of productions

• Assuming E ∈ N E → ε , or E → Y1 Y2 ... Yn where Yi ∈ N ∪ T

Page 15: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 15

Examples of CFGs

Simple arithmetic expressions: E → int E → E + E E → E * E E → ( E )– One non-terminal: E– Several terminals: int, +, *, (, )

• Called terminals because they are never replaced– By convention the non-terminal for the first

production is the start one

Page 16: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 16

The Language of a CFG

Read productions as replacement rules:

X → Y1 ... YnMeans X can be replaced by Y1 ... Yn

X → εMeans X can be erased (replaced with empty string)

Page 17: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 17

Key Idea

1. Begin with a string consisting of the startsymbol “S”

2. Replace any non-terminal X in the string by aright-hand side of some production X → Y1 … Yn

3. Repeat (2) until there are only terminals inthe string

4. The successive strings created in this wayare called sentential forms.

Page 18: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 18

The Language of a CFG (Cont.)

More formally, may write

X1 … Xi-1 Xi Xi+1… Xn → X1 … Xi-1 Y1 … Ym Xi+1 … Xn

if there is a production

Xi → Y1 … Ym

Page 19: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 19

The Language of a CFG (Cont.)

Write X1 … Xn →* Y1 … Ym

if X1 … Xn → … → … → Y1 … Ym

in 0 or more steps

Page 20: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 20

The Language of a CFG

Let G be a context-free grammar with startsymbol S. Then the language of G is:

L(G) = { a1 … an | S →* a1 … an and every ai

is a terminal }

Page 21: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 21

Examples:

• S → 0 also written as S → 0 | 1 S → 1

Generates the language { “0”, “1” }• What about S → 1 A A → 0 | 1• What about S → 1 A A → 0 | 1 A• What about S → ε | ( S )

Page 22: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 22

Pyth Example

A fragment of Pyth:

Compound → while Expr: Block | if Expr: Block ElsesElses → ε | else: Block | elif Expr: Block Elses Block → Stmt_List | Suite

(Formal language papers use one-character non-terminals, but we don’t have to!)

Page 23: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 23

Derivations and Parse Trees

• A derivation is a sequence of sentential formsresulting from the application of a sequence ofproductions

S → … → …

• A derivation can be represented as a parse tree– Start symbol is the tree’s root– For a production X → Y1 … Yn add children Y1, …, Yn to node X

Page 24: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 24

Derivation Example

• Grammar E → E + E | E * E | (E) | int• String int * int + int

Page 25: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 25

Derivation Example (Cont.)

E→ E + E→ E * E + E→ int * E + E→ int * int + E→ int * int + int

Page 26: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 26

Derivation in Detail (1)

E E

Page 27: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 27

Derivation in Detail (2)

E

E E+

E→ E + E

Page 28: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 28

Derivation in Detail (3)

E

E

E E

E+

*

E→ E + E→ E * E + E

Page 29: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 29

Derivation in Detail (4)

E

E

E E

E+

*

int

E→ E + E→ E * E + E→ int * E + E

Page 30: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 30

Derivation in Detail (5)

E

E

E E

E+

*

intint

E→ E + E→ E * E + E→ int * E + E→ int * int + E

Page 31: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 31

Derivation in Detail (6)

E

E

E E

E+

int*

intint

E→ E + E→ E * E + E→ int * E + E→ int * int + E→ int * int + int

Page 32: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 32

Notes on Derivations

• A parse tree has– Terminals at the leaves– Non-terminals at the interior nodes

• A left-right traversal of the leaves is theoriginal input

• The parse tree shows the association ofoperations, the input string does not !– There may be multiple ways to match the input– Derivations (and parse trees) choose one

Page 33: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 33

The Payoff: parser as a translator

syntax-directed translation

parser

syntax + translation rules(typically hardcoded in the parser)

stream of tokens

ASTs, orassembly code

Page 34: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 34

Mechanism of syntax-directed translation

• syntax-directed translation is done byextending the CFG– a translation rule is defined for each production

givenX d A B c

the translation of X is defined recursively using• translation of nonterminals A, B• values of attributes of terminals d, c• constants

Page 35: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 35

To translate an input string:

1. Build the parse tree.2. Working bottom-up

• Use the translation rules to compute the translation of eachnonterminal in the tree

Result: the translation of the string is the translation of the parsetree's root nonterminal.

Why bottom up?• a nonterminal's value may depend on the value of the symbols

on the right-hand side,• so translate a non-terminal node only after children

translations are available.

Page 36: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 36

Example 1: Arithmetic expression to value

Syntax-directed translation rules:

E E + T E1.trans = E2.trans + T.trans

E T E.trans = T.transT T * F T1.trans = T2.trans * F.transT F T.trans = F.transF int F.trans = int.valueF ( E ) F.trans = E.trans

Page 37: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 37

Example 1: Bison/Yacc Notation

E : E + T { $$ = $1 + $3; }

T : T * F { $$ = $1 * $3; }

F : int { $$ = $1; }F : ‘(‘ E ‘) ‘ { $$ = $2; }

• KEY: $$ : Semantic value of left-hand side

$n : Semantic value of nth symbol onright-hand side

Page 38: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 38

Example 1 (cont): Annotated Parse Tree

Input: 2 * (4 + 5)E (18)

T (18)

F (9)T (2)

F (2)E (9)

T (5)

F (5)

E (4)

T (4)

F (4)

*

)

+

(

int (2)

int (4)int (5)

Page 39: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 39

Example 2: Compute the type of an expression

E -> E + E if $1 == INT and $3 == INT: $$ = INTelse: $$ = ERROR

E -> E and E if $1 == BOOL and $3 == BOOL: $$ = BOOLelse: $$ = ERROR

E -> E == E if $1 == $3 and $2 != ERROR: $$ = BOOLelse: $$ = ERROR

E -> true $$ = BOOLE -> false $$ = BOOLE -> int $$ = INTE -> ( E ) $$ = $2

Page 40: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 40

Example 2 (cont)

• Input: (2 + 2) == 4

E (INT)

E (BOOL)

E (INT)

int (INT)

==

E (INT) E (INT)+

int (INT)int (INT)

E (INT)( )

Page 41: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 41

Building Abstract Syntax Trees

• Examples so far, streams of tokens translatedinto– integer values, or– types

• Translating into ASTs is not very different

Page 42: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 42

AST vs. Parse Tree

• AST is condensed form of a parse tree– operators appear at internal nodes, not at leaves.– "Chains" of single productions are collapsed.– Lists are "flattened".– Syntactic details are omitted

• e.g., parentheses, commas, semi-colons

• AST is a better structure for later compilerstages– omits details having to do with the source language,– only contains information about the essential

structure of the program.

Page 43: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 43

Example: 2 * (4 + 5) Parse tree vs. AST

E

int (2)

*

+2

54

T

FT

F E

TF

ETF

*

)

+

(

int (5)

int (4)

Page 44: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 44

AST-building translation rules

E E + T $$ = new PlusNode($1, $3)

E T $$ = $1

T T * F $$ = new TimesNode($1, $3)

T F $$ = $1

F int $$ = new IntLitNode($1)

F ( E ) $$ = $2

Page 45: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 45

Example: 2 * (4 + 5): Steps in Creating AST

E

int (2)

T

FT

F E

TF

ETF

*

)

+

(

int (5)

int (4)

+54

*2

+54

2 (Only some of thesemantic values areshown)

Page 46: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 46

Leftmost and Rightmost Derivations

E→ E + E→ E * E + E→ int * E + E→ int * int + E→ int * int + intLeftmost derivation:

always act on leftmostnon-terminal

E→ E + E→ E + int→ E * E + int→ E * int + int→ int * int + intRightmost derivation:always act on rightmostnon-terminal

Page 47: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 47

rightmost Derivation in Detail (1)

E E

Page 48: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 48

rightmost Derivation in Detail (2)

E

E E+

E→ E + E

Page 49: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 49

rightmost Derivation in Detail (3)

E

E E+

int

E→ E + E→ E + int

Page 50: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 50

rightmost Derivation in Detail (4)

E

E

E E

E+

int*

E→ E + E→ E + int→ E * E + int

Page 51: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 51

rightmost Derivation in Detail (5)

E

E

E E

E+

int*

int

E→ E + E→ E + int→ E * E + int→ E * int + int

Page 52: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 52

rightmost Derivation in Detail (6)

E

E

E E

E+

int*

intint

E→ E + E→ E + int→ E * E + int→ E * int + int→ int * int + int

Page 53: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 53

Aside: Canonical Derivations

• Take a look at that last derivation in reverse.• The active part (red) tends to move left to

right.• We call this a reverse rightmost or canonical

derivation.• Comes up in bottom-up parsing. We’ll return

to it in a couple of lectures.

Page 54: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 54

Derivations and Parse Trees

• For each parse tree there is exactly oneleftmost and one rightmost derivation

• The difference is the order in which branchesare added, not the structure of the tree.

Page 55: Lecture 8inst.cs.berkeley.edu/~cs164/sp08/lectures/lecture8.pdf2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages •Intuition: A finite automaton that runs

2/8/2008 Prof. Hilfinger CS164 Lecture 8 55

Summary of Derivations

• We are not just interested in whether s ∈ L(G)• Also need derivation (or parse tree) and AST.• Parse trees slavishly reflect the grammar.• Abstract syntax trees abstract from the grammar,

cutting out detail that interferes with later stages.• A derivation defines a parse tree

– But one parse tree may have many derivations• Derivations drive translation (to ASTs, etc.)• Leftmost and rightmost derivations most important in

parser implementation