Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3

Preview:

DESCRIPTION

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Shift-reduce parsing How does a shift-reduce parser work? How do we construct a shift-reduce parse table? LR(1), SLR(1), LALR(1) - PowerPoint PPT Presentation

Citation preview

Winter 2012-2013Compiler Principles

Syntax Analysis (Parsing) – Part 3Mayer Goldberg and Roman Manevich

Ben-Gurion University

2

Today

• Shift-reduce parsing– How does a shift-reduce parser work?– How do we construct a shift-reduce parse table?– LR(1), SLR(1), LALR(1)– Meaning of shift-reduce / reduce-reduce conflicts– Behavior on left-recursion/right-recursion

• Automatic parser generation (CUP)– Handling ambiguity

3

Model of an LR parser

LRParsing program

0

T

2

+

7

id

5

Stack

$ id + id + id

Outputstate

symbol

goto action

Input

4

LR parser stack

• Sequence made of state, symbol pairs• For instance a possible stack for the grammar

S E $E TE E + TT id T ( E )could be: 0 T 2 + 7 id 5

Form of LR parsing table

5

state terminals non-terminals

shift/reduceactions

gotopart

01...

sn

rk

shift state n reduce by rule k

gm

goto state m

acc

accept

error

6

LR parser table example

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

goto action STATE

T E $ ) ( + id

g6 g1 s7 s5 0

acc s3 1

2

g4 s7 s5 3

r3 r3 r3 r3 r3 4

r4 r4 r4 r4 r4 5

r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7

s9 s3 8

r5 r5 r5 r5 r5 9

7

Shift move

LRParsing program

q...

Stack

$ … a …

Output

goto action

Input

If action[q, a] = sn

8

Result of shift

If action[q, a] = sn

LRParsing program

naq...

Stack

$ … a …

Output

goto action

Input

9

Reduce move

• If action[qn, a] = rk• Production: (k) A β• If β= σ1… σn

Top of stack looks like q1 σ1… qn σn

• goto[q, A] = qm

LRParsing program

qn

q…

Stack

$ … a …

Output

goto action

Input

2*|β|

10

Result of reduce move

• If action[qn, a] = rk• Production: (k) A β• If β= σ1… σn

Top of stack looks like q1 σ1… qn σn

• goto[q, A] = qm

LRParsing program

Stack

Output

goto action

2*|β| qm

A

q

$ … a …Input

11

Accept move

LRParsing program

q...

Stack

$ a …

Output

goto action

Input

If action[q, a] = acceptparsing completed

12

Error move

LRParsing program

q...

Stack

$ … a …

Output

goto action

Input

If action[q, a] = errorparsing discovered a syntactic error

13

Parsing id+id$

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Stack Input Action0 id + id $ s5

Initialize with state 0

14

Parsing id+id$

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Stack Input Action0 id + id $ s5

Initialize with state 0

15

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r4

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

16

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r4

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

pop id 5

17

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r4

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

push T 6

18

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r2

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

19

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s3

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

20

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s5

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

21

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r4

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

22

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r40 E 1 + 3 T 4 $ r3

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

23

Parsing id+id$

Stack Input Action0 id + id $ s50 id 5 + id $ r40 T 6 + id $ r20 E 1 + id $ s30 E 1 + 3 id $ s50 E 1 + 3 id 5 $ r40 E 1 + 3 T 4 $ r30 E 1 $ s2

goto action ST E $ ) ( + id

g6 g1 s7 s5 0acc s3 1

2g4 s7 s5 3

r3 r3 r3 r3 r3 4r4 r4 r4 r4 r4 5r2 r2 r2 r2 r2 6

g6 g8 s7 s5 7s9 s3 8

r5 r5 r5 r5 r5 9

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

24

Constructing an LR parsing table

1. Construct a (determinized) transition diagram from LR items

2. If there are conflicts – stop3. Fill table entries from diagram

Form of LR(0) items

N αβ

Already matched To be matched

Input

Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

25

Types of LR(0) items

N αβ Shift Item

N αβ Reduce Item

26

27

LR(0) items enumeration example

• All items can be obtained by placing a dot at every position for every production:

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

1: S E$2: S E $3: S E $ 4: E T5: E T 6: E E + T7: E E + T8: E E + T9: E E + T 10: T i11: T i 12: T (E)13: T ( E)14: T (E )15: T (E)

Grammar LR(0) items

28

Operations for transition diagram construction

• Initial = {S’S$}• For an item set I

Closure(I) = Closure(I) + {Xµ is in grammar| NαXβ in I}

• Goto(I, X) = { NαXβ | NαXβ in I}

29

Initial example

• Initial = {S E $}(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Grammar

30

Closure example

• Initial = {S E $}• Closure({S E $}) =

S E $E TE E + TT id T ( E )

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Grammar

31

Goto example

• Initial = {S E $}• Closure({S E $}) =

S E $E TE E + TT id T ( E )

• Goto({S E $ , E E + T, T id}, E) = {S E $, E E + T}

(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

Grammar

32

Constructing the transition diagram

1. Start with state 0 containing itemClosure({S E $})

2. Repeat until no new states are discovered– For every state p containing item set Ip, and

symbol N, compute state q containing item setIq = Closure(goto(Ip, N))

33

Automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

S E$E TE E + TT iT (E)

T (E)E TE E + TT iT (E)

E E + T

T (E) S E$

S E$E E+ T E E+T

T iT (E)

T i

T (E)E E+T

E Tq0

q1

q2

q3

q4

q5

q6

q7

q8

q9

T

(

i

E

+

$

T

)

+

E

i

T

(i

(

34

Automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

S E$

q0

Initialize

35

Automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

S E$E TE E + TT iT (E)

q0

applyClosure

36

Automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

S E$E TE E + TT iT (E)

q0E T

q6

T

T (E)E TE E + TT iT (E)

(

T i

q5i

S E$E E+ T

q1

E

37

Automaton construction example(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

S E$E TE E + TT iT (E)

T (E)E TE E + TT iT (E)

E E + T

T (E) S E$

Z E$E E+ T E E+T

T iT (E)

T i

T (E)E E+T

E Tq0

q1

q2

q3

q4

q5

q6

q7

q8

q9

T

(

i

E

+

$

T

)

+

E

i

T

(i

(

terminal transition corresponds to shift action in parse table

non-terminal transition corresponds to goto action in parse table

a single reduce item corresponds to reduce action

38

Conflicts

• Can construct a diagram for every grammar but some may introduce conflicts

• shift-reduce conflict: an item set contains at least one shift item and one reduce item

• reduce-reduce conflict: an item set contains two reduce items

39

LR(0) conflicts

S E $E T E E + TT i T ( E )T i[E]

S E$E TE E + TT iT (E)T i[E] T i

T i[E]

q0

q5

T

(

i

E Shift/reduce conflict

40

LR(0) conflicts

S E $E T E E + TT i V iT ( E )

S E$E TE E + TT iT (E)T i[E] T i

V i

q0

q5

T

(

i

E reduce/reduce conflict

41

LR(0) conflicts

• Any grammar with an -rule cannot be LR(0)• Inherent shift/reduce conflict– A – reduce item– P αAβ – shift item– A can always be predicted from P αAβ

LR variants

• LR(0) – what we’ve seen so far• SLR(0)– Removes infeasible reduce actions via FOLLOW set

reasoning• LR(1)– LR(0) with one lookahead token in items

• LALR(0)– LR(1) with merging of states with same LR(0)

component

42

SRL parsing• A handle should not be reduced to a non-terminal

N if the lookahead is a token that cannot follow N• A reduce item N α is applicable only when the

lookahead is in FOLLOW(N)– If b is not in FOLLOW(N) we just proved there is no

derivation S =>* βNαb and thus it is safe to remove the reduce item from the conflicted state

• Differs from LR(0) only on the ACTION table– Now a row in the parsing table may contain both shift

actions and reduce actions and we need to consult the current token to decide which one to take

43

SLR action tableState i + ( ) [ ] $

0 shift shift

1 shift accept

2

3 shift shift

4 EE+T EE+T EE+T5 Ti Ti shift Ti6 ET ET ET7 shift shift

8 shift shift

9 T(E) T(E) T(E)

vs.

state action

q0 shift

q1 shift

q2

q3 shift

q4 EE+Tq5 Tiq6 ETq7 shift

q8 shift

q9 TE

SLR – use 1 token look-ahead LR(0) – no look-ahead

… as before…T i T i[E]

Lookahead token from the

input

44

Going beyond SLR(0)

• Some common language constructs introduce conflicts even for SLR

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

45

S’ → SS → L = RS → RL → * RL → idR → L

S’ → S

S → L = RR → L

S → R

L → * RR → LL → * RL → id

L → id

S → L = RR → LL → * RL → id

L → * R

R → L

S → L = R

S

L

R

id

*

=

R

*

id

R

L*

L

id

q0

q4

q7

q1

q3

q9

q6

q8

q2

q5

46

47

shift/reduce conflict

• S → L = R vs. R → L • FOLLOW(R) contains =

– S L ⇒ = R * R ⇒ = R

• SLR cannot resolve conflict

S → L = RR → L

S → L = RR → LL → * RL → id

=q6

q2

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

48

Inputs requiring shift/reduce

• For the input id the rightmost derivationS’ => S => R => L => id requires reducing in q2

• For the input id = idS’ => S => L = R => L = L => L = id => id = idrequires shifting

S → L = RR → L

S → L = RR → LL → * RL → id

=q6

q2(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

LR(1) grammars

• In SLR: a reduce item N α is applicable only when the lookahead is in FOLLOW(N)

• But FOLLOW(N) merges lookahead for all alternatives for N– Insensitive to the context of a given production

• LR(1) keeps lookahead with each LR item• Idea: a more refined notion of follows

computed per item

49

50

LR(1) items• LR(1) item is a pair

– LR(0) item– Lookahead token

• Meaning– We matched the part left of the dot, looking to match the part on the right of

the dot, followed by the lookahead token• Example

– The production L id yields the following LR(1) items

[L → ● id, *][L → ● id, =][L → ● id, id][L → ● id, $][L → id ●, *][L → id ●, =][L → id ●, id][L → id ●, $]

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

[L → ● id][L → id ●]

LR(0) items

LR(1) items

-closure for LR(1)

• For every [A → α ● Bβ , c] in S – for every production B→δ and every token b in

the grammar such that b FIRST(βc) – Add [B → ● δ , b] to S

51

(S’ → ∙ S , $)(S → ∙ L = R , $)(S → ∙ R , $)(L → ∙ * R , = )(L → ∙ id , = )(R → ∙ L , $ )(L → ∙ id , $ )(L → ∙ * R , $ )

(S’ → S ∙ , $)

(S → L ∙ = R , $)(R → L ∙ , $)

(S → R ∙ , $)

(L → * ∙ R , =)(R → ∙ L , =)(L → ∙ * R , =)(L → ∙ id , =)(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → id ∙ , $)(L → id ∙ , =)

(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → * R ∙ , =)(L → * R ∙ , $)

(R → L ∙ , =)(R → L ∙ , $)

(S → L = R ∙ , $)

S

L

R

id*

=

R

*id

R

L

*

L

id

q0

q4 q5

q7

q6

q9

q3

q1

q2

q8

(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → id ∙ , $)

(R → L ∙ , $)

(L → * R ∙ , $)

q11

q12

q10

Rq13

id

52

Back to the conflict

• Is there a conflict now?

53

(S → L ∙ = R , $)(R → L ∙ , $)

(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

=

q6

q2

LALR(1)• LR(1) tables have huge number of entries• Often don’t need such refined observation (and

cost)• Idea: find states with the same LR(0) component

and merge their lookaheads component as long as there are no conflicts

• LALR(1) not as powerful as LR(1) in theory but works quite well in practice– Merging may not introduce new shift-reduce conflicts,

only reduce-reduce, which is unlikely in practice

54

(S’ → ∙ S , $)(S → ∙ L = R , $)(S → ∙ R , $)(L → ∙ * R , = )(L → ∙ id , = )(R → ∙ L , $ )(L → ∙ id , $ )(L → ∙ * R , $ )

(S’ → S ∙ , $)

(S → L ∙ = R , $)(R → L ∙ , $)

(S → R ∙ , $)

(L → * ∙ R , =)(R → ∙ L , =)(L → ∙ * R , =)(L → ∙ id , =)(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → id ∙ , $)(L → id ∙ , =)

(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → * R ∙ , =)(L → * R ∙ , $)

(R → L ∙ , =)(R → L ∙ , $)

(S → L = R ∙ , $)

S

L

R

id*

=

R

*id

R

L

*

L

id

q0

q4 q5

q7

q6

q9

q3

q1

q2

q8

(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → id ∙ , $)

(R → L ∙ , $)

(L → * R ∙ , $)

q11

q12

q10

Rq13

id

55

(S’ → ∙ S , $)(S → ∙ L = R , $)(S → ∙ R , $)(L → ∙ * R , = )(L → ∙ id , = )(R → ∙ L , $ )(L → ∙ id , $ )(L → ∙ * R , $ )

(S’ → S ∙ , $)

(S → L ∙ = R , $)(R → L ∙ , $)

(S → R ∙ , $)

(L → * ∙ R , =)(R → ∙ L , =)(L → ∙ * R , =)(L → ∙ id , =)(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → id ∙ , $)(L → id ∙ , =)

(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → * R ∙ , =)(L → * R ∙ , $)

(R → L ∙ , =)(R → L ∙ , $)

(S → L = R ∙ , $)

S

L

R

id*

=

R

*id

R

L

*

L

id

q0

q4 q5

q7

q6

q9

q3

q1

q2

q8

(L → * ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

(L → * R ∙ , $)

q10

Rq13

id

56

57

Left/Right- recursion

• At home: create a simple grammar with left-recursion and one with right-recursion

• Construct corresponding LR(0) parser– Any conflicts?

• Run on simple input and observe behavior– Attempt to generalize observation for long inputs

Automated Parser Generation(via CUP)

58

High-level structure

JFlex javacLexerspec

Lexical analyzer

text

tokens

.java

CUP javacParserspec .java Parser

AST

TPL.cup

TPL.lex

sym.javaParser.java

Lexer.java

(Token.java)

59

Expression calculator

expr expr + expr| expr - expr| expr * expr| expr / expr| - expr| ( expr )| number

Goals of expression calculator parser:• Is 2+3+4+5 a valid expression?• What is the meaning (value) of this expression?

60

Syntax analysis with CUP

CUP javacParserspec .java Parser

AST

CUP – parser generator Generates an LALR(1) Parser Input: spec file Output: a syntax analyzer

tokens

61

CUP spec file

• Package and import specifications• User code components• Symbol (terminal and non-terminal) lists– Terminals go to sym.java– Types of AST nodes

• Precedence declarations• The grammar– Semantic actions to construct AST

62

Expression Calculator – 1st Attempt

terminal Integer NUMBER;terminal PLUS, MINUS, MULT, DIV;terminal LPAREN, RPAREN;

non terminal Integer expr;

expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr| LPAREN expr RPAREN| NUMBER

;

Symbol typeexplained later

63

Ambiguities

a * b + c

a b c

+

*

a b c

*

+

a + b + ca b c

+

+

a b c

+

+

64

terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV;terminal LPAREN, RPAREN;terminal UMINUS;non terminal Integer expr;

precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left UMINUS;

expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec UMINUS| LPAREN expr RPAREN| NUMBER

;

Expression Calculator – 2nd Attempt

Increasing precedence

Contextual precedence

65

Parsing ambiguous grammars using precedence declarations

• Each terminal assigned with precedence– By default all terminals have lowest precedence– User can assign his own precedence– CUP assigns each production a precedence

• Precedence of last terminal in production• or user-specified contextual precedence

• On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce

• In case of equal precedences left/right help resolve conflicts– left means reduce– right means shift

• More information on precedence declarations in CUP’s manual

66

Resolving ambiguity

a + b + c

a b c

+

+

a b c

+

+

precedence left PLUS

67

Resolving ambiguity

a * b + c

a b c

+

*

a b c

*

+

precedence left PLUSprecedence left MULT

68

Resolving ambiguity

- a - b

a b

-

-

MINUS expr %prec UMINUS

a

-b

-

69

Resolving ambiguityterminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV;terminal LPAREN, RPAREN;terminal UMINUS;

precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left UMINUS;

expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec

UMINUS| LPAREN expr RPAREN| NUMBER

;

Rule has precedence of UMINUS

UMINUS never returnedby scanner

(used only to define precedence)

70

More CUP directives• precedence nonassoc NEQ– Non-associative operators: < > == != etc.– 1<2<3 identified as an error (semantic error?)

• start non-terminal– Specifies start non-terminal other than first non-terminal– Can change to test parts of grammar

• Getting internal representation– Command line options:

• -dump_grammar• -dump_states • -dump_tables• -dump

71

import java_cup.runtime.*;%%%cup%eofval{ return new Symbol(sym.EOF);%eofval}NUMBER=[0-9]+%%<YYINITIAL>”+” { return new Symbol(sym.PLUS); }<YYINITIAL>”-” { return new Symbol(sym.MINUS); }<YYINITIAL>”*” { return new Symbol(sym.MULT); }<YYINITIAL>”/” { return new Symbol(sym.DIV); }<YYINITIAL>”(” { return new Symbol(sym.LPAREN); }<YYINITIAL>”)” { return new Symbol(sym.RPAREN); }<YYINITIAL>{NUMBER} {

return new Symbol(sym.NUMBER, new Integer(yytext()));}<YYINITIAL>\n { }<YYINITIAL>. { }

Parser gets terminals from the scanner

Scanner integration

Generated from tokendeclarations in .cup file

72

Recap

• Package and import specifications and user code components

• Symbol (terminal and non-terminal) lists– Define building-blocks of the grammar

• Precedence declarations– May help resolve conflicts

• The grammar– May introduce conflicts that have to be resolved

73

Assigning meaning

• So far, only validation• Add Java code implementing semantic actions

expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec UMINUS| LPAREN expr RPAREN| NUMBER

;

74

• Symbol labels used to name variables• RESULT names the left-hand side symbol

expr ::= expr:e1 PLUS expr:e2{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}| expr:e1 MINUS expr:e2{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}| expr:e1 MULT expr:e2{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}| expr:e1 DIV expr:e2{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}| MINUS expr:e1{: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS| LPAREN expr:e1 RPAREN{: RESULT = e1; :}| NUMBER:n {: RESULT = n; :};

Assigning meaning

75

Building an AST

• More useful representation of syntax tree– Less clutter– Actual level of detail depends on your design

• Basis for semantic analysis• Later annotated with various information– Type information– Computed values

76

Parse tree vs. AST

+

expr

1 2 + 3

expr

expr

( ) ( )

expr

expr

1 2

+

3

+

77

AST construction• AST Nodes constructed during parsing– Stored in push-down stack

• Bottom-up parser– Grammar rules annotated with actions for AST

construction– When node is constructed all children available

(already constructed)– Node (RESULT) pushed on stack

• Top-down parser– More complicated

78

1 + (2) + (3)

expr + (expr) + (3)

+

expr

1 2 + 3

expr

expr + (3)

expr

( ) ( )

expr + (expr)

expr

expr

expr

expr + (2) + (3)

int_constval = 1

pluse1 e2

int_constval = 2

int_constval = 3

pluse1 e2

expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :}

AST construction

79

terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI;terminal UMINUS;non terminal Integer expr;non terminal expr_list, expr_part; precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left UMINUS;

expr_list ::= expr_list expr_part | expr_part

; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI

; expr ::= expr PLUS expr

| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec UMINUS| LPAREN expr RPAREN| NUMBER

;

Designing an AST

80

See you next time

Recommended