53
EDAN65: Compilers, Lecture 03 Context-free grammars, introduction to parsing Görel Hedin Revised: 2018-09-10

EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

EDAN65:Compilers,Lecture 03

Context-free grammars,introductionto parsingGörelHedinRevised:2018-09-10

Page 2: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Courseoverview

Semantic analyzer

Intermediatecode generator

Optimizer

Targetcodegenerator

2

Lexical analyzer(scanner)

Syntactic analyzer(parser)

Regularexpressions

Context-freegrammar

Attributegrammar

machine

runtime system

stack

heap

codeanddata

objects

activationrecords

Interpreter

target code

tokens

Attributed AST

intermediate code

sourcecode (text)

AST(Abstractsyntaxtree)

intermediate code

garbagecollection

Virtualmachine

This lecture

Page 3: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Analyzing programtext

3

sum = sum + k ;

AssignStmt

Exp

Add

Exp Exp

ID EQ ID PLUS ID SEMIprogram text

tokens

parse tree

Page 4: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Analyzing programtext

4

AssignStmt

Exp

Add

Exp Exp

ID EQ ID PLUS ID SEMIprogram text

tokens

parse tree

sum = sum + k ; \n

non-tokens(likewhitespace)arediscarded

Page 5: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Recall:Generatingthecompiler:

Semantic analyzer

Lexical analyzer(scanner)

Syntactic analyzer(parser)

Regularexpressions

ScannergeneratorJFlex

Context-freegrammar

ParsergeneratorBeaver

Attributegrammar

Attribute evaluatorgenerator

We will use aparsergeneratorcalled Beaver

5

tokens

text

tree

Page 6: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

6

Context-Free Grammars

Page 7: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Regular ExpressionsvsContext-Free Grammars

7

AnREcan have iteration

ACFGcan also have recursion(itispossible toderive asymbol,e.g.,Stmt,fromitself)

Example REs:WHILE = "while"ID = [a-z][a-z0-9]*LPAR = "("RPAR = ")"PLUS = "+"...

Example CFG:Stmt –> WhileStmtStmt –> AssignStmtWhileStmt –> WHILE LPAR Exp RPAR StmtExp –> IDExp –> Exp PLUS Exp...

Page 8: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

ElementsofaContext-Free Grammar

8

Production rules:X –> s1 s2 … sn

where sk isasymbol(terminalornonterminal),n>=0

Nonterminal symbols

Terminalsymbols(tokens)

Startsymbol(one ofthenonterminals,usually theleft-handside of thefirst production)

Example CFG:Stmt –> WhileStmtStmt –> AssignStmtWhileStmt –> WHILE LPAR Exp RPAR StmtAssignStmt –> ID EQ Exp SEMIC…

Page 9: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Shorthand foralternatives

9

Stmt –> WhileStmtStmt –> AssignStmt

Stmt –> WhileStmt | AssignStmt

isequivalent to

Page 10: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Shorthand forrepetition

10

Stmt*

StmtList –> e | Stmt StmtList

isequivalent to

StmtList

where

Page 11: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

ExerciseConstruct agrammar covering this programandsimilar ones:

11

Example program:while (k <= n) {sum = sum + k; k = k+1;}

CFG:Stmt –> WhileStmt | AssignStmt | CompoundStmtWhileStmt –> "while" "(" Exp ")" StmtAssignStmt –> ID "=" Exp ";"CompoundStmt –> ...Exp –> ...LessEq –> ...Add –> ...

(Often,simpletokensarewritten directly astextstrings)

Page 12: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

SolutionConstruct agrammar covering this programandsimilar ones:

12

CFG:Stmt –> WhileStmt | AssignStmt | CompoundStmtWhileStmt –> "while" "(" Exp ")" StmtAssignStmt –> ID "=" Exp ";"CompoundStmt –> "{" Stmt* "}"Exp –> LessEq | Add | ID | INTLessEq –> Exp "<=" ExpAdd –> Exp "+" Exp

Example program:while (k <= n) {sum = sum + k; k = k+1;}

Page 13: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

ParsingUse thegrammar toderive atree foraprogram:

13sum = sum + k ;

StmtExample program:sum = sum + k;

Startsymbol

Page 14: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Parse treeUse thegrammar toderive aparse tree foraprogram:

14sum = sum + k ;

Stmt

AssignStmt

Exp

Add

Exp Exp

Example program:sum = sum + k;

Nonterminalsareinnernodes

Startsymbol

Terminalsareleafs

Aparse tree includes alltheinputtokensasleafs.

Page 15: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Corresponding abstractsyntaxtree(will bediscussed inlaterlecture)

15sum = sum + k ;

AssignStmt

Add

IdExp IdExp

Example program:sum = sum + k;

IdExp

Anabstractsyntaxtree issimilarto aparse tree,but simpler.

Itincludesonlysomeofthetokens.

(Tokensthatcouldbegeneratedfromthetreeareexcluded.)

Page 16: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

EBNFvsCanonical Form

16

EBNF:Stmt –> AssignStmt | CompoundStmtAssignStmt –> ID "=" Exp ";"CompoundStmt –> "{" Stmt* "}"Exp –> Add | IDAdd –> Exp "+" Exp

Canonical form:Stmt –> ID "=" Exp ";"Stmt –> "{" Stmts "}"Stmts –> eStmts –> Stmt StmtsExp –> Exp "+" ExpExp –> ID

(Extended)Backus-Naur Form:• Compact,easytoreadandwrite• EBNFhasalternatives,repetition,optionals,parentheses (likeREs)

• Commonnotationforpracticaluse

Canonical form:• Core formalismforCFGs• Useful forproving properties andexplaining algorithms

Page 17: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Realworldexample:TheJavaLanguageSpecification

17

See http://docs.oracle.com/javase/specs/jls/se8/html/index.html• See Chapter 2about theJavagrammar notation.• Lookatsome other chapters to see other syntaxexamples.

CompilationUnit:[PackageDeclaration]{ImportDeclaration}{TypeDeclaration}

PackageDeclaration:{PackageModifier}package Identifier {.Identifier};

PackageModifier:Annotation

Page 18: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

18

Formaldefinitionof CFGs

Page 19: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

FormaldefinitionofCFGs (canonical form)

19

Acontext-free grammar G=(N,T,P,S),whereN– thesetofnonterminalsymbolsT– thesetofterminalsymbolsP– thesetofproduction rules,each withtheform

X–>Y1 Y2 …Ynwhere X∈ N,n≥ 0,andYk∈ N∪ T

S– thestartsymbol(one ofthenonterminals).I.e.,S∈ N

So,theleft-hand side Xofarule isanonterminal.

Andtheright-hand side Y1 Y2 …Yn isasequence of nonterminalsandterminals.

If therhs foraproduction isempty,i.e.,n=0,we writeX–>e

Page 20: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

AgrammarG defines alanguageL(G)

20

A context-free grammar G=(N,T,P,S),whereN– thesetofnonterminalsymbolsT– thesetofterminalsymbolsP– thesetofproduction rules,each withtheform

X–>Y1 Y2 …Ynwhere X∈ N,n≥ 0,andYk∈ N∪ T

S– thestartsymbol(one ofthenonterminals).I.e.,S∈ N

Gdefines alanguage L(G) overthealphabet T

T*isthesetofallpossible sequences ofTsymbols.

L(G)isthesubsetofT*thatcan bederived fromthestartsymbolS,byfollowing theproduction rules P.

Page 21: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Exercise

21

G = (N, T, P, S)

P = {Stmt –> ID "=" Exp ";",Stmt –> "{" Stmts "}" ,Stmts –> e ,Stmts –> Stmt Stmts ,Exp –> Exp "+" Exp ,Exp –> ID

}

N = { }

T = { }

S =

L(G) = {

}

Page 22: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Solution

22

G = (N, T, P, S)

P = {Stmt –> ID "=" Exp ";",Stmt –> "{" Stmts "}" ,Stmts –> e ,Stmts –> Stmt Stmts ,Exp –> Exp "+" Exp ,Exp –> ID

}

N = {Stmt, Exp, Stmts}

T = {ID, "=", "{", "}", ";", "+"}

S = Stmt

L(G) = {"{" "}","{" "{" "}" "}",ID "=" ID ";","{" ID "=" ID ";" "}",ID "=" ID "+" ID ";","{" "{" "}" "{" "}" "}","{" "{" "{" "}" "}" "}","{" ID "=" ID "+" ID ";" "}",ID "=" ID "+" ID "+" ID ";",...

}

Thesequences inL(G)areusually called sentences orstrings

Page 23: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

23

Derivations

Page 24: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Derivationstep

24

If we have asequence ofterminalsandnonterminals,e.g.,

XaYY b

we can replace one ofthenonterminals,applying aproductionrule.Thisiscalled aderivationstep.(Swedish:Härledningssteg)

Supposethere isaproduction

Y–>Xa

andwe apply itforthefirstYinthesequence.We write thederivationstepasfollows:

XaYY b=>XaXaYb

Page 25: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Derivation

25

Aderivation,issimply asequence ofderivationsteps,e.g.:

g0 =>g1 =>…=>gn (n≥0)

where each gi isasequence ofterminalsandnonterminals

Ifthere isaderivationfromg0 togn,we can write thisas

g0 =>*gn

Sothismeans itispossible togetfromthesequence g0 tothesequence gn byfollowing theproduction rules.

Page 26: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Definitionofthelanguage L(G)

26

Recall that:

G=(N,T,P,S)

T* isthesetofallpossible sequences ofT symbols.

L(G)isthesubsetofT*thatcan bederived fromthestartsymbol S,byfollowing theproduction rules P.

Using theconcept ofderivations,we can formally define L(G) asfollows:

L(G)={w∈ T*| S=>*w}

Page 27: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Exercise:Prove thatasentence belongs toalanguage

27

Prove that

INT+INT*INT

Proof:(byshowing allthederivationstepsfromthestartsymbolExp):

Exp=>

belongs tothelanguage ofthefollowing grammar:

p1: Exp –>Exp "+" Expp2: Exp –>Exp "*" Expp3: Exp –> INT

Page 28: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Solution:Prove thatasentence belongs toalanguage

28

Prove that

INT+INT*INT

Proof:(byshowing allthederivationstepsfromthestartsymbolExp)

Exp=>p1 Exp "+" Exp=>p3 INT"+"Exp=>p2 INT"+"Exp "*"Exp=>p3 INT"+"INT"*"Exp=>p3 INT"+"INT"*"INT

belongs tothelanguage ofthefollowing grammar:

p1: Exp –>Exp "+" Expp2: Exp –>Exp "*" Expp3: Exp –> INT

Page 29: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Leftmost andrightmost derivations

29

Inaleftmost derivation,theleftmost nonterminalisreplacedineach derivationstep,e.g.,:

Exp =>Exp "+" Exp =>INT"+"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT

LLparsingalgorithms use leftmost derivation.LRparsingalgorithms use rightmost derivation.Willbediscussed inlaterlectures.

Inarightmost derivation,therightmost nonterminalisreplaced ineach derivationstep,e.g.,:

Exp =>Exp "+" Exp =>Exp "+"Exp "*"Exp =>Exp "+"Exp "*"INT=>Exp "+"INT"*"INT=>INT"+"INT"*"INT

Page 30: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Aderivationcorresponds tobuilding aparse tree

30

Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT

Example derivation:

Exp =>Exp "+" Exp =>INT"+"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT

Exercise:build theparse tree(also called derivationtree).

Page 31: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Aderivationcorresponds tobuilding aparse tree

31

Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT

Example derivation:

Exp =>Exp "+" Exp =>INT"+"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT

Parse tree (derivationtree):

Exp

Exp Exp

Exp Exp

"+"

INT"*"

INT INT

Page 32: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

32

Ambiguities

Page 33: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Exercise:Canwe do another derivationofthesamesentence,

thatgivesadifferentparse tree?

33

Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT

Parse tree:

Anotherderivation:

Exp =>

Page 34: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Solution:Canwe do another derivationofthesamesentence,

thatgivesadifferentparse tree?

34

Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT

Parse tree:

Exp

Exp"*"

INT

Exp

Exp Exp"+"

INT INT

Anotherderivation:

Exp =>Exp "*" Exp =>Exp "+"Exp "*"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT

Which parse tree would we prefer?

Page 35: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Ambiguous context-freegrammars

35

ACFGisambiguous if asentence inthelanguage can bederived bytwo (ormore)differentparse trees.

ACFGisunambiguous if each sentence inthelanguage canbederived byonly one parse tree.

(Swedish:tvetydig,otvetydig)

Note!There can bemany differentderivationsthat give thesameparse tree.

Page 36: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

HowcanweknowifaCFGisambiguous?

36

Ifwe find anexample of anambiguity,we know thegrammar isambiguous.

There are algorithms fordeciding if aCFGbelongs to certainsubsets of CFGs,e.g.LL,LR,etc.(See laterlectures.)Thesegrammarsare unambiguous.

But inthegeneralcase,theproblemisundecidable:itisnotpossible to construct ageneralalgorithm that decidesambiguity foranarbitrary CFG.

Strategies foreliminating ambiguities,next lecture.

Page 37: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

37

Parsing

Page 38: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Differentparsingalgorithms

38

Ambiguous

Unambiguous

Allcontext-freegrammars

LR

LL

LL:Left-to-rightscanLeftmost derivationBuilds tree top-downSimpleto understand

LR:Left-to-rightscanRightmost derivationBuilds tree bottom-upMore powerful

Page 39: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

LLandLRparsers:main idea

39

...if IDthen ID=ID;ID...

LR(1):decidestobuildAssignafterseeingthefirsttokenfollowingitssubtree.Thetreeisbuiltbottomup.

Id Assign

Id Id

Thetokeniscalled lookahead.LL(k)andLR(k)use k lookahead tokens.

Inpractice,k=1isusuallyused

...if IDthen ID=ID;ID...

IfStmt

Id Assign

LL(1):decidestobuildAssignafterseeingthefirsttokenofitssubtree.Thetreeisbuilttopdown.

CompoundStmt

Page 40: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Recursive-descentparsingAwayofprogramminganLL(1)parserbyrecursivemethodcalls

40

Assume aBNFgrammar with exactly one production rule foreach nonterminal.(Can easily begeneralized to EBNF.)

Each production rule RHSiseither1. asequence of token/nonterminal symbols,or2. asetof nonterminal symbolalternatives

Foreach nonterminal,amethod isconstructed.Themethod1. matches tokensandcallsnonterminal methods,or2. callsone of thenonterminal methods – which one depends onthe

lookahead token.

Ifthelookahead tokendoes notmatch,aparsingerror isreported.

A–>B|C|DB–>eCfDC–>...D–>...

Page 41: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

ExampleJavaimplementation:overview

41

statement –>assignment |compoundStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACE...

class Parser{privateint token; //current lookahead tokenvoid accept(int t){...} //accepttandreadinnext tokenvoid error(Stringstr){...} //generate error messagevoid statement(){...}void assignment (){...}void compoundStmt (){...}...

}

Page 42: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Example:recursivedescentmethods

42

statement –>assignment |compoundStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACE

class Parser{void statement(){switch(token){case ID:assignment();break;case LBRACE:compoundStmt();break;default:error("Expecting statement,found:"+token);}}void assignment(){accept(ID);accept(ASSIGN);expr();accept(SEMICOLON);}void compoundStmt(){accept(LBRACE);while (token!=RBRACE){statement();}accept(RBRACE);}...}

Page 43: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Example:Parserskeletondetails

43

statement –>assignment |compoundStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACEexpr –>...

class Parser{finalstatic int ID=1,WHILE=2,DO=3,ASSIGN=4,...;privateint token; //current lookahead tokenvoid accept(int t){ //accepttandreadinnext tokenif (token==t){token=nextToken();}else {error("Expected "+t+",but found "+token);}}void error(Stringstr){...} //generate error messageprivateint nextToken(){...}//readnext tokenfromscannervoid statement()......

}

Page 44: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

ArethesegrammarsLL(1)?

44

expr –>name params |name

Whatwouldhappeninarecursive-descentparser?

Could they beLL(2)?LL(k)?

Commonprefix

expr –>expr "+" term Left recursion

Page 45: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Dealingwithcommonprefixoflimitedlength:Locallookahead

45

LL(2)grammar:statement –>assignment |compoundStmt |callStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACEcallStmt –> IDLPAR expr RPAR SEMICOLON

voidstatement()...

Page 46: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

46

LL(2)grammar:statement –>assignment |compoundStmt |callStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACEcallStmt –> IDLPAR expr RPAR SEMICOLON

voidstatement(){switch(token){caseID:if(lookahead(2) ==ASSIGN){assignment();}else{callStmt();}break;caseLBRACE:compoundStmt();break;default:error("Expectingstatement,found:"+token);}}

Dealingwithcommonprefixoflimitedlength:Locallookahead

Page 47: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Generatingtheparser:

Syntactic analyzer(parser)

Context-freegrammar Parsergenerator

47

tokens

tree

Page 48: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Beaver:anLR-based parsergenerator

ParserinJava

Context-freegrammar,

with semanticactionsinJava

Beaver

48

tokens

tree

Page 49: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Example beaver specification

49

%class "LangParser";%package "lang";...%terminalsLET,IN,END,ASSIGN,MUL,ID,NUMERAL;

%goal program;//Thestartsymbol

//Context-free grammarprogram=exp;exp =factor |exp MULfactor;factor =let |numeral |id;let =LETidASSIGNexp INexp END;numeral =NUMERAL;id=ID;

Lateron,we will extend this specification with semantic actionsto build thesyntaxtree.

Page 50: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

50

RE CFGTypicalAlphabet

characters terminalsymbols(tokens)

Language isasetof ...

strings(charsequences)

sentences(tokensequences)

Used for... tokens parsetreesPower iteration recursionRecognizer DFA DFAwith stack

RegularExpressionsvsContext-FreeGrammars

Page 51: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

51

Grammar Rulepatterns Typeregular X –>aY orX –>a orX –>e 3

contextfree X–> g 2context sensitive a X b –>a g b 1

arbitrary g –>d 0

TheChomskyhierarchy of formalgrammars

a – terminalsymbola, b, g, d – sequences of (terminalornonterminal)symbols

Type(3)⊂ Type (2)⊂ Type(1)⊂ Type(0)

Regular grammarshave thesamepower asregular expressions(tail recursion =iteration).

Type 2and3are of practicaluse incompiler construction.Type 0and1are only of theoretical interest.

Page 52: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Courseoverview

Semantic analyzer

52

Lexical analyzer(scanner)

Syntactic analyzer(parser)

Regularexpressions

Context-freegrammar

Attributegrammar

tokens

sourcecode (text)

AST(Abstractsyntaxtree)

What we have covered:Context-free grammars,derivations,parse treesAmbiguous grammarsIntroduction to parsing,recursive-descent

You can now finishassignment 1

Page 53: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer

Summary questions

53

• Construct aCFGforasimplepartof aprogramming language.• What isanonterminal symbol?Aterminalsymbol?Aproduction?Astartsymbol?Aparse tree?• What isaleft-handside of aproduction?Aright-handside?• Givenagrammar G,what ismeant bythelanguage L(G)?• What isaderivationstep?Aderivation?Aleftmost derivation?Arighmostderivation?• How does aderivationcorrespond to aparse tree?• What does itmean foragrammar to beambiguous?Unambiguous?• Give anexample anambiguous CFG.• What isthedifference between anLLandanLRparser?• What isthedifference between LL(1)andLL(2)?Orbetween LR(1)andLR(2)?• Construct arecursive descent parserforasimplelanguage.• Give typical examples of grammarsthat cannot behandledbyarecursive-descent parser.• Explain why context-free grammarsare more powerful than regularexpressions.• Inwhat senseare context-free grammars"context-free"?