30
CSE 5317/4305 L5: Abstract Syntax 1 Abstract Syntax Leonidas Fegaras

Abstract Syntax

  • Upload
    lucius

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Leonidas Fegaras. Abstract Syntax. Abstract Syntax Tree (AST). A parser typically generates an Abstract Syntax Tree (AST): A parse tree is not an AST. get token. get next character. AST. scanner. parser. source file. token. E T E F T E - PowerPoint PPT Presentation

Citation preview

Page 1: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 1

Abstract Syntax

Leonidas Fegaras

Page 2: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 2

Abstract Syntax Tree (AST)

• A parser typically generates an Abstract Syntax Tree (AST):

• A parse tree is not an AST

scanner parser

get token

token

source file

get next character AST

E

T E

F T E

F T

F

id(x) + id(y) * id(z)

+

*x

y z

Page 3: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 3

Building Abstract Syntax Trees in Java

abstract class Exp {

}

class IntegerExp extends Exp {

public int value;

public IntegerExp ( int n ) { value=n; }

}

class TrueExp extends Exp {

public TrueExp () {}

}

class FalseExp extends Exp {

public FalseExp () {}

}

class VariableExp extends Exp {

public String value;

public VariableExp ( String n ) { value=n; }

}

Page 4: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 4

Exp (cont.)

class BinaryExp extends Exp {

public String operator;

public Exp left;

public Exp right;

public BinaryExp ( String o, Exp l, Exp r ) { operator=o; left=l; right=r; }

}

class UnaryExp extends Exp {

public String operator;

public Exp operand;

public UnaryExp ( String o, Exp e ) { operator=o; operand=e; }

}

class ExpList {

public Exp head;

public ExpList next;

public ExpList ( Exp h, ExpList n ) { head=h; next=n; }

}

Page 5: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 5

Exp (cont.)

class CallExp extends Exp {

public String name;

public ExpList arguments;

public CallExp ( String nm, ExpList s ) { name=nm; arguments=s; }

}

class ProjectionExp extends Exp {

public Exp value;

public String attribute;

public ProjectionExp ( Exp v, String a ) { value=v; attribute=a; }

}

Page 6: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 6

Exp (cont.)

class RecordElements {

public String attribute;

public Exp value;

public RecordElements next;

public RecordElements ( String a, Exp v, RecordElements el )

{ attribute=a; value=v; next=el; }

}

class RecordExp extends Exp {

public RecordElements elements;

public RecordExp ( RecordElements el ) { elements=el; }

}

Page 7: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 7

Examples

• The AST for the input (x-2)+3new BinaryExp("+",

new BinaryExp("-",

new VariableExp("x"),

new IntegerExp(2)),

new IntegerExp(3))

• The AST for the input f(x.A,true)new CallExp(“f”,

new ExpList(new ProjectionExp(new VariableExp("x"),

“A”),

new ExpList(new TrueExp(),null)))

Page 8: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 8

Gen

• A Java package for constructing and manipulating ASTs• you are required to use Gen for your project• it is basically a Java preprocessor that adds syntactic constructs

to the Java language to make the task of handling ASTs easier– uses a universal class Ast to capture any kind of AST– supports easy construction of ASTs using the #<...> syntax– supports pattern matching, editing, pretty-printing, etc– includes a symbol table class

• Architecture:

Gen javacfile.gen file.java file.class

Page 9: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 9

The Gen Ast Class

abstract class Ast {

}

class Number extends Ast {

public long value;

public Number ( long n ) { value = n; }

}

class Real extends Ast {

public double value;

public Real ( double n ) { value = n; }

}

class Variable extends Ast {

public String value;

public Variable ( String s ) { value = s; }

}

class Astring extends Ast {

public String value;

public Astring ( String s ) { value = s; }

}

Page 10: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 10

AST Nodes are Instances of Node

class Node extends Ast {

public String name;

public Arguments args;

public Node ( String n, Arguments a ) { tag = n; args = a; }

}

class Arguments {

public Ast head;

public Arguments tail;

public Arguments ( Ast h, Arguments t );

public final static Arguments nil;

public Arguments append ( Ast e );

}

Page 11: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 11

Example

To construct Binop(Plus,x,Binop(Minus,y,z))

in Java, use:new Node("Binop",

Arguments.nil.append(new Variable("Plus"))

.append(new Variable("x"))

.append(new Node("Binop",

Arguments.nil.append(new Variable("Minus"))

.append(new Variable("y"))

.append(new Variable("z")))))

• Ugly!• You should never use this kind of code in your project

Binop

Plus x Binop

Minus y z

Page 12: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 12

The #< > Brackets

When you write #<Binop(Plus,x,Binop(Minus,y,z))>

in your Gen file, it generates the following Java code:new Node("Binop",

Arguments.nil.append(new Variable("Plus"))

.append(new Variable("x"))

.append(new Node("Binop",

Arguments.nil.append(new Variable("Minus"))

.append(new Variable("y"))

.append(new Variable("z")))))

which represents the AST:Binop(Plus,x,Binop(Minus,y,z))

Binop

Plus x Binop

Minus y z

Page 13: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 13

Escaping a Value Using Backquote

• Objects of the class Ast can be included into the form generated by the #< > brackets by “escaping” them with a backquote (`)

• The operand of the escape operator is expected to be an object of class Ast that provides the value to “fill in” the hole in the bracketed text at that point– actually, an escaped string/int/double value is also lifted to an Ast

• For exampleAst x = #<join(a,b,p)>;

Ast y = #<select(`x,q)>;

Ast z = #<project(`y,A)>;

are equivalent to:Ast x = #<join(a,b,p)>;

Ast y = #<select(join(a,b,p),q)>;

Ast z = #<project(select(join(a,b,p),q),A)>;

Page 14: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 14

BNF of #< >

bracketed ::= "#<" expr ">" an AST construction

| "#[" arg "," ... "," arg "]" an Arguments construction

expr ::= name the representation of a variable name

| integer the repr. of an integer

| real the repr. of a real number

| string the repr. of a string

| "`" name escaping to the value of name

| "`(" code ")" escaping to the value of code

| name "(" arg "," ... "," arg ")“ the repr. of an AST node with >=0 children

| "`" name "(" arg "," ... "," arg ")" the repr. of an AST node with escaped name

| expr opr expr an AST node that represents a binary infix opr

| "`" name "[" expr "]" variable substitution

arg ::= expr the repr. of an expression

| "..." name escaping to a list of ASTs bound to name

| "...(" code ")" escaping to a list of ASTs returned by code

Page 15: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 15

“...” is for Arguments

• The three dots (...) construct is used to indicate a list of children in an AST node– name in “...name” must be an instance of the class Arguments

• For example, inArguments r = #[join(a,b,p),select(c,q)];

Ast z = #<project(...r)>;

• z will be bound to #<project(join(a,b,p),select(c,q))>

Page 16: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 16

Example

For example,<`f(6,...r,g("ab",`(k(x))),`y)>

is equivalent to the following Java code:new Node(f,

Arguments.nil.append(new Number(6))

.append(r)

.append(new Node("g",Arguments.nil.append(new Astring("ab"))

.append(k(x))))

.append(y)

• If f="h", r=#[2,z], y=#<m(1,"a")>, and k(x) returns the value #<8>, then the above term is equivalent to #<h(6,2,z,g("ab",8),m(1,"a"))>

Page 17: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 17

Pattern Matching

• Gen provides a case statement syntax with patterns• Patterns match the Ast representations with similar shape• Escape operators applied to variables inside these patterns

represent variable patterns, which “bind” to corresponding subterms upon a successful match

• This capability makes it particularly easy to write functions that perform source-to-source transformations

Page 18: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 18

Example

• A function that simplifies arithmetic expressions: Ast simplify ( Ast e ) {

#case e

| plus(`x,0) => return x;

| times(`x,1) => return x;

| times(`x,0) => return #<0>;

| _ => return e;

#end;

}

where the _ pattern matches any value.• For example, simplify(#<times(z,1)>) returns #<z>

Page 19: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 19

BNF

case_stmt ::= "#case" code case ... case "#end"

case ::= "|" expr guard "=>" code

guard ::= ":" code an optional condition

|

expr ::= name exact match with a variable name

| integer exact match with an integer

| real exact match with a real number

| string exact match with a string

| "`" name match with the value of name

| "`(" code ")" match with the value of code

| name "(" arg "," ... "," arg ")“ match with an AST node with zero or more children

| "`" name "(" arg "," ... "," arg ")" match with an AST node with escaped name

| expr opr expr an AST node that represents a binary infix operation

| "`" name "[" expr "]" second-order matching

| "_" match any Ast

arg ::= expr match with an Ast

| "..." name match with a list of ASTs bound to name

| "...(" code ")" match with a list of ASTs returned by code

| "..." match the rest of the arguments

Page 20: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 20

Examples

• The pattern `f(...r) matches any Ast Node– when it is matched with #<join(a,b,c)>, it binds

1) f to the string "join"

2) r to the Arguments #[a,b,c]

• The following function adds the terms #<8> and #<9> as children to any Node e:Ast add_arg ( Ast e ) {

#case e

| `f(...r) => return #<`f(8,9,...r)>;

| `x => return x;

#end;

}

Page 21: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 21

Another Example

• The following function switches the inputs of a binary join found as a parameter to a Node e:Ast switch_join_args ( Ast e ) {

#case e

| `f(...r,join(`x,`y),...s) => return #<`f(...r,join(`y,`x),...s)>;

| `x => return x;

#end;

}

Page 22: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 22

Second-Order Pattern Matching

• When `f[expr] is matched against an Ast e, it traverses the entire tree representation of e (in preorder) until it finds a tree node that matches the pattern expr– it fails when it does not find a match– when it finds a match

• it succeeds• it binds the variables in the pattern expr• it binds the variable f to a list of Ast (of class Arguments) that represents the

path from the root Ast to the Ast node that matched the pattern

• This is best used in conjunction with the bracketed expression `f[e], which uses the path bound in f to construct a new Ast with expr replaced with e

Page 23: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 23

Misc

• Another syntactic construct in Gen is a for-loop that iterates over Arguments:"#for" name "in" code "do" code "#end"

• For example,#for v in #[a,b,c] do

System.out.println(v);

#end;

Page 24: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 24

Adding Semantic Actions to a Parser

• Grammar:E ::= T E'

E' ::= + T E'

| - T E'

|

T ::= num

• Recursive descent parser:

int E () { return Eprime(T()); };

int Eprime ( int left ) {

if (current_token=='+') {

read_next_token();

return Eprime(left + T());

} else if (current_token=='-') {

read_next_token();

return Eprime(left - T());

} else return left; };

int T () {

if (current_token=='num') {

int n = num_value;

read_next_token();

return n;

} else error();

};

Page 25: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 25

Table-Driven Predictive Parsers

• use the parse stack to push/pop both actions and symbols but they use a separate semantic stack to execute the actionspush(S);

read_next_token();

repeat

X = pop();

if (X is a terminal or '$')

if (X == current_token)

read_next_token();

else error();

else if (X is an action)

perform the action;

else if (M[X,current_token] == "X ::= Y1 Y2 ... Yk")

{ push(Yk);

...

push(Y1);

}

else error();

until X == '$';

Page 26: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 26

Example

• Need to embed actions { code; } in the grammar rules• Suppose that pushV and popV are the functions to manipulate the

semantic stack• The following is the grammar of an interpreter that uses the

semantic stack to perform additions and subtractions:E ::= T E' $ { print(popV()); }

E' ::= + T { pushV(popV() + popV()); } E'

| - T { pushV(-popV() + popV()); } E'

|

T ::= num { pushV(num); }

• For example, for 1+5-2, we have the following sequence of actions:pushV(1); pushV(5); pushV(popV()+popV()); pushV(3);

pushV(-popV()+popV()); print(popV());

Page 27: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 27

Bottom-Up Parsers

• can only perform an action after a reduction• We can only have rules of the form

X ::= Y1 ... Yn { action }where the action is always at the end of the rule;

this action is evaluated after the rule X ::= Y1 ... Yn is reduced

• How? In addition to state numbers, the parser pushes values into the parse stack

• If we want to put an action in the middle of the rhs of a rule, we use a dummy nonterminal, called a markerFor example,

X ::= a { action } bis equivalent to

X ::= M b

M ::= a { action }

Page 28: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 28

CUP

• Both terminals and non-terminals are associated with typed values– these values are instances of the Object class (or of some subclass of the

Object class)– the value associated with a terminal is in most cases an Object, except for

an identifier which is a String, for an integer which is an Integer, etc– the typical values associated with non-terminals in a compiler are ASTs,

lists of ASTs, etc

• You can retrieve the value of a symbol s at the lhs of a rule by using the notation s:x, where x is a variable name that hasn't appeared elsewhere in this rule

• The value of the non-terminal defined by a rule is called RESULT and should always be assigned a value in the action– eg if the non-terminal E is associated with an Integer object, then

E ::= E:n PLUS E:m {: RESULT = n+m; :}

Page 29: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 29

Machinery

• The parse stack elements are of typestruct( state: int, value: Object )– int is the state number– Object is the value

• When a reduction occurs, the RESULT value is calculated from the values in the stack and is pushed along with the GOTO state

• Example:after the reduction by

E ::= E:n PLUS E:m {: RESULT = n+m; :}

the RESULT value isstack[top-2].value + stack[top].value

which is the new value pushed in the stack along with the GOTO state

Page 30: Abstract Syntax

CSE 5317/4305 L5: Abstract Syntax 30

ASTs in CUP

• Need to associate each non-terminal symbol with an AST typenon terminal Ast exp;

non terminal Arguments expl;

exp ::= exp:e1 PLUS exp:e2 {: RESULT = new Node(plus_exp,e1,e2); :}

| exp:e1 MINUS exp:e2 {: RESULT = new Node(minus_exp,e1,e2); :}

| id:nm LP expl:el RP {: RESULT = new Node(call_exp,el.reverse()

.cons(new Variable(nm))); :}

| INT:n {: RESULT = new Number(n.intValue()); :}

;

expl ::= expl:el COMMA exp:e {: RESULT = el.cons(e); :}

| exp:e {: RESULT = nil.cons(e); :}

;