21
009 YEAR 20 732 Software Tools ANTLR COMPSCI ANTLR ealand Part II - Lecture 8 kland | New Ze versity of Auck The Univ 1

09 2 Software Tools ANTLR - cs.auckland.ac.nz€¦ · 0 09 YEAR 2 732 Software Tools ANTLR COMPSCI e aland Part II - Lecture 8 k land | New Z v ersity of Auc The Uni 1

  • Upload
    dinhthu

  • View
    221

  • Download
    6

Embed Size (px)

Citation preview

009

YEAR

20 7

32

Software ToolsANTLR

CO

MP

SC

I ANTLR

eala

nd Part II - Lecture 8

klan

d | N

ew Z

eve

rsity

of A

uck

The

Uni

v

1

Today’s Outline009

y

YEAR

20 7

32

• Introduction to ANTLRP i A ti

CO

MP

SC

I • Parsing Actions• Generators

eala

ndkl

and

| New

Ze

vers

ity o

f Auc

kTh

e U

niv

2

009

YEAR

20 7

32C

OM

PS

CI

Introduction to ANTLR

eala

nd

Introduction to ANTLR

klan

d | N

ew Z

eve

rsity

of A

uck

The

Uni

v

3

ANTLR009 • Parser/lexer generator: takes a grammar and

YEAR

20 7

32

g ggenerates a LL(k) lexer and/or parser for you– Written in Java, open source software

CO

MP

SC

I

– Can generate Java, C#, C++, Python, …– Uses the regular expression / grammar syntax

eala

nd

g p g ythat we have learned in the last lecture

– Grammar files have suffix .g

klan

d | N

ew Z

e

• Besides simple LL(k), ANTLR supports backtracking:– If it is unclear which rule alternative to apply,

vers

ity o

f Auc

k

alternatives are applied speculatively– If a choice turns out to be wrong, backtracking is

d d h l i i i d

The

Uni

v used and another alternative is tried4

ANTLR Example: Java.g009

p ggrammar Java;options { backtrack true; memoize true; }

YEAR

20 7

32

options { backtrack=true; memoize=true; }

compilationUnit:( (annotations)? packageDeclaration )?

CO

MP

SC

I (importDeclaration)* (typeDeclaration)* ;

packageDeclaration: 'package' qualifiedName ';' ;

importDeclaration:

eala

nd

importDeclaration:'import' ('static')? IDENTIFIER '.' '*' ';'

| 'import' ('static')?IDENTIFIER ('.' IDENTIFIER)+ ('.' '*')? ';' ;

klan

d | N

ew Z

e

• The “Java” grammar uses backtracking

( ) ( ) ; ;

qualifiedImportName: IDENTIFIER ('.' IDENTIFIER)* ; // ...

vers

ity o

f Auc

k The Java grammar uses backtracking• Some grammar rules define simple tokens directly,

e.g. 'import', 'static'

The

Uni

v

• Grammar rules also refer to tokens of the lexer, which is defined later on in Java.g 5

The Lexer in Java.g009

gLONGLITERAL: IntegerNumber LongSuffix ;

INTLITERAL: IntegerNumber ;

YEAR

20 7

32

INTLITERAL: IntegerNumber ;

fragment IntegerNumber: '0' // number zero| '1'..'9' ('0'..'9')* // decimal numbers| '0' ('0' '7')+ // octal n mbe s

CO

MP

SC

I | '0' ('0'..'7')+ // octal numbers| HexPrefix HexDigit+ ; // hexadecimal numbers

fragment HexPrefix: '0x' | '0X' ;

eala

nd

fragment HexDigit: ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment LongSuffix: 'l' | 'L' ;

ABSTRACT ' b t t' //

klan

d | N

ew Z

e

• The lexer rules come right after the parser rules( h i l l N )

ABSTRACT: 'abstract' ; // ...

vers

ity o

f Auc

k (some grammars have an optional lexer lexerName; )• Lexer rules use essentially the same syntax as parser rules

L l s s s b l s (f t l s) th t d t

The

Uni

v • Lexer rules can use subrules (fragment rules) that do not define tokens themselves but are used by other rules 6

Generating and Using L x rs nd P rs rs

009

Lexers and Parsers• Generate Java classes for parser and lexer by executing

YEAR

20 7

32

p y gclass org.antlr.Tool with command line arguments:-Xconversiontimeout 100000 -o src\pdstore\java Java.g

(timeout for backtracking) (output folder) (input)

CO

MP

SC

I (timeout for backtracking) (output folder) (input)• This generates classes JavaLexer and JavaParser, which

can be used from other classes, e.g.

eala

nd

import org.antlr.runtime.*; // ...public class Import {

public static void main(String[] args) { // ...

klan

d | N

ew Z

e public static void main(String[] args) { // ...CharStream input = new ANTLRFileStream(args[0]); JavaLexer lexer = new JavaLexer(input);CommonTokenStream tokens = new CommonTokenStream();

vers

ity o

f Auc

k ();tokens.setTokenSource(lexer);JavaParser parser = new JavaParser(tokens);

// start parsing at the compilationUnit rule

The

Uni

v

7

// start parsing at the compilationUnit ruleparser.compilationUnit(); // ...

} }

009

YEAR

20 7

32C

OM

PS

CI

Parsing Actions

eala

nd

Parsing Actions

klan

d | N

ew Z

eve

rsity

of A

uck

The

Uni

v

8

Parsing Actions009

g• We want to do things with the source code we parse

YEAR

20 7

32

• Idea: whenever we have recognized part of the language, execute some code (“action”)

CO

MP

SC

I • Actions can be at beginning (@init{ }), end (@after{ }) or anywhere else in the rule body ({ })

eala

nd

compilationUnit@init { System.out.println("Rule application has begun");}@after{ System.out.println("Rule application has ended");}

klan

d | N

ew Z

e @after{ System.out.println( Rule application has ended );}: ((annotations)? packageDeclaration

{ System.out.println("Parsed packageDeclaration");})?

vers

ity o

f Auc

k )(importDeclaration{ System.out.println("Parsed importDeclaration"); })*

The

Uni

v

9(typeDeclaration { ….println("Parsed typeDeclaration"); })*

;

Accessing and Returning V lu s fr m Rul s

009

Values from Rules• Rules are used to generate methods that can return

YEAR

20 7

32

gvalues: add returns [Type varName] after rule name

• To access return values, assign a variable d f ld

CO

MP

SC

I

var=ruleName or var=TOKEN and access its fields• The variable is declared for you by ANTLR and can be

d i ti ith $

eala

nd

accessed in actions with $var• Tokens have their text string in $var.text

klan

d | N

ew Z

e

packageDeclaration: 'package' name=qualifiedName{ System.out.println("qualifiedName="+$name.value); }

vers

ity o

f Auc

k ';' ;

qualifiedName returns [String value]: id=IDENTIFIER { $value = $id.text; }

The

Uni

v

10

: id IDENTIFIER { $value $id.text; }('.' id=IDENTIFIER { $value += "." + $id.text; } )* ;

Example: Accessing and R turnin V lu s

009

Returning ValuesThe following rule prints out the source code it parses:

YEAR

20 7

32

g p p

importDeclaration@init{ String s = "import "; }

CO

MP

SC

I

@ { g p ; }: 'import' ('static' { s += "static "; } )?

id=IDENTIFIER '.' '*' ';'{ S t t i tl ( + $id t t + " * ") }

eala

nd

{ System.out.println(s + $id.text + ".*;"); }| 'import' ('static' { s += "static "; } )?

id=IDENTIFIER

klan

d | N

ew Z

e

{ s += $id.text; }('.' id=IDENTIFIER { s += "." + $id.text; } )+(' ' '*' { s += " *"; } )?

vers

ity o

f Auc

k ( . { s += . ; } )?';'{ System.out.println(s + ";"); }

The

Uni

v

;11

Debugging Parsing Actions009

gg g g• ANTLR will not check the Java code in the actions,

YEAR

20 7

32

i.e. the generated class might contain errors• Eclipse’s compiler will show you syntax errors after

l di h d j fil (F5 f l d)

CO

MP

SC

I reloading the generated .java file (F5 for reload)• For each rule, ANTLR will generate a method with the

l

eala

nd

rule namepublic final PDJavaImport importDeclaration() throws

importDeclaration returns [PDJavaImport value]

klan

d | N

ew Z

e importDeclaration() throws RecognitionException {…if (state.backtracking==0 )

[PDJavaImport value] …: 'import' ('static')?id=IDENTIFIER '.' '*' ';'{ ANTLR

vers

ity o

f Auc

k ( g ){PDJavaPackage package = …

}

{PDJavaPackage package = …

}…

The

Uni

v

12…

}; Error: package

is a Java keyword

Example:Buildin n AST

009

Building an ASTIdea: each rule returns AST node and gets the returned AST

n d s f th l s it s sYEAR

20 7

32

nodes of the rules it usescompilationUnit returns [JavaCompilationUnit value]@init {

CO

MP

SC

I @init {$value = new JavaCompilationUnit();

} : ( (annotations)?

eala

nd

packageDecl=packageDeclaration{ $value.setPackage($packageDecl.value); }

)?

klan

d | N

ew Z

e )?(importDecl=importDeclaration{ $value.addImports($importDecl.value); }

vers

ity o

f Auc

k )*(typeDecl=typeDeclaration{ $value.addTypeDefinitions($typeDecl.value); }

The

Uni

v

)* ;13

Building an AST Cont’d009

gimportDeclaration returns [JavaImport value]@init {

YEAR

20 7

32

$value = new JavaImport();

String name = null;boolean isPackage false;

CO

MP

SC

I boolean isPackage = false;} : …| 'import' ('static')? id=IDENTIFIER { name = $id.text; }

eala

nd

('.' id=IDENTIFIER { name += "." + $id.text; } )+('.' '*' { isPackage = true; } )? ';'{

klan

d | N

ew Z

e {if (isPackage) {JavaPackage p = new JavaPackage();

vers

ity o

f Auc

k p.setName(name); $value.setPackage(p);} else {JavaType type = new JavaType();

The

Uni

v

type.setName(name); $value.setType(type);} } ;

14

009

YEAR

20 7

32C

OM

PS

CI

Generators

eala

nd

Generators

klan

d | N

ew Z

eve

rsity

of A

uck

The

Uni

v

15

Writing Generators009

g

• Generators traverse the AST that was generated by YEAR

20 7

32

g ythe parser

• For each AST node, they generate some output

CO

MP

SC

I

• Easy way to implement:– For important AST node types, write a generator

eala

nd

p yp gmethod

– Method for AST node type X calls other methods

klan

d | N

ew Z

e

to do generation for child node types of X• Examples:

vers

ity o

f Auc

k

– Source code printer– Source code converter (i.e. print another language)

The

Uni

v

16

Java Printer009 public class JavaPrinter {

PrintStream s;YEAR

20 7

32

PrintStream s;

public JavaPrinter(OutputStream out) {s = new PrintStream(out);

CO

MP

SC

I

}

public void printCompilationUnit(Ja aCompilationUnit compilationUnit) {

eala

nd

JavaCompilationUnit compilationUnit) {s.println("package " +

compilationUnit.getPackage().getName() + ";");

klan

d | N

ew Z

e

s.println(); // use separate method to print importsfor (JavaImport i : compilationUnit.getImports())printImport(i);

vers

ity o

f Auc

k printImport(i);s.println(); // use separate method to print typesfor (JavaType t : compilationUnit.getTypeDefinitions())

The

Uni

v printType(t);} … 17

Java Printer Cont.009 public void printImport(JavaImport javaImport) {

if (javaImport getPackage() != null)YEAR

20 7

32

if (javaImport.getPackage() != null)s.println("import " + javaImport.getPackage().getName()

+ ".*;");

CO

MP

SC

I else if (javaImport.getType() != null)s.println("import " + javaImport.getType().getName()

+ ";");

eala

nd

}

public void printType(JavaType type) {

klan

d | N

ew Z

e

if (type.getJavaInterface() != null)s.println("interface " + type.getJavaInterface().getName()

+ " { }");

vers

ity o

f Auc

k + { ... } );else if (type.getJavaClass() != null)s.println("class " + type.getJavaClass().getName()

The

Uni

v + " { ... }");} 18

Using the Java Printer009

gpublic class Import {public static void main(String[] args) {

YEAR

20 7

32

public static void main(String[] args) { …// Create a parser that reads from the token streamJavaParser parser = new JavaParser(tokens);

CO

MP

SC

I

// start parsing at the compilationUnit ruleJavaCompilationUnit compilationUnit =

eala

nd

parser.compilationUnit();

// set up a JavaPrinter that prints to the standard outputi i i ( )

klan

d | N

ew Z

e JavaPrinter printer = new JavaPrinter(System.out);

// print the ASTprinter printCompilationUnit(compilationUnit);

vers

ity o

f Auc

k printer.printCompilationUnit(compilationUnit);…

} }

The

Uni

v

19

009

YEAR

20 7

32C

OM

PS

CI

Summary

eala

nd

Summary

klan

d | N

ew Z

eve

rsity

of A

uck

The

Uni

v

20

Today’s Summary009

y y

• ANTLR is a tool that can generate LL(k) parsers and YEAR

20 7

32

g ( ) plexers in Java

• By adding actions to a parser rule, Java code can be

CO

MP

SC

I executed after something has been parsed• Actions can create ASTs

eala

nd

• Generators traverse an AST and produce output recursively for each AST node

klan

d | N

ew Z

e

References:– ANTLR Homepage with Online Documentation.

vers

ity o

f Auc

k p ghttp://www.antlr.org/

– Scott Stanchfield. An ANTLR 2.0 Tutorial.http://javadude com/articles/antlrtut/

The

Uni

v

21

http://javadude.com/articles/antlrtut/