33
Compiler Construction Compiler Construction Lexical Analysis Lexical Analysis

Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group [email protected] [email protected]

Embed Size (px)

Citation preview

Page 1: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

Compiler ConstructionCompiler Construction

Lexical Analysis Lexical Analysis

Page 2: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

22

AdministrationAdministration Project Teams Project Teams

Send me your groupSend me your group [email protected]@post.tau.ac.il

Send me an email if you’re unable to find partnersSend me an email if you’re unable to find partners Use the forum to team up with othersUse the forum to team up with others

First PA – submission in two weeksFirst PA – submission in two weeks

Page 3: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

33

Generic compiler structureGeneric compiler structure

Executable

code

exe

Source

text

txt

Semantic

Representation

Backend

(synthesis)

Compiler

Frontend

(analysis)

Page 4: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

44

Compiler

ICProgram

ic

x86 executable

exeLexicalAnalysi

s

Syntax Analysi

s

Parsing

AST

Symbol

Tableetc.

Inter.Rep.(IR)

CodeGeneration

IC compilerIC compiler

Page 5: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

55

Lexical AnalysisLexical Analysis

converts characters to tokensconverts characters to tokens

class Quicksort { int[] a; int partition(int low, int high) { int pivot = a[low]; ...}

1: CLASS1: CLASS_ID(Quicksort)1: LCBR2: INT2: LB2: RB2: ID(a)

. . .

2: SEMI

Page 6: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

66

Lexical AnalysisLexical Analysis TokensTokens

ID – size, num, CarID – size, num, Car Num – 7, 5 , 9, 4926209837, 07Num – 7, 5 , 9, 4926209837, 07 COMMA – ,COMMA – , SEMI – ;SEMI – ; ……

Non tokensNon tokens Comment – // Comment – // WhitespaceWhitespace ……

Page 7: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

77

ProblemProblem

InputInput Program textProgram text Tokens specificationTokens specification

OutputOutput Sequence of tokensSequence of tokens

class Quicksort { int[] a; int partition(int low, int high) { int pivot = a[low]; ...}

1: CLASS1: CLASS_ID(Quicksort)1: LCBR2: INT2: LB2: RB2: ID(a)

. . .

2: SEMI

Page 8: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

88

SolutionSolution

Write a lexical analyzerWrite a lexical analyzerToken nextToken(){

char c ;loop: c = getchar();switch (c){

case ` `:goto loop ;case `;`: return Semicolon;case `+`: c = getchar() ;

switch (c) { case `+': return PlusPlus ; case '=’ return PlusEqual; default: ungetc(c);

return Plus; }

case `<`:case `w`:

… }

Page 9: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

99

Solution’s ProblemSolution’s Problem

A lot of workA lot of work Corner casesCorner cases Error proneError prone Hard to debugHard to debug ExhaustingExhausting BoringBoring Hard to reuseHard to reuse Hard to understandHard to understand ……

Page 10: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

1010

JFlexJFlex

Off-the-shelf lexical analysis generatorOff-the-shelf lexical analysis generator InputInput

Scanner specification fileScanner specification fileOutputOutput

Lexical analyzer written in JavaLexical analyzer written in Java

JFlex javacIC.lex Lexical analyzer

IC text

tokens

Lexer.java

Page 11: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

1111

JFlexJFlex

SimpleSimple Good for reuseGood for reuse Easy to understandEasy to understand Many developers and users debugged the generatorsMany developers and users debugged the generators

"+" { return new symbol (sym.PLUS); }"boolean" { return new symbol (sym.BOOLEAN); }“int" { return new symbol (sym.INT); }"null" {return new symbol (sym.NULL);}"while" {return new symbol (sym.WHILE);}

"=" {return new symbol (sym.ASSIGN);}

……

Page 12: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

1212

JFlex Spec FileJFlex Spec File

User codeUser code Copied directly to Java fileCopied directly to Java file

%%

JFlex directivesJFlex directives Define macros, state namesDefine macros, state names

%%

Lexical analysis rulesLexical analysis rules How to break input to tokensHow to break input to tokens Action when token matchedAction when token matched

Possible source of javac errors down the road

DIGIT= [0-9]LETTER= [a-zA-Z]

YYINITIAL

{LETTER}({LETTER}|{DIGIT})*

Page 13: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

1313

User CodeUser Code

package IC.Parser;import IC.Parser.Token;

…any scanner-helper Java code…

Page 14: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

1414

JFlex DirectivesJFlex Directives

Control JFlex internalsControl JFlex internals %line %line switches line counting onswitches line counting on %char %char switches character counting onswitches character counting on %class class-name%class class-name changes default name changes default name %cup %cup CUP compatibility modeCUP compatibility mode %type token-class-name%type token-class-name %public %public Makes generated class public (package by default)Makes generated class public (package by default) %function read-token-method%function read-token-method %scanerror exception-type-name%scanerror exception-type-name

Page 15: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

1515

JFlex DirectivesJFlex Directives

State definitionsState definitions%state %state state-name state-name e.g.: %state e.g.: %state STRINGSTRING

Macro definitionsMacro definitionsmacro-name = regexmacro-name = regex

Page 16: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

1616

Regular ExpressionRegular Expression

. . any character except the newlineany character except the newline

"...""..." stringstring{name}{name} macro expansionmacro expansion** zero or more repetitions zero or more repetitions ++ one or more repetitionsone or more repetitions?? zero or one repetitions zero or one repetitions

(...) (...) grouping within regular expressionsgrouping within regular expressions

aa||bb match match aa or or bb

[...][...] class of characters - any class of characters - any oneone character enclosed in brackets character enclosed in brackets

aa––bb range of charactersrange of characters

[^…] [^…] negated class – any one not enclosed in bracketsnegated class – any one not enclosed in brackets

Page 17: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

1717

Examples of MacrosExamples of Macros

ALPHA=[A-Za-z] ALPHA=[A-Za-z]

DIGIT=[0-9]DIGIT=[0-9]

ALPHA_NUMERIC={ALPHA}|{DIGIT}ALPHA_NUMERIC={ALPHA}|{DIGIT}

IDENT={ALPHA}({ALPHA_NUMERIC})*IDENT={ALPHA}({ALPHA_NUMERIC})*

NUMBER=({DIGIT})+NUMBER=({DIGIT})+

NUMBER=[0-9]+NUMBER=[0-9]+

Page 18: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

1818

RulesRules

[states] regexp {action as Java code}[states] regexp {action as Java code}

PrioritiesPriorities Longest matchLongest match Order in the lex fileOrder in the lex file

Rules should match all inputs!!!Rules should match all inputs!!!

Breaks input to tokens

Invoked when regexp matches

e.g.: break vsbreaker int

identifier or integer ?

The regexp should be evaluated ?

Page 19: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

1919

Rules ExamplesRules Examples

<YYINITIAL> {DIGIT}+ DIGIT}+ {

return new Symbol(sym.NUMBER, yytext(), yyline);

}

<YYINITIAL> \t {}

<YYINITIAL> "-" {

return new Symbol(sym.MINUS, yytext(), yyline);

}

<YYINITIAL> [a-zA-Z] ([a-zA-Z0-9]) * {

return new Symbol(sym.ID, yytext(), yyline);

}

Page 20: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

2020

Rules – ActionRules – Action

ActionActionJava codeJava codeCan use special methods and varsCan use special methods and vars

yylineyylineyytext()yytext()

Returns a token for a tokenReturns a token for a token ““Eats” chars for non tokensEats” chars for non tokens

Page 21: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

2121

Rules – StateRules – State

StateStateWhich regexp should be evaluated?Which regexp should be evaluated?YYINITIALYYINITIAL

JFlex’s initial stateJFlex’s initial state

yybegin(stateX)yybegin(stateX)Jumps to stateXJumps to stateX

Page 22: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

2222

Rules – StateRules – State

<YYINITIAL> "//" { yybegin(COMMENTS); }

<COMMENTS> [^\n] { }

<COMMENTS> [\n] { yybegin(YYINITIAL); }

YYINITIAL COMMENTS

‘//’

\n

^\n

Page 23: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

2323

Lines Count ExampleLines Count Exampleimport java_cup.runtime.Symbol;

%%%cup%{ private int lineCounter = 0;%}

%eofval{ System.out.println("line number=" + lineCounter); return new Symbol(sym.EOF);%eofval}

NEWLINE=\n%%

<YYINITIAL>{NEWLINE} {lineCounter++;

} <YYINITIAL>[^{NEWLINE}] { }

Page 24: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

2424

Lines Count ExampleLines Count Example

JFlex

javac

lineCount.lex

Lexical analyzer

text

tokens

Yylex.java

Main.java

JFlex and JavaCup must be on CLASSPATH

sym.java

java JFlex.Main lineCount.lex

javac *.java

Page 25: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

2525

TestbedTestbedimport java.io.*;

public class Main { public static void main(String[] args) { Symbol currToken; try { FileReader txtFile = new FileReader(args[0]); Yylex scanner = new Yylex(txtFile); do { currToken = scanner.next_token(); // do something with currToken } while (currToken.sym != sym.EOF); } catch (Exception e) { throw new RuntimeException("IO Error (brutal exit)” +

e.toString()); } }}

Page 26: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

2626

Programming Assignment 1Programming Assignment 1 Implement a scanner for ICImplement a scanner for IC class Tokenclass Token

At least – line, id, valueAt least – line, id, value Should extend java_cup.runtime.SymbolShould extend java_cup.runtime.Symbol Numeric token ids in Numeric token ids in sym.javasym.java

Later on, will be generated by JavaCupLater on, will be generated by JavaCup class Compilerclass Compiler

Testbed - calls scanner to print list of tokensTestbed - calls scanner to print list of tokens class LexicalErrorclass LexicalError

Caught by CompilerCaught by Compiler Don’t forget to generate scanner and recompile Java sources Don’t forget to generate scanner and recompile Java sources

when you change the specwhen you change the spec You need to download and install You need to download and install bothboth JFlex and JavaCup JFlex and JavaCup

Page 27: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

2727

sym.javasym.java

public class sym {public class sym {

public static final int EOF = 0;public static final int EOF = 0;

public static final int ID = 1;public static final int ID = 1;

......

}}

Defines symbol constant ids Communicate between parser and scanner Actual values don’t matter

Unique value for each token

Will be generated by cup in PA2

Page 28: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

2828

Token classToken class

import java_cup.runtime.Symbol;import java_cup.runtime.Symbol;

public class Token extends Symbol {public class Token extends Symbol {

public int getId() {...}public int getId() {...}

public Object getValue() {...}public Object getValue() {...} public int getLine() {...} public int getLine() {...}

......

}}

Page 29: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

2929

JFlex directives to useJFlex directives to use

%line%line (count lines)(count lines)

%type Token%type Token (pass type Token)(pass type Token)

%class Lexer%class Lexer (gen. scanner class)(gen. scanner class)

%cup%cup (integrate with cup)(integrate with cup)

Page 30: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

3030

%cup%cup

%class Lexer%class Lexer

%scanerror LexicalError

%function next_token %function next_token

%type Token %type Token

……

%eofval { %eofval {

return new Token(sym.EOF,…);return new Token(sym.EOF,…);

%eofval }%eofval }

Page 31: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

3131

StructureStructure

JFlex javacIC.lexLexical analyzer

test.ic

tokens

Lexer.java

sym.javaToken.java

LexicalError.javaCompiler.java

Page 32: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

3232

DirectionsDirections

Download JavaDownload JavaDownload JFlexDownload JFlexDownload JavaCupDownload JavaCupPut JFlex and JavaCup in classpathPut JFlex and JavaCup in classpathApache AntApache AntEclipseEclipse

Use ant build.xmlUse ant build.xml Import jflex and javacupImport jflex and javacup

Page 33: Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il

3333

DirectionsDirections

Use skeleton from the websiteUse skeleton from the websiteRead AssignmentRead Assignment