Upload
polly-hawkins
View
220
Download
7
Embed Size (px)
Citation preview
JLex
Lecture 4
Mon, Jan 24, 2005
JLex
JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a
lexical analyzer generator in C. The gnu lexical analyzer flex is also based on
lex. JLex reads a description of a set of tokens
and outputs a Java program that will process those tokens.
The JLex Input File
The input file to JLex uses the extension .lex.
The file is divided into three parts. User code JLex directives Regular expression rules
These three sections are separated by %%.
JLex User Code
See Section 2.1 of the JLex User’s Manual. Any code written in the user-code section is
copied directly into the Java source file created by JLex.
JLex creates a class named Yylex, which is at the heart of the lexer. The user code is not incorporated into this class.
JLex Directives
See Section 2.2 of the JLex User’s Manual. Any code bracketed within %{ and %} is
copied directly into the Yylex class, at the beginning.
Although this code is incorporated into the Yylex class, it is not incorporated into any Yylex member function.
Thus, we may define Yylex class variables or additional member functions.
The init Directive
Code bracketed within %init{ and %init} is copied into the Yylex default constructor, which is called on by the other constructors.
%init{ System.out.println("In the constructor");%init}
The eof Directive
Code bracketed within %eof{ and %eof} is copied into the Yylex function yy_do_eof(), which is called once upon end of file.
%eof{ System.out.println("In yy_do_eof()");%eof}
JLex Token Types
Unless we specify otherwise, the data type of the returned tokens is Yytoken.
This class is not created automatically. We may change the return type to int by
typing the directive %integer. We may change the return type to Integer
by typing the directive %intwrap. We may set the return type to any other type
by using the directive %type.
JLex Token Types
If the return type is Yytoken or Integer, then the EOF token is null.
If the return type is int, then the EOF token is -1.
For any other type, we need to specify the EOF value.
JLex EOF Value
By using the %eofval directive, we may indicate what value to return upon EOF.
We write
%eofval{ return new type(value);%eofval}
JLex Regular Expression Rules
Each regular expression rule consists of a regular expression followed by an associated action.
The associated action is a segment of Java code, enclosed in braces { }.
Typically, the action will be to return the appropriate token.
JLex Regular Expressions
Regular expressions are expressed using ASCII characters (0 – 127).
The following characters are metacharacters.
? * + | ( ) ^ $ . [ ] { } “ \ Metacharacters have special meaning; they
do not represent themselves. All other characters represent themselves.
JLex Regular Expressions
Let r and s be regular expressions. r? matches zero or one occurrences of r. r* matches zero or more occurrences of r. r+ matches one or more occurrences of r. r|s matches r or s. rs matches r concatenated with s.
JLex Regular Expressions
Parentheses are used for grouping.
("+"|"-")? If a regular expression begins with ^, then it
is matched only at the beginning of a line. If a regular expression ends with $, then it is
matched only at the end of a line. The dot . matches any non-newline
character.
JLex Regular Expressions
Brackets [ ] match any single character listed within the brackets. [abc] matches a or b or c. [A-Za-z] matches any letter.
If the first character after [ is ^, then the brackets match any character except those listed. [^A-Za-z] matches any nonletter.
JLex Regular Expressions
A single character within double quotes " “ or after \ represents itself.
Metacharacters lose their special meaning and represent themselves when they stand alone within single quotes or follow \. "?" and \? match ?.
JLex Escape Sequences
Some escape sequences. \n matches newline. \b matches backspace. \r matches carriage return. \t matches tab. \f matches formfeed.
If c is not a special escape-sequence character, then \c matches c.
Running JLex
The lexical analyzer generator is the Main class in the JLex folder.
To create a lexical analyzer from the file filename.lex, type
java JLex.Main filename.lex This produces a file filename.lex.java,
which must be compiled to create the lexical analyzer.
Running the Lexical Analyzer
To run the lexical analyzer, a Yylex object must first be created.
The Yylex constructor has one parameter specifying an input stream.
For exampleYylex lexer = new Yylex(System.in);
Then, calls to the yylex() member function will return tokens.
token = lexer.yylex();