Code Generator Translator Architecture Parser Tokenizer string of characters (source code) string of...

CodeGenerator

Translator Architecture

ParserTokenizer

string ofcharacters

(source code)

string oftokens

abstractprogram

string ofintegers

(object code)

A Tokenizing Machine

Another greatmachine!

Ready to Dispense?

In(Chars)

Out(Tokens)

Tokenizing Machine Continued…

Ready toDispense?

In Out

3: Tokenscome outhere

2: Light comes on

1: Charactersgo in here

Tokenizing Machine Continued… Type

BL_Tokenizing_Machine_Kernel is modeled by (

buffer : string of character ready_to_dispense : boolean

) constraint …

Initial Value (empty_string, false)

Tokenizing Machine Continued…

Operations m.Insert (ch) m.Dispense (token_text, token_kind) m.Is_Ready_To_Dispense () m.Flush_A_Token (token_text, token_kind) m.Size ()

A State-Transition View

not readyto dispense

readyto dispense

Insert

Flush_A_Token

Insert

Is_Ready_To_DispenseIs_Ready_To_Dispense

Dispense

Tokenizing BL Programs

Token Types KEYWORD IDENTIFIER CONDITION WHITE_SPACE COMMENT ERROR

A Very Useful Extensionprocedure_body Get_Next_Token ( alters Character_IStream& str, produces Text& token_text, produces Integer& token_kind ){

while (not self.Is_Ready_To_Dispense () and not str.At_EOS ()) { object Character ch; str.Read (ch); self.Insert (ch); } if (self.Is_Ready_To_Dispense ()) { self.Dispense (token_text, token_kind); } else { self.Flush_A_Token (token_text, token_kind); }

Another Useful Extensionprocedure_body Get_Next_Non_Separator_Token ( alters Character_IStream& str, produces Text& token_text, produces Integer& token_kind ){

self.Get_Next_Token (str, token_text, token_kind); while ((token_kind == WHITE_SPACE) or (token_kind == COMMENT)) { self.Get_Next_Token (str, token_text, token_kind); }

How Does Insert Work?

buffer: “PROGRAM”ready_to_dispense: false

buffer: “PROGRAM ”ready_to_dispense: true

Here’s anothercharacter.

‘ ’

The Specification of Insert

procedure Insert ( preserves Character ch ) is_abstract; /*! requires self.ready_to_dispense = false ensures self.buffer = #self.buffer * <ch> and self.ready_to_dispense = IS_COMPLETE_TOKEN_TEXT (#self.buffer, ch) !*/

An Important Math Operation

math definition IS_COMPLETE_TOKEN_TEXT (

s: string of character

c: character

): boolean is

( s is in OK_STRINGS and

s * <c> is not in OK_STRINGS ) or

( <c> is in PREFIX (OK_STRINGS) and

s * <c> is not in PREFIX (OK_STRINGS) )

s is a complete“valid” token

c can start a “valid” token, buts*<c> doesn’t start a “valid” token

Other Math Definitions

OK_STRINGS ={s: string of character (IS_KEYWORD (s))} union{s: string of character (IS_IDENTIFIER (s))} union{s: string of character (IS_CONDITION_NAME (s))} union{s: string of character (IS_WHITE_SPACE (s))} union{s: string of character (IS_COMMENT (s))}

PREFIX (s_set) ={x: string of character (there exists y: string of character (x * y is in s_set))}

PREFIX Examples

s_set = {“abc”} PREFIX (s_set) =

{“”, “a”, “ab”, “abc”}

s_set = {“abc”, “de”} PREFIX (s_set) =

{“”, “a”, “ab”, “abc”, “d”, “de”}

Tokenizing Machine: Implementation

Obvious Representation Text buffer_rep Boolean token_ready

Insert (ch)? check if IS_COMPLETE_TOKEN_TEXT

(self[buffer_rep], ch), andset self[token_ready] accordingly

append ch at end of self[buffer_rep]

Tokenizing Machine: Implementation Continued…

Dispense (token_text, token_kind)? set token_text to all but the last

character of self[buffer_rep] set token_kind to the value of

WHICH_KIND (token_text) set self[token_ready] to false

Tokenizing Machine: Implementation Continued…

How do we “check if IS_COMPLETE_TOKEN_TEXT (self[buffer_rep], ch)”?

How do we determine“WHICH_KIND (token_text)”?

How do we do these things quickly?

Making Decisions Quickly

Keep track of the “state” of the buffer by adding one field to the representation: Text buffer_rep Boolean token_ready Integer buffer_state

Possible Buffer States

How many interestingly different buffer states do you think there may be?

Let’s start enumerating them…

Buffer States Continued… Initial state (empty buffer) How many states after inserting

the first character? ‘B’, ‘D’, ‘E’, ‘I’, ‘P’, ‘T’, ‘W’, ‘n’, ‘r’, ‘t’,

identifier (any other letter) white_space (‘ ’, ‘\n’, ‘\t’) comment (‘#’) error (any other character)

Buffer States Continued…

How many states after inserting the second character? “BE”, “DO”, “EL”, “EN”, “IF”, “IN”, “IS”,

“PR”, “TH”, “WH”, “ne”, “ra”, “tr”, identifier (any other id character)

white_space (‘ ’, ‘\n’, ‘\t’) comment (any other character but ‘\n’) error (any character that cannot start a

new “good” token)

A State Transition Diagram:Transitions Out of ‘empty’ Only

identifierwhite_space

comment

‘B’

‘D’

‘E’ ‘I’‘P’

‘T’

‘W’

‘n’

‘r’

‘t’

any otherletter

‘ ’,’\n’,’\t’‘#’

any othercharacter

Structure of Body of Insertcase_select (self[buffer_state]){ case empty: // case for buffer = empty_string case B: // case for buffer = “B” case D: // case for buffer = “D” case E: // case for buffer = “E” . . . case error: // case for buffer holding an error token}

A Simplified View

Buffer States EMPTY_BS ID_OR_KEYWORD_OR_CONDITION_BS WHITE_SPACE_BS COMMENT_BS ERROR_BS

The State Transition Diagram

EMPTY_BS

ID_OR_KEYWORD_OR_CONDITION_BS

COMMENT_BS

WHITE_SPACE_BS

ERROR_BS

‘ ’, ‘\n’, ‘\t’

‘#’

‘a’..’z’,‘A’..’Z’

any othercharacter

any characterexcept ‘\n’

‘ ’, ‘\n’, ‘\t’

‘a’..’z’,‘A’..’Z’,‘0’..’9’,

‘-’

any character except‘a’..’z’, ‘A’..’Z’, ‘ ’, ‘\n’, ‘\t’, ‘#’

Useful Private Functions Is_White_Space_Character (ch) Is_Digit_Character (ch) Is_Alphabetic_Character (ch) Is_Identifier_Character (ch) Can_Start_Token (ch) Id_Or_Keyword_Or_Condition (t) Buffer_Type (ch) Token_Kind (bs, str)

Code Generator Translator Architecture Parser Tokenizer string of characters (source code) string of...

Documents

skinzwear.com...style code.....name.....base price...page men’s swimwear M1R cutaway string thong $28.00 14 M1R1 deeper string thong $28.00 14 M1R2 deepest string thong $28.00 14

Cryptograms A Tour of Code. Initial stuff package cryptograms import java.io.File import Code._ object Cryptograms { type Word = String type Pattern =

Aplikasi Algoritma Runut Balik dan Pencocokan String dalam …rinaldi.munir/Stmik/... · 2006. 5. 19. · 1 Aplikasi Algoritma Runut Balik dan Pencocokan String dalam Code Translation

Engine & Database Changes - PRIMAVERA BSS...Linha Integer Proposal line number Artigo String Item Code Descricao String Item description C Unidade String C Quantidade Double CQuantity

MaxCompiler: State Machine Tutorialbedford-computing.co.uk/learning/wp-content/uploads/2016/03/max... · 2 Kernel State Machine Example7 ... Morse Code Tokenizer ... 2 Kernel State

StringTokenizer tokenizer

Object-Oriented Application Development Using VB.NET 1 Creating a String Array Code to create a String array: ' declare a String array with 4 elements

Integer code Integer code name String spelling

Adding Source Code Searching Capability to Yioop€¦ · code and then finds the maximal and conditionally maximal sub-strings. A string is called a maximal string if it does not

Completion Data SubscriptionsDRILLING_CONTRACTOR_NM String IS_MULTIPLE_COMPLETION String Y, N DID_PREFLOW_48_HOURS String Y, N PIPELINE_CONNECTION String ANY_CONDENSATE String Y, N

Token classification using Bengali Tokenizer

The String Landscape as Genetic Alphabet: The Subtle Virtues of a Non-Unique Cosmic Code

Password security Dr.Patrick A.H. Bours. 2 Password: Kinds of passwords Password A string of characters: A,B,C,…d,e,f,…1,2,3…!,”,@,… PIN-code A string

String and string manipulation

Release Notes - Overview - Software AG · 2014. 6. 13. · CONSOLE DESCR-CODE-STRING New,A16 HOLD-FLAG New,A1 ROUTE-CODE-STRING New,A28 TIME-STAMP-LONG New,A11 CONSOLE-LOG LOG-TIME-LONG

Inverted Index Construction Tokenizer Token stream. Friends RomansCountrymen Linguistic modules Modified tokens. friend romancountryman Indexer Inverted

String Inverter (SUN2000-30KTL-A) - Huawei · String Inverter (SUN2000-30KTL-A) Technical Specifications SUN2000-30KTL-A ... EN/IEC 62109-2, IEC 60529 Grid Code IEC 61727, IEC 62116,

CHAPTER 8 File Input Output Part 1: String. The String Class Constructing a String: String message = "Welcome to Java“; String message = new String("Welcome

String 21 String 29 String 30 String 38 String 39 String ...mmadsen/Diagrams images/6-DOMNames-Poster.pdf44 Napthalene Waterspout Covini_C6W Traumatophilia Osmophilia Stout Mononucleosis

Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object