Upload
helen-foster
View
225
Download
1
Embed Size (px)
Citation preview
1
April 20, 20231
April 20, 2023April 20, 2023 Azusa, CAAzusa, CA
Sheldon X. Liang Ph. D.
Computer Science at Computer Science at Azusa Pacific UniversityAzusa Pacific University
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS400 Compiler ConstructionCS400 Compiler Construction
2
The Reason Why Lexical Analysis is a Separate Phase
• Simplifies the design of the compiler– LL(1) or LR(1) parsing with 1 token lookahead would not
be possible (multiple characters/tokens to match)
• Provides efficient implementation– Systematic techniques to implement lexical analyzers by
hand or automatically from specifications– Stream buffering methods to scan input
• Improves portability– Non-standard symbols and alternate character encodings
can be normalized (e.g. trigraphs)
April 20, 20232
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
3
Interaction of the Lexical Analyzer with the Parser
LexicalAnalyzer
ParserSource
Program
Token,tokenval
Symbol Table
Get nexttoken
error error
April 20, 20233
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
4
Attributes of Tokens
Lexical analyzer
<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>
y := 31 + 28*x
Parsertoken
tokenval(token attribute)
April 20, 20234
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
5
Formalization
April 20, 20235
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
Lexical Analysis & Lexical Analyzer Generators
Regular Expressions Finite Automata RE Conversion FA Lexer Design
6
April 20, 20236
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
Keep in mind following questionsKeep in mind following questions
• Token– Lexical units– Atom parse element– Abstracted in syntax: e.g. Id
• Lexeme – Specific string making up token– Value / attribute related to a token– Concrete in language, e.g., Amt
• Spec of patterns for tokens– Alphabet - a finite set– String s - a finite sequence from – Language – a specific set of strings
7
Tokens, Patterns, and Lexemes
• A token is a classification of lexical units– For example: id and num
• Lexemes are the specific character strings that make up a token– For example: abc and 123
• Patterns are rules describing the set of lexemes belonging to a token– For example: “letter followed by letters and digits”
and “non-empty sequence of digits”
April 20, 20237
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
8
• An alphabet is a finite set of symbols (characters)
• A string s is a finite sequence of symbols from s denotes the length of string s denotes the empty string, thus = 0
• A language is a specific set of strings over some fixed alphabet
April 20, 20238
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
Specification of Patterns for Tokens: Definitions
9
Specification of Patterns for Tokens: String Operations
• The concatenation of two strings x and y is denoted by xy
• The exponentation of a string s is defined by
s0 = si = si-1s for i > 0
note that s = s = sApril 20, 2023
9Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
10
• UnionL M = {s s L or s M}
• ConcatenationLM = {xy x L and y M}
• ExponentiationL0 = {}; Li = Li-1L
• Kleene closureL* = i=0,…, Li
• Positive closureL+ = i=1,…, Li
April 20, 202310
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
Specification of Patterns for Tokens: Language Operations
11
• Basis symbols: is a regular expression denoting language {}– a is a regular expression denoting {a}
• If r and s are regular expressions denoting languages L(r) and M(s) respectively, then– rs is a regular expression denoting L(r) M(s)– rs is a regular expression denoting L(r)M(s)– r* is a regular expression denoting L(r)*
– (r) is a regular expression denoting L(r)
• A language defined by a regular expression is called a regular set
April 20, 202311
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
Specification of Patterns for Tokens: Regular Expressions
12
Nondeterministic Finite Automata
• An NFA is a 5-tuple (S, , , s0, F) where
S is a finite set of states is a finite set of symbols, the alphabet is a mapping from S to a set of statess0 S is the start stateF S is the set of accepting (or final) states
April 20, 202312
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
13
Conversion of an NFA into a DFA
• The subset construction algorithm converts an NFA into a DFA using:
-closure(s) = {s} {t s … t}-closure(T) = sT -closure(s)move(T,a) = {t s a t and s T}
• The algorithm produces:Dstates is the set of states of the new DFA consisting of sets of states of the NFADtran is the transition table of the new DFA
April 20, 202313
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
14
April 20, 202314
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction
Got it with following questionsGot it with following questions• Tokens
– Lexical units– Atom parse element– Abstracted in syntax: e.g. Id
• Lexeme – Specific string making up token– Value / attribute related to a token– Concrete in language, e.g., Amt
• Spec of patterns for tokens– Alphabet - a finite set– String s - a finite sequence from – Language – a specific set of strings
15
Thank you very much!
Questions?
April 20, 202315
Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/
CS@APU: CS400 Compiler ConstructionCS@APU: CS400 Compiler Construction