Upload
dorothy-lynn-crawford
View
226
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction
CPSC 388Ellen WalkerHiram College
Why Learn About Compilers?
• Practical application of important computer science theory
• Ties together computer architecture and programming
• Useful tools for developing language interpreters– Not just programming languages!
Computer Languages
• Machine language– Binary numbers stored in memory– Bits correspond directly to machine actions
• Assembly language– A “symbolic face” for machine language– Line-for-line translation
• High-level language (our goal!)– Closer to human expressions of problems, e.g. mathematical notation
Assembler vs. HLL
• AssemblerLdi $r1, 2 -- put the value 2 in R1
Sto $r1, x -- store that value in X
• HLLX = 2;
Characteristics of HLL’s
• Easier to learn (and remember)
• Machine independent– No knowledge of architecture needed
– … as long as there is a compiler for that machine!
Early Milestones
• FORTRAN (Formula Translation)– IBM (John Backus) 1954-1957– First High-level language, and first compiler
• Chomsky Hierarchy (1950’s)– Formal description of natural language structure
– Ranks languages according to the complexity of their grammar
Chomsky Hierarchy
• Type 3: Regular languages– Too simple for programming languages– Good for tokens, e.g. numbers
• Type 2: Context Free languages– Standard representation of programming languages
• Type 1: Context Sensitive Languages
• Type 0: Unrestricted
CSL
Another View of the Hierarchy
CFL
RL
Formal Language & Automata Theory
• Machines to recognizes each language class– Turing Machine (computable languages)– Push-down Automaton (context-free languages)
– Finite Automaton (regular languages)
• Use machines to prove that a given language belongs to a class
• Formally prove that a given language does not belong to a class
Practical Applications of Theory
• Translate from grammar to formal machine description
• Implement the formal machine to parse the language
• Tools:– Scanner Generator (RL / FA): LEX, FLEX
– Parser Generator (CFL / FA): YACC, Bison
Beyond Parsing
• Code generation• Optimization
– Techniques to “mindlessly” improve code
– Usually after code generation– Rarely “optimal”, simply better
Phases of a Compiler
• Scanner -> tokens• Parser -> syntax tree• Semantic Analyzer -> annotated tree
• Source code optimizer -> intermediate code
• Code generator -> target code• Target code optimizer -> better target code
Additional Tables
• Symbol table– Tracks all variable names and other symbols that will have to be mapped to addresses later
• Literal table– Tracks literals (such as numbers and strings) that will have to be stored along with the eventual program
Scanner
• Read a stream of characters• Perform lexical analysis to generate tokens
• Update symbol and literal tables as needed
• Example:Input: a[j] = 4 + 1Tokens: ID Lbrack ID Rbrack EQL NUM PLUS NUM
Parser
• Performs syntax analysis• Relates the sequence of tokens to the grammar
• Builds a tree that represents this relationship, the parse tree
Partial Grammar
• assign-expr -> expr = expr• array-expr -> ID [ expr ]• expr -> array-expr • expr -> expr + expr• expr -> ID• expr -> NUM
Example Parse
assign-expression
expression
add-expressionarray-expression
ID [
ID
]
=
NUM
+
NUM
expression expression
expression expression
Abstract Syntax Tree
assign-expression
expression
add-expressionarray-expression
ID
ID NUM NUM
expression expression
expression expression
Semantic Analyzer
• Determine the meaning (not structure) of the program
• This is “compile-time” or static semantics only
• Example; a[j] = 4 + 1– a refers to an array location– a contains integers– j is an integer – j is in the range of the array (not checked in C)
• Parse or Syntax tree is “decorated” with this information
Source Code Optimizer
• Simplify and improve the source code by applying rules– Constant folding: replace “4+2” by 6– Combine common sub-expressions– Reordering expressions (often prior to constant folding)
– Etc.
• Result: modified, decorated syntax tree or Intermediate Representation
Code Generator
• Generates code for the target machine
• Example:– MOV R0, j value of j into R0– MUL R0, 2 2*j in R0 (int = 2 wds)
– MOV R1, &a value of a in R1– ADD R1, R0 a+2*j in R1 (addr of a[j])
– MOV *R1, 6 6 into address in R1
Target Code Optimizer
• Apply rules to improve machine code• Example:
– MOV R0, j– SHL R0 (shift to multiply by 2)
Use more complex– MOV &a[R0], 6 machine instruction to
replace simpler ones
Major Data Structures
• Tokens• Syntax Tree• Symbol Table• Literal Table• Intermediate Code• Temporary files
Structuring a Compiler
• Analysis vs. Synthesis– Analysis = understanding the source code– Synthesis = generating the target code
• Front end vs. Back end– Front end: parsing & intermediate code generation (target machine-independent)
– Back end: target code generation
• Optimization included in both parts
Multiple Passes
• Each pass process the source code once– One pass per phase– One pass for several phases– One pass for entire compilation
• Language definition can preclude one-pass compilation
Runtime Environments
• Static (e.g. FORTRAN)– No pointers, no dynamic allocation, no recursion
– All memory allocation done prior to execution
• Stack-based (e.g. C family)– Stack for nested allocation (call/return)– Heap for random allocation (new)
• Fully dynamic (LISP)– Allocation is automatic (not in source code)
– Garbage collection required
Error Handling
• Each phase finds and handles its own types of errors– Scanning: errors like: 1o1 (invalid ID)
– Parsing: syntax errors– Semantic Analysis: type errors
• Runtime errors handled by the runtime environment– Exception handling by programmer often allowed
Compiling the Compiler
• Using machine language– Immediately executable, hard to write
– Necessary for the first (FORTRAN) compiler
• Using a language with an existing compiler and the same target machine
• Using the language to be compiled (bootstrapping)
Bootstrapping
• Write a “quick & dirty” compiler for a subset of the language (using machine language or another available HLL)
• Write a complete compiler in the language subset
• Compile the complete compiler using the “quick & dirty” compiler