Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
CADSL
Compiler ConstructionAn Introduction
Virendra SinghAssociate ProfessorComputer Architecture and Dependable Systems LabDepartment of Electrical EngineeringIndian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/E-mail: [email protected]
EE717/453:Advance Computing for Electrical EngineersLecture 17 (30 Sep 2013)
CADSLEE-717/EE-453@IITB 2
What is a compiler?
A program that reads a program written in one language and translates it into another language.
Source language Target language
Traditionally, compilers go from high-level languages to low-level languages.
30 Sep 2012
CADSL30 Sep 2012 EE-717/EE-453@IITB 3
Compilers
Common compilation tasks● Language translation● Error checking and report ● Performance improvement
Fundamental compilation principles● The compiler must preserve the meaning of
source program● The compiler must improve the source program
in some discernible way
CADSLEE-717/EE-453@IITB 4
Compiler Architecture
Front End – language specific
Back End –machine specific
SourceLanguage
Target Language
Intermediate Language
In more detail:
Separation of Concerns Retargeting
30 Sep 2012
CADSLEE-717/EE-453@IITB 5
Compiler Architecture
Scanner(lexical
analysis)
Parser(syntax
analysis)
CodeOptimizer
SemanticAnalysis
(IC generator)
CodeGenerator
SymbolTable
Sourcelanguage
tokens Syntacticstructure
IntermediateLanguage
Targetlanguage
IntermediateLanguage
30 Sep 2012
CADSLEE-717/EE-453@IITB 6
Translation of an assignment
Translation of an assignment statement
30 Sep 2012
CADSL30 Sep 2012 EE-717/EE-453@IITB 7
Lexical Analysis Character stream token stream
● Recognize “words” of a language Theoretical problem: specify and recognize
patterns in strings ● Scanner as a practical application● Regular expression, finite automata● Tools that automatically generate scanners are
commonly used
index := start + step * 20Input:index := start + step * 20After scanning:
identifier operator number
CADSL30 Sep 2012 EE-717/EE-453@IITB 8
Syntactical Analysis
Token stream syntax tree● Recognize “sentences” of a
language Grammars and parsers
● CFG● Parsers can be automatically
generated● Top-down and bottom-up
parsing● Predictive parsing● Driven process of compiler front-
ends
After scanning:index := start + step * 20
After parsing:
index
Assign
ExpID :=
NumID
ID
+ Exp
*
start
step 20
Exp
Exp Exp
CADSLEE-717/EE-453@IITB 9
Semantic Analysis The semantic analyzer uses the syntax tree and the
information in the symbol table to check the source program for semantic consistency with the language definition.
Gathers type information and saves it in either the syntax tree or the symbol table, for subsequent use during intermediate-code generation.
An important part of semantic analysis is type checking, where the compiler checks that each operator has matching operands. For example, many programming language definitions require an array index to be an integer; the compiler must report an error if a floating-point number is used to index an array.
The language specification may permit some type conversions called coercions. For example, a binary arithmetic operator may be applied to either a pair of integers or to a pair of floating-point numbers. If the operator is applied to a floating-point number and an integer, the compiler may convert or coerce the integer into a floating-point number.
30 Sep 2012
CADSL30 Sep 2012 EE-717/EE-453@IITB 10
Semantic Analysis
Understand/annotate meaning of the program● Syntax-directed translation● Check semantic errors
● Inconsistent variable definitions and uses● Type systems
● Collect knowledge of the input program● Symbol tables● Scopes
CADSLEE-717/EE-453@IITB 11
Compiler Architecture
Scanner(lexical
analysis)
Parser(syntax
analysis)
CodeOptimizer
SemanticAnalysis
(IC generator)
CodeGenerator
SymbolTable
Sourcelanguage
tokens Syntacticstructure
IntermediateLanguage
Targetlanguage
IntermediateLanguage
30 Sep 2012
CADSL30 Sep 2012 EE-717/EE-453@IITB 12
Intermediate Code Generation
Representation of the input program● Internal to the compiler● Encode knowledge collected during compilation● Varied forms and levels
● Typically a compiler use more than one kind of IR● Desired properties
● Easy to produce, manipulate, and translate into the target code
CADSL
IR scheme
30 Sep 2012 EE-717/EE-453@IITB 13
• front end produces IR
• optimizer transforms IR to more efficient program
• back end transforms IR to target code
CADSL
Kinds of IR Abstract syntax trees (AST) Linear operator form of tree (e.g., postfix notation)
Directed acyclic graphs (DAG) Control flow graphs (CFG) Program dependence graphs (PDG) Static single assignment form (SSA) 3-address code Hybrid combinations
30 Sep 2012 EE-717/EE-453@IITB 14
CADSL
Categories of IR Structural
● graphically oriented (trees, DAGs)● nodes and edges tend to be large● heavily used on source-to-source translators
Linear● pseudo-code for abstract machine● large variation in level of abstraction● simple, compact data structures● easier to rearrange
Hybrid● combination of graphs and linear code (e.g. CFGs)● attempt to achieve best of both worlds
30 Sep 2012 EE-717/EE-453@IITB 15
CADSL
Important IR properties
Ease of generation Ease of manipulation Cost of manipulation Level of abstraction Freedom of expression (!) Size of typical procedure Original or derivative
30 Sep 2012 EE-717/EE-453@IITB 16
Subtle design decisions in the IR can have far-reaching effects on the speed and effectiveness of the compiler! Degree of exposed detail can be crucial
CADSL
Abstract Syntax Tree
30 Sep 2012 EE-717/EE-453@IITB 17
An AST is a parse tree
A linear operator form of this tree (postfix) would be:
x 2 y * -
CADSL
Directed Acyclic Graph
30 Sep 2012 EE-717/EE-453@IITB 18
A DAG is an AST with unique, shared nodes for each value.
x := 2 * y + sin(2*x)z := x / 2
CADSL
Control Flow Graph A CFG models transfer of control in a program● nodes are basic blocks (straight-line blocks of
code)● edges represent control flow (loops, if/else, goto
…)
30 Sep 2012 EE-717/EE-453@IITB 19
if x = y thenS1
elseS2
endS3
CADSL
3-address code
30 Sep 2012 EE-717/EE-453@IITB 20
Statements take the form: x = y op z● single operator and at most three names
x – 2 * y t1 = 2 * yt2 = x – t1
Advantages:● Compact form● Names for intermediate values
CADSL
Typical 3-address codes
assignments x = y op z
x = op y
x = y[i]
x = y
branches goto L
conditional branches if x relop y goto L
procedure calls param xparam ycall p
address and pointer assignments
x = &y*y = z
30 Sep 2012 EE-717/EE-453@IITB 21
CADSL
IR choices Other hybrids exist
● combinations of graphs and linear codes● CFG with 3-address code for basic blocks
Many variants used in practice● no widespread agreement● compilers may need several different IRs!
Advice:● choose IR with right level of detail● keep manipulation costs in mind
30 Sep 2012 EE-717/EE-453@IITB 22
CADSL
Static Single Assignment Form
Goal: simplify procedure-global optimizations
Definition:
Program is in SSA form if every
variable
is only assigned once
30 Sep 2012 EE-717/EE-453@IITB 23
CADSL
Static Single Assignment (SSA) Each assignment to a temporary is given a unique name● All uses reached by that assignment are
renamed● Compact representation● Useful for many kinds of compiler optimization
…
30 Sep 2012 EE-717/EE-453@IITB 24
Ron Cytron, et al., “Efficiently computing static single assignment form and the control dependence graph,” ACM TOPLAS., 1991.
x := 3;x := x + 1;x := 7;x := x*2;
x1 := 3;x2 := x1 + 1;x3 := 7;x4 := x3*2;
CADSL
Why Static?
Why Static?● We only look at the static program● One assignment per variable in the program
At runtime variables are assigned multiple times!
30 Sep 2012 EE-717/EE-453@IITB 25
CADSL
Example: Sequence
a := b + cb := c + 1d := b + ca := a + 1e := a + b
a1 := b1 + c1b2 := c1 + 1d1 := b2 + c1a2 := a1 + 1e1 := a2 + b2
Original SSA
Easy to do for sequential programs:
30 Sep 2012 EE-717/EE-453@IITB 26
CADSL
Example: Condition
if B thena := b
elsea := c
end… a …
if B thena1 := b
elsea2 := c
End
… a? …
Original SSA
Conditions: what to do on control-flow merge?
30 Sep 2012 EE-717/EE-453@IITB 27
CADSL
Solution: Φ-Function
if B thena := b
elsea := c
end… a …
if B thena1 := b
elsea2 := c
Enda3 := Φ(a1,a2)
… a3 …
Original SSA
Conditions: what to do on control-flow merge?
30 Sep 2012 EE-717/EE-453@IITB 28
CADSL
The Φ-Function
Φ-functions are always at the beginning of a basic block
Select between values depending on control-flow
ak+1 := Φ (a1…ak): the block has k preceding blocks
PHI-functions are evaluated simultaneously within a basic block.
30 Sep 2012 EE-717/EE-453@IITB 29
CADSL
SSA and CFG
SSA is normally used for control-flow graphs (CFG)
Basic blocks are in 3-address form
30 Sep 2012 EE-717/EE-453@IITB 30
CADSL
Recall: Control flow graph
A CFG models transfer of control in a program● nodes are basic blocks (straight-line blocks of
code)● edges represent control flow (loops, if/else, goto
…)
© Marcus Denker
if x = y thenS1
elseS2
endS3
30 Sep 2012 EE-717/EE-453@IITB 31
CADSL
SSA: a Simple Example
if B thena1 := 1
elsea2 := 2
Enda3 := PHI(a1,a2)
… a3 …
30 Sep 2012 EE-717/EE-453@IITB 32
CADSL
Recall: IR
• front end produces IR• optimizer transforms IR to more efficient program• back end transform IR to target code
30 Sep 2012 EE-717/EE-453@IITB 33
CADSL
SSA as IR
30 Sep 2012 EE-717/EE-453@IITB 34
CADSL
Transforming to SSA
Problem: Performance / Memory● Minimize number of inserted Φ-functions● Do not spend too much time
Many relatively complex algorithms● We do not go too much into detail● See literature!
30 Sep 2012 EE-717/EE-453@IITB 35
CADSL
Minimal SSA
Two steps: ● Place Φ-functions● Rename Variables
Where to place Φ-functions?
We want minimal amount of needed Φ● Save memory● Algorithms will work faster
30 Sep 2012 EE-717/EE-453@IITB 36
CADSLEE-717/EE-453@IITB 37
Translation of an assignment
Translation of an assignment statement
30 Sep 2012
CADSL
Optimization: The Idea
Transform the program to improve efficiency
Performance: faster execution Size: smaller executable, smaller memory footprint
Tradeoffs: 1) Performance vs. Size
2) Compilation speed and memory
3830 Sep 2012 EE-717/EE-453@IITB
CADSL
No Magic Bullet!
There is no perfect optimizer Example: optimize for simplicity
Opt(P): Smallest Program
Q: Program with no output, does not stop
Opt(Q)?
3930 Sep 2012 EE-717/EE-453@IITB
CADSL
Optimization on many levels
Optimizations both in the optimizer and back-end
4030 Sep 2012 EE-717/EE-453@IITB
CADSL© Marcus Denker
Examples for Optimizations Constant Folding / Propagation Copy Propagation Algebraic Simplifications Strength Reduction Dead Code Elimination
● Structure Simplifications Loop Optimizations Partial Redundancy Elimination Code Inlining
4130 Sep 2012 EE-717/EE-453@IITB