Upload
artie
View
37
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Instruction Selection. Mooly Sagiv Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc.html. Already Studied. Source program (string). lexical analysis. Tokens. syntax analysis. Abstract syntax tree. semantic analysis. Abstract syntax tree. - PowerPoint PPT Presentation
Citation preview
Instruction Selection
Mooly Sagiv
Schrierber 31703-640-7606
Wed 10:00-12:00
html://www.math.tau.ac.il/~msagiv/courses/wcc.html
Already StudiedSource program (string)
lexical analysis
syntax analysis
semantic analysis
Translate
Tokens
Abstract syntax tree
Tree IR
Abstract syntax tree
Cannon
Cannonical Tree IR
Instruction Selection• Input:
– Cannonical IR– Description of translation rules from IR into
machine language
• Output– Machine code
• Unbounded number of registers
• Some prologue and epilogue instructions are missing
LABEL(l3)
CJUMP(EQ, TEMP t128, CONST 0, l0, l1)
LABEL( l1)
MOVE(TEMP t131, TEMP t128)
MOVE(TEMP t130, CALL(nfactor, BINOP(MINUS, TEMP t128, CONST 1)))
MOVE(TEMP t129, BINOP(TIMES, TEMP t131, TEMP t130))
LABEL(l2)
MOVE(TEMP t103, TEMP t129)
JUMP(NAME lend)
LABEL(l0)
MOVE(TEMP t129, CONST 1)
JUMP(NAME l2)
l3: beq t128, $0, l0
l1: or t131, $0, t128
addi t132, t128, -1
or $4, $0, t132
jal nfactor
or t130, $0, $2
or t133, $0, t131
mult t133, t130
mflo t133
or t129, $0, t133
l2: or t103, $0, t129
b lend
l0: addi t129, $0, 1
b l2
The Challenge• “Clumps” of trees can be translated into a
single machine instruction
MOVE
MEM
BINOP
TEMP t1
PLUS TEMP t2 CONST c
lw t1, c(t2)
Outline• The “Tiling” problem
• An optimal solution
• An optimum solution (via dynamic programming)
• Tree grammars
• The Pentium architecture
• Instruction selection for Tiger– Abstract data type for machine instructions
Instruction set inthe Jouette Machine
ADD ri rj + rk
MUL ri rj * rk
SUB ri rj - rk
DIV ri rj / rk
ADDI ri rj + c
SUBI ri rj - c
LOAD ri M[rj + c]
STORE M[ri + c] rj
MOVEM M[ri] M[rj]
Tree Patterns for Jouette MachineName Effect Trees
rj TEMP
ADD
MUL
ri rj + rj
ri rj * rk
+(e1, e2)
*(e1, e2)
SUB
DIV
ri rj - rk
ri rj / rk
-(e1, e2)
/(e1, e2)
Tree Patterns for Jouette Machine(cont)Name Effect Trees
ADDI ri rj + c +(e, CONST c)
+(CONST c, e)
CONST c
SUBI ri rj - c -(e, CONST c)
LOAD ri M[rj + c] MEM(+(e, CONST(c))
MEM(+(CONST(c) ,e)
MEM(CONST c)
MEM(e)
STORE M[ri + c] rjMOVE(MEM(+(e1, CONST c)), e2)
MOVE(MEM(+(CONST c, e1)), e2)
MOVE(MEM(CONST c), e2)
MOVE(MEM(e1), e2)
MOVEM M[ri] M[rj] MOVE(MEM(e1), MEM(e2))
The Tiling Problem• Cover the tree with non overlapping tiles
from the tree patterns
• Minimize “the cost” of the generated code
Example• Tiger input a[e] := x
PLUS
MOVE
MEM
BINOP
MEM BINOP
BINOP
PLUS TEMP FP CONST -8
TIMES TEMP te CONST 4
BINOP
PLUS TEMP FP CONST -4
MEM
PLUS
MOVE
MEM
BINOP
MEM BINOP
BINOP
PLUS
TEMP FP
CONST -8
TIMES TEMP te CONST 4
BINOP
PLUS
TEMP FP
CONST -4
ADDI r2 r0+ 4
LOAD r1 M[FP + -8]
MUL r2 te * r2
ADD r1r1 +r2
MEM
LOAD r2 M[FP + -4]
STORE M[ r1+ 0] r2
PLUS
MOVE
MEM
BINOP
MEM BINOP
BINOP
PLUS
TEMP FP
CONST -8
TIMES TEMP te CONST 4
BINOP
PLUS
TEMP FP
CONST -4
LOAD r1 M[FP + -8]
ADDI r2 r0+ 4
MUL r2 te * r2
ADD r1r1 +r2
LOAD r2 M[FP + -4]
STORE M[ r1+ 0] r2
MEM
PLUS
MOVE
MEM
BINOP
MEM BINOP
BINOP
PLUS
TEMP FP
CONST -8
TIMES TEMP te CONST 4
BINOP
PLUS
TEMP FP
CONST -4
ADDI r2 r0+ 4
LOAD r1 M[FP + -8]
MUL r2 te * r2
ADD r1r1 +r2
MEM
MOVEM M[ r1]M[ r2 ]
ADDI r2 FP + -4
PLUS
MOVE
MEM
BINOP
MEM
BINOP
BINOP
PLUS
TEMP FP
CONST -8
TIMESTEMP te
CONST 4
BINOP
PLUS
TEMP FP
CONST -4
LOAD r1 M[FP + -8]
ADDI r2 r0+ 4
MUL r2 te * r2
ADD r1r1 +r2
ADD r2 FP + r2
MOVEM M[ r1]M[ r2 ]
MEM
The Tiling Problem• Cover the tree with non overlapping tiles
from the tree patterns
• Minimize “the cost” of the generated code
• Assures that every tree can be covered– Tree patterns for all the “tiny” tiles
PLUS
MOVE
MEM
BINOP
MEM BINOP
BINOP
PLUS
TEMP FP CONST -8
TIMES TEMP te CONST 4
BINOP
PLUS
TEMP FP
CONST -4
ADDI r2 r0+ 4MUL r2 te * r2
ADD r1r1 +r2
ADDI r2 FP + -4
ADDI r1 r0 + -8
ADD r1 FP + r1
LOAD r1 M[r1 +0]
MEM
LOAD r2 M[r2+ 0]
STORE M[ r1+ 0] r2
PLUS
MOVE
MEM
BINOP
MEM
BINOP
BINOP
PLUS
TEMP FP
CONST -8
TIMESTEMP te
CONST 4
BINOP
PLUSTEMP FP CONST -4
ADDI r1 r0 + -8
ADD r1 FP + r1
LOAD r1 M[r1+ 0]
ADDI r2 r0+4
MUL r2 te * r2
ADD r1r1 +r2
ADDI r2 r0+ -4
ADD r2 FP + r2
LOAD r2 M[r2+ 0]
STORE M[ r1] r2
MEM
Optimal vs. Optimum Tiling
• Optimum Tiling – Minimum cost of tile sum
• Optimal Tiling– No two adjacent tiles can be combined to reduce
the cost
PLUS
MOVE
MEM
BINOP
MEM BINOP
BINOP
PLUS
TEMP FP
CONST -8
TIMES TEMP te CONST 4
BINOP
PLUS
TEMP FP
CONST -4
ADDI r2 r0+ 4
LOAD r1 M[FP + -8]
MUL r2 te * r2
ADD r1r1 +r2
MEM
LOAD r2 M[FP + -4]
STORE M[ r1+ 0] r2
PLUS
MOVE
MEM
BINOP
MEM BINOP
BINOP
PLUS
TEMP FP CONST -8
TIMES TEMP te CONST 4
BINOP
PLUS
TEMP FP
CONST -4
ADDI r2 r0+ 4MUL r2 te * r2
ADD r1r1 +r2
ADDI r2 FP + -4
ADDI r1 r0 + -8
ADD r1 FP + r1
LOAD r1 M[r1 +0]
MEM
LOAD r2 M[r2+ 0]
STORE M[ r1+ 0] r2
Optimum Tiling
LOAD r1 M[FP + -8]
ADDI r2 r0+ 4
MUL r2 te * r2
ADD r1r1 +r2
LOAD r2 M[FP + -4]
STORE M[ r1+ 0] r2
LOAD r1 M[FP + -8]
ADDI r2 r0+ 4
MUL r2 te * r2
ADD r1r1 +r2
ADD r2 FP + r2
MOVEM M[ r1]M[ r2 ]
RISC vs. CISC Machines
Feature RISC CISC
Registers 32 6, 8, 16
Register Classes One Some
Arithmetic Operands Registers Memory+Registers
Instructions 3-addr 2-addr
Addressing Modesr
M[r+c] (l,s)several
Instruction Length 32 bits Variable
Side-effects None Some
Instruction-Cost “Uniform” Varied
Architecture and Tiling Algorithm
• RISC – Cost of operations is uniform– Optimal tiling usually suffices
• CISC– Optimum tiling may be significantly better
Optimal Tiling using “Maximal Munch”
• Top-down traversal of the IR tree
• At every node try the relevant tree patterns in “cost-order”
• Generate assembly code in reverse order
• Tiny tiles guarantees that we can never get stack
static void munchStm(T_stm s) {
switch(s->kind) {
case T_MOVE:
T_exp dst = s->u.MOVE.dst, src=s->u.MOVE.src;
if (dst->kind==T_MEM)
if (dst->u.MEM->kind==T_BINOP &&
dst->u.MEM->u.BINOP.op==T_PLUS &&
dst->u.MEM->u.BINOP.right.kind==T_CONST) {
T_exp e1 =dst->u.MEM->u.BINOP.left, e2=src;
/* MOVE(MEM(BINOP(PLUS, e1, CONST c,), e2) */
munchExp(e1); munchExp(e2); emit(“STORE”); }
else if (dst->u.MEM->kind==T_BINOP &&
dst->u.MEM->u.BINOP.op==T_PLUS &&
dst->u.MEM->u.BINOP.left.kind==T_CONST) {
T_exp e1 =dst->u.MEM->u.BINOP.right, e2=src;
/* MOVE(MEM(BINOP(PLUS, CONST c, e1), e2) */
munchExp(e1); munchExp(e2); emit(“STORE”); }
static void munchStm(T_stm s) {
switch(s->kind) {
case T_MOVE:
T_exp dst = s->u.MOVE.dst, src=s->u.MOVE.src;
if (dst->kind==T_MEM)
if (… ) {
/* MOVE(MEM(BINOP(PLUS, e1, CONST c,), e2) */
munchExp(e1); munchExp(e2); emit(“STORE”); }
else if (…) {
/* MOVE(MEM(BINOP(PLUS, CONST c, e1), e2) */
munchExp(e1); munchExp(e2); emit(“STORE”); }
else if (src->kind==T_MEM) {
T_exp e1= dst->u.MEM, e2=src->u.MEM;
/* MOVE(MEM(e1), MEM(e2)) */
munchExp(e1), munchExp(e2); emit(“MOVEM”) ; }
else { T_exp e1=dst->u.MEM, e2=src;
/* MOVE(MEM(e1), e2) */
munchExp(e1), munchExp(e2); emit(“STORE”) ;}
case T_MOVE:
T_exp dst = s->u.MOVE.dst, src=s->u.MOVE.src;
if (dst->kind==T_MEM)
if (… ) {
/* MOVE(MEM(BINOP(PLUS, e1, CONST c,), e2) */
munchExp(e1); munchExp(e2); emit(“STORE”); }
else if (…) {
/* MOVE(MEM(BINOP(PLUS, CONST c, e1), e2) */
munchExp(e1); munchExp(e2); emit(“STORE”); }
else if (…) {
/* MOVE(MEM(e1), MEM(e2)) */
munchExp(e1), munchExp(e2); emit(“MOVEM”) ; }
else { /* MOVE(MEM(e1), e2) */
munchExp(e1), munchExp(e2); emit(“STORE”) ;}
else if (dst->kind==T_TEMP) {
T_exp e=src; /* MOVE(TEMP t, e) */
munchExp(e); emit(“ADD”); }
else assert(0);
static void munchStm(T_stm s) {
MOVE(MEM(BINOP(PLUS, e1, CONST c), e2)
munchExp(e1); munchExp(e2); emit(“STORE”);
MOVE(MEM(BINOP(PLUS, CONST c, e1), e2)
munchExp(e1); munchExp(e2); emit(“STORE”);
MOVE(MEM(e1), MEM(e2))
munchExp(e1), munchExp(e2); emit(“MOVEM”) ;MOVE(TEMP t, e) munchExp(e); emit(“ADD”);
JUMP(e) …
CJUMP(e) …
LABEL(l)
}
static void munchExp(T_exp e) {
MEM(BINOP(PLUS, e, CONST c))
munchExp(e); emit(“LOAD”);
MEM(BINOP(PLUS, CONST c, e1)
munchExp(e); emit(“LOAD”);
MEM(CONST c) emit(“LOAD”);
MEM(e) munchExp(e); emit(“LOAD”);
BINOP(PLUS, e, CONST c) munchExp(e); emit(“ADDI”);
BINOP(PLUS, CONST c, e) munchExp(e); emit(“ADDI”);
BINOP(CONST c) munchExp(e); emit(“ADDI”);
BINOP(PLUS, e1, e2)
munchExp(e1; munchExp(e2); emit(“ADD”);
…
TEMP t
Example• Tiger input a[e] := x
PLUS
MOVE
MEM
BINOP
MEM BINOP
BINOP
PLUS TEMP FP CONST -8
TIMES TEMP te CONST 4
BINOP
PLUS TEMP FP CONST -4
MEM
Optimum Tiling
• Maximal munch does not necessarily produce optimum results
• The number of potential code sequences is quite big
• But Dynamic Programming yields an optimum solution in linear time
• Assign optimum cost to every sub-tree • Two phase solution
– Find the optimum cost for every subtree in a bottom up traversal
– Generate the optimum solution in a top down traversal• Skip subtrees
Dynamic Programming
• For each subtree with root n– For each tile t which matches n of cost c
• Calculate the cost of t as:
c + ci
– The cost of the subtree rooted at n is the minimum of all matching tiles
• Generate the optimum code during top-down traversal
Example
MEM
BINOP
PLUS CONST 1 CONST 2
CONST 1
Tile Instruction Tile
Cost
Leaves
Cost
Total
Cost
CONST C ADDI 1 0 1
CONST 2
Tile Instruction Tile
Cost
Leaves
Cost
Total
Cost
CONST C ADDI 1 0 1
Tile Inst. Tile
Cost
Leaves
Cost
Total
Cost
ADD 1 1+1 3
ADDI 1 1 2
ADDI 1 1 2
BINOP
PLUS CONST 1 CONST 2
CONST C
BINOP
PLUS e
BINOP
PLUS e1 e2
BINOP
PLUS CONST ce
MEM
BINOP
PLUSCONST 1 CONST 2
Tile Inst. Tile
Cost
Leaves
Cost
Total
Cost
LOAD 1 2 3
LOAD 1 1+1 3
LOAD 1 1 2
LOAD 1 1 2
e1
BINOP
PLUS e2
MEM
e
BINOP
PLUS eCONST c
MEM
MEM
MEM
BINOP
PLUS CONST ce
MEM
BINOP
PLUSCONST 1 CONST 2
Top-Down Code Generation
ADDI(1) ADDI(1)
ADDI(2)
LOAD(2)
ADDI r1 r0 + 1
LOAD r1 M[r1 + 2]
The “Schizo”-Jouette Machine
• In the spirit of Motorola 68000• Two types of registers
– data registers– address registers
• Arithmetic performed on data registers• Load and Store using address registers• Machine instruction to convert between
addresses and data
Tree Patterns for Schizo-Jouette
Name Effect Trees
dj TEMP
aj TEMP
ADD
MUL
di dj + dj
d dj * dk
d+(e1, e2)
d*(e1, e2)
SUB
DIV
di dj - dk
di dj / dk
d-(e1, e2)
d/(e1, e2)
Tree Patterns for Schizo-Jouette MachineName Effect Trees
ADDI di dj + c d+(e, CONST c)
d+(CONST c, e)
dCONST c
SUBI di dj - c d-(e, CONST c)
LOAD di M[aj + c] dMEM(+(ae, CONST(c))
dMEM(+(CONST(c) ,ae)
dMEM(CONST c)
dMEM(ae)
STORE M[ai + c] djMOVE(MEM(+(ae1, CONST c)), de2)
MOVE(MEM(+(CONST c, ae1)), de2)
MOVE(MEM( CONST c), de2)
MOVE(MEM(ae1), de2)
MOVEM M[ai] M[aj] MOVE(MEM(a1), MEM(a2))
Tree Patterns for Schizo-Jouette
Name Effect Trees
MOVEA di aj d a
MOVED ai dj a d
Tree Grammars
• A generalization of dynamic programming• Input
– A (usually ambiguous) context free grammar describing the machine tree patterns
• non-terminals correspond to machine types
• every production has machine cost
– A linearized IR tree
• Output– A parse-tree with the minimum cost
Partial Grammar for Schizo-Jouette
d TEMP ta TEMP td +(d, d)d +(d, CONST)d +(CONST, d)d MEM(+(a, CONST))d MEM(+(CONST, a))d MEM(CONST)d MEM(a)d aa d
MEM(+(CONST 1, CONST 2))
Simple Instruction-Selection in the Pentium Architecture
• Six general purpose registers• The multiply requires that the
left arg. is eax
• Two-address instructions
• Arithmetic on memory
• Several addressing modes• Variable-length instructions• Instructions with side-effects
• Good register allocation
• For t1 t2 * t3
– mov eax, t1
– mul t2
– mov t3, eax
• For t1 t2 + t3
– mov t1, t2
– add t1, t3
• add [ebp –8], ecx – mov eax, [ebp –8]– add eax, ecx– mov [ebp-8], eax
Instruction-Selection in the Tiger Compiler
• Use maximal munch
• Store the generated code in an abstract data type– The following phases are machine-independent– Control flow of the program is explicitly
represented– Special representation of MOVE
• Register allocation can remove
/* assem.h */
typedef struct {Temp_labelList labels;} AS_targets;
AS_targets AS_Targets(Temp_labelList labels);
typedef struct AS_instr_ *AS_instr;
typedef enum {I_OPER, I_LABEL, I_MOVE} AS_instr_kind;
struct AS_instr_ {
AS_instr_kind kind;
union {struct {string assem; Temp_tempList dst, src;
AS_targets jumps;} OPER;
struct {string assem; Temp_label label;} LABEL;
struct {string assem; Temp_tempList dst, src;} MOVE;
} u;
};
AS_instr AS_Oper(string a, Temp_tempList d, Temp_tempList s, AS_targets j);
AS_instr AS_Label(string a, Temp_label label);
AS_instr AS_Move(string a, Temp_tempList d, Temp_tempList s);
Summary
• Type of Machines– CISC(Pentium, MC68000, IBM 370)– RISC(MIPS, Sparc) 1990-– Other
• VLIW(Itanium) 2000-
• Types of Instruction-Selection Algorithms– Ad hock– Optimal using Maximal-Munch– Optimum using Dynamic Programming– Optimum using Tree-Grammars and ambiguous
parsers