42
COMPILER CONSTRUCTION Principles and Practice Kenneth C. Louden

Chapter Eight(1)

  • Upload
    bolovv

  • View
    1.169

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter Eight(1)

COMPILER CONSTRUCTION

Principles and Practice

Kenneth C. Louden

Page 2: Chapter Eight(1)

8. Code Generation

PART ONE

Page 3: Chapter Eight(1)

• Generate executable code for a target machine that is a faithful representation of the semantics of the source code

• Depends not only on the characteristics of the source language but also on detailed information about the target architecture, the structure of the runtime environment, and the operating system running on the target machine

Page 4: Chapter Eight(1)

Contents

Part One8.1 Intermediate Code and Data Structure for code Generation8.2 Basic Code Generation Techniques

Other Parts8.3 Code Generation of Data Structure Reference8.4 Code Generation of Control Statements and Logical Expression8.5 Code Generation of Procedure and Function calls8.6 Code Generation on Commercial Compilers: Two Case Studies8.7 TM: A Simple Target Machine8.8 A Code Generator for the TINY Language8.9 A Survey of Code Optimization Techniques8.10 Simple Optimizations for TINY Code Generator

Page 5: Chapter Eight(1)

8.1 Intermediate Code and Data Structures for Code Generation

Page 6: Chapter Eight(1)

8.1.1 Three-Address Code

Page 7: Chapter Eight(1)

• A data structure that represents the source program during translation is called an intermediate representation, or IR, for short

• Such an intermediate representation that resembles target code is called intermediate code– Intermediate code is particularly useful when the goal of the

compiler is to produce extremely efficient code;– Intermediate code can also be useful in making a compiler more

easily retarget-able.

• Study two popular forms of intermediate code: Three -Address code and P-code

• The most basic instruction of three-address code is designed to represent the evaluation of arithmetic expressions and has the following general form:

X=y op z

Page 8: Chapter Eight(1)

2*a+(b-3) with syntax tree + * - 2 a b 3 The corresponding three-address code is T1 = 2 * a T2 = b – 3 T3 = t1 + t2

Page 9: Chapter Eight(1)

Figure 8.1 Sample TINY program: { sample program in TINY language -- computes factorial }

read x ; { input an integer }if 0 > x then { don’t compute if x <= 0 } fact:=1; repeat fact:=fact*x; x:=x-1until x=0;write fact { output factorial of x } ends

Page 10: Chapter Eight(1)

• The Three-address codes for above TINY program read x

t1=x>0if_false t1 goto L1fact=1label L2t2=fact*xfact=t2t3=x-1x=t3t4= x= =0if_false t4 goto L2

write factlabel L1halt

Page 11: Chapter Eight(1)

8.1.2 Data Structures for the Implementation of Three-Address Code

Page 12: Chapter Eight(1)

• The most common implementation is to implement three-address code as quadruple, which means that four fields are necessary: – One for the operation and three for the addresses

• A different implementation of three-address code is called a triple:– Use the instructions themselves to represent the

temporaries.

• It requires that each three-address instruction be reference-able, either as an index in an array or as a pointer in a linked list.

Page 13: Chapter Eight(1)

• Quadruple implementation for the three-address code of the previous example (rd, x , _ , _ )(gt, x, 0, t1 )(if_f, t1, L1, _ )(asn, 1,fact, _ )(lab, L2, _ , _ )(mul, fact, x, t2 )(asn, t2, fact, _ )(sub, x, 1, t3 )(asn, t3, x, _ )(eq, x, 0, t4 )(if_f, t4, L2, _)(wri, fact, _, _ )(lab, L1, _ , _ )(halt, _, _, _ )

Page 14: Chapter Eight(1)

• C code defining data structures for the quadruples

Typedef enum { rd, gt, if_f, asn, lab, mul, sub, eq, wri, halt, …} OpKind; Typedef enum { Empty, IntConst, String } AddrKind; Typedef struct

{ AddrKind kind; Union

{ int val; char * name;} contents;

} Address Typedef struct

{ OpKind op; Address addr1, addr2, addr3;} Quad

Page 15: Chapter Eight(1)

• A representation of the three-address code of the previous example as triples(0) (rd, x , _ )(1) (gt, x, 0)(2) (if_f, (1), (11) )(3) (asn, 1,fact )(4) (mul, fact, x)(5) (asn, (4), fact )(6) (sub, x, 1 )(7) (asn, (6), x)(8) (eq, x, 0 )(9) (if_f, (8), (4))(10) (wri, fact, _ )(11) (halt, _, _)

Page 16: Chapter Eight(1)

8.1.3 P-Code

Page 17: Chapter Eight(1)

• It was designed to be the actual code for a hypothetical stack machine, called the P-machine, for which an interpreter was written on various actual machines

• Since P-code was designed to be directly executable, it contains an implicit description of a particular runtime environment, including data sizes, as well as a great deal of information specific to the P-machines.

Page 18: Chapter Eight(1)

• The P-machine consists of a code memory, an unspecified data memory for named variables, and a stack for temporary data, together with whatever registers are needed to maintain the stack and support execution

2*a+(b-3)

ldc 2 ; load constant 2lod a ; load value of variable ampi ; integer multiplicationlod b ; load value of variable bldc 3 ; load constant 3sbi ; integer subtractionadi ; integer addition

Page 19: Chapter Eight(1)

1) Comparison of P-Code to Three-Address Code P-code is in many respects closer to actual machine code than three-address code.

P-code instructions also require fewer addresses:One the other hand, P-code is less compact than three-address code in terms of numbers of instructions, and P-code is not “self-contained” in that the instructions operate implicitly on a stack.

2) Implementation of P-Code Historically, P-code has largely been generated as a text file, but the previous descriptions of internal data structure implementations for three-address code will also work with appropriate modification for P-code.

Page 20: Chapter Eight(1)

8.2 Basic Code Generation Techniques

Page 21: Chapter Eight(1)

8.2.1 Intermediate Code or Target Code as a Synthesized Attribute

Page 22: Chapter Eight(1)

• Intermediate code generation can be viewed as an attribute computation.

• This code becomes a synthesized attribute that can be defined using an attribute grammar and generated either directly during parsing or by a post-order traversal of the syntax tree.

• For an example: a small subset of C expressions:Exp id = exp | aexp

Aexp aexp + factor | factor

Factor (exp) | num | id

Page 23: Chapter Eight(1)

Grammar Rule Semantic Rules

Exp1 id = exp2 exp1.pcode=”lda”|| id.strval++exp2.pcode++”stn”

Exp aexp exp.pcode=aexp.pcode

Aexp1 aexp2 + factor aexp1.pcode=aexp2.pcode++factor.pcode++”adi”

Aexp factor aexp.pcode=factor.pcode

Factor (exp) factor.pcode=exp.pcode

Factor num factor.pcode=”ldc”||num.strval

Factor id factor.pcode=”lod”||id.strval

1) P-Code

The expression (x=x+3)+4 has the following P-Code attribute:Lda xLod xLdc 3AdiStnLdc 4Adi

Page 24: Chapter Eight(1)

Grammar Rule Semantic Rules

Exp1 id = exp2 exp1.name=exp2.name Exp1.tacode=exp2.tacode++ id.strval||”=”||exp2.name

Exp aexp exp.name=aexp.name; exp.tacode=aexp.tacode

Aexp1 aexp2 + factor aexp1.name=newtemp( ) aexp1.tacode=aexp2.tacode++factor.tacode++ aexp1.name||”=”||aexp2.name||”+”||factor.name

Aexp factor aexp.name=factor.name; aexp.tacode=factor.tacode

Factor (exp) factor.name=exp.name; factor.tacode=exp.tacode

Factor num factor.namenum.strval; factor.tacode=” “

Factor id factor.namenum.strval; factor.tacode=” “

T1=x+3 x=t1 t2=t1+4

2) Three-Address Code

Page 25: Chapter Eight(1)

8.2.2 Practical Code Generation

Page 26: Chapter Eight(1)

The basic algorithm can be described as the following recursive procedure

Procedure gencode (T: treenode);BeginIf T is not nil then

Generate code to prepare for code of left child of T;Gencode(left child of T);Generate code to prepare for code of right child of T;Gencode(right child of T);Generate code to implement the action of T;

End;

Page 27: Chapter Eight(1)

Typedef enum {plus, assign} optype;Typedef enum {OpKind, ConstKind, IdKind} NodeKind;

Typedef struct streenode{ NodeKind kind; Optype op; /* used with OpKind */ Struct streenod *lchild, *rchild; Int val; /* used with ConstKind */ Char * strval; /* used for identifiers and numbers */} streenode;

typedef streenode *syntaxtree;

Page 28: Chapter Eight(1)

Expression (x=x+3) + 4 +

x= 4

+

x 3

Page 29: Chapter Eight(1)

• A genCode procedure to generate P-code.

Void genCode (SyntaxTree t){ char codestr[CODESIZE];/* CODESIZE = max length of 1 line of P-code */if (t !=NULL){ switch (t->kind)

{ case opKind:switch(t->op){ case Plus:

genCode(t->lchild);genCode(t->rchild);emitCode(“adi”);break;

Page 30: Chapter Eight(1)

case Assign: sprintf(codestr, “%s %s”, “lda”, t->strval);

emitcode(codestr);genCode(t->lchild);emitCode(“stn”);break;

}break;

case ConstKind:sprintf(codestr,”%s %s”,”ldc”,t->strval);emitCode(codestr);break;

Page 31: Chapter Eight(1)

case IdKind:

sprintf(codestr,”%s %s”,”lod”,t->strval);

emitCode(codestr);

break;

default:

emitCode(“error”);

break;

}

}

}

Page 32: Chapter Eight(1)

Yacc specification for the generation P-code according to the attribute grammar of Table 8.1

%{#define YYSTYPE char *

/* make Yacc use strings as values *//* other inclusion code … */%}%token NUM ID%%exp : ID

{ sprintf (codestr, “%s %s”, “lda”, $1); emitCode ( codestr); }‘ = ‘ exp{ emitCode (“stn”); }

Page 33: Chapter Eight(1)

| aexp;aexp : aexp ‘+’ factor {emitCode(“adi”);}| factor;factor : ‘(‘ exp ‘)’| NUM {sprintf(codestr, “%s %s”,”ldc”,$1);

emitCode(codestr);}| ID {sprintf(codestr,”%s %s”, “lod”,$1);

emitCode(codestr);};

%%/*utility functions… */

Page 34: Chapter Eight(1)

8.2.3 Generation of Target Code from Intermediate Code

Page 35: Chapter Eight(1)

• Code generation from intermediate code involves either or both of two standard techniques: – Macro expansion and Static simulation

• Macro expansion involves replacing each kind of intermediate code instruction with an equivalent sequence of target code instructions.

• Static simulation involves a straight-line simulation of the effects of the intermediate code and generating target code to match these effects.

Page 36: Chapter Eight(1)

• Consider the expression (x=x+3) +4, translate the P-code into three-address code:

Lad xLod xLdc 3Adi t1=x+3Stn x=t1Ldc 4Adi t2=t1+4

• We perform a static simulation of the P-machine stack to find three-address equivalence for the given code

Page 37: Chapter Eight(1)

< -- top of stack 3

X Address of x

T1=x+3

< -- top of stack T1

Addrss of x

Page 38: Chapter Eight(1)

X=t1 < -- top of stack

T1 < -- top of stack

4 T1

T2=t1+4

< -- top of stack T2

Page 39: Chapter Eight(1)

• We now consider the case of translating from three-address code to P-code, by simple macro expansion. A three-address instruction:

a = b + c• Can always be translated into the P-code sequence

lda a

lod b

lod c

adi

sto

Page 40: Chapter Eight(1)

• Then, the three-address code for the expression (x=x+3)+4:T1 = x + 3X = t1T2 = t1 + 4

• Can be translated into the following P-code:Lda t1Lod xLdc 3AdiStoLad xLod t1StoLda t2Lod t1Ldc 4AdiSto

Page 41: Chapter Eight(1)

If we want to eliminate the extra temporaries, then a more sophisticated scheme than pure macro expansion must be used.

T2 + X, t1 + 4 X 3

Page 42: Chapter Eight(1)

End of Part One

THANKS