C program compiler presentation

Preview:

Citation preview

Rigvendra Kumar Vardhan

M.TECH ECE

Pondicherry University

1

Interpreters (direct execution) Assemblers Preprocessors Text formatters (non-WYSIWYG) Analysis tools

CS 540 Spring 2013 GMU 2

InterpreterA program that reads a source program

and produces the results of executing that program

CompilerA program that translates a program

from one language (the source) to another (the target)

3

Correctness Speed (runtime and compile time)

Degrees of optimization Multiple passes

Space Feedback to user Debugging

CS 540 Spring 2013 GMU 4

InterpreterExecution engineProgram execution interleaved with

analysisrunning = true;while (running) { analyze next statement; execute that statement;}

May involve repeated analysis of some statements (loops, functions)

5

Read and analyze entire program Translate to semantically equivalent

program in another languagePresumably easier to execute or more

efficientShould “improve” the program in some

fashion Offline process

Tradeoff: compile time overhead (preprocessing step) vs execution performance

6

CompilersFORTRAN, C, C++, Java, COBOL, etc.

etc.Strong need for optimization, etc.

InterpretersPERL, Python, awk, sed, sh, csh,

postscript printer, Java VMEffective if interpreter overhead is low

relative to execution cost of language statements

7

8

Source code

Compiler

Assembly code

AssemblerObject code(machine code)

Fully-resolved objectcode (machine code)

Executable image

Linker

Loader

Series of program representations

Intermediate representations optimized

for program manipulations of various

kinds (checking, optimization)

Become more machine-specific, less

language-specific as translation proceeds

9

First approximation Front end: analysis

Read source program and understand its structure and meaning

Back end: synthesis Generate equivalent target language program

10

Source TargetFront End Back End

Must recognize legal programs (& complain about illegal ones)

Must generate correct code Must manage storage of all variables Must agree with OS & linker on target

format

11

Source TargetFront End Back End

Need some sort of Intermediate Representation (IR)

Front end maps source into IR Back end maps IR to target machine code

12

Source TargetFront End Back End

CS 540 Spring 2013 GMU 13

Scanner(lexical

analysis)

Parser(syntax

analysis)

CodeOptimizer

SemanticAnalysis

(IC generator)

CodeGenerator

SymbolTable

Sourcelanguage

tokens Syntacticstructure

IntermediateLanguage

Targetlanguage

IntermediateLanguage

Lexical Analysis Syntax Analysis Semantic Analysis Runtime environments Code Generation Code Optimization

CS 540 Spring 2013 GMU 14

15

Source code(character stream)

Lexical analysis

ParsingToken stream

Abstract syntax tree

Intermediate Code Generation

Intermediate code

Optimization

Code generationAssembly code

Intermediate code

Front end(machine-independent)

Back end(machine-dependent)

Split into two partsScanner: Responsible for converting

character stream to token stream Also strips out white space, comments

Parser: Reads token stream; generates IR Both of these can be generated

automaticallySource language specified by a formal

grammarTools read the grammar and generate

scanner & parser (either table-driven or hard coded)

16

Scanner Parsersource tokens IR

Token stream: Each significant lexical chunk of the program is represented by a tokenOperators & Punctuation: {}[]!+-=*;: …Keywords: if while return gotoIdentifiers: id & actual nameConstants: kind & value; int, floating-

point character, string, …

17

Input text// this statement does very l i t t leif (x >= y) y = 42;

Token Stream

Note: tokens are atomic items, not character strings

18

IF LPAREN ID(x) GEQ ID(y)

RPAREN ID(y) BECOMES INT(42) SCOLON

Many different forms(Engineering tradeoffs)

Common output from a parser is an abstract syntax treeEssential meaning of the program

without the syntactic noise

19

Token Stream Input Abstract Syntax Tree

20

IF LPAREN ID(x)

GEQ ID(y) RPAREN

ID(y) BECOMES

INT(42) SCOLON

ifStmt

>=

ID(x) ID(y)

assign

ID(y) INT(42)

During or (more common) after parsingType checkingCheck for language requirements like

“declare before use”, type compatibilityPreliminary resource allocationCollect other information needed by

back end analysis and code generation

21

ResponsibilitiesTranslate IR into target machine codeShould produce fast, compact codeShould use machine resources

effectivelyRegisters InstructionsMemory hierarchy

22

Typically split into two major parts with sub phases“Optimization” – code improvements

May well translate parser IR into another IR

Code generation Instruction selection & schedulingRegister allocation

23

Input

if (x >= y) y = 42;

Output

mov eax,[ebp+16] cmp eax,[ebp-8] jl L17 mov [ebp-8],42L17:

24

lda $30,-32($30)stq $26,0($30)stq $15,8($30)bis $30,$30,$15bis $16,$16,$1stl $1,16($15)lds $f1,16($15)sts $f1,24($15)ldl $5,24($15)bis $5,$5,$2s4addq $2,0,$3ldl $4,16($15)mull $4,$3,$2ldl $3,16($15)addq $3,1,$4mull $2,$4,$2ldl $3,16($15)addq $3,1,$4mull $2,$4,$2stl $2,20($15)ldl $0,20($15)br $31,$33

$33:bis $15,$15,$30ldq $26,0($30)ldq $15,8($30)addq $30,32,$30ret $31,($26),1

25

Optimized Codes4addq $16,0,$0mull $16,$0,$0addq $16,1,$16mull $0,$16,$0mull $0,$16,$0ret $31,($26),1

Unoptimized Code

26

Source code(character stream)

Lexical analysis

Parsing

Token stream

Abstract syntax tree(AST)

Semantic Analysis

if (b == 0) a = b;

if ( b ) a = b ;0==

if==

b 0

=

a b

if

==

int b int 0

=

int alvalue

int b

boolean

Decorated ASTint

;

;

27

Intermediate Code Generation

Optimization

Code generation

if

==

int b int 0

=

int alvalue

int b

boolean int;

CJUMP ==

MEM

fp 8

+

CONST MOVE

0 MEM MEM

fp 4 fp 8

NOP

+ +

CJUMP ==

CONST MOVE

0 DX CX

NOPCXCMP CX, 0CMOVZ DX,CX

Compiler techniques are everywhereParsing (little languages, interpreters)Database enginesAI: domain-specific languagesText processing

Tex/LaTex -> dvi -> Postscript -> pdfHardware: VHDL; model-checking toolsMathematics (Mathematica, Matlab)

28

Fascinating blend of theory and engineeringDirect applications of theory to practice

Parsing, scanning, static analysisSome very difficult problems (NP-hard or

worse)Resource allocation, “optimization”,

etc.Need to come up with good-enough

solutions

29

Ideas from many parts of CSEAI: Greedy algorithms, heuristic searchAlgorithms: graph algorithms, dynamic

programming, approximation algorithms

Theory: Grammars DFAs and PDAs, pattern matching, fixed-point algorithms

Systems: Allocation & naming, synchronization, locality

Architecture: pipelines & hierarchy management, instruction set use

30

program ::= statement | program statement

statement ::= assignStmt | ifStmt assignStmt ::= id = expr ; ifStmt ::= if ( expr ) stmt expr ::= id | int | expr + expr Id ::= a | b | c | i | j | k | n | x | y | z int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

31

There are several syntax notations for productions in common use; all mean the same thingifStmt ::= if ( expr ) stmtifStmt if ( expr ) stmt<ifStmt> ::= if ( <expr> ) <stmt>

32

a = 1 ; if ( a + 1 )

b = 2 ;

33

program ::= statement | program statementstatement ::= assignStmt | i fStmtassignStmt ::= id = expr ;i fStmt ::= if ( expr ) stmtexpr ::= id | int | expr + exprid ::= a | b | c | i | j | k | n | x | y | zint ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

program

program

ID(a) expr

stmt

stmt

assign

int (1)

ifStmt

expr stmt

expr expr

int (1)ID(a) ID(b) expr

assign

int (2)

34

Recommended