제 5 장 컴파일러 개요

5.1 서 론 5.2 컴파일러 일반적 구성 5.3 컴파일러 자동화 도구 5.4 어휘 분석 5.5 구문 분석

구문 분석 방법 구문 분석기의 출력 Top-down 방법

Recursive-descent 파서 LL 파서

Bottom-up 방법 Shift-reduce 구문 분석 LR 파서

제 5 장 컴파일러 개요

[1/29]

Compiler “A compiler is a computer program which translates programs

written in a particular high-level programming language into executable code for a specific target computer.”

ex) C compiler on SPARC C program 을 입력으로 받아 SPARC 에서 수행 가능한 코드를

출력한다 .

5.2 컴파일러 일반적 구성

[2/29]

SourcePrograms

CompilerObject

Programs(Assembly Language,Machine Language)

Compiler Structure

Front-End : language dependent part Back-End : machine dependent part

Front-End

Back-End

IC

SourcePrograms

ObjectPrograms

[3/29]

1.3 일반적인 컴파일러 구조

Lexical Analyzer

Syntax Analyzer

Intermediate Code Generator

Code Optimizer

Target Code Generator

Token

Tree

Intermediate Code

Optimized Code

SourcePrograms

ObjectPrograms

[4/29]

1. Lexical Analyzer(Scanner) 컴파일러 내부에서 효율적이며 다루기 쉬운 정수로 바꾸어 줌 .

ex) if ( a > 10 ) ...

Token : if ( a > 10 ) ...

Token Number : 32 7 4 25 5 8

Source Programs

Lexical Analyzer A sequence of tokens

[5/29]

2. Syntax Analyzer(Parser) 기능 : Syntax checking, Tree generation.

출력 : incorrect - error message 출력 correct - program structure (=> tree 형태 )

출력

ex) if (a > 10) a = 1;

if

> =

a 10 a 1

A sequence of tokens

Syntax Analyzer Error message or syntactic structure

Tree

Introduction to Compiler Design Theory [6/29]

3. Intermediate Code Generator Semantic checking Intermediate Code Generation

ex) if (a > 10) a = 1.0; ☞ a 가 정수일 때 semantic error ! ex) a = b + 1; Tree : =

a +

b 1

Ucode: lod 1 2 ldc 1 add str 1 1

- variable reference: (base, offset)

[7/29]

4. Code Optimizer Optional phase 비효율적인 code 를 구분해 내서 더 효율적인 code 로 바꾸어

준다 . Meaning of optimization

major part : improve running time minor part : reduce code size

ex) LDC R1, 1 LDC R1, 1 (x)

Criteria for optimization preserve the program meanings speed up on average be worth the effort

[8/29]

Local optimization local inspection 을 통하여 inefficient 한 code 들을 구분해 내서

좀 더 efficient 한 code 들로 바꾸는 방법 .

1. Constant folding 2. Eliminating redundant load, store instructions 3. Algebraic simplification 4. Strength reduction

Global optimization flow analysis technique 을 이용

1. Common subexpression 2. Moving loop invariants

3. Removing unreachable codes

[9/29]

5. Target Code Generator 중간 코드로부터 machine instruction 을 생성한다 .

Code generator tasks1. instruction selection & generation2. register management3. storage allocation4. code optimization (Machine-dependent optimization)

Intermediate Code

TargetCode Generator Target

Code

[10/29]

6. Error Recovery Error recovery - error 가 다른 문장에 영향을 미치지 않도록 수정하는 것

Error repair - error 가 발생하면 복구해 주는 것

Error Handling Error detection Error recovery Error reporting Error repair

Error Syntax Error Semantic Error Run-time Error

[11/29]

Lexical analyzer

Semantic analyzer

Syntax analyzer

Code generator

Code optimizer

Intermediate code generatorposition := initial + rate * 60

id1 := id2 + id3 * 60

:=id1 +

id2 *

id3 60

:=id1 +

id2 *

id3

60

inttoreal

temp1 := inttoreal(60)temp2 := id3 * temp1temp3 := id2 + temp2id1 := temp3

temp1 := id3 * 60.0id1 := id2 + temp1

MOVF id3, R2MOVF #60.0, R2MOVF id2, R2ADDF R2, R1MOVF R1, id1

position ...initial …rate ...

1234

Symbol Table

((( 예제 ) a statement [12/29]

Compiler Generating Tools(= Compiler-Compiler, Translator Writing System)

Language 와 machine 이 발달할 수록 많은 compiler 가 필요 . 새로운 언어를 개발하는 이유 : 컴퓨터의 응용 분야가 넓어지므로 .

N 개의 language 를 M 개의 컴퓨터에서 구현하려면 N*M 개의 컴파일러가 필요 .

ex) 2 개의 language : C, Java 3 개의 Machine : IBM, SPARC, Pentium

C-to-IBM, C-to-SPARC, C-to-Pentium Java-to-IBM, Java-to-SPARC, Java-to-Pentium

5.3 컴파일러 자동화 도구

[13/29]

Compiler-compiler Model

Language description 은 grammar theory 를 이용하고 있으나 , Machine description 은 정형화가 이루어져 있지 않은 상태임 .

HDL : Hardware Description Language Computer Architecture 를 design 하는 데 사용 .

Machine architecture 와 programming language 의 발전에 따라 automatic compiler generation 이 연구됨 .

Compiler -Compiler

Program written in L

Compiler

Executable form on M

Language Description : L

Machine Description : M

[14/29]

1. LEX : 1975 년에 M. E. Lesk 가 고안 . 입력 스트림에서 정규표현으로 기술된 토큰들을 찾아내는

프로그램을 작성하는데 유용한 도구 .

LEXRegular Expression

+Action Code

Lexical Analyzer(lex.yy.c)

Token StreamSource

Program

[15/29]

2. Parser Generator(PGS: Parser Generating System)

(1) Stanford PGS John Hennessy 파스칼 언어로 쓰여 있음 : 5000 lines 특징 : 구문 구조를 AST 형태로 얻음 . Output : Abstract Syntax Tree(AST) 의 정보를 포함한 파싱 테이블을 출력 .

PGSGrammar

Description

Input Program

Parsing Table

ParserOutput(program structures)

[16/29]

(2) Wisconsin PGS C.N. Fisher 파스칼 언어로 쓰여 있음 .: 10000 lines 특징 : error recovery

(3) YACC(Yet Another Compiler Compiler) UNIX 에서 수행 . C language 로 쓰여 있음 .

YACCLEX

lex.yy.c y.tab.cSource Program

Result by Action Code

Regular Expression+

Action Code

<Lexical Analysis> <Syntax Analysis>

Grammar Rule+

Action Code

[17/29]

3. Automatic Code Generation

Three aspects1. Machine Description : ISP, ISPS, HDL2. Intermediate language3. Code generating algorithm

CGAPattern matching code generationTable driven code generation

Code-Generator Generator

Table

Code Generator

Machine Description

Intermediate Code

Target Code

[18/29]

4. Compiler Compiler System

(1) PQCC(Production Quality Compiler Compiler System) W.A. Wulf(Carnegie-Mellon University) input 으로 language description 과 target machine descrip-

tion 을 받아 PQC(Production Quality Compiler) 와 table 이 output 됨 .

중간 언어로 tree 구조인 TCOL 을 사용 . Pattern Matching Code Generation 에 의해 code 를 생성함 .

(2) ACK(Amsterdam Compiler Kit)

Vrije 대학의 Andrew S. Tanenbaum 을 중심으로 개발된 Com-piler 의 Back-End 자동화 도구 .

UNCOL 개념에서 출발 (N*M=>N+M). EM 이라는 Abstract Machine Code 를 중간 언어로 사용 . Portable Compiler 를 만들기에 편리 .

[19/29]

PQCC Model

Language Description+

Machine Description

PQCC

Table

PQC Object Code

Front-EndTCOL

Source Program

[20/29]

ACK Model

Front-End Back-EndEM

FORTRANALGOLPASCAL

CADA

Interpreter

Intel 8080/8086/80386Motorola 6800/6809/

68000/68020Zilog Z80/Z8000VAXSPARC

Source Program

Object Code

C:\...Result

[21/29]

SourceProgram Lexical Analyzer Token

Stream

5.4 어휘 분석

Lexical Analysis the process by which the compiler groups certain

strings of characters into individual tokens.

Lexical Analyzer Scanner Lexer

[22/39]

Token 문법적으로 의미 있는 최소 단위

Token - a single syntactic entity(terminal symbol). Token Number - string 처리의 효율성 위한 integer number. Token Value - numeric value or string value.

ex) if ( a > 10 ) ...

Token Number : 32 7 4 25 5 8 Token Value : 0 0 ‘a’ 0 10 0

[23/39]

Token classes Special form - language designer

1. Keyword --- const, else, if, int, ...2. Operator symbols --- +, -, *, /, ++, -- etc.3. Delimiters --- ;, ,, (, ), [, ] etc.

General form - programmer4. identifier --- stk, ptr, sum, ...5. constant --- 526, 3.0, 0.1234e-10, ‘c’, “string” etc.

Token Structure - represented by regular expres-sion.

ex) id = (l + _)( l + d + _)*

[24/39]

Symbol table 의 용도 L.A 와 S.A 시 identifier 에 관한 정보를 수집하여 저장 . Semantic analysis 와 Code generation 시에 사용 . name + attributes

ex) Hashed symbol table

- chapter 12 참조

attributesname

symbol tablebucket

[25/39]

Text p.134

5.4.2 토큰 인식

Specification of token structure - RE Specification of PL - CFG Scanner design steps

1. describe the structure of tokens in re.2. or, directly design a transition diagram for the tokens.3. and program a scanner according to the diagram.4. moreover, we verify the scanner action through regular

language theory. Character classification

letter : a | b | c... | z | A | B | C |…| Z l digit : 0 | 1 | 2... | 9 d special character : + | - | * | / | . | , | ...

[26/39]

S Astartl, _

l, d, _

4.2.1 Identifier Recognition

Transition diagram

Regular grammar S lA | _A A lA | dA | _A | ε

Regular expression S = lA + _A = (l + _)A A = lA + dA + _A + ε = (l + d + _)A + ε = (l + d + _)*

S = (l + _)( l + d + _)*

[27/39]

Form : 10 진수 , 8 진수 , 16 진수로 구분되어진다 . 10 진수 : 0 이 아닌 수 시작

8 진수 : 0 으로 시작 , 16 진수 : 0x, 0X 로 시작

Transition diagram

4.2.2 Integer number Recognition

S An

D

start

B C

E

0o

x, Xh

o

h

d

n : non-zero digito : octal digit h : hexa digit

[28/39]

Regular grammar S nA | 0B A dA | ε B oC | xD | XD | ε C oC | ε D hE E hE | ε

Regular expression E = hE + ε = h*ε = h* D = hE = hh* = h+

C = oC + ε = o* B = oC + xD + XD + ε = o+ + (x + X)D = o+ + (x + X)h+

+ ε A = dA + ε = d*

S = nA + 0B = nd* + 0(o+ + (x + X)h+ + ε) = nd* + 0 + 0o+ + 0(x + X)h+

∴ S = nd* + 0 + 0o+ + 0(x + X)h+[29/39]

5.5 구문 분석

구문 분석 방법5.5.1

구문 분석기의 출력5.5.2

Top-down 방법5.5.3

Bottom-up 방법5.5.4

[30/28]

6.1 구문 분석 방법

How to check whether an input string is a sentence of a grammar and how to construct a parse tree for the string.

A Parser for grammar G is a program that takes as in-put a string ω and produces as output either a parse tree(or derivation tree) for ω, if ω is a sentence of G, or an error message indicating that ω is not sentence of G.

? Parsing : ∈L(G)

ParserA sequence of tokens

Correct sentence : Parse tree

Incorrect sentence : Error message

[31/28]

Two basic types of parsers for context-free grammars

① Top down - starting with the root and working down to the

leaves. recursive descent parser, predictive parser.

② Bottom up - beginning at the leaves and working up the root.

precedence parser, shift-reduce parser.

ex) A → XYZ A

reduce expand bottom-up X Y Z top-down

“start symbol 로” “ sentence 로”[32/28]

5.5.2 구문 분석기의 출력

The output of a parser:

① Parse - left parse, right parse ② Parse tree ③ Abstract syntax tree

ex) G : 1. E → E + T string : a + a * a 2. E → T 3. T → T * F 4. T → F 5. F →(E)

6. F → a

[33/28]

left parse : a sequence of production rule numbers applied

in leftmost derivation. E E + T T + T F + T

a + T a + T * F a + F * F a + a * F a + a * a

∴ 1 2 4 6 3 4 6 6

right parse : reverse order of production rule num-bers

applied in rightmost derivation. E E + T E + T * F E + T * a

E + F * a E + a * a T + a * a F + a * a a + a * a

∴ 6 4 2 6 4 6 3 1

1 2 4

6 3 4

6 6

1 3 6

4 6 2

4 6

[34/28]

parse tree : derivation tree E E + T T T * F F F a

a a

string : a + a * a

[35/28]

add

a mula a

Abstract Syntax Tree(AST) ::= a transformed parse tree that is a more efficient representation of the source program.

leaf node - operand(identifier or constant) internal node - operator(meaningful production rule name)

ex) G: 1. E → E + T add 2. E → T 3. T → T * F mul 4. T → F 5. F → (E) 6. F → a string : a + a * a

[36/28]

IF_ST

GT ASSIGN_OP ASSIGN_OP

a b a ADD a SUB

b 1 b 2

※ 의미 있는 terminal terminal node

의미 있는 production rule nonterminal node

→ naming : compiler designer 가 지정 .

ex) if (a > b) a = b + 1; else a = b – 2;

[37/28]

5.5.3 Top-Down 방법

::= Beginning with the start symbol of the grammar, it attempts to produce a string of terminal symbol that is identical to a given source string. This matching process proceeds by successively ap-plying the productions of the grammar to produce substrings from nonterminals.

::= In the terminology of trees, this is moving from the root of the tree to a set of leaves in the parse tree for a program.

Top-Down parsing methods (1) Parsing with backup or backtracking. (2) Parsing with limited or partial backup. (3) Parsing with nobacktracking.

backtracking : making repeated scans of the input.

[38/28]

General Top-Down Parsing method called a brute-force method with backtracking ( Top-Down parsing with full backup )

1. Given a particular nonterminal that is to be expanded, the first production for this nonterminal is applied. 2. Compare the newly expanded string with the input string. In the matching process, terminal symbol is compared with an input symbol is selected for expansion and its first production is applied. 3. If the generated string does not match the input string, an incorrect expan-

sion occurs. In the case of such an incorrect expansion this process is backed up

by undoing the most recently applied production. And the next production of this nonterminal is used as next expansion. 4. This process continues either until the generated string becomes an input

string or until there are no further productions to be tried. In the latter case, the

given string cannot be generated from the grammar.

[39/28]

Several problems with top-down parsing method

left recursion A nonterminal A is left recursive if A Aα for some α. A grammar G is left recursive if it has a left-recursive non-

terminal. ⇒ A left-recursive grammar can cause a top down parser to go into an infinite loop. ∴ eliminate the left recursion.

Backtracking the repeated scanning of input string. the speed of parsing is much slower. (very time consuming)

⇒ the conditions for nobacktracking : FIRST, FOLLOW을

이용하여 formal 하게 정의 .

Syntax Analysis [40/28]

Elimination of left recursion

direct left-recursion : A → Aα P∈ indirect left-recursion : A Aα

general form : A → Aα ┃ A = Aα +

= α*

introducing new nonterminal A’ which generates α*.

==> A → A' A' → αA' ε┃

+

[41/28]

ex) E → E + T | T T → T F | F F → (E) | a

E E(+T)* T(+T)*

| | E' E' → +TE' |

※ E → TE' E' → +TE' |

general method :A → Aα1┃Aα2┃ ... ┃Aαm┃β1┃β2┃... ┃βn

==> A → β1 A' | β2 A' | ... | βn A' A' → α1A' | α2 A' | ... | αm A' |

*

[42/28]

Left-factoring

if A → | are two A-productions and the input begins with a non-empty string derived from , we do not know whether to expand A to or to .

==> left-factoring : the process of factoring out the com-mon

prefixes of alternates.

method : A → | ==> A → (|) ==> A → A', A' → |

ex) S → iCtS | iCtSeS | a C → b

[43/28]

S → iCtS | iCtSeS | a→ iCtS( | eS) | a

∴ S → iCtSS' | a S' → | eS C → b

No-backtracking ::= deterministic selection of the production rule to be applied.

[44/28]

5.5.4 Bottom-up 방법

S

A

A B

a b b c d e

::= Reducing a given string to the start symbol of the grammar.

::= It attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working up towards the root(the top).

ex) G: S → aAcBe string : abbcde A → Ab | b B → d

[45/28]

[Def 3.1] reduce : the replacement of the right side of a production with the left side.

S , A → ∈ P S A [Def 3.2] handle : If S A , then is a handle of .

[Def 3.3] handle pruning : S r0 r1 ... rn-1 rn

rn-1 rn-2 ... S

“ reduce sequence ”

ex) G : S → bAe ω : b a ; a e A → a;A | a

Reduce

rm

*rm*rm*

rm

rm

rm

rm

rm

rm= = = =

=

[46/28]

::= a bottom-up style of parsing.

Two problems for automatic parsing1. How to find a handle in a right sentential form.

2. What production to choose in case there is more than

one production with the same right hand side.

====> grammar 의 종류에 따라 방법이 결정되지만 handle 를 유지하기 위하여 stack 을 사용한다 .

Shift-Reduce Parsing

[47/28]

Four actions of a shift-reduce parser

“Stack top 과 current input symbol 에 따라 파싱 테이블을 참조해서 action 을 결정 .”

1. shift : the next input symbol is shifted to the top of the stack. 2. reduce : the handle is reduced to the left side of production. 3. accept : the parser announces successful completion of parsing. 4. error : the parser discovers that a syntax error has occurred and calls an error recovery routine.

Shift-Reduce Parser

Parsing Table

$ : input

output

stack

Sn

.

.

.

$

[48/28]

ex) G: E →E + T | T string : a + a a T →T F | F F → (E) | a

STACK INPUT ACTION -------------- ------------------ --------------------- (1) $ a + a a $ shift a (2) $a + a a $ reduce F → a (3) $F + a a $ reduce T → F (4) $T + a a $ reduce E → T (5) $E + a a $ shift + (6) $E + a a $ shift a (7) $E + a a $ reduce F → a (8) $E + F a $ reduce T → F (9) $E + T a $ shift (10) $E + T a $ shift a (11) $E + T a $ reduce F → a (12) $E + T F $ reduce T → T * F (13) $E + T $ reduce E → E + T (14) $E $ accept

[49/28]

<< Thinking points >>

1. the handle will always eventually appear on top of the stack, never inside.

∵ rightmost derivation in reverse.

stack 에 있는 contents 와 input 에 남아 있는 string 이 합해져서 right sentential form 을 이룬다 . 따라서 항상 stack 의 top 부분이 reduce 된다 .

2. How to make a parsing table for a given grammar. → 문법의 종류에 따라 Parsing table 을 만드는 방법이 다르다 .

SLR(Simple LR) LALR(LookAhead LR) CLR(Canonical LR)

[50/28]

Constructing a Parse tree

1. shift : create a terminal node labeled the shifted symbol. 2. reduce : A → X1X2...Xn.

(1) A new node labeled A is created. (2) The X1X2...Xn are made direct descendants of the new node. (3) If A → ε, then the parser merely creates a node labeled A with no descendants.

ex) G : 1. LIST → LIST , ELEMENT 2. LIST → ELEMENT 3. ELEMENT → a

string : a , a

[51/28]

Step

STACK INPUT ACTION PARSETREE

(1) $ a,a$ shift a Build Node(2) $a ,a$ reduce 3 Build Tree(3) $ELEMENT ,a$ reduce 2 Build Tree(4) $LIST ,a$ shift , Build Node(5) $LIST , a$ shift a Build Node(6) $LIST , a $ reduce 3 Build Tree(7) $LIST , ELE-

MENT$ reduce 1 Build Tree

(8) $LIST $ accept return that tree

$

ELEMENT

ELEMENT

LIST

LIST

, aa

[52/28]

list

a a

LR Parser an efficient Bottom-up parser for a large and useful class

of context-free grammars. the “L” stands for left-to-right scan of the input;

the “R” for constructing a Rightmost derivation in re-verse.

The attractive reasons of LR parsers(1) LR parsers can be constructed for most programming lan-

guages.(2) LR parsing method is more general than LL parsing

method.(3) LR parsers can detect syntactic errors as soon as possi-

ble.But,

it is too much work to implement an LR parser by hand for a typical programming-language grammar.

=====> Parser Generator[53/60]

Parser Generating Systems

Grammar<BNF Notations>

PGS Parsing Table

ParsingTableInput Output

DriverRoutine

The driver routine is the same for all LR parsers; only the parsing table changes from one parser to another.

Three Methods

CLRLALR

SLR

The techniques for producing LR parsing tables Simple LR(SLR) - LR(0) items, FOLLOW Canonical LR(CLR) - LR(1) items Lookahead LR(LALR) - ① LR(1) items ② LR(0), Lookahead

LR Parser 의 구조 [1/3]

Sm

a1

stack

ParsingTable

DriverRoutine

… ai … an $ : input LR parser

Stack : S0X1S1X2 ••• XmSm, where Si : state and Xi V. Configuration of an LR parser : (S0X1S1 ••• XmSm, aiai+1 ••• an$)

stack contents unscanned input

symbolsstates <Terminals> <Nonterminals>

… … …

ACTION Table GOTO Table

LR Parsing Table (ACTION table + GOTO table)

The LR parsing algorithm::= same as the shift-reduce parsing algorithm. Four Actions :

shift reduce accept error


1. ACTION[Sm,ai] = shift S ::= (S0X1S1 XmSm, aiai+1 an$) (S0X1S1 XmSmaiS, ai+1 an$)

2. ACTION[Sm,ai] = reduce A α and |α| = r ::= (S0X1S1 XmSm, aiai+1 an$) (S0X1S1 Xm-rSm-r, aiai+1 an$), GOTO(Sm-r , A) = S (S0X1S1 Xm-rSm-rAS, aiai+1 an$)

3. ACTION [Sm,ai] = accept, parsing is completed.

4. ACTION [Sm,ai] = error, the parser has discovered an errorand calls an error recovery routine.


LR 파싱 예제

5 r1 r1

4 5s3

3 r3 r3

2 r2 r2

1 s4 acc

0 1 2s3

symbolsstates

LIST ELEMENTa , $

G: 1. LIST LIST , ELEMENT 2. LIST ELEMENT 3. ELEMENT a

Parsing Table : ( 이 파싱테이블 이용하여 a,a 의 파싱과정 보이기 )

where,sj means shift and stack state j,ri means reduce by production numbered i,acc means accept, and blank means error.

구문 분석기의 작성

Driver Routine

Token stream

Result of parsing

Parsing table

PGSGrammar

Parser Generating Sys-tem

Documents

제 5 장 컴파일러 개요