115
1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Jing-Shin Chang Department of Computer Science Department of Computer Science & Information Engineering & Information Engineering National Chi-Nan University National Chi-Nan University

1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

Embed Size (px)

Citation preview

Page 1: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

1

Compilers: Principles, Techniques, and Tools

Jing-Shin ChangJing-Shin Chang

Department of Computer Science & Department of Computer Science & Information EngineeringInformation Engineering

National Chi-Nan UniversityNational Chi-Nan University

Page 2: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

2

What is a Compiler? Why? Applications?What is a Compiler? Why? Applications?

How to Write a Compiler by Hands?How to Write a Compiler by Hands?

Theories and Principles behind compiler Theories and Principles behind compiler construction - Parsing, Translation & construction - Parsing, Translation & CompilingCompiling

Techniques for Efficient ParsingTechniques for Efficient Parsing

How to Write a Compiler with ToolsHow to Write a Compiler with Tools

Goals

Page 3: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

3

1. Introduction: What, Why & Apps1. Introduction: What, Why & Apps

2. How: A Simple Compiler2. How: A Simple Compiler

- What is A Better & Typical Compiler- What is A Better & Typical Compiler

3. Lexical Analysis:3. Lexical Analysis:- Regular Expression and Scanner- Regular Expression and Scanner

4. Syntax Analysis:4. Syntax Analysis:- Grammars and Parsing- Grammars and Parsing

5. Top-Down Parsing: LL(1)5. Top-Down Parsing: LL(1)

6. Bottom-Up Parsing: LR(1)6. Bottom-Up Parsing: LR(1)

Table of Contents

Page 4: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

4

7. Syntax-Directed Translation7. Syntax-Directed Translation

8. Semantic Processing8. Semantic Processing

9. Symbol Tables9. Symbol Tables

10. Run-time Storage Organization10. Run-time Storage Organization

Table of Contents

Page 5: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

5

11. Translation of Special Structures11. Translation of Special Structures

*. Modular Program Structures*. Modular Program Structures*. Declarations*. Declarations

*. Expressions and Data Structure *. Expressions and Data Structure ReferencesReferences

*. Control Structures*. Control Structures

*. Procedures and Functions*. Procedures and Functions

12. General Translation Scheme:12. General Translation Scheme:- Attribute Grammars- Attribute Grammars

Table of Contents

Page 6: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

6

13. Code Generation13. Code Generation

14. Global Optimization14. Global Optimization

15. Tools: Compiler Compiler15. Tools: Compiler Compiler

Table of Contents

Page 7: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

7

What is A Compiler?

- Functional blocksFunctional blocks- Forms of compilersForms of compilers

Page 8: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

8

The Compiler

What is a compiler?What is a compiler? A program for translating programming A program for translating programming

languages into machine languageslanguages into machine languages source language => target languagesource language => target language

Why compilers?Why compilers? Filling the gaps between a programmer and the Filling the gaps between a programmer and the

computer hardwarecomputer hardware

Page 9: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

9

Compiler: A Bridge Between PL and Hardware

Operating System

Hardware (Low Level Language)

Compiler

Applications (High Level Language) A := B + C * D

MOV A, CMUL A, DADD A, BMOV va, A

Assembly CodesRegister-based orStack-based machines

Page 10: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

10

Typical Machine Instructions –Register-based Machines Data TransferData Transfer

MOV A, BMOV A, B MOV A, [mem]MOV A, [mem] More: IN/OUT, Push, Pop, ...More: IN/OUT, Push, Pop, ...

Arithmetic OperationArithmetic Operation ADD A, CADD A, C // A := A + C// A := A + C MUL A, DMUL A, D // A := A * D// A := A * D More: ADC, SUB, SBB, INC …More: ADC, SUB, SBB, INC …

Logical OperationLogical Operation AND A, 00001111BAND A, 00001111B // A := A & 00001111B// A := A & 00001111B More: OR, NOT, XOR, Shift, RotateMore: OR, NOT, XOR, Shift, Rotate

Program ControlProgram Control JMP, JZ, JNZ, Call, …JMP, JZ, JNZ, Call, …

Low Level Instructions Features:Low Level Instructions Features: Mostly Simple Mostly Simple BinaryBinary Operators (using Operators (using sourcesource & & targettarget operands) operands)

AA

BB CC

DD EE

HH LL

Registers of an Intel 8085 processor

Page 11: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

11

Typical Machine Instructions – Stack-based Machines Data TransferData Transfer

Push APush A // SP++; *(SP) := A// SP++; *(SP) := A Push [mem]Push [mem] // SP++; *(SP) := [mem]// SP++; *(SP) := [mem] DupDup // *(SP+1) := *(SP) ; SP++// *(SP+1) := *(SP) ; SP++ Pop [mem]Pop [mem] // *[mem] := *(SP); SP--// *[mem] := *(SP); SP--

Arithmetic OperationArithmetic Operation ADDADD // *(SP-1) := *(SP) + *(SP-1); SP--// *(SP-1) := *(SP) + *(SP-1); SP-- MULMUL // *(SP-1) := *(SP) x *(SP-1); SP--// *(SP-1) := *(SP) x *(SP-1); SP--

Logical Operation …Logical Operation … Program Control …Program Control … Low Level Instructions Features:Low Level Instructions Features:

Mostly Simple Mostly Simple BinaryBinary Operators Operators Operations are applied to the Operations are applied to the topmosttopmost 22 sourcesource operands operands

return results to new stack top (return results to new stack top (destinationdestination operand) operand) Almost no general purpose registersAlmost no general purpose registers

SPSP *SP*SP

SP-1SP-1

……

Page 12: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

12

Compiler (1) - Compilation

CompilerSourceProgram/Code

(P.L., Formal Spec.)

TargetProgram/Code

(P.L., Assembly,Machine Code)

Error Message

A := B + C * D

MOV A, CMUL A, DADD A, BMOV va, A

Page 13: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

16

Compiler (2a) – Execution

Target CodeInput Output

Running the compiled codes

(in Real Machine)

Loader

Target code(compiled)

(load into Real Machine)

Page 14: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

17

Compiler (2b) – Compile & Go

Two working phases in two passes

OutputInput

CompilerSource

Program Error Message

Target Code

(in Real Machine)Compiler: Two independent phases to complete the work- (1) Compilation Phase: Source to Target compilation- (2) Execution Phase: run compiled codes & respond to input & produce output

Page 15: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

18

Compiler (2c) – compile & go

Two working phases in two passes

Compiler(+Loader)

Source program (& executable Target code)

OutputInput

(target loaded into Real Machine)

Compiler: Two independent phases to complete the work- (1) Compilation Phase: Source to Target compilation- (2) Execution Phase: run compiled codes & respond to input & produce output

Page 16: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

19

Interpreter (1)

Interpreter

Source program

OutputInput

Interpreter: One single pass to complete the two-phases work- Each source statement is Compiled and Executed subsequently- The next statement is then handled in the same way

Error Message

Page 17: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

20

Interpreter (2)

Compile and then execute for each Compile and then execute for each incoming statementsincoming statements Do not save compiled codes in executable filesDo not save compiled codes in executable files

Save Save storagestorage

Re-compile the same statements if loop backRe-compile the same statements if loop back SlowerSlower

Detect (Detect (compilationcompilation & & runtimeruntime) ) errorserrors as one as one occurs during the occurs during the executionexecution time time

CompilerCompiler: Detect syntax/semantic errors : Detect syntax/semantic errors (“compilation errors”) during (“compilation errors”) during compilationcompilation time time

Page 18: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

21

Hybrid: Compiler + Interpreter?

Interpreter+

Intermediate program

OutputInput

Source program

Compiler

(with/without JIT)

Error Message

Page 19: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

22

Hybrid: Compiler + Interpreter?

Interpreter+

Intermediate program

OutputInput

Source program

Compiler

(with/without JIT)

Intermediate program:- without syntax/semantic errors- machine independentInterpreter:- do not interpret high level source- but compiled low level code- easy to interpret + efficient

Page 20: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

23

Hybrid Method & Virtual Machine

Virtual Machine(VM)

Intermediate program

OutputInput

Source program

Translator

(Interpreter with/without JIT)

(Compiler)

Page 21: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

24

Example: Java Compiler & Java VM

JavaVirtual Machine

Java Bytecodes

OutputInput

Java program

Java Compiler

(Interpreter with/without JIT)

(app.java)

(app.class)

(Javac)

Page 22: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

25

Hybrid Method & Virtual Machine

Compile source program into a Compile source program into a platformplatform indindependentependent code code E.g., Java => Bytecodes (E.g., Java => Bytecodes (stack-basedstack-based instructio instructio

ns)ns) Execute the code with a virtual machineExecute the code with a virtual machine

High High portabilityportability: The platform independent cod: The platform independent code can be distributed on the web, downloaded ane can be distributed on the web, downloaded and executed in any platform that had VM pre-insd executed in any platform that had VM pre-installedtalled

Good for Good for cross-platformcross-platform applications applications

Page 23: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

26

Just-in-time (JIT) Compilation

Compile a new statement (only Compile a new statement (only onceonce) as it comes f) as it comes for the first timeor the first time And And savesave the compiled codes the compiled codes Executed by virtual/real machineExecuted by virtual/real machine Do not re-compile as it loop backDo not re-compile as it loop back

Example:Example: Java VM (simple Interpreter version, without JIT): high Java VM (simple Interpreter version, without JIT): high

penalty in penalty in performanceperformance due to interpretation due to interpretation Java VM Java VM + JIT+ JIT: improved by the order of a factor of : improved by the order of a factor of 1010

JIT: JIT: translatetranslate bytecodes during run time to the native target ma bytecodes during run time to the native target machine instruction setchine instruction set

Page 24: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

27

Comparison of Different Compilation-and-Go Schemes Normal CompilersNormal Compilers

Will generate codes for Will generate codes for allall statements whether they will be statements whether they will be executed or notexecuted or not

Separate the Separate the compilationcompilation phase and phase and executionexecution phase into two phase into two different phrasesdifferent phrases

Syntax & semantic Syntax & semantic errorserrors are detected at are detected at compilationcompilation time time Interpreters and JIT CompilersInterpreters and JIT Compilers

Can generate codes only for statements that are really executedCan generate codes only for statements that are really executed Will depend on your input – different Will depend on your input – different execution flowsexecution flows mean different mean different

sets of executed codessets of executed codes InterpreterInterpreter: Syntax & semantic : Syntax & semantic errorserrors are detected at are detected at run/executionrun/execution

timetime JIT vs. Simple InterpreterJIT vs. Simple Interpreter

JITJIT: save the target machine codes: save the target machine codes• Can be Can be re-usedre-used, and compiled at most once, and compiled at most once

InterpreterInterpreter: do not save target machine codes: do not save target machine codes• Compiled more than onceCompiled more than once

Page 25: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

28

Register-Based Virtual Machine for Android Phone – Dalvik VM

Java VM (JVM) – Stack-based Java VM (JVM) – Stack-based Instruction SetInstruction Set Normally less efficient than RISC or Normally less efficient than RISC or

CISC instructionsCISC instructions Limited memory organizationLimited memory organization Requires too many swap and copy Requires too many swap and copy

operationsoperations

Java Bytecodes(stack based)

Java Program

JavaVirtual Machine

JavaCompiler

Page 26: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

29

Register-Based Virtual Machine for Android Phone – Dalvik VM

Dalvik VM (for Android OS) – Register-based InDalvik VM (for Android OS) – Register-based Instruction Setstruction Set

SmallerSmaller size size Better memory Better memory efficiencyefficiency Good for phone and other embedded systemsGood for phone and other embedded systems

Generation and Execution of Generation and Execution of Dalvik byte codesDalvik byte codes Compiled/Translated from Java byte code into a new bCompiled/Translated from Java byte code into a new b

yte codeyte code app.java (Java source)app.java (Java source) =|| javac (Java Compiler)||=> app.class (executable by J=|| javac (Java Compiler)||=> app.class (executable by J

VM)VM) =|| =|| dxdx (in Android SDK tool) ||=> app.dex (Dalvik Exec (in Android SDK tool) ||=> app.dex (Dalvik Exec

utable)utable) =|| compression ||=> apps.apk (Android Application Pac=|| compression ||=> apps.apk (Android Application Pac

kage)kage) =|| Dalvik VM ||=> (execution)=|| Dalvik VM ||=> (execution)

Java Bytecodes(stack-based)

Java Program

dx(+compression)

JavaCompiler

Dalvik Bytecodes(register-based)

DalvikVirtual Machine

Page 27: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

30

How To Construct A Compiler

- Language Processing SystemsLanguage Processing Systems- High-Level and Intermediate LanguagesHigh-Level and Intermediate Languages- Processing PhasesProcessing Phases- Quick Review on Syntax & SemanticsQuick Review on Syntax & Semantics- Processing Phases in DetailProcessing Phases in Detail- Structure of CompilersStructure of Compilers

Page 28: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

31

Source Program

A la

ngu

age-

Pro

cess

ing

Sys

tem

Preprocessor

Modified Source Program

Compiler

Target Assembly Program

Assembler

Relocatable Machine Code

Target Machine Code

Library filesand/or

Relocatable object filesLinker/Loader

Page 29: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

32

NaturalNatural languages: for communication between nati languages: for communication between native speakers of the same or different languagesve speakers of the same or different languages Chinese, English, French, JapaneseChinese, English, French, Japanese

ProgrammingProgramming languages: for communication betwee languages: for communication between programmers and computersn programmers and computers Generic High-Level Generic High-Level ProgrammingProgramming Languages: Languages:

Basic, Fortran, COBOL, Pascal, C/C++, JavaBasic, Fortran, COBOL, Pascal, C/C++, Java TypesettingTypesetting Languages: Languages:

TROFF (+TBL, EQN, PIC), La/Tex, PostScript TROFF (+TBL, EQN, PIC), La/Tex, PostScript MarkupMarkup Language -- Structured Documents: Language -- Structured Documents:

SGML, HTML, XML, ...SGML, HTML, XML, ... ScriptScript Languages: Languages:

Csh, bsh, awk, perl, python, javascript, asp, jsp, phpCsh, bsh, awk, perl, python, javascript, asp, jsp, php

Programming Languages vs. Natural Languages

Page 30: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

33

Machine Independent Intermediate Instructions Low LevelLow Level Instructions Features: Instructions Features:

Mostly Simple Mostly Simple BinaryBinary Operators Operators Result is often save to Result is often save to AccumulatorAccumulator (A register) (A register) Not intuitive to programmersNot intuitive to programmers

IntermediateIntermediate instructions: instructions: 3 address codes3 address codes: (for register-based machines): (for register-based machines)

A := B + CA := B + C 2 source operands, one destination operand2 source operands, one destination operand Easy to map to machine instructions (share one source & Easy to map to machine instructions (share one source &

destination operand)destination operand)• A := A + BA := A + B

Stack machine codesStack machine codes: (for stack-based machines): (for stack-based machines)

Page 31: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

34

Compiler: A Bridge Between PL and Hardware

Compiler

Applications (High Level Language) A := B + C * D

T1 := C * DT2 := B + T1A := T2

Intermediate CodesOperating System

Hardware (Low Level Language)MOV A, CMUL A, DADD A, BMOV va, A

Assembly Codes

Register-based orStack-based machines

Page 32: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

35

Compiler: with Intermediate Codes

CompilerSourceProgram/Code

(P.L., Formal Spec.)

TargetProgram/Code

(P.L., Assembly,Machine Code)

Error Message

A := B + C * DT1 := C * DT2 := B + T1A := T2

MOV A, CMUL A, DADD A, BMOV va, A

Page 33: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

36

float position, initial, rateposition := initial + rate * 60

Typ

ical

Ph

ases

of

a C

omp

iler

lexical analyzer

id1 := id2 + id3 * 60

syntax analyzer

:=

id1 +

id2 *

id3 60

semantic analyzer

:=

id1 +

id2 *

id3 inttoreal

60

intermediate code generator

temp1 := inttoreal (60)temp2 := id3 * temp1temp3 := id2 + temp2Id1 := temp3

code optimizer

temp1 := id3 * 60.0 id1 := id2 + temp1

code generator

MOVF id3, R2MULF #60.0, R2MOVF id2, R1ADDF R2, R1MOVF R1, id1

Parse Treeor

Syntax Tree

Syntax Treeor

AnnotatedSyntax Tree

Tokens 3-addresscodes, or

Stack machinecodes

Assembly(or Machine)

Codes

Optimizedcodes

Page 34: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

37

Analysis-Synthesis Model of a Compiler AnalysisAnalysis : : ProgramProgram => Constituents => => Constituents => I.R.I.R.

LexicalLexical Analysis: linear => token Analysis: linear => token SyntaxSyntax Analysis: hierarchical, nested => tree Analysis: hierarchical, nested => tree

Identify Identify relations/actionsrelations/actions among tokens: e.g., among tokens: e.g., addadd(b, (b, multmult(c,d))(c,d)) SemanticSemantic Analysis: check legal Analysis: check legal constraintsconstraints / / meaningsmeanings

By examining By examining attributesattributes associated with tokens & relations associated with tokens & relations

SynthesisSynthesis: : I.R.I.R. => I.R.* => => I.R.* => TargetTarget Language Language IntermediateIntermediate Code Code GenerationGeneration

generategenerate intermediate representation (I.R.) intermediate representation (I.R.) fromfrom syntax syntax Code Code OptimizationOptimization: generate better equivalent IR: generate better equivalent IR

machine machine independentindependent + machine + machine dependentdependent CodeCode Generation Generation

Page 35: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

38

Typical Modules of a Compiler

TokensSyntax

Tree IRTargetCodeIR

SourceCode

Lexical

Analyzer

Syntax

Analyzer

Semantic

Analyzer

IntermediateCode

Generator

CodeOptimizer

CodeGenerator

Error

Handler

Symbol

Table

Literal

Table

AnnotatedTree

AnnotatedSyntax Tree

Page 36: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

float position, initial, rateposition := initial + rate * 60

Typ

ical

Ph

ases

of

a C

omp

iler

lexical analyzer

id1 := id2 + id3 * 60

syntax analyzer

:=

id1 +

id2 *

id3 60

semantic analyzer

:=

id1 +

id2 *

id3 inttoreal

60

intermediate code generator

temp1 := inttoreal (60)temp2 := id3 * temp1temp3 := id2 + temp2Id1 := temp3

code optimizer

temp1 := id3 * 60.0 id1 := id2 + temp1

code generator

MOVF id3, R2MULF #60.0, R2MOVF id2, R1ADDF R2, R1MOVF R1, id1

Parse Treeor

Syntax Tree

Syntax Treeor

AnnotatedSyntax Tree

Tokens 3-addresscodes, or

Stack machinecodes

Assembly(or Machine)

Codes

Optimizedcodes

Page 37: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

40

How To Construct A Compiler

- Language Processing SystemsLanguage Processing Systems- High-Level and Intermediate LanguagesHigh-Level and Intermediate Languages- Processing PhrasesProcessing Phrases- Quick Review on Syntax & SemanticsQuick Review on Syntax & Semantics- Processing Phrases in DetailProcessing Phrases in Detail- Structure of CompilersStructure of Compilers

Page 38: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

41

Syntax Analysis: Structure

Syntax Analysis

id1 := id2 + id3 * 60

id3 * 60

id2 + t

id1 := e

s

Parse Tree(Concrete syntax tree)

Grammar

Syntax Analysis (Parsing): match input tokens against a grammar of the language

To ensure that the input tokens form a legal sentence (statement)

To build the structure representation of the input tokens

So the structure can be used for translation (or code generation)

Knowledge source: Grammar in CFG (Context-

Free Grammar) form Additional semantic rules for

semantic checks and translation (in later phases)

S → id := eS → …e → id + te → …t → id * nt → …

Page 39: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

42

Grammar: Context Free Grammar

Page 40: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

43

Context Free Grammar (CFG):Specification for Structures & Constituency

Parse Tree: graphical representation of structure root node (S): a sentential level structure internal nodes: constituents of the sentence arcs: relationship between parent nodes and their children (constituents) terminal nodes: surface forms of the input symbols (e.g., words) alternative representation: bracketed notation:

e.g., [I saw [the [girl [in [the park]]]]]

Example:

PP

in

NP

NP

girl the park

NP

Page 41: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

44

Parse Tree: “I saw the girl in the park”

PP

in

NP

NP

girl the park

NP

I saw the

NP

NP

S

VP

vpron det n p det n

Page 42: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

45

CFG: Components

CFG: formal specification of parse trees G = {, N, P, S} : terminal symbols N: non-terminal symbols P: production rules S: start symbol

: terminal symbols the input symbols of the language

programming language: tokens (reserved words, variables, operators, …) natural languages: words or parts of speech

pre-terminal: parts of speech (when words are regarded as terminals) N: non-terminal symbols

groups of terminals and/or other non-terminals S: start symbol: the largest constituent of a parse tree P: production (re-writing) rules

form: α → β (α: non-terminal, β: string of terminals and non-terminals) meaning: α re-writes to (“consists of”, “derived into”)β, or βreduced to α start with “S-productions” (S → β)

Page 43: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

46

CFG: Example Grammar

Grammar Rules S → NP VP NP → Pron | Proper-Noun | Det Norm Norm → Noun Norm | Noun VP → Verb | Verb NP | Verb NP PP | Verb PP PP → Prep NP

S: sentence, NP: noun phrase, VP: verb phrase Pron: pronoun Det: determiner, Norm: Norminal PP: prepositional phrase, Prep: preposition

Lexicon (in CFG form) Noun → girl | park | desk Verb → like | want | is | saw | walk Prep → by | in | with | for Det → the | a | this | these Pron → I | you | he | she | him Proper-Noun → IBM | Microsoft | Berkeley

Page 44: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

47

Syntax vs. Semantic Analyses

Syntax:Syntax: How the input tokens How the input tokens looklook like? Do they form a legal like? Do they form a legal

structure?structure? Analysis of relationship between elementsAnalysis of relationship between elements

e.g., operator-operands relationshipe.g., operator-operands relationship

Semantic:Semantic: What they What they meanmean? And, thus, how they act?? And, thus, how they act? Analysis of detailed Analysis of detailed attributesattributes of elements and check of elements and check

constraints over them under the given syntaxconstraints over them under the given syntax Not all knowledge between elements can be conveniently Not all knowledge between elements can be conveniently

represented by a simple represented by a simple syntacticsyntactic structure. Various kinds of structure. Various kinds of attributesattributes are associated with sub-structures in the given syntax are associated with sub-structures in the given syntax

Page 45: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

48

semantic analyzer

:=

id1 +

id2 *

id3 inttoreal

id4

Syntax vs. Semantic Analyses Examples:Examples:

intint a, b, c ,d; a, b, c ,d; floatfloat f; f; charchar s1[], s2[] ; s1[], s2[] ; a = b + c * d ;a = b + c * d ; a = b + f * d ; // OK, but not strictly righta = b + f * d ; // OK, but not strictly right a = b + s1 * s2 ; // BAD: * is undefined for stringsa = b + s1 * s2 ; // BAD: * is undefined for strings a = b + s1 * 3 ; // OK? if properly defineda = b + s1 * 3 ; // OK? if properly defined

All the above statements have the same All the above statements have the same looklook Convenient to represent them with the same Convenient to represent them with the same syntacticsyntactic structure structure

((grammargrammar/production rules)/production rules) But But SemanticallySemantically … …

Not all of them are Not all of them are meaningful meaningful (?? string * string ??)(?? string * string ??)• You have to check their other You have to check their other attributesattributes for meanings for meanings

Not all meaningful statements will Not all meaningful statements will mean/actmean/act the same and have the same and have the same the same codes codes (*: int * int (*: int * int int * float int * float string * int) string * int)

• You have to generate different codes according to other You have to generate different codes according to other attributeattributess of the tokens, since instructions are limited of the tokens, since instructions are limited

• E.g., INT and FLOAT additions may use different machine instrE.g., INT and FLOAT additions may use different machine instructions, like ADD and ADDF respectively.uctions, like ADD and ADDF respectively.

:=

id1 +

id2 *

id3 id4

Page 46: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

49

Semantic Analysis: Attributes

:=id1

+

id2*

id3

60

i2r

Semantic Analysis

id3 * 60

id2 + t

id1 := e

s

Semantic checks

&abstraction

Syntax Tree(Abstract Syntax Tree)

Parse Tree(Concrete Syntax Tree)

:=id1

+

id2*

id3 60

Semantic RulesAssoc. withGrammar

Productions

Page 47: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

50

How To Construct A Compiler

- Language Processing SystemsLanguage Processing Systems- High-Level and Intermediate LanguagesHigh-Level and Intermediate Languages- Processing PhrasesProcessing Phrases- Quick Review on Syntax & SemanticsQuick Review on Syntax & Semantics- Processing Phrases in DetailProcessing Phrases in Detail- Structure of CompilersStructure of Compilers

Page 48: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

51

Symbol Table Management

SymbolsSymbols:: VariableVariable namesnames, , procedureprocedure names, names, constantconstant literals literals

(3.14159)(3.14159)

Symbol Table:Symbol Table: A record for each A record for each namename describing its describing its attributesattributes Managing Information about Managing Information about namesnames

VariableVariable attributes: attributes:• Type, register/storage allocated, scopeType, register/storage allocated, scope

ProcedureProcedure names: names:• Number and types of argumentsNumber and types of arguments

• Method of argument passingMethod of argument passing

– By value, address, referenceBy value, address, reference

Page 49: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

52

[1] Lexical Analysis: Tokenization

Lexical Analysis

final := initial + rate * 60[f := i + r * 60]

id1 := id2 + id3 * 60

I(+1p+sg) see (+ed) the girl (+s)[I(+1p+sg) see (+prs) the girl (+s)]

I saw the girls[I see the girls]

11 id1id1 ““final”final” floatfloat R2R2

22 id2id2 ““initial”initial” floatfloat R1R1

33 id3id3 ““rate”rate” floatfloat

44 const1const1 ““60”60” constconst 60.060.0

11 ““I”I” ““I”I” +1p+sg+1p+sg

22 ““see”see” ““saw”saw” +ed+ed

33 ““the”the” ““the”the”

44 ““girl”girl” ““girls”girls” +3p+pl+3p+pl +s+s

Both looks the same. So you want torepresent them with the same normalized

token string, and hide detailedfeatures as additional attributes.

Page 50: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

53

[2] Syntax Analysis: Structure

Syntax Analysis

id1 := id2 + id3 * 60I see (+ed) the girl (+s)

NP verb NP

Sentence

I see (+ed) the girl (+s)

id3 * 60

id2 + t

id1 := e

s

Parse Tree(Concrete syntax tree)

Normalized tokens havethe same parse/syntax tree

whether they were “see”/“saw”and “girl”/“girls”.

Grammar

Page 51: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

56

[3] Semantic Analysis: Attributes

:=id1

+

id2*

id3

60

i2r

Semantic Analysis

NP.subject verb NP.object

Sentence

I see (+ed) the girl (+s)

NP verb NP

Sentence

I see (+ed) the girl (+s) id3 * 60

id2 + t

id1 := e

s Semantic checks

&abstraction

Syntax Tree(Abstract Syntax Tree)

Parse Tree(Concrete Syntax Tree)

Semantic RulesAssoc. withGrammar

Productions

Page 52: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

58

[3] Semantic Analysis: Attributes

:=id1

+

id2*

id3

60

i2r

Semantic Analysis

id3 * 60

id2 + t

id1 := e

s

Semantic checks

&abstraction

Syntax Tree(Abstract Syntax Tree)

Parse Tree(Concrete Syntax Tree)

:=id1

+

id2*

id3 60

Semantic RulesAssoc. withGrammar

Productions

Page 53: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

60

Semantic Checking

subject verb object

sentence

I see (+ed) the girl (+s)

Semantic Constraints:Semantic Constraints: Agreement: (somewhat Agreement: (somewhat

syntactic)syntactic) Subject-Verb: I have, shSubject-Verb: I have, sh

e has/had, I do have, she e has/had, I do have, she does notdoes not

NP: Quantifier-noun: a bNP: Quantifier-noun: a book, two booksook, two books

Selectional Constraint:Selectional Constraint: Kill Kill Animate Animate Kiss Kiss Animate Animate

subject object

see (+ed)

I the girl (+s)

abstraction

Page 54: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

61

Semantic Checking

See[+ed](I, the girl[+s])

Kill/Kiss (John, the Stone)

(semantically meaningful)

(semantically meaningless unless the Stone refers to an animate entity)

subject verb object

sentence

I see (+ed) the girl (+s)

Semantic Constraints:Semantic Constraints: Agreement: (somewhat Agreement: (somewhat

syntactic)syntactic) Subject-Verb: I have, shSubject-Verb: I have, sh

e has/had, I do have, she e has/had, I do have, she does notdoes not

NP: Quantifier-noun: a bNP: Quantifier-noun: a book, two booksook, two books

Selectional Constraint:Selectional Constraint: Kill Kill Animate Animate Kiss Kiss Animate Animate

semantic checking

Page 55: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

62

Parse Tree vs. Syntax Tree ParseParse Tree: (aka Tree: (aka concreteconcrete syntax tree) syntax tree)

Tree Tree concreteconcrete representation drawn according to a representation drawn according to a grammargrammar For validating correctness of syntax of inputFor validating correctness of syntax of input For easy For easy parsingparsing (or fitting constraints of parsing algorithm) (or fitting constraints of parsing algorithm)

Normally constructed incrementally during parsingNormally constructed incrementally during parsing SyntaxSyntax Tree: (aka Tree: (aka abstractabstract syntax tree) syntax tree)

Tree Tree logicallogical representation that characterize the representation that characterize the abstractabstract relationshipsrelationships between constituents between constituents

For representing semantic relationships & semantic checkingFor representing semantic relationships & semantic checking Normalizing various parse trees of the same “Normalizing various parse trees of the same “meaningmeaning” (semantics)” (semantics) May ignore non-essential syntactic detailsMay ignore non-essential syntactic details

Not always the same as Not always the same as parseparse tree tree May be constructed in May be constructed in parallelparallel with the with the parseparse treetree during parsing during parsing

Or converted from parse tree after syntactic parsingOr converted from parse tree after syntactic parsing AnnotatedAnnotated Syntax Tree (AST) Syntax Tree (AST)

Syntax Tree with annotated Syntax Tree with annotated attributesattributes

Page 56: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

63

Parse Tree vs. Syntax Tree ParseParse Tree: (depend on Tree: (depend on grammargrammar))

Input: T + T + TInput: T + T + T G1: T ((+ T) (+ T) …)G1: T ((+ T) (+ T) …)

E → T R’E → T R’ R’ → + T R’R’ → + T R’ R’ → <null>R’ → <null>

G2: ((T) + T) + TG2: ((T) + T) + T E → E + TE → E + T E → TE → T

Syntax Tree:Syntax Tree: AbstractAbstract representation for syntax representation for syntax

defined by G1/G2defined by G1/G2 Use Use operationoperation as parent nodes and as parent nodes and

operandsoperands as children nodes as children nodes Operation-operandOperation-operand relationship: Easy relationship: Easy

for instruction selection in code for instruction selection in code generation (e.g., ADD R1, R2)generation (e.g., ADD R1, R2)

Parse Tree for G1

Parse Tree for G2

Syntax Tree (independent of G1 or G2)

Page 57: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

64

[4] Intermediate Code Generation

:=id1

+id2

*id3

60i2r

temp1 := i2r ( 60 )temp2 := id3 * temp1temp3 := id2 + temp2id1 := temp3

Intermediate Code Generation

See[+ed](I, the girl[+s])

logic form

Attribute evaluation

(assembly codes are attributes for code generation)

+anim

Action(+anim,+anim)

+anim

3-address codes

subject object

see (+ed)

I the girl (+s)

Page 58: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

66

Syntax-Directed Translation (1) Translation from input to target can be regarded as Translation from input to target can be regarded as

attributeattribute evaluationevaluation.. Evaluate attributes of each node, in a well defined order, Evaluate attributes of each node, in a well defined order,

based on the particular piece of sub-tree structure based on the particular piece of sub-tree structure (syntax) wherein the attributes are to be evaluated.(syntax) wherein the attributes are to be evaluated.

AttributesAttributes: the particular properties associated with : the particular properties associated with a tree node (a node may have many attributes)a tree node (a node may have many attributes) Abstract representation of the sub-tree rooted at that nodeAbstract representation of the sub-tree rooted at that node The attributes of the root node represent the particular The attributes of the root node represent the particular

properties of the whole input statement or sentence.properties of the whole input statement or sentence. E.g., E.g., valuevalue associated with a associated with a mathematicmathematic sub-expressionsub-expression E.g., E.g., machine codesmachine codes associated with a associated with a sub-expressionsub-expression E.g., language E.g., language translationtranslation associated with a associated with a sub-sentencesub-sentence

Page 59: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

67

Syntax-Directed Translation (2) SynthesisSynthesis Attributes: Attributes:

Attributes that can be evaluated based on the attributes of Attributes that can be evaluated based on the attributes of childrenchildren nodesnodes

E.g., value of math. expression can be acquired from the values oE.g., value of math. expression can be acquired from the values of sub-expressions (and the operators being applied)f sub-expressions (and the operators being applied)

a := b + c * da := b + c * d• (( a.val = b.val + tmp.val where tmp.val = c.val * d.val) a.val = b.val + tmp.val where tmp.val = c.val * d.val)

girls = girl + sgirls = girl + s• (( tr.girls = tr.girl + tr.s = tr.girls = tr.girl + tr.s = 女孩女孩 ++ 們們 女孩們 女孩們 ))

InheritedInherited Attributes: Attributes: Attributes evaluatable from Attributes evaluatable from parentparent and/or and/or siblingsibling nodesnodes

E.g., data E.g., data typetype of a variable can be acquired from its left-hand si of a variable can be acquired from its left-hand side type declaration or from the type of its left-hand side brotherde type declaration or from the type of its left-hand side brother

int a, b, c; (int a, b, c; ( a.type = INT & b.type = a.type & …) a.type = INT & b.type = a.type & …)

Page 60: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

68

Syntax-Directed Translation (3) Attribute Attribute evaluationevaluation orderorder::

Any order that can evaluate the attribute Any order that can evaluate the attribute AFTER all its AFTER all its dependentdependent attributesattributes are are evaluated will result in correct evaluation.evaluated will result in correct evaluation.

General: General: topologicaltopological orderorder Analyze the Analyze the dependencydependency between attributes and between attributes and

construct an attribute tree or forestconstruct an attribute tree or forest Evaluate the attribute of any leave node, and mark it Evaluate the attribute of any leave node, and mark it

as “evaluated”, thus logically remove it from the as “evaluated”, thus logically remove it from the attribute tree or forest attribute tree or forest

Repeat for any leave nodes that have not been Repeat for any leave nodes that have not been marked, until no unmarked nodemarked, until no unmarked node

Page 61: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

69

[5] Code Optimization[Normalization]

temp1 := i2r ( 60 )temp2 := id3 * temp1temp3 := id2 + temp2id1 := temp3

Code Optimization

temp1 := id3 * 60.0id1 := id2 + temp1

See[+ed](I, the girl[+s])

See[+ed](I, the girl[+s])

Was_Kill[+ed](Bill, John)

Kill[+ed](John, Bill)

Normalization into better equivalent

form (optional)

Unify passive/active

voices

Page 62: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

70

[6] Code Generation

Code Generation

temp1 := id3 * 60.0id1 := id2 + temp1

movf id3, r2mulf #60.0, r2movf id2, r1addf r2, r1movf r1, id1

Lexical: 看到 [ 了 ] ( 我 , 女孩 [ 們 ])

See[+ed](I, the girl[+s])

Structural: 我 看到 女孩 [ 們 ] [ 了 ]

Selection of usable codes

&order of codes

&Allocation of

available registers

Selection of target words

&order of phrases

Page 63: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

71

Objectives of Optimizing Compilers

CorrectCorrect codes: preserve codes: preserve meaningmeaning BetterBetter performance performance

Maximum Execution Maximum Execution EfficiencyEfficiency Minimum Code Minimum Code SizeSize

Embedded systemsEmbedded systems

Minimizing Minimizing PowerPower Consumptions Consumptions Mobile devicesMobile devices Typically, faster execution also implies lower powerTypically, faster execution also implies lower power

ReasonableReasonable compilationcompilation timetime ManageableManageable engineering and maintenance engineering and maintenance effortsefforts

Page 64: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

72

Optimization for Computer Architectures (1) ParallelismParallelism

InstructionInstruction level: multiple operations are executed simultaneously level: multiple operations are executed simultaneously Processor check Processor check dependencydependency in sequential instructions, issue them in in sequential instructions, issue them in

parallel parallel • Hardware scheduler: change order of instructionHardware scheduler: change order of instruction

CompilersCompilers: : rearrangerearrange instructions to make instruction level parallelis instructions to make instruction level parallelism more effectivem more effective

Instruction set supports:Instruction set supports:• Very long Instruction wordVery long Instruction word: issues multiple operations in parallel: issues multiple operations in parallel• Instructions that can operate on Instructions that can operate on VectorVector data at the same time data at the same time

CompilersCompilers: generate codes for such machine from sequential codes: generate codes for such machine from sequential codes ProcessorProcessor level: different level: different threadsthreads of the same application are run o of the same application are run o

n different processorsn different processors Multiprocessors + multithreaded codesMultiprocessors + multithreaded codes

• Programmer: write multithreaded codes, Programmer: write multithreaded codes, vsvs• CompilerCompiler: generate parallel codes : generate parallel codes automaticallyautomatically

Page 65: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

73

Optimization for Computer Architectures (2) Memory HierarchiesMemory Hierarchies

No storage that is both fast and largeNo storage that is both fast and large RegistersRegisters (tens ~ hundreds bytes), (tens ~ hundreds bytes), cachescaches (K~MB), (K~MB),

main/physicalmain/physical memory (M~GB), memory (M~GB), secondary/virtualsecondary/virtual memory memory (hard disks) (G~TB)(hard disks) (G~TB)

Using Using registersregisters effectively is probably the single most effectively is probably the single most important problem in optimizing a programimportant problem in optimizing a program

Cache-managementCache-management by hardware is not effective in by hardware is not effective in scientific code that has large data structures (arrays)scientific code that has large data structures (arrays)

Improve effectiveness of Improve effectiveness of memorymemory hierarchieshierarchies::• By changing By changing layout of datalayout of data, or, or• Changing the Changing the order of instructionsorder of instructions accessing the data accessing the data

Improve effectiveness of Improve effectiveness of instructioninstruction cachecache::• Change the Change the layout of codeslayout of codes

Page 66: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

74

How To Construct A Compiler

- Language Processing SystemsLanguage Processing Systems- High-Level and Intermediate LanguagesHigh-Level and Intermediate Languages- Processing PhrasesProcessing Phrases- Quick Review on Syntax & SemanticsQuick Review on Syntax & Semantics- Processing Phrases in DetailProcessing Phrases in Detail- Structure of CompilersStructure of Compilers

Page 67: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

75

Structure of a Compiler

FrontFront End: End: SourceSource Dependent Dependent Lexical AnalysisLexical Analysis Syntax AnalysisSyntax Analysis Semantic AnalysisSemantic Analysis Intermediate Code GenerationIntermediate Code Generation (Code Optimization: machine independent)(Code Optimization: machine independent)

BackBack End: End: TargetTarget Dependent Dependent Code OptimizationCode Optimization Target Code GenerationTarget Code Generation

Page 68: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

76

Structure of a Compiler

Fortran Pascal C

Intermediate Code

MIPS SPARC Pentium

Page 69: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

77

History

1st Fortran compiler: 1950s1st Fortran compiler: 1950s

efficient? (compared with assembly program)efficient? (compared with assembly program)

not bad, but much easier to write programsnot bad, but much easier to write programs

high-level languages are feasible.high-level languages are feasible.

18 man-year, ad hoc structure18 man-year, ad hoc structure

Today, we can build a simple compiler in a few Today, we can build a simple compiler in a few month.month.

Crafting an efficient and reliable compiler is still Crafting an efficient and reliable compiler is still challenging.challenging.

Page 70: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

78

Cousins of the Compiler PreprocessorsPreprocessors: macro definition/expansion: macro definition/expansion InterpretersInterpreters

Compiler vs. interpreter vs. just-in-time compilationCompiler vs. interpreter vs. just-in-time compilation AssemblersAssemblers: 1-pass / 2-pass: 1-pass / 2-pass LinkersLinkers: link source with library functions: link source with library functions LoadersLoaders: load executables into memory: load executables into memory EditorsEditors: editing sources (with/without syntax prediction): editing sources (with/without syntax prediction) DebuggersDebuggers: symbolically providing stepwise trace: symbolically providing stepwise trace ProfilersProfilers: gprof (: gprof (call graphcall graph and and time analysistime analysis)) Project managers: Project managers: IDEIDE

Integrated Development EnvironmentIntegrated Development Environment DeassemblersDeassemblers, , DecompilersDecompilers: low-level to high-level lang: low-level to high-level lang

uage conversionuage conversion

Page 71: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

79

Applications of Compilation Techniques

Page 72: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

80

Applications of Compilation Techniques

Virtually any kinds of Programming Virtually any kinds of Programming LanguagesLanguages and Specification Languages and Specification Languages with Regular and Well-defined with Regular and Well-defined Grammatical Structures will need a kind of Grammatical Structures will need a kind of compiler (or its variant, or a part of it) to compiler (or its variant, or a part of it) to analyze and then process them. analyze and then process them.

Page 73: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

81

Applications of Lexical Analysis

Text/Pattern Processing:Text/Pattern Processing: grepgrep: get lines with specified pattern: get lines with specified pattern

• Ex: grep ‘^From ‘ /var/spool/mail/andyEx: grep ‘^From ‘ /var/spool/mail/andy

sedsed: stream editor, editing specified patterns: stream editor, editing specified patterns• Ex: ls *.JPG | sed ‘s/JPG/jpg/’Ex: ls *.JPG | sed ‘s/JPG/jpg/’

trtr: simple translation between patterns (e.g., uppercases : simple translation between patterns (e.g., uppercases to lowercases)to lowercases)

• Ex: tr ‘a-z’ ‘A-Z’ < mytext > mytext.ucEx: tr ‘a-z’ ‘A-Z’ < mytext > mytext.uc

AWKAWK: pattern-action rule processing: pattern-action rule processing pattern processing based on regular expressionpattern processing based on regular expression

• Ex: awk '$1==“John"{count++}END{print count} ' < Students.tEx: awk '$1==“John"{count++}END{print count} ' < Students.txtxt

Page 74: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

82

Applications of Lexical Analysis

Search Engines/Information RetrievalSearch Engines/Information Retrieval full text search, keyword matching, fuzzy full text search, keyword matching, fuzzy

matchmatch Database MachineDatabase Machine

fast matching over large databasefast matching over large database database filterdatabase filter

Fast & Multiple Matching AlgorithmsFast & Multiple Matching Algorithms Optimized/specialized lexical analyzers (FSA)Optimized/specialized lexical analyzers (FSA) Examples: KMP, Boyer-Moore (BM), …Examples: KMP, Boyer-Moore (BM), …

Page 75: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

83

Applications Syntax Analysis

Structured Editor/Word ProcessorStructured Editor/Word Processor Integrated Develop Environment (IDE)Integrated Develop Environment (IDE)

automatic formatting, keyword insertionautomatic formatting, keyword insertion Incremental Parser vs. Full-blown ParsingIncremental Parser vs. Full-blown Parsing

incremental: patching analysis made by incremental incremental: patching analysis made by incremental changes, instead of re-parsing or re-compilingchanges, instead of re-parsing or re-compiling

Pretty Printer: beautify nested structuresPretty Printer: beautify nested structures cb (C-beautifier)cb (C-beautifier) indent (an even more versatile C-beautifier)indent (an even more versatile C-beautifier)

Page 76: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

84

Applications Syntax Analysis

Static Checker/Debugger: lintStatic Checker/Debugger: lint check errors without really running, e.g.,check errors without really running, e.g.,

statement not reachablestatement not reachable used before definedused before defined

Page 77: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

85

Application of Optimization Techniques Data flow analysisData flow analysis

SoftwareSoftware testing: testing: Locating Locating errorserrors before running ( before running (static checkingstatic checking)) Locate errors along all possible execution pathsLocate errors along all possible execution paths

• not only on test data setnot only on test data set

TypeType Checking Checking Dereferncing null or freed pointersDereferncing null or freed pointers ““Dangerous” user supplied stringsDangerous” user supplied strings

BoundBound Checking Checking Security vulnerability: buffer Security vulnerability: buffer over-run attackover-run attack Tracking values of pointers across proceduresTracking values of pointers across procedures

MemoryMemory management management Garbage collectionGarbage collection

Page 78: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

86

Applications of Compilation Techniques

Pre-processor: Macro definition/expansionPre-processor: Macro definition/expansion Active Webpages ProcessingActive Webpages Processing

Script or programming languages embedded in Script or programming languages embedded in webpages for interactive transactionswebpages for interactive transactions

Examples: JavaScript, JSP, ASP, PHPExamples: JavaScript, JSP, ASP, PHP Compiler Apps: expansion of embedded statemCompiler Apps: expansion of embedded statem

ents, in addition to web page parsingents, in addition to web page parsing Database Query Language: SQLDatabase Query Language: SQL

Page 79: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

87

Applications of Compilation Techniques

InterpreterInterpreter no pre-compilationno pre-compilation executed on-the-flyexecuted on-the-fly e.g., BASICe.g., BASIC

Script Languages: C-shell, PerlScript Languages: C-shell, Perl Function: for batch processing multiple Function: for batch processing multiple

files/databasesfiles/databases mostly interpreted, some pre-compiledmostly interpreted, some pre-compiled Some interpreted and save compiled codesSome interpreted and save compiled codes

Page 80: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

88

Applications of Compilation Techniques

Text FormatterText Formatter Troff, LaTex, Eqn, Pic, TblTroff, LaTex, Eqn, Pic, Tbl

VLSI Design: Silicon CompilerVLSI Design: Silicon Compiler Hardware Description LanguagesHardware Description Languages

variables => control signals / datavariables => control signals / data

Circuit SynthesisCircuit Synthesis Preliminary Circuit Simulation by SoftwarePreliminary Circuit Simulation by Software

Page 81: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

89

Applications of Compilation Techniques

VLSI DesignVLSI Design

Page 82: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

90

Advanced Applications Natural Language ProcessingNatural Language Processing

advanced search engines: retrieve relevant advanced search engines: retrieve relevant documentsdocuments

more than keyword matchingmore than keyword matching natural language natural language queryquery

information extraction:information extraction: acquire relevant information (into acquire relevant information (into structuredstructured form) form)

text text summarizationsummarization:: get most brief & relevant paragraphsget most brief & relevant paragraphs

text/web mining:text/web mining: mining information & rules from text/webmining information & rules from text/web

Page 83: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

91

Advanced Applications Machine TranslationMachine Translation

Translating a natural language into anotherTranslating a natural language into another Models:Models:

Direct translationDirect translation Transfer-Based ModelTransfer-Based Model Inter-lingua ModelInter-lingua Model

Transfer-Based Transfer-Based Model:Model: Analysis-Transfer-Generation (or Synthesis) modelAnalysis-Transfer-Generation (or Synthesis) model

Page 84: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

92

Tools for Compiler Construction

Page 85: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

93

Tools: Automatic Generation of Lexical Analyzers and Compilers Lexical Analyzer Generator: Lexical Analyzer Generator: LEXLEX

Input: Token Pattern specification (in regular Input: Token Pattern specification (in regular expression)expression)

Output: a lexical analyzerOutput: a lexical analyzer Parser Generator: Parser Generator: YACCYACC

““compiler-compiler”compiler-compiler” Input: Grammar Specification (in context-free Input: Grammar Specification (in context-free

grammar)grammar) Output: a syntax analyzer (aka “parser”)Output: a syntax analyzer (aka “parser”)

Page 86: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

94

Tools Syntax Directed Translation enginesSyntax Directed Translation engines

translations associated with nodestranslations associated with nodes translations defined in terms of translations of translations defined in terms of translations of

childrenchildren Automatic code generationAutomatic code generation

translation rulestranslation rules template matchingtemplate matching

Data flow analysesData flow analyses dependency of variables & constructsdependency of variables & constructs

Page 87: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

95

Programming Languages

-Issues about Modern PL’sIssues about Modern PL’s- Module programming & Parameter passingModule programming & Parameter passing- Nested modules & ScopesNested modules & Scopes- Static dynamic allocationStatic dynamic allocation

Page 88: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

96

Programming Language Basics

StaticStatic vs. vs. DynamicDynamic Issues or Policies Issues or Policies StaticStatic: determined at : determined at compilecompile time time DynamicDynamic: determined at : determined at runrun time time

ScopesScopes of of declarationdeclaration Region in which the Region in which the useuse of x refer to a of x refer to a declarationdeclaration of x of x

StaticStatic ScopeScope (aka lexical scope): (aka lexical scope): Possible to determine the scope of declaration by looking at Possible to determine the scope of declaration by looking at

the programthe program C, Java (and most PL)C, Java (and most PL)

• Delimited by Delimited by block structuresblock structures

DynamicDynamic scopescope:: At run time, the same use of x could refer to any of several At run time, the same use of x could refer to any of several

declarations of x.declarations of x.

Page 89: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

97

Programming Language Basics

VariableVariable declaration declaration StaticStatic variablesvariables

Possible to determine the location in memory where the declarPossible to determine the location in memory where the declared variable can be founded variable can be found

• Public Public staticstatic int x; // C++ int x; // C++• Only Only one copyone copy of x, can be determined at of x, can be determined at compilecompile timetime• GlobalGlobal declarations and declared declarations and declared constantsconstants can also be made stat can also be made stat

icic

DynamicDynamic variablesvariables:: LocalLocal variables without the “static” keyword variables without the “static” keyword

• Each object of the class would have its own location where x woEach object of the class would have its own location where x would be held.uld be held.

• At run time, the same use of At run time, the same use of xx in in different objectsdifferent objects could refer to could refer to any of several different locations.any of several different locations.

Page 90: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

98

Programming Language Basics

Parameter Passing MechanismsParameter Passing Mechanisms called by called by valuevalue

make a copy of physical valuemake a copy of physical value called by called by referencereference

make a copy of the make a copy of the addressaddress of a physical object of a physical object call by name (Algol 60)call by name (Algol 60)

callee executed as if the actual parameter were substcallee executed as if the actual parameter were substituted literally for the formal parameter in the code ituted literally for the formal parameter in the code of the calleeof the callee

• macro expansion of formal parameter into actual parametemacro expansion of formal parameter into actual parameterr

Page 91: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

99

Formal Languages

Page 92: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

100

Languages, Grammars and Languages, Grammars and Recognition MachinesRecognition Machines

Language

Grammar(expression)

Parser(automaton)

define acceptgenerate

construct

Parsing Table

I saw a girl in the park …

S · NP VPNP · pron | · det n

S NP VPNP pron | det n

Page 93: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

101

LanguagesLanguages

AlphabetAlphabet - any finite set of symbols - any finite set of symbols{0, 1}: {0, 1}: binary alphabetbinary alphabet

StringString - a finite sequence of symbols from - a finite sequence of symbols from an alphabetan alphabet

1011: 1011: a string of length 4a string of length 4 : : the empty stringthe empty string

LanguageLanguage - - any set of strings on an alphabetany set of strings on an alphabet{00, 01, 10, 11}: {00, 01, 10, 11}: the set of strings of length 2the set of strings of length 2 : : the empty setthe empty set

Page 94: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

105

GrammarsGrammars The sentences in a language may be defined The sentences in a language may be defined

by a set of rules called a by a set of rules called a grammargrammarLL: {00, 01, 10, 11}: {00, 01, 10, 11}

(the set of binary digits of length 2)(the set of binary digits of length 2)

G: (0|1)(0|1)G: (0|1)(0|1) Languages of different degree of regularity can be Languages of different degree of regularity can be

specified with grammar of different “expressive specified with grammar of different “expressive powers”powers” Chomsky Hierarchy:Chomsky Hierarchy:

Regular Grammar < Context-Free Grammar < Context-Regular Grammar < Context-Free Grammar < Context-Sensitive Grammar < Unrestricted Sensitive Grammar < Unrestricted

Page 95: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

106

AutomataAutomata

An An acceptor/recognizeracceptor/recognizer of a language is an of a language is an automaton which determines if an input automaton which determines if an input string is a sentence in the languagestring is a sentence in the language

A A transducertransducer of a language is an automaton of a language is an automaton which determines if an input string is a which determines if an input string is a sentence in the language, and may produce sentence in the language, and may produce strings as output if it is in the languagestrings as output if it is in the language

Implementation: state transition functions Implementation: state transition functions (parsing table) (parsing table)

Page 96: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

107

TransducerTransducer

language L1

grammar G1

automatonDefine/ Generate

construct

language L2

grammar G2

accept translation

Define/ Generate

Page 97: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

108

Meta-languagesMeta-languages

Meta-languageMeta-language: : a language used to define a language used to define another languageanother language

Different Different meta-languagesmeta-languages will be used to will be used to define the various components of a define the various components of a programming language so that these programming language so that these components can be analyzed automatically components can be analyzed automatically

Page 98: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

109

Definition of Programming Definition of Programming LanguagesLanguages

Lexical tokensLexical tokens: regular expressions: regular expressions SyntaxSyntax: context free grammars: context free grammars SemanticsSemantics: attribute grammars: attribute grammars Intermediate code generationIntermediate code generation: :

attribute grammarsattribute grammars Code generationCode generation: tree grammars: tree grammars

Page 99: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

110

Implementation of Implementation of Programming LanguagesProgramming Languages Regular expressionsRegular expressions: :

finite automata, lexical analyzerfinite automata, lexical analyzer Context free grammarsContext free grammars: :

pushdown automata, parserpushdown automata, parser Attribute grammarsAttribute grammars: :

attribute evaluators, type checker andattribute evaluators, type checker and intermediate code generatorintermediate code generator

Tree grammarsTree grammars: : finite tree automata, code generatorfinite tree automata, code generator

Page 100: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

111

Appendix: Machine Translation

Page 101: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

112

Machine Translation (Transfer Approach)

SL

Text

Analysis SL

IR

Transfer TL

IR

Synthesis TL

Text

SLDictionaries& Grammar

TLDictionaries& Grammar

SL-TLDictionaries

TransferRules

IR: Intermediate Representation

Analysis is target independent, andAnalysis is target independent, and

Generation (Synthesis) is source independentGeneration (Synthesis) is source independent

Inter-lingua

SL TL

Page 102: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

113

AnalysisAnalysis Morphological and Lexical AnalysisMorphological and Lexical Analysis Part-of-speech (POS) Tagging Part-of-speech (POS) Tagging

n. Missn. Missn. Smithn. Smithv. v. put (+ed)put (+ed)q. twoq. twon. n. book (+s)book (+s)p. onp. ond. thisd. thisn. n. dining table.dining table.

Example:Miss Smith put two books on this dining table.

Page 103: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

114

S

NP VP

V NP PP

Miss Smith put(+ed) two book(s) on this dining table

Example:Miss Smith put two books on this dining table.

Syntax AnalysisSyntax Analysis

Page 104: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

115

TransferTransfer

(1) Lexical Transfer(1) Lexical Transfer

Miss Miss 小姐小姐 SmithSmith 史密斯史密斯 put (+ed)put (+ed) 放放 twotwo 兩兩 book (+s)book (+s) 書書 onon 在…上面在…上面 thisthis 這這 dining tabledining table 餐桌餐桌

Example:Miss Smith put two books on this dining table.

Page 105: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

116

TransferTransfer

(2) Phrasal/Structural Transfer(2) Phrasal/Structural Transfer

小姐史密斯小姐史密斯放兩書在放兩書在上面這餐桌上面這餐桌 史密斯小姐史密斯小姐放兩書在放兩書在這餐桌上面這餐桌上面

Example:Miss Smith put two books on this dining table.

Page 106: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

117

Generation: Morphological & StructuralGeneration: Morphological & Structural

史密斯小姐放兩書在這餐桌上面史密斯小姐放兩書在這餐桌上面

史密斯小姐放兩 史密斯小姐放兩 (( 本本 )) 書在這書在這 (( 張張 )) 餐桌上面餐桌上面

史密斯小姐史密斯小姐 (( 把把 )) 兩兩 (( 本本 )) 書放在這書放在這 (( 張張 )) 餐餐

桌上面桌上面

史密斯小姐把兩本書放在這張餐桌上面中文翻譯中文翻譯::

Example:Miss Smith put two books on this dining table.

Page 107: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

118

errorhandler

symbol-tablemanager

[Aho 86]

source program

intermediate code generator

lexicalanalyzer

syntaxanalyzer

semanticanalyzer

codeoptimizer

codegenerator

target program

Page 108: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

119

position : = initial + rate * 60

lexical analyzer

id1 : = id2 + id3 * 60

syntax analyzer

: =+

*id1

id2 id3 60

semantic analyzer

+*

id1

id2 id3 inttoreal 60

: =

positionposition ……

initialinitial ……

raterate ……

SYMBOL TABLE

1

2

3

4

[Aho 86]

Page 109: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

120

intermediate code generator

C

temp1 := inttoreal (60)temp2 := id3 * temp1temp3 := id2 + temp2id1 := temp3

code optimizer

temp1 := id3 * 60.0id1 := id2 + temp1

code generator

Binary Code

[Aho 86]

Page 110: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

121

Detailed Steps (1):   Analysis

Text Pre-processing (separating texts from tags)Text Pre-processing (separating texts from tags) Clean up garbage patterns (usually introduced during file conversioClean up garbage patterns (usually introduced during file conversio

n)n) Recover sentences and words (e.g., <B>C</B> omputer)Recover sentences and words (e.g., <B>C</B> omputer) Separate Processing-Regions from Non-Processing-Regions (e.g., FilSeparate Processing-Regions from Non-Processing-Regions (e.g., Fil

e-Header-Sections, Equations, etc.)e-Header-Sections, Equations, etc.) Extract and mark strings that need special treatment (e.g., Topics, KeExtract and mark strings that need special treatment (e.g., Topics, Ke

ywords, etc.) ywords, etc.) Identify and convert markup tags into internal tags (de-markup; howIdentify and convert markup tags into internal tags (de-markup; how

ever, markup tags also provide information)ever, markup tags also provide information)

Discourse and Sentence SegmentationDiscourse and Sentence Segmentation Divide text into various primary processing units (e.g., sentences)Divide text into various primary processing units (e.g., sentences) Discourse: Cue PhrasesDiscourse: Cue Phrases Sentence: mainly classify the type of “Period” and “Carriage Return” Sentence: mainly classify the type of “Period” and “Carriage Return”

in English (“sentence stops” vs. “abbreviations/titles”)in English (“sentence stops” vs. “abbreviations/titles”)

Page 111: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

122

Detailed Steps (2):   Analysis (Cont.) StemmingStemming

English: perform morphological analysis (e.g., -ed, -ing, -s, -ly, re-, prEnglish: perform morphological analysis (e.g., -ed, -ing, -s, -ly, re-, pre-, etc.) and Identify root form (e.g., got <get>, lay <lie/lay>, etc.)e-, etc.) and Identify root form (e.g., got <get>, lay <lie/lay>, etc.)

Chinese: mainly detect suffix lexemes (e.g., Chinese: mainly detect suffix lexemes (e.g., 孩子們孩子們 , , 學生們學生們 , etc.), etc.) Text normalization: Capitalization, Hyphenation, …Text normalization: Capitalization, Hyphenation, …

TokenizationTokenization English: mainly identify split-idiom (e.g., turn NP on) and compoundEnglish: mainly identify split-idiom (e.g., turn NP on) and compound Chinese: Word Segmentation (e.g., [Chinese: Word Segmentation (e.g., [ 土地土地 ] [] [ 公有公有 ] [] [ 政策政策 ])]) Regular Expression: numerical strings/expressions (e.g., twenty millionRegular Expression: numerical strings/expressions (e.g., twenty million

s), date, … (each being associated with a specific type)s), date, … (each being associated with a specific type)

TaggingTagging Assign Part-of-Speech (e.g., n, v, adj, adv, etc.)Assign Part-of-Speech (e.g., n, v, adj, adv, etc.) Associated forms are basically independent of languages starting from tAssociated forms are basically independent of languages starting from t

his stephis step

Page 112: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

123

Detailed Steps (3):   Analysis (Cont.) ParsingParsing

Decide suitable syntactic relationship (e.g., PP-Attachment)Decide suitable syntactic relationship (e.g., PP-Attachment)

Decide Word-SenseDecide Word-Sense Decide appropriate lexicon-sense (e.g., River-Bank, Money-Bank, Decide appropriate lexicon-sense (e.g., River-Bank, Money-Bank,

etc.)etc.)

Assign Case-LabelAssign Case-Label Decide suitable semantic relationship (e.g., Patient, Agent, etc.)Decide suitable semantic relationship (e.g., Patient, Agent, etc.)

Anaphora and Antecedent ResolutionAnaphora and Antecedent Resolution Pronoun reference (e.g., “he” refers to “the president”)Pronoun reference (e.g., “he” refers to “the president”)

Page 113: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

124

Detailed Steps (4):   Analysis (Cont.) Decide Discourse StructureDecide Discourse Structure

Decide suitable discourse segments relationship (e.g., Evidence, Decide suitable discourse segments relationship (e.g., Evidence, Concession, Justification, etc. [Marcu 2000].)Concession, Justification, etc. [Marcu 2000].)

Convert into Logical Form (Optional)Convert into Logical Form (Optional) Co-reference resolution (e.g., “president” refers to “Bill Clinton”), Co-reference resolution (e.g., “president” refers to “Bill Clinton”),

scope resolution (e.g., negation), Temporal Resolution (e.g., today, scope resolution (e.g., negation), Temporal Resolution (e.g., today, last Friday), Spatial Resolution (e.g., here, next), etc.last Friday), Spatial Resolution (e.g., here, next), etc.

Identify roles of Named-Entities (Person, Location, Organization), Identify roles of Named-Entities (Person, Location, Organization), and determine IS-A (also Part-of) relationship, etc.and determine IS-A (also Part-of) relationship, etc.

Mainly used in inference related applications (e.g., Q&A, etc.)Mainly used in inference related applications (e.g., Q&A, etc.)

Page 114: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

125

Detailed Steps (5):   Transfer Decide suitable Target Discourse StructureDecide suitable Target Discourse Structure

For example: Evidence, Concession, Justification, etc. [Marcu 2000].For example: Evidence, Concession, Justification, etc. [Marcu 2000].

Decide suitable Target Lexicon SensesDecide suitable Target Lexicon Senses Sense Mapping may not be one-to-one (sense resolution might be different Sense Mapping may not be one-to-one (sense resolution might be different

in different languages, e.g. “snow” has more senses in Eskimo)in different languages, e.g. “snow” has more senses in Eskimo) Sense-Token Mapping may not be one-to-one (lexicon representation powSense-Token Mapping may not be one-to-one (lexicon representation pow

er might be different in different languages, e.g., “DINK”, “er might be different in different languages, e.g., “DINK”, “ 睨”睨” , etc). It , etc). It could be 2-1, 1-2, etc.could be 2-1, 1-2, etc.

Decide suitable Target Sentence StructureDecide suitable Target Sentence Structure For example: verb nominalization, constitute promotion and demotion (usuFor example: verb nominalization, constitute promotion and demotion (usu

ally occurs when Sense-Token-Mapping is not 1-1)ally occurs when Sense-Token-Mapping is not 1-1)

Decide appropriate Target CaseDecide appropriate Target Case Case Label might change after the structure has been modifiedCase Label might change after the structure has been modified (Example) verb nominalization: “… that you (AGENT) invite me” (Example) verb nominalization: “… that you (AGENT) invite me” “… “…

your (POSS) invitation”your (POSS) invitation”

Page 115: 1 Compilers: Principles, Techniques, and Tools Jing-Shin Chang Department of Computer Science & Information Engineering National Chi-Nan University

126

Detailed Steps (6):   Generation Adopt suitable Sentence Syntactic PatternAdopt suitable Sentence Syntactic Pattern

Depend on Style (which is the distributions of lexicon selection Depend on Style (which is the distributions of lexicon selection and syntactic patterns adopted)and syntactic patterns adopted)

Adopt suitable Target LexiconAdopt suitable Target Lexicon Select from Synonym Set (depend on style)Select from Synonym Set (depend on style)

Add “de” (Chinese), comma, tense, measure (Chinese), etc.Add “de” (Chinese), comma, tense, measure (Chinese), etc. Morphological generation is required for target-specific tokensMorphological generation is required for target-specific tokens

Text Post-processingText Post-processing Final string substitution (replace those markers of special strings)Final string substitution (replace those markers of special strings) Extract and export associated information (e.g., Glossary, Index, Extract and export associated information (e.g., Glossary, Index,

etc.)etc.) Restore customer’s markup tags (re-markup) for saving Restore customer’s markup tags (re-markup) for saving

typesetting worktypesetting work