77
UNIT 1 INTODUCTION TO COMPILERS BY :- NAMRATHA NAYAK w w w . B o o k s p a r . c o m | W e b s i t e f o r S t u d e n t s | V T U - N o t e s - Q u e s t i o n P a p e r s

UNIT 1 INTODUCTION TO COMPILERS By :- Namratha Nayak

  • Upload
    tambre

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

UNIT 1 INTODUCTION TO COMPILERS By :- Namratha Nayak. TOPICS. Language Processors Structure of a Compiler Evolution of Programming Languages Science of Building a Compiler Applications of Compiler Technology Programming Language Basics. Language Processors. COMPILER - PowerPoint PPT Presentation

Citation preview

Page 1: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

UNIT 1

INTODUCTION TO COMPILERS

BY :- NAMRATHA NAYAK

Page 2: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

TOPICS Language Processors Structure of a Compiler Evolution of Programming Languages Science of Building a Compiler Applications of Compiler Technology Programming Language Basics

Page 3: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

LANGUAGE PROCESSORS COMPILER

Read a program in source language and translate into the target language Source language – High-level language like C, C++ Target language – object code of the target machine

Report any errors detected in the source program during translation

Page 4: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

LANGUAGE PROCESSORS INTERPRETER

Directly executes the operations specified in the source program on inputs supplied by the user

Target program is not produced as output of translation Gives better error-diagnostics than a compiler

Executes source program statement by statement Target program produced by compiler is much faster at

mapping inputs to outputs

Page 5: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

LANGUAGE PROCESSORS EXAMPLE

Java language processors combine compilation and interpretation Source program is first compiled into bytecodes Bytecodes are then interpreted by a virtual machine

Just-in-time compilers Translate bytecodes into machine language before they runt he

intermediate program to process input

Page 6: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

LANGUAGE PROCESSORS A Language-Processing System

Page 7: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

LANGUAGE PROCESSORS Preprocessor

Source program may be divided into modules in separate files Accomplishes the task of collecting the source program Can delete comments, include other files, expand macros

Assembler Compiler produces an assembly-language program Symbolic form of the machine language Produces relocatable machine code as output

Page 8: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

LANGUAGE PROCESSORS Linker/Loader

Relocatable Code Not ready to execute Memory references are made relative to an undetermined starting

address in memory Relocatable machine code may have to be linked with other

object files Linker

Resolves external memory addresses Code in file referring to a location in another file

Loader Resolve all relocatable addresses relative to a given starting address Puts together all the executable object files into memory for

execution

Page 9: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

THE STRUCTURE OF A COMPILER Analysis Phase

Break up source program into token or constituent pieces Impose a grammatical structure Create an intermediate representation of the source program If source program is syntactically incorrect or semantically

wrong Provide informative messages to the user

Symbol Table Stores the information collected about the source program Given to the synthesis phase along with the intermediate

representation

Page 10: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

THE STRUCTURE OF A COMPILER Synthesis Phase

Constructs the desired target program from Intermediate representation Information in symbol table

Back end of the compiler Analysis phase is called front end of the compiler

Compilation process is a sequence of phases Each phase transforms one representation of source program

into another Several phases may be grouped together Symbol table is used by all the phases of the compiler

Page 11: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

THE STRUCTURE OF A COMPILER

Page 12: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

LEXICAL ANALYSIS Lexical Analyzer

Reads stream of characters in the source program Groups the characters into meaningful sequences – lexemes For each lexeme, a token is produced as output

<token-name , attribute-value> Token-name : symbol used during syntax analysis Attribute-value : an entry in the symbol table for this token

Information from symbol table is needed for syntax analysis and code generation

Consider the following assignment statement

Page 13: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SYNTAX ANALYSIS Parsing

Parser uses the tokens to create a tree-like intermediate representation

Depicts the grammatical structure of the token stream Syntax tree is one such representation

Interior node – operation Children - arguments of the operation

Other phases use this syntax tree to help analyze source program and generate target program

Page 14: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SEMANTIC ANALYSIS Semantic Analyzer

Checks semantic consistency with language using: Syntax tree Information in symbol table

Gathers type information and save in syntax tree or symbol table

Type Checking Checks each operator for matching operands Ex: Report error if floating point number is used as index of an array

Coercions or type conversions Binary arithmetic operator applied to a pair of integers or floating point

numbers If applied to floating point and integer, compiler may convert integer to

floating-point number

Page 15: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SEMANTIC ANALYSIS Semantic Analyzer

For our assignment statement Position, rate and initial are floating-point numbers Lexeme 60 is an integer

Type checker finds that * is applied to floating-point ‘rate’ and integer ‘60’ Integer is converted to floating-point

Page 16: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

INTERMEDIATE CODE GENERATION After syntax and semantic analysis

Compilers generate machine-like intermediate representation This intermediate representation should have the two properties:

Should be easy to produce Should be easy to translate into target machine

Three-address code Sequence of assembly-like instructions with three operands per

instruction Each operand acts like a register

Page 17: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

INTERMEDIATE CODE GENERATION Points to be noted about three-address instructions are:

Each assignment instruction has at most one operator on the right side

Compiler must generate a temporary name to hold the value computed by a three-address instruction

Some instructions have fewer than three operands

Page 18: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

CODE OPTIMIZATION Attempt to improve the target code

Faster code, shorter code or target code that consumes less power Optimizer can deduce that

Conversion of 60 from int to float can be done once at compile time

So, the inttofloat can be eliminated by replacing 60 with 60.0 t3 is used only once to transmit its value to id1

Page 19: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

CODE GENERATION Code Generator

Takes intermediate representation as input Maps it into target language If target language is machine code

Registers or memory locations are selected for each of the variables used Intermediate instructions are translated into sequences of machine

instructions performing the same task Assignment of registers to hold variables is a crucial aspect

First operand of each instruction specifies a destination

Page 20: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

Page 21: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SYMBOL-TABLE MANAGEMENT Essential function of Compiler

Record variable names used in source program Collect information about storage allocated for a name

Type Scope – where in the program the value may be used In case of procedure names,

Number and type of its argument Method of passing each argument Type returned

Symbol Table Data structure containing a record for each variable name with

fields for attributes

Page 22: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

COMPILER-CONSTRUCTION TOOLS Commonly used compiler-construction tools

Parser Generators Scanner Generators Syntax-directed translation engines Code-generator Generators Data-flow analysis engines Compiler-construction Toolkits

Page 23: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

EVOLUTION OF PROGRAMMING LANGUAGES Move to Higher-Level Languages

Development of mnemonic assembly languages in 1950’s Classification of Languages

Generation First-generation : machine languages Second-generation : assembly languages Third-generation : C, C++, C#, Java Fourth-generation : SQL, Postscript Fifth-generation : Prolog

Imperative and Declarative Imperative : how a computation is to be done Declarative : what computation is to be done

Object-oriented Language Scripting Languages

Page 24: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

EVOLUTION OF PROGRAMMING LANGUAGES Impact on Compilers

Advances in PL’s placed new demands on compiler writers Devise algorithms and representations to support new features Performance of a computer is dependent on compiler technology Good software-engineering techniques are essential for creating

and evolving modern language processors Compiler writers must evaluate tradeoffs about

What problems to deal with What heuristics to use to approach the problem of generating efficient

code

Page 25: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SCIENCE OF BUILDING A COMPILER Modeling in Compiler Design and Implementation

Study of compilers is a study of how To design the right mathematical models and Choose the right algorithms

Finite-state machines and regular expressions Useful for describing the lexical units of a program (keywords,

identifiers) Used to describe the algorithms used to recognize those units

Context-free Grammars Describe syntactic structure of PL Nesting of parentheses, control constructs

Page 26: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SCIENCE OF BUILDING A COMPILER Science of Code Optimization

“Optimization” – an attempt to produce code that is more efficient Processor architectures have become more complex Important to formulate the right problem to solve Need a good understanding of the programs Compiler design must meet the following design objectives

Optimization must be correct, i.e., preserve the meaning of compiled program

Optimization must improve the performance of many programs Compilation time must be kept reasonable Engineering effort required must be manageable

Page 27: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

APPLICATIONS OF COMPILER TECHNOLOGY Implementation of high-level programming languages

High-level programming language defines a programming abstraction

Low-level language have more control over computation and produce efficient code Hard to write, less portable, prone to errors and harder to maintain Example : register keyword

Common programming languages (C, Fortran, Cobol) support User-defined aggregate data types (arrays, structures, control flow ) Data-flow optimizations

Analyze flow of data through the program and remove redundancies Key ideas behind object oriented languages are

Data Abstraction Inheritance of properties

Page 28: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

APPLICATIONS OF COMPILER TECHNOLOGY Implementation of high-level programming languages

Java has features that make programming easier Type-safe – an object cannot be used as an object of an unrelated type Array accesses are checked to ensure that they lie within the bounds Built in garbage-collection facility Optimizations developed to overcome the overhead by eliminating

unnecessary range checks

Page 29: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

APPLICATIONS OF COMPILER TECHNOLOGY Optimizations for Computer Architectures

Parallelism Instruction level : multiple operations are executed simultaneously Processor level : different threads of the same application run on different

processors Memory hierarchies

Consists of several levels of storage with different speeds and sizes Average memory access time is reduces Using registers effectively is the most important problem in optimizing a

program Caches and physical memories are managed by the hardware Improve effectiveness by changing the layout of data or order of

instructions accessing the data

Page 30: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

APPLICATIONS OF COMPILER TECHNOLOGY Design of new Computer Architectures

RISC (Reduced Instruction-Set Computer) CISC (Complex Instruction-Set Computer) –

Make assembly programming easier Include complex memory addressing modes

Optimizations reduce these instructions to a small number of simpler operations

PowerPC, SPARC, MIPS, Alpha and PA-RISC Specialized Architectures

Data flow machines, vector machines, VLIW, SIMD, systolic arrays Made way into the designs of embedded machines Entire systems can fit on a single chip Compiler technology helps to evaluate architectural designs

Page 31: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

APPLICATIONS OF COMPILER TECHNOLOGY Program Translations

Binary Translation Translate binary code of one machine to that of another Allow machine to run programs compiled for another instruction set Used to increase the availability of software for their machines Can provide backward compatibility

Hardware synthesis Hardware designs are described in high-level hardware description

languages like Verilog and VHDL Described at register transfer level (RTL)

Variables represent registers Expressions represent combinational logic

Tools translate RTL descriptions into gates, which are then mapped to transistors and eventually to physical layout

Page 32: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

APPLICATIONS OF COMPILER TECHNOLOGY Program Translations

Database Query Interpreters Languages are useful in other applications Query languages like SQL are used to search databases

Queries consist of predicates containing relational and boolean operators Can be interpreted or compiled into commands to search a database

Compiled Simulation Simulation

Technique used in scientific and engg disciplines Understand a phenomenon or validate a design

Inputs include description of the design and specific input parameters for that run

Page 33: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

APPLICATIONS OF COMPILER TECHNOLOGY Software Productivity Tools

Testing is a primary technique for locating errors in a program Use data flow analysis to locate errors statically Problem of finding all program errors is undecidable Ways in which program analysis has improved software productivity

Type Checking Catch inconsistencies in the program

Operation applied to wrong type of object Parameters to a procedure do not match the signature

Go beyond finding type errors by analyzing flow of data If pointer is assigned null and then dereferenced, the program is clearly in error

Page 34: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

APPLICATIONS OF COMPILER TECHNOLOGY Software Productivity Tools

Bounds Checking Security breaches are caused by buffer overflows in programs written in C Data-flow analysis can be used to locate buffer overflows Failing to identify a buffer overflow may compromise the security of the

system Memory-management tools

Automatic memory management removes all memory-management errors like memory leaks

Tools developed to help programmers find memory management errors Purify - dynamically catches memory management errors as they occur

Page 35: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE BASICS The Static/Dynamic Distinction

What decision can the compiler make about a program Static Policy - Language uses a policy that allows compiler to decide

an issue, i.e., at compile time Dynamic Policy – Policy that allows a decision to be made when we

execute the program, i.e. at run time Scope of Declarations

Scope declaration of x is the region of the program in which uses of x refer to this declaration

Static or Lexical scope : Used if it is possible to determine the scope of a declaration by looking only at the program

Dynamic Scope : As the program runs, the same use of x could refer to any several different declaration of x

Example : public static int x;

Page 36: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE BASICS Environments and States

Whether the changes that occur as the program is run Affects the values of the data elements Affect interpretation of names for that data

Association of names with locations on memory (store) and then with values is described as a two-stage mapping Environment – Mapping from names to locations in the store State – Mapping from locations in store to their values. It maps l-values to

their corresponding r-values

Page 37: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE BASICS Environments and States

Example

Exceptions to environment and state mappings Static versus dynamic binding of names to locations Static versus dynamic binding of locations to values

Page 38: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE BASICS Static Scope and Block Structure

Scope rules for C – based on program structure Scope of a declaration – determined by the location of its appearance Languages like C++,C# and Java provide explicit control over scopes

– public, private and protected Static scope rules for a language with blocks – a grouping of

declarations and statements C static scope policy is as follows:

C program is a sequence of top-level declarations of variables & functions Functions may have variable declarations within them, scope of which is

restricted to the function in which it appears Scope of a top-level declaration of a name x consists of the entire program that

follows

Page 39: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE BASICS Static Scope and Block Structure

The syntax of blocks in C is given by It is a type of statement and can appear anywhere that other statements can

appear Is a sequence of declarations followed by a sequence of statements, all

surrounded by braces Block structure – nesting property of blocks Static scope rule for variable declaration is as follows:

If declaration D of name x belongs to block B, Then scope of D is all of B, except for any blocks B’ nested to any depth

within B in which x is redeclared

Page 40: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE BASICS Static Scope and Block Structure

Blocks in a C++ program

Page 41: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE BASICS Explicit Access Control

Classes and structures introduce new scope for their members If p is an object of a class with a field x, then use of x in p.x refers to field x in

the class definition The scope of declaration x in a class C extends to any subclass C’, except if C’

has a local declaration of the same name x Public, private and protected – provide explicit control over access to

member names in a super class In C++, class definition may be separated from the definition of some

or all of its methods A name x associated with the class C may have a region of code that is outside

its scope followed by another region within its scope

Page 42: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE BASICS Dynamic Scope

Based on factors that can be known only when the program executes A use of a name x refers to the declaration of x in the most recently

called procedure with such a declaration Macro expansion in the C preprocessor

Dynamic scope resolution is essential for polymorphic procedures

Page 43: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE BASICS Dynamic Scope

Method resolution in OOP The procedure called when x.m() is executed depends on the class of the

object denoted by x at that time Example:

Class C with a method named m() D is a subclass of C , and D has its own method named m() There is a use of m of the form x.m(), where x is an object of class C

Impossible to tell at compile time whether x will be of class C or of the subclass D

Cannot be decided until runtime which definition of m is the right one Code generated by compiler must determine the class of the object x, and call

one or the other method named m

Page 44: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE BASICS Parameter Passing Mechanisms

How actual parameters are associated with formal parameters Actual parameters – used in the call of a procedure Formal parameters – used in the procedure definition

Call-by-Value The actual parameter is evaluated or copied Value is placed in the location belonging to the corresponding formal

parameter of the called procedure Computation involving formal parameters done by the called procedure is

local to that procedure and actual parameters cannot be changed In C, we can pass a pointer to a variable to allow that variable to be changed

by the callee Array names passed as parameters in C,C++ or Java give the called procedure

what is in effect a pointer or reference to the array itself

Page 45: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE BASICS Parameter Passing Mechanisms

Call-by-Reference Address of actual parameter is passed to the callee as the value of the

corresponding formal parameter Changes to formal parameter appear as changes to the actual parameter Essential when the formal parameter is a large object, array or a structure

Strict call-by-value requires that the caller copy the entire actual parameter into the space of the corresponding formal parameter

Copying is expensive when the parameter is large Call-by-Name

The callee executes as if the actual parameter were substituted literally for the formal parameter in the code of the callee

Page 46: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE BASICS Aliasing

Consequence of call-by-reference parameter passing Possible that two formal parameters can refer to the same location Such variables are said to be aliases of one another Example:

a is an array belonging to procedure p, and p calls another procedure q(x,y) with a call q(a,a)

Parameters are passed by value but the array names are references to the location where the array is stored

So, x and y become aliases of each other Understanding aliasing is essential for a compiler that optimizes a program

Page 47: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers LEXICAL ANALYSIS

Page 48: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

OBJECTIVES Role of Lexical analyzer Lexical analysis using formal language definitions with

Finite Automata Specification of Tokens Recognition of Tokens

Page 49: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

PROGRAMMING LANGUAGE STRUCTURE A Programming Language is defined by:

SYNTAX Decides whether a sentence in a language is well-formed

SEMANTICS Determines the meaning , if any, of a syntactically well-formed sentence

GRAMMAR Provides a generative finite description of the language

Well developed tools (regular, context-free and attribute grammars) are available for the description of syntax

Lexical analyzer and the Parser handle the syntax of the programming language

Page 50: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

THE ROLE OF THE LEXICAL ANALYZER Main task of lexical analyzer

Read input characters in a source program Group them into lexemes Produce as output a sequence of tokens for each lexeme Stream of tokens is sent to the parser Whenever a lexeme is found, it is entered into the symbol table

Page 51: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

THE ROLE OF THE LEXICAL ANALYZER Other tasks performed by the lexical analyzer

Removing comments and whitespace Correlating error messages generated by compiler with source program Associates a line number with each error message Makes a copy of the source program with error messages

Cascade of two processes Scanning

Processes that do not require the tokenization of input, like, deletion of comments and compaction of whitespaces

Lexical analysis Scanner produces the sequence of tokens as output

Page 52: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

THE ROLE OF THE LEXICAL ANALYZER Lexical Analysis versus Parsing

Simplicity of design Separation of lexical and syntactic analysis allows to simplify one of these tasks A parser that has to deal with comments and whitespace is more complex

Compiler efficiency is improved Allows to apply specialized techniques that serve only the lexical task Specialized buffering techniques for reading input

Compiler portability is enhanced Input device specific peculiarities can be restricted to lexical analyzer

Page 53: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

TOKENS, PATTERNS, AND LEXEMES Token

Pair consisting of a token name and an optional attribute value Token name – abstract symbol for a lexical unit, like keyword

Pattern Description of the form that the lexemes of a token may take If keyword is a token, pattern is a sequence of characters that form the

keyword Lexeme

Sequence of characters in the source program that matches the pattern for a token

Identified by the lexical analyzer as an instance of that token

Page 54: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

TOKENS, PATTERNS, AND LEXEMES Typical tokens in a Programming Language

One token for each keyword Tokens for the operators Token representing all identifiers One or more tokens representing constants, such as numbers and literal

strings Tokens for each punctuation symbol, such as comma, semicolon, left and

right parentheses

Page 55: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

ATTRIBUTES FOR TOKENS When more than one lexeme matches a pattern, additional

information about the lexeme must be passed Example : Pattern for token number matches both 0 and 1 So, lexical analyzer returns to the parser both the token name and attribute

value describing the lexeme Token name influences parsing decisions Attribute value influences translation of tokens after the parse

Appropriate attribute value for an identifier is a pointer to the symbol-table entry for that entry

Page 56: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

LEXICAL ERRORS Lexical analyzer is unable to proceed because none of the

patterns for tokens matches any prefix of the remaining input Error recovery strategy

“Panic mode” recovery Delete successive characters from the remaining input, until the lexical analyzer

can find a well-formed token at the beginning of the input left Delete one character from the remaining input Insert a missing character into the remaining input Replace a character by another character Transpose two adjacent characters See whether a prefix of the remaining input can be transformed into a valid

lexeme by a single transformation

Page 57: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

INPUT BUFFERING Buffer Pairs

Specialized buffering techniques to reduce the amount of overhead to process a single input character

Scheme involving two buffers that are alternately reloaded

eof – marks the end of the source file Two pointers to the input

lexemeBegin – marks beginning of the current lexeme forward – scans ahead until a pattern match is found

Page 58: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

INPUT BUFFERING Buffer Pairs

Once the next lexeme is determined, forward is set to the character at its right end

After lexeme is recorded as an attribute value, lexemeBegin is set to the character immediately after the lexeme just found

To advance forward pointer, Test whether the end of one of the buffers has been reached If so, then reload the other buffer from the input Move forward to the beginning of the newly loaded buffer

Page 59: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

INPUT BUFFERING Sentinels

In previous scheme, each time the forward pointer is advanced, Must check that we have not moved off one of the buffers If we do, then reload the other buffer

For each character read, make two tests End of buffer Determine what character is read

Combine the buffer-end test with the test for current character, if we extend each buffer to hold a sentinel character at the end Sentinel is a special character that is not a part of the source program

Page 60: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SPECIFICATION OF TOKENS Strings and Languages

Alphabet Finite set of symbols, e.g., letters, digits and punctuation Binary alphabet – {0,1}

String over an alphabet Finite sequence of symbols drawn from that alphabet Length of string s (|s|) – number of occurrences of symbols in s Empty string (ε) –string of length zero

Language Set of strings over some fixed alphabet Ex :

{ε}, set containing only the empty string Set of all syntactically well-formed C programs

Page 61: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SPECIFICATION OF TOKENS Strings and Languages

Prefix of a string s String obtained by removing zero or more symbols from the end of s

Suffix of string s String obtained by removing zero or more symbols from the beginning if s

Substring of s String obtained by deleting any prefix and any suffix from s

Proper prefixes, suffixes and substrings of a string s : Prefixes, suffixes and substrings of s that are not or not equal to s itself

Subsequence of s Any string formed by deleting zero or more not necessarily consecutive positions

of s Concatenation of x and y (xy)

String formed by appending y to x

Page 62: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SPECIFICATION OF TOKENS Operations on Languages

Kleene Closure (L*) Set of strings obtained by concatenating L zero or more times L0 - concatenation of L zero times, that is ,{ }

Positive Closure (L+) Same as Kleene closure but without the term L0

ε

Page 63: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SPECIFICATION OF TOKENS Regular Expressions

Notation used for describing all languages that can be built from these operators applied to the symbols of some alphabet

Ex: Language of C identifiers letter (letter | digit)* Each regular expression r denotes a language L(r), defined recursively

from the languages denoted by r’s sub expressions Rules that define the RE’s over some alphabet

ε is a regular expression and L(ε ) is {ε } If a is a symbol in alphabet, then a is a regular expression and L(a) = {a}

r and s are RE’s denoting languages L(r) and L(s), then (r)|(s) is a RE denoting the language L(r) U L(s) (r)(s) is a RE denoting the language L(r)L(s) (r)* is a RE denoting (L(r))*

(r) is a RE denoting L(r)

Page 64: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SPECIFICATION OF TOKENS Regular Expressions

Parentheses in RE’s may be dropped if we adopt the following Unary operator * has highest precedence and is left associative Concatenation has second highest precedence and is left associative | has lowest precedence and is left associative

Example: (a) | ((b)*(c) a | b*c Regular set : Language defined by a RE

Two RE’s are equivalent if they denote the same regular set Ex: (a|b) = (b|a)

Page 65: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SPECIFICATION OF TOKENS Regular Definitions

If is an alphabet of symbols, then a regular definition is a sequence of definitions of the form

d1 r1

d2 r2

........dn rn

Each di is a new symbol, not in and not the same as any other of the d’s Each ri is a RE over the alphabet U {d1, d2,... ,di-1}

Avoid recursive definition by restricting ri to and the previously defined d’s

Construct a RE over alone for each ri

Page 66: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

SPECIFICATION OF TOKENS Extensions of Regular Expressions

One or more instances Unary operator +, represents positive closure of a RE and its language r* = r+| ε and r+ = rr* = r*r

Zero or one instance Unary operator ? Means “zero or one occurence” r? = r|ε or L(r?) = L(r) U {ε}

Character classes Regular expression, a1| a2|....| an can be replaced by [a1 a2... an ] [abc] is short form for a|b|c

Page 67: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

RECOGNITION OF TOKENS Study how to take patterns for all the needed tokens

Build a piece of code that examines the input string Find a prefix that is a lexeme matching one of the patterns

Simple form of branching statements and conditional expressions Terminals of the grammar : if, then, else, relop, id, number

Page 68: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

RECOGNITION OF TOKENS Patterns for the tokens are described using regular definitions

Recognize the token ws, to remove whitespaces

Page 69: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

RECOGNITION OF TOKENS Tokens, their patterns and attribute values

Page 70: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

TRANSITION DIAGRAMS Convert patterns into flowcharts called transition diagrams

States Represents a condition that could occur during the scanning of input

that matches a patternEdges

Directed from one state to another Labelled by a symbol or set of symbols

If in some state s, and next input symbol is a, Look for an edge out of state s labelled by a If such an edge is found, advance the forward pointer and enter the

state to which that edge leads

Page 71: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

TRANSITION DIAGRAMS Important conventions about transition diagrams

Certain states are said to be final or accepting Indicate that a lexeme has been found If there is an action to be taken – returning a token an attribute value to the

parser – attach that action tot he accepting state If it is necessary to retract the forward pointer one position, then

place a * near the accepting state One state is the start state or initial state

Transition diagram always begins in the start state before any input symbols have been read

Page 72: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

TRANSITION DIAGRAMS Transition diagram for relop

Page 73: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

RECOGNITION OF RESERVED WORDS AND IDENTIFIERS Keywords like if or then are reserved, even though they look

like identifiers they are not identifiers

Two ways to handle reserved words that look like identifiers1. Install the reserved words in the symbol table initially

When an identifier is found, call to installID places it in the symbol table and returns a pointer to the symbol-table entry

Any identifier not in the symbol table during lexical analysis has a token id getToken examines the symbol table entry for the lexeme found and returns

the token name

Page 74: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

RECOGNITION OF RESERVED WORDS AND IDENTIFIERS Two ways to handle reserved words that look like identifiers

2. Create separate transition diagrams for each keyword Such a diagram consists of states representing the situation after each

successive letter of keyword is seen , followed by a test for “nonletter-or-digit”

Necessary to check that the identifier has ended, or else would return token then in situations where correct token was id

Page 75: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

ARCHITECTURE OF A TRANSITION-DIAGRAM-BASED LEXICAL ANALYZER Collection of transition diagrams can be used to build a lexical

analyzer Each state is represented by a piece of code

Variable state holding the number of the current state A switch based on the value of state takes us to the code for each of

the possible states, where action of that state is found Code for a state is itself a switch statement or multiway branch that

determines the next state

Page 76: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

ARCHITECTURE OF A TRANSITION-DIAGRAM-BASED LEXICAL ANALYZER

Fig : Implementation of relop transition diagram

Page 77: UNIT 1 INTODUCTION TO  COMPILERS By :- Namratha Nayak

www.Bookspar.com | W

ebsite for Students | VTU - Notes - Question Papers

ARCHITECTURE OF A TRANSITION-DIAGRAM-BASED LEXICAL ANALYZER Ways in which the code could fit into the entire lexical analyzer

Arrange the transition diagrams for each token to be tried sequentially Function fail() resets the forward pointer and starts next transition diagram Allows to use transition diagrams for the individual keywords

Run various transition diagrams “in parallel” Feed the next input character to all of them an allow each one to make the

transitions required Must be careful to resolve the case where

One diagram finds a lexeme that matches the pattern While one or more other diagrams are still able to process the input

Combine all transition diagrams into one Allow to read input until there is no possible next state Take the longest lexeme that matched any pattern