Q1. The lexical analyzer scans the input from left to

Q1. The lexical analyzer scans the input from left to right one character at a time. It uses two

pointers begin ptr(bp) and forward to keep track of the pointer of the input scanned.

Initially both the pointers point to the first character

The forward ptr moves ahead to search for end of lexeme. As soon as the blank space is

encountered, it indicates end of lexeme. In above example as soon as ptr (fp) encounters a blank

space the lexeme “int” is identified.The fp will be moved ahead at white space, when fp

encounters white space, it ignores and moves ahead. then both the begin ptr(bp) and forward

ptr(fp) are set at next token.The input character is thus read from secondary storage, but reading

in this way from secondary storage is costly. hence buffering technique is used.A block of data is

first read into a buffer, and then second by lexical analyzer.

Q2. TheLAisthefirstphaseofacompiler.Itmaintaskistoreadtheinputcharacterand produceas output a

sequenceof tokens that theparser uses for syntaxanalysis.

Uponreceivinga‘getnexttoken’commandformtheparser, thelexicalanalyzerreads the input

character until it can identify the next token. The LA returnto the parser

representationforthetokenithasfound.Therepresentationwillbeanintegercode,ifthe token is asimple

construct such as parenthesis,commaor

colon.LAmayalsoperformcertainsecondarytasksastheuserinterface.Onesuchtaskis

stripingoutfromthesourceprogramthecommandsandwhitespacesintheformofblank,

tabandnewlinecharacters.Anotheriscorrelatingerrormessagefromthecompilerwiththe

sourceprogram.

Q3. FINITE AUTOMATION-

inite Automata(FA) is the simplest machine to recognize patterns.It takes the string of symbol as

input and changes its state accordingly. When the desired symbol is found, then the transition

occurs.At the time of transition, the automata can either move to the next state or stay in the

same state. Finite automata have two states, Accept state or Reject state. When the input string

is processed successfully, and the automata reached its final state, then it will accept.

A Finite Automata consists of the following :

Q : Finite set of states.

∑ : set of Input Symbols.

q : Initial state.

F : set of Final States.

δ : Transition Function.

Formal specification of machine is {Q, ∑, q, F, δ }

RAGULAR EXPRESSION-

Regular Expressions are used to denote regular languages. An expression is regular if:

• ɸ is a regular expression for regular language ɸ.

• ɛ is a regular expression for regular language {ɛ}.

• If a ∈ Σ (Σ represents the input alphabet), a is regular expression with language {a}.

• If a and b are regular expression, a + b is also a regular expression with language {a,b}.

• If a and b are regular expression, ab (concatenation of a and b) is also regular.

• If a is regular expression, a* (0 or more times a) is also regular.

Q4. ACTIVATION RECORD

o Control stack is a run time stack which is used to keep track of the live procedure

activations i.e. it is used to find out the procedures whose execution have not been

completed.

o When it is called (activation begins) then the procedure name will push on to the stack

and when it returns (activation ends) then it will popped.

o Activation record is used to manage the information needed by a single execution of a

procedure.

http://quiz.geeksforgeeks.org/toc-finite-automata-introduction/

o An activation record is pushed into the stack when a procedure is called and it is popped

when the control returns to the caller function.

Q5. DAG (Directed Acyclic Graph)

Directed Acyclic Graph (DAG) is a tool that depicts the structure of basic blocks, helps to see the

flow of values flowing among the basic blocks, and offers optimization too. DAG provides easy

transformation on basic blocks. DAG can be understood here:

• Leaf nodes represent identifiers, names or constants.

• Interior nodes represent operators.

• Interior nodes also represent the results of expressions or the identifiers/name where the

values are to be stored or assigned.

Q6. Different types of error in compiler-

The program errors are detected and reported by parser. The parser handles the errors

encountered and the rest of the input is parsed. The errors may be encountered at various stages

of the compilation process. At various stages, the following kinds of errors occur:

• Lexical : name of some identifier typed incorrectly

• Syntactical : missing semicolon or unbalanced parenthesis

• Semantical : incompatible value assignment

• Logical : code not reachable, infinite loop

Q7. Parameter passing is the communication medium among the procedures. By using some

mechanism, the variable values from the calling procedure are transferred to the called

procedure. Some of the terms related to value are:

r-value

r-value is the value of an expression. If the value of the single variable appears on the right-hand

side of the assignment operator, the variable becomes an r-value. R-values ara always assigned to

some other variables.

l-value

The memory location where the expression is stored in known as I-Value of the expression. It

always appears at the left hand side of an assignment operator.

For instance:

Day=1;

Week=day*7;

Month=1;

Year=month*12;

It is understood that the constant values like 1, 7, 12, and variables like day, week, month and

year, all have r-values. The variables have the i-values as they represent the memory location

assigned to them.

For instance:

7=x+y;

is an l-value error, as the constant 7 does not represent any memory location.

Q8: PEEPHOLE OPTIMIZATION:

A statement-by-statement code-generations strategy often produces target code that contains

redundant instructions and suboptimal constructs. The quality of such target code can be

improved by applying “optimizing” transformations to the target program.

A simple but effective technique for improving the target code is peephole optimization, a

method for trying to improving the performance of the target program by examining a short

sequence of target instructions (called the peephole) and replacing these instructions by a shorter

or faster sequence, whenever possible.

The peephole is a small, moving window on the target program. The code in the peephole need

not be contiguous, although some implementations do require this. It is characteristic of peephole

optimization that each improvement may spawn opportunities for additional improvements.

Characteristics of peephole optimizations:

Redundant-instructions elimination

Flow-of-control optimizations

Algebraic simplifications

Use of machine idioms

Unreachable

Q9. The grammar after eliminating left recursion is-

S → (L) / a

L → SL’

L’ → ,SL’ / ∈

Q10. Step-01:

S → bSS’ / a

S’ → SaaS / SaSb / b

Again, this is a grammar with common prefixes.

Step-02:

S → bSS’ / a

S’ → SaA / b

A → aS / Sb

This is a left factored grammar.

PART-B

Q1. ABSTRACT SYNTAX TREE-

Syntax trees are abstract or compact representation of parse trees.They are also called as Abstract

Syntax Trees.

Syntax trees are called as Abstract Syntax Trees because-

• They are abstract representation of the parse trees.

• They do not provide every characteristic information from the real syntax.

• For example- no rule nodes, no parenthesis etc.

i. ( a + b ) * ( c – d ) + ( ( e / f ) * ( a + b ))

Step-01:

We convert the given arithmetic expression into a postfix expression as-

( a + b ) * ( c – d ) + ( ( e / f ) * ( a + b ) )

ab+ * ( c – d ) + ( ( e / f ) * ( a + b ) )

ab+ * cd- + ( ( e / f ) * ( a + b ) )

ab+ * cd- + ( ef/ * ( a + b ) )

ab+ * cd- + ( ef/ * ab+ )

ab+ * cd- + ef/ab+*

ab+cd-* + ef/ab+*

ab+cd-*ef/ab+*+

Step-02:

We draw a syntax tree for the above postfix expression.

ii. A+4-b+3

Step-01:

A+4-b+3

A4+-b+3

A4+-b3+

A4+b3+-

Step-02:

We draw a syntax tree for the above postfix expression.

Q2. The design of compiler can be decomposed into several phases, each of which converts one form of

source program into another.

The different phases of compiler are as follows:

1. Lexical analysis

2. Syntax analysis

3. Semantic analysis

4. Intermediate code generation

5. Code optimization

6. Code generation

All of the aforementioned phases involve the following tasks:

• Symbol table management.

• Error handling.

Lexical Analysis

• Lexical analysis is the first phase of compiler which is also termed as scanning.

• Source program is scanned to read the stream of characters and those characters are grouped to

form a sequence called lexemes which produces token as output.

• Token: Token is a sequence of characters that represent lexical unit, which matches with the

pattern, such as keywords, operators, identifiers etc.

• Lexeme: Lexeme is instance of a token i.e., group of characters forming a token. ,

• Pattern: Pattern describes the rule that the lexemes of a token takes. It is the structure that must

be matched by strings.

• Once a token is generated the corresponding entry is made in the symbol table.

Input: stream of characters

Output: Token

Token Template: <token-name, attribute-value>

(eg.) c=a+b*5;

Lexemes and tokens

Lexemes Tokens

c identifier

= assignment symbol

a identifier

+ + (addition symbol)

b identifier

* * (multiplication symbol)

5 5 (number)

Hence, <id, 1><=>< id, 2>< +><id, 3 >< * >< 5>

Syntax Analysis

• Syntax analysis is the second phase of compiler which is also called as parsing.

• Parser converts the tokens produced by lexical analyzer into a tree like representation called

parse tree.

• A parse tree describes the syntactic structure of the input.

• Syntax tree is a compressed representation of the parse tree in which the operators appear as

interior nodes and the operands of the operator are the children of the node for that operator.

Input: Tokens

Output: Syntax tree

Semantic Analysis

• Semantic analysis is the third phase of compiler.

• It checks for the semantic consistency.

• Type information gathered and stored in symbol table or in syntax tree.

• Performs type checking.

Intermediate Code Generation

• Intermediate code generation produces intermediate representations for the source program

which are of the following forms:

o Postfix notation

o Three address code

o Syntax tree

Most commonly used form is the three address code.

t1 = inttofloat (5)

t2 = id3* tl

t3 = id2 + t2

id1 = t3

Properties of intermediate code-

• It should be easy to produce.

• It should be easy to translate into target program.

Code Optimization

• Code optimization phase gets the intermediate code as input and produces optimized

intermediate code as output.

• It results in faster running machine code.

• It can be done by reducing the number of lines of code for a program.

• This phase reduces the redundant code and attempts to improve the intermediate code so that

faster-running machine code will result.

• During the code optimization, the result of the program is not affected.

• To improve the code generation, the optimization involves

o Deduction and removal of dead code (unreachable code).

o Calculation of constants in expressions and terms.

o Collapsing of repeated expression into temporary string.

o Loop unrolling.

o Moving code outside the loop.

o Removal of unwanted temporary variables.

t1 = id3* 5.0

id1 = id2 + t1

Code Generation

• Code generation is the final phase of a compiler.

• It gets input from code optimization phase and produces the target code or object code as result.

• Intermediate instructions are translated into a sequence of machine instructions that perform the

same task.

• The code generation involves

o Allocation of register and memory.

o Generation of correct references.

o Generation of correct data types.

o Generation of missing code.

LDF R2, id3

MULF R2, # 5.0

LDF R1, id2

ADDF R1, R2

STF id1, R1

Q3. Part-01:

Three address code for the given code is-

prod = 0

i = 1

T1 = 4 x i

T2 = a[T1]

T3 = 4 x i

T4 = b[T3]

T5 = T2 x T4

T6 = T5 + prod

prod = T6

T7 = i + 1

i = T7

if (i <= 10) goto (3)

Part-02:

Step-01:

We identify the leader statements as-

• prod = 0 is a leader because first statement is a leader.

• T1 = 4 x i is a leader because target of conditional or unconditional goto is a leader.

Step-02:

The above generated three address code can be partitioned into 2 basic blocks as-

Step-03:

The flow graph is:

Q4. TYPE CHECKING

A compiler must check that the source program follows both syntactic and semantic conventions

of the source language. This checking, called static checking, detects and reports programming

errors.

Some examples of static checks:

1. Type checks - A compiler should report an error if an operator is applied to an incompatible

operand. Example: If an array variable and function variable are added together.

2. Flow-of-control checks - Statements that cause flow of control to leave a construct must have

some place to which to transfer the flow of control. Example: An enclosing statement, such as

break, does not exist in switch statement.

A type checker verifies that the type of a construct matches that expected by its context. For

example: arithmetic operator mod in Pascal requires integer operands, so a type checker verifies

that the operands of mod have type integer. Type information gathered by a type checker may be

needed when code is generated.

TYPE SYSTEMS

A type system is a collection of rules for assigning type expressions to the various parts of a

program. A type checker implements a type system. It is specified in a syntax-directed manner.

Different type systems may be used by different compilers or processors of the same language.

Static and Dynamic Checking of Types

Checking done by a compiler is said to be static, while checking done when the target program

runs is termed dynamic. Any check can be done dynamically, if the target code carries the type

of an element along with the value of that element.

Sound type system

A sound type system eliminates the need for dynamic checking fo allows us to determine

statically that these errors cannot occur when the target program runs. That is, if a sound type

system assigns a type other than type_error to a program part, then type errors cannot occur

when the target code for the program part is run.

Strongly typed language

A language is strongly typed if its compiler can guarantee that the programs it accepts will

execute without type errors.

Error Recovery

Since type checking has the potential for catching errors in program, it is desirable for type

checker to recover from errors, so it can check the rest of the input. Error handling has to be

designed into the type system right from the start; the type checking rules must be prepared to

cope with errors.

TYPE EXPRESSIONS

The type of a language construct will be denoted by a “type expression.” A type expression is

either a basic type or is formed by applying an operator called a type constructor to other type

expressions. The sets of basic types and constructors depend on the language to be checked. The

following are the definitions of type expressions:

1. Basic types such as boolean, char, integer, real are type expressions.

A special basic type, type_error , will signal an error during type checking; void denoting “the

absence of a value” allows statements to be checked.

2. Since type expressions may be named, a type name is a type expression.

3. A type constructor applied to type expressions is a type expression.

Constructors include:

Arrays : If T is a type expression then array (I,T) is a type expression denoting the type of an

array with elements of type T and index set I.

Products : If T1 and T2 are type expressions, then their Cartesian product T1 X T2 is a type

expression.

Records : The difference between a record and a product is that the names. The record type

constructor will be applied to a tuple formed from field names and field types.

For example:

type row = record

address: integer;

lexeme: array[1..15] of char

end;

var table: array[1...101] of row;

declares the type name row representing the type expression record((address X integer) X

(lexeme X array(1..15,char))) and the variable table to be an array of records of this type.

Pointers : If T is a type expression, then pointer(T) is a type expression denoting the type

“pointer to an object of type T”.

For example, var p: ↑ row declares variable p to have type pointer(row).

Functions : A function in programming languages maps a domain type D to a range type R. The

type of such function is denoted by the type expression D → R

4. Type expressions may contain variables whose values are type expressions.

TYPE CONVERSION:

Type conversion generally means a process of converting one type to another type when two

different types are assigned. Or we can say that when two variables of different datatypes are

used together or are assigned to each other we need to convert type of one variable to match with

the type of another variable. In programming language it generally means basic datatypes such

as:

Int

char

float

longint

double and so on.

Type conversion: The process of converting one pre defined type to another type is called type

conversion.Types conversion are of two types:

1.IMPLICIT.- Implicit conversion is when compiler automatically converts one predefined data

type to another data type.

2.EXPLICIT- Explicit type conversion is a type conversion which is done explicitly(i.e) by the

user instead of leaving it up to compiler to perform automatically.

Q5. ACTIVATION RECORD

o Control stack is a run time stack which is used to keep track of the live procedure

activations i.e. it is used to find out the procedures whose execution have not been

completed.

o When it is called (activation begins) then the procedure name will push on to the stack

and when it returns (activation ends) then it will popped.

o Activation record is used to manage the information needed by a single execution of a

procedure.

o An activation record is pushed into the stack when a procedure is called and it is popped

when the control returns to the caller function.

ACTIVATION TREE

The sequence of instructions when combined into number of procedures is known as a program

and the procedure instructions are executed sequentially. There is a start and end delimiter and

the rest is the body of the procedure. The body of the procedure comprises of the procedure

identifier and the sequence of finite instructions.

The process of executing a procedure is called activation. The necessary information for calling a

procedure is available in an activation record. Depending upon the source language used, the

activation record contains some of the units such as:

Temporaries Stores temporary and intermediate values of an expression.

Local Data Stores local data of the called procedure.

Machine Status Stores machine status such as Registers, Program Counter etc.,

before the procedure is called.

Control Link Stores the address of activation record of the caller procedure.

Access Link Stores the information of data which is outside the local scope.

Actual Parameters Stores actual parameters, i.e., parameters which are used to

send input to the called procedure.

Return Value Stores return values.

The control stack stores the activation record, when the procedure is executed. Until execution is

finished by the called procedure, the execution of the caller is suspended, when the procedure

calls another procedure. The stack stores the activation record of the called procedure.

It is assumed that the program control flows in a sequential manner and the control of the called

procedure is transferred to the called procedure. The control is returned back to the caller on

execution of the called procedure. This type of control flows represents the activation tree.

Q6. BASIC BLOCK

A basic block is a sequence of consecutive statements in which flow of control enters at the

beginning and leaves at the end without halt or possibility of branching except at the end.

The following sequence of three-address statements forms a basic block:

t1 := a*a

t2 := a*b

t3 := 2*t2

t4 := t1+t3

t5 := b*b

t6 := t4+t5

Some terminology used in basic blocks are given below:

• A three-address statement x :=y+z is said to define x and to use y or z. A name in a basic

block is said to live at a given point if its value is used after that point in the program,

perhaps in another basic block.

• The following algorithm can be used to partition a sequence of three-address statements

into basic blocks.

Algorithm: Partition into basic blocks.

Input: A sequence of three-address statements.

Output: A list of basic blocks with each three-address statement in exactly one block.

Method:

1. We first determine the set of leaders, for that we use the following rules:

I) The first statement is a leader.

II) Any statement that is the target of a conditional or unconditional goto is a leader.

III) Any statement that immediately follows a goto or conditional goto statement is leader.

2. For each leader, its basic block consists of the leader and all statements up to but not

including the next leader or the end of the program.

TRANSFORMATION OF BASIC BLOCK

1) STRUCTURE-PRESERVING TRANSFORMATIONS

The primary structure-preserving transformations on basic blocks are:

1. common sub-expression elimination

2. dead-code elimination

3. renaming of temporary variables

4. interchange of two independent adjacent statements

• Common sub-expression elimination

✓ Consider the basic block

a:= b+c

b:= a-d

c:= b+c

d:= a-d

✓ The second and fourth statements compute the same expression, hence this basic block may

be transformed into the equivalent block

a:= b+c

b:= a-d

c:= b+c

d:= b

✓ Although the 1st and 3rd statements in both cases appear to have the same expression on the

right, the second statement redefines b. Therefore, the value of b in the 3rd statement is

different from the value of b in the 1st, and the 1st and 3rd statements do not compute the same

expression.

• Dead-code elimination

✓ Suppose x is dead, that is, never subsequently used, at the point where the statement x: = y+z

appears in a basic block. Then this statement may be safely removed without changing the

value of the basic block.

• Renaming temporary variables

✓ Suppose we have a statement t: = b+c, where t is a temporary. If we change this statement to

u:= b+c, where u is a new temporary variable, and change all uses of this instance of t to u,

then the value of the basic block is not changed.

✓ In fact, we can always transform a basic block into an equivalent block in which each

statement that defines a temporary defines a new temporary. We call such a basic block a

normal-form block.

• Interchange of statements

✓ Suppose we have a block with the two adjacent statements

t1:= b+c

t2:= x+y

✓ Then we can interchange the two statements without affecting the value of the block if and

only if neither x nor y is t1 and neither b nor c is t2. A normal-form basic block permits all

statement interchanges that are possible.

2) ALGEBRIC TRANSFORMATIONS

✓ Countless algebraic transformation can be used to change the set of expressions computed by

the basic block into an algebraically equivalent set. The useful one are those that simplify

expressions or replace expensive operation by cheaper one.

✓ Example x=x+0 or x=x+1 can be eliminated

3) FLOW GRAPH

✓ A graph representation of three-address statements, called a flow graph, is useful for

understanding code-generation algorithms

✓ Nodes in the flow graph represent computations, and the edges represent the flow of control.

✓ Example of flow graph for following three address code

(1) prod=0

(2) i=1

(3) t1 := 4*i

(4) t2 := a [ t1 ] (5) t3 := 4*i

(6) t4 :=b [ t3 ]

(7) t5 := t2*t4

(8) t6 := prod +t5

(9) prod := t6

(10) t7 := i+1 B2

(11) i := t7

(12) if i<=20 goto (3)

Q7. Step-01:

We convert the given grammar into operator precedence grammar.

The equivalent operator precedence grammar is-

E → E + E | E x E | id

Step-02:

The terminal symbols in the grammar are { id, + , x , $ }

We construct the operator precedence table as-

id + x $

id > > >

+ < > < >

x < > > >

$ < < <

Operator Precedence Table

Parsing Given String-

Given string to be parsed is id + id x id.

We follow the following steps to parse the given string-

Step-01:

We insert $ symbol at both ends of the string as-

$ id + id x id $

We insert precedence operators between the string symbols as-

$ < id > + < id > x < id > $

Step-02:

We scan and parse the string as-

$ < id > + < id > x < id > $

$ E + < id > x < id > $

$ E + E x < id > $

$ E + E x E $

$ + x $

$ < + < x > $

$ < + > $

$ $

PART-C

Q1. Augmented Grammar:

S’→S

CLR Parsing Table:

Q2.

.

.Q3. POSTFIX

a+a*(b-c)+(b-c)*d

a+a*bc-+bc-*d

a+bc-*+bc-d*

abc-*++bc-d*

abc-*+bc-d*+

SYNTAX TREE

TREE ADDRESS CODE

DAG

QUADRUPLE

TRIPLE

INDIRECT TRIPLE

Q4. (a)

(b) Parsing input string “id*id+id”

STACK INPUT ACTION

1 0 Id*id+id$ Shift

2 0 id 5 *id+id$ reduce F→id

3 0 F 3 *id+id$ reduce T→F

4 0 T 2 *id+id$ Shift

5 0 T 2*7 id+id$ Shift

6 0 T 2*7 id 5 +id $ reduce F→id

7 0 T 2*7 F 10 +id $ reduce T→T*F

8 0 T 2 +id$ reduce E→T

9 0 E 1 +id $ Shift

10 0 E 1+6 id $ Shift

11 0 E 1+6 id 5 $ reduce F→id

12 0 E 1+6 F 3 $ reduce T→F

13 0 E 1+6 T 9 $ Reduce E→E+T

14 0 E $ Accept

Parse tree-

Q5. (a)

(b).

Documents

Q1. The lexical analyzer scans the input from left to