22
Lovely Professional University Term paper Topic:-STAGE COMPILER

systm software

Embed Size (px)

DESCRIPTION

Stage compiler related notes...

Citation preview

Page 1: systm software

Lovely Professional

University

Term paper

Topic:-STAGE COMPILER

Submitted to: - Submitted by:-

Neha Malhotra Satnam Singh

Page 2: systm software

RD1901A21

10902610

Acknowledgement

I am very thankful to god who gave me this position to do this type of project. I am thankful to my respected Madam Miss. Neha Malhotra who guided me to do as well as I can do in this term paper. My topic is “STAGE COMPILER” and it is very interesting topic for me. With the help of Miss. Neha Malhotra, I had gained a lot of knowledge from this topic. Beside this my friends also helped me a lot in this topic.

Page 3: systm software

CONTENTS

Introduction

Compiler versus Interpreter

Compiler Phases

1. Front End Phases

A. Line reconstruction

B. Lexical analysis

C. Preprocessing.

D. Syntax analysis

E. Semantic analysis

2. Back End Phases

o Analysis

o Optimization

o Code generation

Compilation Processes

o Lexical Analyzer Phase

Page 4: systm software

o Syntax Analyzer Phase

o Symantec Analyzer Phase

o Intermediate Code Generator Phase

o Code Optimizer Phase

o Code Generator Phase

o Error Handling methods:-

o Symbol Table & Symbol Table Manger

Compiler Design

Compiler Architectural Design

Compiler Structuring

BIBLIOGRAPHY

Introduction:-A compiler is a special type of computer program that

translates a human readable text file into a form that the computer can more easily understand. At its most basic level, a computer can only understand two things, a 1 and a 0. At this level, a human will operate very slowly and find the information contained in the long string of 1s and 0s incomprehensible. A compiler is a computer program that bridges this gap.

In the beginning, compilers were very simple programs that could only translate symbols into the bits, the 1s and 0s, the computer understood. Programs were also very simple, composed of a series of steps that were originally translated by hand into data the computer could understand. This was a very time consuming task, so portions of this task were automated or

Page 5: systm software

programmed, and the first compiler was written. This program assembled, or compiled, the steps required to execute the step by step program.

These simple compilers were used to write a more sophisticated compiler. With the newer version, more rules could be added to the compiler program to allow a more natural language structure for the human programmer to operate with. This made writing programs easier and allowed more people to begin writing programs. As more people started writing programs, more ideas about writing programs were offered and used to make more sophisticated compilers. In this way, compiler programs continue to evolve, improve and become easier to use.

Compiler programs can also be specialized. Certain language structures are better suited for a particular task than others, so specific compilers were developed for specific tasks or languages. Some compilers are multistage or multiple pass. A first pass could take a very natural language and make it closer to a computer understandable language. A second or even a third pass could take it to the final stage, the executable file.

The intermediate output in a multistage compiler is usually called pseudo code,‐ since it not usable by the computer. Pseudo code‐ is very structured, like a computer program, not free flowing and verbose like a more natural language. The final output is called the executable file, since it is what is actually executed or run by the computer. Splitting the task up like this made it easier to write more sophisticated compilers, as each sub task is different. It also made it easier for the computer to point out where it had trouble understanding what it was being asked to do.

Errors that limit the compiler in understanding a

Page 6: systm software

program are called syntax errors. Errors in the way the program functions are called logic errors. Logic errors are much harder to spot and correct. Syntax errors are like spelling mistakes, whereas logic errors are a bit more like grammatical errors.

Cross compiler programs have also been developed. A cross compiler allows a text file set of instructions that is written for one computer designed by a specific manufacturer to be compiled and run for a different computer by a different manufacturer. For example, a program that was written to run on an Intel computer can sometimes be cross compiled to run a on computer developed by Motorola. This frequently does not work very well. At the level at which computer programs operate, the computer hardware can look very different, even if they may look similar to you.

Cross compilation is different from having one computer emulate another computer. If a computer is emulating a different computer, it is pretending to be that other computer. Emulation is frequently slower than cross compilation, since two programs are running at once, the program that is pretending to be the other computer and the programthat is running. However, for cross compilation to work, you need both the original natural language text that describes the program and a computer that is sufficiently similar to the original computer that the program can function on to run on a different computer. This is not always possible, so both techniques are in use.

Compiler versus Interpreter:-An interpreter reads the source code one instruction or line

at a time, converts this line into machine code and executes it.

Page 7: systm software

The machine code is then discarded and the next line is read. The advantage of this is it's simple and you can interrupt it while it is running, change the program and either continue or start again. The disadvantage is that every line has to be translated every time it is executed, even if it is executed manytimes as the program runs. Because of this interpreters tend to be slow. Examples of interpreters are Basic on older home computers, and script interpreters such as JavaScript, and languages such as Lisp and Forth.

A compiler reads the whole source code and translates it into a complete machine code program to perform the required tasks which is output as a new file. This completely separates the source code from the executable file. The biggest advantage of this is that the translation is done once only and as a separate process. The program that is run is already translated into machine code so is much faster in execution. The disadvantage is that you cannot change the program without going back to the original source code, editing that and recompiling (though for a professional software developer this is more of an advantage because it stops source code being copied). Current examples of compilers are Visual Basic, C, C++, C#, Fortran, Cobol, Ada, Pascal and so on.

Compiler Phases :- There are two phases of compiler, they are:- 1. Front End Phases

The front end analyzes the source code to build an internal representation of the program, called the intermediate representation or IR. It also manages the symbol table, a data structure mapping each symbol in the source code to associated information such as location, type and scope. This is done over

Page 8: systm software

several phases, which includes some of the following:

1. Line reconstruction. Languages which strop their keywords or allow arbitrary spaces within identifiers require a phase before parsing, which converts the input character sequence to a canonical formReady for the parser. The top down,‐ recursive‐descent, table driven‐ parsers used in the 1960s typicallyRead the source one character at a time and did not require a separate tokenizing phase. Atlas Autocode, and Imp (and some implementations of Algol and Coral66) are examples of stropped languages whose compilers would have a Line Reconstruction phase.

2. Lexical analysis breaks the source code text into small pieces called tokens. Each token is a single atomic unit of the language, for instance a keyword, identifier or symbol name. The token syntax is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it. This phase is also called lexing or scanning, and the software doing lexical analysis is called a lexical analyzer or scanner.

3. Preprocessing. Some languages, e.g., C, require a preprocessing phase which supports macro substitution and conditional compilation. Typically the preprocessing phase occurs before syntactic or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than syntactic forms. However, some languages such as Scheme support macro substitutions based on syntactic forms.

Page 9: systm software

4. Syntax analysis involves parsing the token sequence to identify the syntactic structure of the program.This phase typically builds a parse tree, which replaces the linear sequence of tokens with a tree structure built according to the rules of a formal grammar which define the language's syntax. The parse tree is often analyzed, augmented, and transformed by later phases in the compiler.

5. Semantic analysis is the phase in which the compiler adds semantic information to the parse tree and builds the symbol table. This phase performs semantic checks such as type checking (checking for type errors), or object binding (associating variable and function references with their definitions), or definite assignment (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows the parsing phase, and logically precedes the code generation phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation.

2. Back End Phases:-

The term back end is sometimes confused with code generator because of the overlapped functionality of generating assembly code. Some literature uses middle end to distinguish the generic analysis and optimization phases in the back end from the machine dependent c‐ ode generators.

Page 10: systm software

The main phases of the back end include the following:

1. Analysis: This is the gathering of program information from the intermediate representation derived from the input. Typical analyses are data flow analysis to build use define‐ chains, dependence analysis, alias analysis, pointer analysis, escape analysis etc. Accurate analysis is the basis for any compiler optimization. The call graph and control flow graph are usually also built during the analysis phase.

2. Optimization: the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms. Popular optimizations are inline expansion, dead code elimination, constant propagation, loop transformation, register allocation or even automatic parallelization.

3. Code generation: the transformed intermediate language is translated into the output language, usually the native machine language of the system. This involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions along with their associated addressing modes.

Compilation Processes:-

Compilation refers to the compiler's process of translating a high level language‐ program into a low level language‐ Program. This process is very complex; hence, from the Logical as well as an implementation point of view, it is customary to partition the compilation process into Several phases, which are nothing more than logically

cohesive operations that input one representation of a

Page 11: systm software

source program and output another representation.

Lexical Analyzer Phase:- Lexical analyzer is the part that process where the

stream of characters making up the source program is read from left t‐ o right and grouped into‐ tokens. `

Tokens are sequences of characters with a collective meaning. There are usually only a small number of tokens for a programming language: constants (integer, double, char, string, etc.), operators (arithmetic, relational, logical), punctuation, and reserved words.

Syn t ax Analyzer Phase:- In the syntax analysis‐ phase, a compiler verifies whether or not the tokens generated by the lexical analyzer are grouped according to the syntactic rules of the language. If the tokens in a string are grouped according to the language's rules of syntax, then the string of tokens generated by the lexical analyzer is accepted as a valid construct of the language; otherwise, an error handler is called.

Symantec Analyzer Phase:- After completing the lexical and Syntax functions without

errors, Now we’ll move forward to semantic analyzer, where we delve even deeper to check whether they form a sensible set of instructions in the programming language. Whereas any old noun phrase followed by some verb phrase makes a syntactically correct English sentence, a semantically correct one has subject verb ag‐ reement, proper use of gender, and the components go together to express an idea that makes sense. For a program to be semantically valid, all variables, functions, classes, etc. must be properly defined, expressions

Page 12: systm software

and variables must be used in ways that respect the type system, access control must be respected, and so forth. Semantic analyzer is the front end’s penultimate phase and the compiler’s last chance to weed out incorrect programs. We need to ensure the program is sound enough to carry on to code generation.

Intermediate Code Generator Phase:- Intermediate code generator is the part of the compilation phases that by which a compiler's code generator converts some internal representation of source code into a form (e.g., machine code) that can be readily executed by a machine (often a computer)

The input to the Intermediate code generator typically consists of a parse tree or an abstract syntax tree. The tree is converted into a linear sequence of instructions, usually in an intermediate language such as three address code. Further stages of compilation may or may not be referred to as "code generation", depending on whether they involve a significant change in the representation of the program. (For example, a peephole optimization pass would not likely be called "code generation", although a code generator might incorporate a peephole optimization pass.

Code Optimizer Phase:-

The code optimizer is take the Intermediate Representation Code that would be generated by the Intermediate Code Generator as it’s input and doing the Compilation Optimization Process that finally produce the Optimized Code to be the input to the code generator. And here we can define the compiler optimization process that the process of tuning the output of the intermediate

Page 13: systm software

representation code to minimize or maximize some attribute of an executable computer program. The most common requirement is to minimize the time taken to execute a program. a less common one is to minimize the amount of memory occupied. The growth of portable computers has created a market for minimizing the power consumed by a program.

Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code. It involves a complex analysis of the intermediate code and the performance of various transformations; but every optimizing transformation must also preserve the semantics of the program. That is, a compiler should not attempt any optimization that would lead to a change in the program's semantics.

Code Generator Phase:-

It’s the last phase in the compiler operations, and this phase is being a machine dependent phase, it is not possible to‐ generate good code without considering the details of the particular machine for which the compiler is expected to generate code. Even so, a carefully selected code generation‐ algorithm can produce code that is twice as fast as code generated by an ill considered‐ code generation algorithm.‐

And the clearly operations of this phase is the representation of the logical addresses of our compiled program in the physical addresses, allocating the registers the target machine, also linking the contents of the symbol table and the optimized code that generated by the code optimizer to produce finally the target file or the object file.

Page 14: systm software

Error Handling methods:- One of the important tasks that a compiler must perform is

the detection of and recovery from errors. Recovery from errors is important, because the compiler will be scanning and compiling the entire program, perhaps in the presence of errors; so as many errors as possible need to be detected.Every phase of a compilation expects the input to be in a

particular format, and whenever that input is not in the required format, an error is returned. When detecting an error, a compiler scans some of the tokens that are ahead of the error's point of occurrence. The fewer the number of tokens that must be scanned ahead of the point of error occurrence, the better the compiler's error detection‐ capability.

Symbol Table & Symbol Table Manger:-

A symbol table is a data structure used by a compiler to keep track of scope/binding information about names. This information is used in the source program to identify the various program elements, like variables, constants, procedures, and the labels of statements. The symbol table is searched every time a name is encountered in the source text. When a new name or new information about an existing name is discovered, the content of the symbol table changes. Therefore, a symbol table must have an efficient mechanism for accessing the information held in the table as well as for adding new entries to the symbol table.

Bibliography:-[1] O.G. Kakde , “ Algorithms for Compiler Design ” , Charles River Media 2002.

Page 15: systm software

[2] Alfred v.Aho ,Ravi Sethi and Jeffrey D.Ullman.“ Compilers Principles ,Techniques and Tools “.

[3] Ian Sommerville , “Software Engineering “.

[4] Y.N. Srikant, Priti Shankar, “The Compiler Design Handbook 2nd Edition “Dec.2007

Website:-www.wikipedia.com