Download pdf - B.tech CS S8 Principles of Programming Languages Notes Module 1

1

MODULE I

Introduction – Role of programming languages - Effects of Environments on languages - Language Design issues - Virtual computers and binding times, Language Paradigms.

LANGUAGE DESIGN ISSUES

A programming language is a notation for the description of algorithms and data structures.

WHY STUDY PROGRAMMING LANGUAGES?

There are six primary reasons:

1. To improve your ability to develop effective algorithms

Many languages provide features, that when used properly, are of benefit to the programmer but, when used improperly, may waste large amounts of computer time or lead the programmer into time-consuming logical errors. Even a programmer who has used a language for years may not understand all of its features. A typical example is recursion – a handy programming feature that, when properly used, allows the direct implementation of elegant and efficient algorithms. When used improperly, it may cause an excessive increase in execution time.

2. To improve your use of your existing programming language

By understanding how features in your language are implemented, you greatly increase your ability to write efficient programs. For example, understanding how data such as arrays, strings, lists, or records are created and manipulated by your language, knowing the implementation details of recursion, or understanding how object classes are built allows you to build more efficient programs consisting of such components.

3. To increase your vocabulary of useful programming constructs

By studying the constructs provided by a wide range of languages, a programmer increases his programming vocabulary. The understanding of implementation techniques is particularly important because, to use a construct while programming in a language that does not provide it directly, the programmer must provide an implementation of the construct in terms of the primitive elements actually provided by the language. For example, the subprogram control structure known as a coroutine is useful in many programs, but few languages provide a coroutine feature directly.

2

4. To allow a better choice of programming language

Knowledge of a variety of languages may allow the choice of just the right language for a particular project, thereby reducing the required coding effort. Knowledge of the basic features of each language’s strengths and weaknesses gives the programmer a broader choice of alternatives. Applications requiring numerical calculations can be easily designed in languages like C, FORTRAN or Ada. Developing applications useful in decision making, such as in artificial intelligence applications, would be more easily written in LISP, ML, or Prolog. Internet applications are more readily designed using Perl and Java.

5. To make it easier to learn a new language

A linguist, through a deep understanding of the underlying structure of natural languages, often can learn a new foreign language more quickly and easily than struggling novices who understand little of the structure even of their native tongue. Similarly, a thorough knowledge of a variety of programming language constructs and implementation techniques allows the programmer to learn a new programming language more easily when the need arises.

6. To make it easier to design a new language

Many new languages are based on C or Pascal as implementation models. This aspect of program design is often simplified if the programmer is familiar with a variety of constructs and implementation methods from ordinary programming languages.

ROLE OF PROGRAMMING LANGUAGES

Factors influencing the revision of language:

1. Computer capabilities

Computers have evolved from the small, slow, and costly vacuum-tube machines of the 1950s to the supercomputers and microcomputers of today. At the same time, layers of operating system software have been inserted between the programming language and the underlying computer hardware. These factors have influenced both the structure and cost of using the features of high-level languages.

2. Applications

Computer use has spread rapidly from the original concentration on military, scientific, business, and industrial applications in the 1950s, where the cost could be justified, to the computer games, PCs, Internet, and applications in every area of human activity seen

3

today. The requirements of these new application areas affect the designs of new languages and the revisions and extensions of older ones.

3. Programming methods

Language designs have evolved to reflect our changing understanding of good methods for writing large and complex programs and to reflect the changing environment in which programming is done.

4. Implementation methods

The development of better implementation methods has affected the choice of features to include in new language designs.

5. Theoretical studies

Research into the conceptual foundation for language design and implementation, using formal mathematical methods, has deepened our understanding of the strengths and weaknesses of language features, which has influenced the inclusion of these features in new language designs.

6. Standardization

The need for standard languages that can be implemented easily on a variety of computer systems, which allow programs to be transported from one computer to another, has provided a strong conservative influence on the evolution of language designs.

1. WHAT MAKES A GOOD LANGUAGE

Attributes of a Good Language

1. Clarity, simplicity, and unity

A programming language provides both a conceptual framework for thinking about algorithms and a means of expressing those algorithms. The language should be an aid to the programmer long before the actual coding stage. It should provide a clear, simple, and unified set of concepts that can be used as primitives in developing algorithms. To this end, it is desirable to have a minimum number of different concepts, with the rules for their combination being as simple and regular as possible. We call this attribute conceptual integrity.

The syntax of a language affects the ease with which a program may be written, tested, and later understood and modified. The readability of programs in a language is a

4

central issue here. A syntax that is particularly terse of cryptic often makes a program easy to write but difficult to read when the program must be modified later. Many languages contain syntactic constructs that encourage misreading by making two almost identical statements actually mean radically different things. For example, the presence of a blank character, which is an operator, in a SNOBOL4 statement may entirely alter its meaning. A language should have the property in which constructs that mean different things look different; that is, semantic differences should be mirrored in the language syntax.

2. Orthogonality

The term orthogonality refers to the attribute of being able to combine various features of a language in all possible combinations, with every combination being meaningful. For example, suppose a language provides for an expression that can produce a value, and it also provides for a conditional statement that evaluates an expression to get a true or false value. These two features of the language, expression and conditional statement, are orthogonal if any expression can be used (and evaluated) within the conditional statement.

When the features of a language are orthogonal, the language is easier to learn and programs are easier to write because there are fewer exceptions and special cases to remember. The negative aspect of orthogonality is that a program will often compile without errors even though it contains a combination of features that are logically incoherent or extremely inefficient to execute.

3. Naturalness for the application

A language needs a syntax that, when properly used, allows the program structure to reflect the underlying logical structure of the algorithm. Ideally, it should be possible to translate such a program design directly into appropriate program statements that reflect the structure of the algorithm. Sequential algorithms, concurrent algorithms, logic algorithms, and others all have differing natural structures that are represented by programs in those languages.

The language should provide appropriate data structures, operations, control structures, and a natural syntax for the problem to be solved.

4. Support for abstraction

Many times languages fail to implement many real life problems into programs. There is always a gap between abstract data structures and operations. Even most natural programming language fails to bridge the gap. For example: Consider a situation where a

5

scheduling is to be done for college student for attending a lecture in a class section,

teacher. Suppose the requirement is to assign a student a section lecture and teacher to attend, which are common task for natural application, but are not provided by C.

The need of point is to design an appropriate abstraction for the problems solution and then implementing these abstraction using most primitive features of a language. Ideally, the language should provide the data structures, data types and operations to maintain such abstractions. C++ is one of the most used language, that provide such facilities.

5. Ease of program verification

The reliability of programs written in a language is always a central problem. There are many techniques for verifying that a program correctly performs its required function. A program may be proved correct by a formal verification method, it may be informally proved correct by desk checking (reading and visually checking the program text), it may be tested by executing it with test input data and checking the output results against the specifications, and so on. For large programs, some combination of all these methods is often used. Simplicity of semantic structure is a primary aspect that tends to simplify program verification.

6. Programming environment

The technical structure of a programming language is only one aspect affecting its utility. The presence of an appropriate programming environment may make a technically weak language easier to work with than a stronger language that has little external support. A long list of factors might be included as part of the programming environment. The availability of a reliable, efficient, and well-documented implementation of the language must head the list. Special editors and testing packages tailored to the language may greatly speed the creation and testing of programs. Facilities for maintaining and modifying multiple versions of a program may make working with large programs much simpler. Smalltalk was the only one language specifically designed around a programming environment consisting of windows, menus, mouse input, and a set of tools to operate on programs written in Smalltalk.

7. Portability of programs

A language that is widely available and whose definition is independent of the features of a particular machine forms a useful base for the production of transportable programs. Ada, FORTRAN, C, and Pascal all have standardized definitions allowing for portable applications to be implemented. Others, like ML, come from a single-source

6

implementation allowing the language designer some control over portable features of the language.

8. Cost of use

Cost is certainly a major element in the evaluation of any programming language, but cost means many different things:

a. Cost of program execution

In the early years of computing, questions of cost were concerned almost exclusively with program execution. Research on the design of optimizing compilers, efficient register allocation, and the design of efficient run-time support mechanisms was important. Cost of program execution, although always of some importance in language design, is of primary importance for large production programs that will be executed repeatedly.

b. Cost of program translation

When a language like FORTRAN or C is used in teaching, the question of efficient translation (compilation) rather than efficient execution may be paramount. Typically, student programs are compiled many times while being debugged but are executed only a few times. In such a case, it is important to have a fast and efficient compiler rather than a compiler that produces optimized executable code.

c. Cost of program creation, testing, and use

For a certain class of problems, a solution may be designed, coded, tested, modified, and used with a minimum investment of programmer time and energy. Smalltalk and Perl are cost-effective in that the overall time and effort expended in solving a problem on the computer is minimized even though execution time of the program may be higher than with other languages.

d. Cost of program maintenance

Many studies have shown that the largest cost involved in any program that is used over a period of years is not the cost of initial design, coding, and testing of the program, but total life cycle costs including development costs and the cost of maintenance of the program while it is in production use. Maintenance includes the repair of errors discovered after the program is put into use, changes in the program required as the underlying hardware or operating system is updated, and extensions and enhancements to the program that are needed to meet new needs. A language that makes it easy for a program to be repeatedly modified, repaired, and extended by

7

different programmers over a period of many years may be, in the long run, much less expensive to use than any other.

Syntax and Semantics

The syntax of a programming language is what the program looks like. To give the rules of syntax for a programming language means to tell how statements, declarations, and other language constructs are written. The semantics of a programming language is the meaning given to the various syntactic constructs. For example, in C, to declare a 10-element vector, V, of integers, you would give a declaration, such as,

int V[10];

In contrast, in Pascal, it would be specified as

V:array[0..9] of integer;

Although both create similar data objects at run time, their syntax is very different. To understand the meaning of the declaration, you need to know the semantics of both Pascal and C for such array declarations.

2. LANGUAGE PARADIGMS

There are four basic computational models that describe most programming today: imperative, applicative, rule based, and object oriented.

Imperative languages

Imperative or procedural languages are command-driven or statement-oriented languages. The basic concept is the machine state, the set of all values for all memory locations in the computer. A program consists of a sequence of statements, and the execution of each statement causes the computer to change the value of one or more locations in its memory, that is, to enter a new state. The syntax of such languages generally has the form

statement1;

statement2;

…

Program development consists of building the successive machine states needed to arrive at the solution. This is often the first view one has of programming and many widely used languages (e.g., C, C++, FORTRAN, ALGOL, PL/I, Pascal, Ada, Smalltalk, and COBOL) support this model.

8

Applicative languages

An alternative view of the computation performed by a programming language is to look at the function that the program represents rather than just the state changes as the program executes, statement by statement. We can achieve this by looking at the desired result rather than at the available data. In other words, rather than looking at the sequence of states that the machine must pass through in achieving an answer, the question to be asked is: What is the function that must be applied to the initial machine state by accessing the initial set of variables and combining them in specific ways to get an answer? Languages that emphasize this view are called applicative or functional languages.

Program development proceeds by developing functions from previously developed functions to build more complex functions that manipulate the initial set of data until the final function can be used to compute an answer from the initial data. Rather than looking at the successive machine states of a computation, we consider the successive functional transformations that we must make on data to arrive at our answer. The syntax of such languages generally is similar to

functionn(…function2(function1(data))…)

LISP and ML are two functional languages that support this model.

Rule-based languages

Rule-based languages execute by checking for the presence of a certain enabling condition and, when present, executing an appropriate action. The most common rule-based language is Prolog, also called a logic programming language, because the basic enabling conditions are certain classes of predicate logic expressions. Execution of a rule-based language is similar to an imperative language except that statements are not sequential. Enabling conditions determine the order of execution. The syntax of such languages generally is similar to the following:

enabling condition1 → action1

enabling condition2 → action2

…

enabling conditionn → actionn

The common business application of decision tables is a form of rule-based programming.

9

Object-oriented programming

In this case, complex data objects are built, then a limited set of functions are designed to operate on those data. Complex objects are designed as extensions of simpler objects, inheriting properties of the simpler object. By building concrete data objects, an object-oriented program gains the efficiency of imperative languages. By building classes of functions that use a restricted set of data objects, we build the flexibility and reliability of the applicative model.

3. LANGUAGE STANDARDIZATION

Standards are of two types:

a. Proprietary standards: These are definitions by the company that developed and owns the language. For the most part, proprietary standards do not work for languages that have become popular and widely used. Variations in implementations soon appear with many enhancements and incompatibilities.

b. Consensus standards: These are documents produced by organizations based on an agreement by the relevant participants. Consensus standards, or simply standards, are the major method to ensure uniformity among several implementations of a language.

To use standards effectively, we need to address three issues:

1. Timeliness: When do we standardize a language?

2. Conformance: What does it mean for a program to adhere to a standard and for a compiler to compile a standard?

3. Obsolescence: When does a standard age, and how does it get modified?

Timeliness

One important issue is when to standardize a language. FORTRAN was initially standardized in 1966 after there were many incompatible versions. This led to problems because each implementation was different from the others. At the other extreme, Ada was initially standardized in 1983, before there were any implementations; therefore, it was not clear when the standard was produced whether the language would even work. The first effective Ada compilers did not even appear until 1987, and several idiosyncrasies were identified by these early implementations. One would like to standardize a language early enough so that there is enough experience in using the language, yet not so late as to encourage many incompatible implementations.

10

Of the languages, FORTRAN was standardized fairly late, when there were many incompatible variations; Ada was standardized very early, before any implementations existed.

Conformance

Standards have to be reviewed every 5 years and either be renewed or dropped. The 5-year cycle often gets stretched out somewhat, but the process is mostly effective.

One problem with updating a standard is what to do with the existing collection of programs written for the older standard. Companies have significant resources invested in their software, and to rewrite all of this code for a new version of a language is quite costly. Because of this, most standards require backward compatibility; the new standard must include older versions of the language.

A feature is obsolescent if it is a candidate feature that may be dropped in the next version of the standard. This warns users that the feature is still available, but in the next 5 to 10 years, it will be dropped. That gives a fair warning to rewrite any code using that feature. A deprecated feature may become obsolescent with the next standard, and hence may be dropped after two revisions. This gives a longer 10- to 20-year warning. New programs should not use these features.

4. INTERNATIONALIZATION

With the globalization of commerce and the emergence of the WWW, programming is increasingly a global activity, and it is important for languages to be readily usable in multiple countries. There is increasing need for computers to “speak” many different languages. For example, use of an 8-bit byte, which can store up to 256 different character representations, to represent a character is often insufficient. This issue has generally gone under the name of internationalization (I18N issue).

Often local conventions affect the way data are stored and processed. Such issues as character codes, collating sequences, formats for date and time, and other local standards affect input and output data. Some of the relevant issues are as follows:

Collating sequences: In what collating sequence should the characters be ordered?

• Sorting: The position of non-Roman characters, such as Å, Ø, ß, ö, and others is not uniformly defined and may have different interpretations in different countries.

• Case: Some languages like Japanese, Arabic, Hebrew, and Thai have no uppercase-lowercase distinction.

11

• Scanning direction: Most languages read from left to right, but others exist (e.g., right to left, top to bottom).

Country-specific date formats: 11/26/02 in the United States is 26/11/02 in England; 26.11.02 in France; 26-XI-02 in Italy, etc.

Country-specific time formats: 5:40 p.m. in the United States is 17:40 in Japan, 17.40 in Germany, 17h40 in France, and so on.

Time zones: Although the general rule is 1 hour of change for each 15 degrees of longitude, it is more a guideline than a reality. Time zones are generally an integral number of hours apart, but some vary by 15 or 30 minutes. Time changes (e.g., daylight savings time in the United States and summer time in Europe) do not occur uniformly around the world. Translating local time into a worldwide standard time is nontrivial. In the southern hemisphere, the transformation for summer time is opposite that of the northern hemisphere.

Ideographic systems: Some written languages are not based on a small number of characters forming an alphabet, but instead use large numbers of ideographs (e.g., Japanese, Chinese, and Korean). Often 16 bits might be needed to represent text in those languages.

Currency: Representation of currency (e.g., $, £, ¥) varies by country.

PROGRAMMING ENVIRONMENTS

A programming environment is the environment in which programs are created and tested, and it tends to have less influence on language design than the operating environment in which programs are expected to be executed. A programming environment consists primarily of a set of support tools and a command language for invoking them. Each support tool is another program that may be used by the programmer as an aid during one or more of the stages of creation of a program. Typical tools in a programming environment include editors, debuggers, verifiers, test data generators, and pretty printers.

Effects on Language Design

Programming environments have affected language design primarily in two major areas: features aiding separate compilation and assembly of a program from components and features aiding program testing and debugging.

Separate compilation:

In the construction of any large program, it is ordinarily desirable to have different programmers or programming groups design, code, and test parts of the program before a final assembly of all the components into a complete program. This requires the language to be structured

12

so that individual subprograms or other parts can be separately compiled and executed, without the other parts, and then later merged without change into the final program.

Separate compilation is made difficult by the fact that in compiling one subprogram, the compiler may need information about other subprograms or shared data objects, as in the following situation:

1. The specification of the number, order, and type of parameters expected by any subprogram called allows the compiler to check whether a call of the external subprogram is valid. The language in which the other subprogram is coded may also need to be known so that the compiler may set up the appropriate calling sequence of instructions to transfer data and control information to the external subprogram during execution in the form expected by that subprogram.

2. The declaration of data type for any variable referenced is needed to allow the compiler to determine the storage representation of the external variable so that the reference may be compiled using the appropriate accessing formula for the variable (e.g., the correct offset within the common environment block).

3. The definition of a data type that is defined externally but is used to declare any local variable within the subprogram is needed to allow the compiler to allocate storage and compute accessing formulas for local data.

To provide this information about separately compiled subprograms, shared data objects, and type definitions either (1) the language may require that the information be redeclared within the subprogram (in FORTRAN); (2) it may prescribe a particular order of compilation to require compilation of each subprograms and shared data (in Ada and to some extent in Pascal); or (3) it may require the presence of a library containing the relevant specifications during compilation so that the compiler may retrieve them as needed (in Java and C++).

The term independent compilation is usually used for option 1. Each subprogram may be independently compiled without any external information; the subprogram is entirely self-contained. Independent compilation has the disadvantage that ordinarily there is no way to check the consistency of the information about external subprograms and data that are redeclared in the subprogram. If the declarations within the subprogram do not match the actual structure of the external data or subprogram, then a subtle error appears in the final assembly stage that will not have been detected during testing of the independently compiled program parts.

A subprogram call made to a subprogram that has not yet been compiled is termed a stub.

Another aspect of separate compilation that affects language design is in the use of shared names. If several groups are writing portions of a large program, it is often difficult to ensure that the

13

names used by each group for subprograms, common environments, and shared type definitions are distinct. A common problem is to find, during assembly of the final complete program, that several subprograms or other program units have the same names. This often means a tedious and time-consuming revision of already-tested code. Languages employ three methods to avoid this problem:

1. Each shared name, such as in an extern statement in C, must be unique, and it is the obligation of the programmer to ensure that is so. Naming conventions must be adopted at the outset so that each group has a distinct set of names they may use for subprograms (e.g., “all names used by your group must begin with QQ”).

2. Languages often use scoping rules to hide names. If one subprogram is contained within another subprogram, only the names in the outermost subprogram are known to other separately compiled subprograms. Languages like Pascal, C, and Ada use this mechanism.

3. Names may be known by explicitly adding their definitions from an external library. This is the basic mechanism of inheritance in object-oriented languages. By including an externally defined class definition into a subprogram, other objects defined by that class become known, as in Ada and C++. In Ada, names may also be overloaded so that several objects may have the same name. As long as the compiler can resolve which object is actually referenced, no change is needed in the calling program.

Testing and debugging

Most languages contain some features to aid program testing and debugging. A few typical examples are the following:

1. Execution trace features: Prolog, LISP, and many other interactive languages provide features that allow particular statements and variables to be tagged for tracing during execution. Whenever a tagged statement is executed or a tagged variable is assigned a new value, execution of the program is interrupted, and a designated trace subprogram is called (which typically prints appropriate debugging information).

2. Breakpoints: In an interactive programming environment, languages often provide a feature where the programmer can specify points in the program as breakpoints. When a breakpoint is reached during execution, execution of the program is interrupted, and control is given to the programmer at a terminal. The programmer may inspect and modify values of variables and then restart the program from the point of interruption.

3. Assertions: An assertion is a conditional expression inserted as a separate statement in a program, for example,

assert(X>0 and A=1) or (X=0 and A>B+10)

14

The assertion states the relationships that must hold among the values of the variables at that point in the program. When the assertion is enabled, the compiler inserts code into the compiled program to test the conditions stated. During execution, if the conditions fail to hold, then execution is interrupted, and an exception handler is invoked to print a message or take other action. After the program is debugged, the assertions may be disabled so that the compiler generates no code for their checking. They then become useful comments that aid in documenting the program. This is a simple concept that exists in several languages, including C++.

VIRTUAL COMPUTERS AND BINDING TIMES

A computer is an integrated set of algorithms and data structures capable of storing and executing program. We considered ways in which a given computer might be constructed:

1. Through a hardware realization, representing the data structures and algorithms directly with physical devices.

2. Through a firmware realization, representing the data structures and algorithms by microprogramming a suitable hardware computer.

3. Through a virtual machine, representing the data structures and algorithms by programs and data structures in some other programming language.

4. Through some combination of these techniques, representing various parts of the computer directly in hardware, in microprograms, or by software simulation as appropriate.

1. Virtual Computers and Language Implementations

Each time the language is implemented on a different computer, the implementer tends to see a slightly (or very) different virtual computer in the language definition. Thus, two different implementations of the same language may utilize a different set of data structures and operations in the implementation, particularly for data structures and operations that are hidden in the program syntax. Each implementer has wide latitude in determining the virtual computer structures that are the basis for a particular implementation.

When a programming language is being implemented on a particular computer, the implementer first determines the virtual computer that represents an interpretation of the semantics of the language and then constructs that virtual computer out of the hardware and software elements provided by the underlying computer.

The implementer must also determine precisely what is to be done during translation of a program and what during execution.

15

Three factors lead to differences among implementations of the same language:

1. Differences in each implementer’s conception of the virtual computer that is implicit in the language definition.

2. Differences in the facilities provided by the host computer on which the language is to be implemented.

3. Differences in the choices made by each implementer as to how to simulate the virtual computer elements using the facilities provided by the underlying computer and how to construct the translator so as to support these choices of virtual computer representation.

2. Hierarchies of Virtual Machines

The virtual machine that a programmer uses to create an application is in fact formed from a hierarchy of virtual computers. At the bottom, there must, of course, lie an actual hardware computer. However, the ordinary programmer seldom has any direct dealing with this computer. Instead, this hardware computer is successively transformed by layers of software (or microprograms) into a virtual machine that may be radically different. The second level of virtual computer (or the third if a microprogram forms the second level) is usually defined by the complex collection of routines known as the operating system.

16

Typically the operating system provides simulations for a number of new operations and data structures that are not directly provided by the hardware (e.g., external file structures or time-of-day functions).

3. Binding and Binding Time

The binding of a program element to a particular characteristic or property is simply the choice of the property from a set of possible properties. The time during program formulation or processing when this choice is made is termed the binding time of that property for that element.

Classes of Binding Times

Although there is no simple categorization of the various types of bindings, a few main binding times may be distinguished if we recall our basic assumption that the processing of a program, regardless of the language, always involves a translation step followed by execution of the translated program:

1. Execution time (run time): Many bindings are performed during program execution. These include bindings of variables to their values, as well as (in many languages) the binding of variables to particular storage locations. Two important subcategories may be distinguished:

a. On entry to a subprogram or block: In most languages, bindings are restricted to occur only at the time of entry to a subprogram or block during execution. For example, in C and C++, the binding of formal to actual parameters and the binding of formal parameters to particular storage locations may occur only on entry to a subprogram.

b. At arbitrary points during execution: Some bindings may occur at any point during execution of a program. The most important example here is the basic binding of variables to values through assignment, whereas some languages like LISP, Smalltalk, and ML permit the binding of names to storage locations to also occur at arbitrary points in the program.

2. Translation time (compile time): Three different classes of translation time bindings may be distinguished:

a. Bindings chosen by the programmer: In writing a program, the programmer consciously makes many decisions regarding choices of variable names, types for variables, program statement structures, and so on that represent bindings during translation. The language translator makes use of these bindings to determine the final form of the object program.

b. Bindings chosen by the translator: Some bindings are chosen by the language translator without direct programmer specification. For example, the relative location of a data object in the storage allocated for a procedure is generally handled without knowledge or

17

intervention by the programmer. How arrays are stored and how descriptors for the arrays, if any, are created are decisions made by the language translator. Different implementations of a given language may choose to provide these features in different ways.

c. Bindings chosen by the loader: A program usually consists of several subprograms that must be merged into a single executable program. The translator typically binds variables to addresses within the storage designated for each subprogram. However, this storage must be allocated actual addresses within the physical computer that will execute the program. This occurs during load time (also called link time).

3. Language implementation time: Some aspects of a language definition may be the same for all programs that are run using a particular implementation of a language, but they may vary between implementations. For example, often the details associated with the representations of numbers and arithmetic operations are determined by the way that arithmetic is done in the underlying hardware computer. A program written in the language that uses a feature whose definition has been fixed at implementation time will not necessarily run on another implementation of the same language; even more troublesome, it may run and give different results.

4. Language definition time: Most of the structure of a programming language is fixed at the time the language is defined, in the sense of specification of the alternatives available to a programmer when writing a program. For example, the possible alternative statement forms, data structure types, program structures, and so on are all fixed at language definition time.

To illustrate the variety of bindings and binding times, consider the simple assignment statement

X=X+10

written in a language L. We might inquire into the buildings and binding times of the following elements of this statement:

1. Set of types for Variable X. Variable X in the statement usually has a data type associated with it, such as real, integer, or Boolean. The set of allowable types for X is often fixed at language definition time (e.g., only types real, integer, Boolean, set, and character might be allowed). Alternatively, language may allow each program to define new types, as in C, Java, and Ada, so that the set of possible types for X is fixed at translation time.

2. Type of variable X. The particular data type associated with variable X is often fixed at translation time through an explicit declaration in the program such as float X, which is the c

18

designation for a real data type. In other languages, such as Smalltalk and Perl, the data type of X may be found at execution time through assignment of a value of a particular type to X. In these languages, X may refer to an integer at one point and to a string at a later point in the same program.

3. Set of possible values for variable X. If X has data type real, then its value at any point during execution is one of a set of bit sequences representing real numbers. The precise set of possible values for X is determined by the real numbers that can be represented and manipulated in the virtual computer defining the language, which ordinarily is the set of real numbers that can be represented conveniently in the underlying hardware computer.

4. Value of Variable X. At any point during program execution, a particular value is bound to Variable X. This value is determined at execution time through assignment of a value to X. The assignment X=X+10 changes the binding of X, replacing its old value by a new one that is 10 more than the old one.

5. Representation of the constant 10. The integer 10 has both a representation as a constant in the text of the program, using the string 10, and a representation at execution time, commonly as a sequence of bits. The choice of decimal representation in the program (i.e., using 10 for ten) is usually made at language definition time, whereas the choice of a particular sequence of bits to represent 10 at execution time is usually made at language

implementation time.

6. Properties of the operator +. The choice of symbol + to represent the addition operation is made at language definition time. However, it is common to allow the same symbol + to be overloaded by representing real addition, integer addition, complex addition, and so on, depending on the context. In a compiled language, it is common to make the determination of which operation is represented by + at compile time. The mechanism for specifying the binding desired is usually the typing mechanism for variables: If X is type integer, then the + in X+10 represent real addition, and so on.

In summary, for a language like C, the symbol + is bound to a set of addition operations at language definition time, each addition operation in the set is defined at language implementation time, each particular use of the symbol + in a program is bound to a particular addition operation at translation time, and the particular value of each particular addition operation for its operands is determined only at execution time.

Importance of Binding Times

A language like FORTRAN, in which most bindings are made during translation, early in the processing of a program, is said to have early binding; languages with late binding, such as ML or HTML, delay most bindings until execution time. The advantages and disadvantages of early

19

binding versus late binding revolve around a conflict between efficiency and flexibility. In languages where execution efficiency is a prime consideration, such as FORTRAN, Pascal, and C, it is common to design the language so that as many bindings as possible may be performed during translation. Where flexibility is the prime determiner, as in ML and LISP, most bindings are delayed until execution time so that they may be made data dependent. In a language designed for both efficient and flexibility, such as Ada, multiple options are often available that allow choices of binding times.