42
OBJECT MODULE FORMATS

OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

Embed Size (px)

Citation preview

Page 1: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

OBJECT MODULE FORMATS

Page 2: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

The object module format we have employed as an educational device is called OMF

(relocatable object format).

It’s one of the earliest forms, but all the subsequent formats contain the basic elements

that are present in OMF

Page 3: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

Here is a depiction of the main formats that followed

pe/coff+ mach-o for Mac osx10.6

pe/coff elf

coff mach-o

omf a.out

Page 4: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

All of them contain separate sections for data, code, and relocation information (i.e. fixups).

All of them, incidentally, were designed by committees with the objective of making them machine and language indepedent to varying degrees.

So the committees included a wealth of fields that they thought might possibly be helpful, but which are in fact never used in practice.

Page 5: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

So why didn’t we pick on one of these later formats to employ for our Project 4?

It just would not have been possible to do this in a one-semester compiler course.

Even in a two-semester course, the amount of extra detail required would be out of proportion to the gain in education value.

Page 6: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

OMF was devised by Intel

and at roughly the same time period, AT&T released A.OUT for use with Unix systems.

Page 7: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

In order to provide for debugging information and shared libraries,

COFF (common object file format) was released by AT&T

together with the introduction of Unix System V.

Page 8: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

The object module formats in use today by Linux, Unix, and Microsoft, are basically

variants of COFF

Page 9: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

COFF supported symbolic debugging by in effect including a symbol table which specified

not only the offset of variables,

but also the offset of code corresponding to the line number of the source - so as to aid e.g. in

the setting of breakpoints.

Page 10: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

Limitations of COFF include:

It places a limit on section names (which correspond to our segment names)

and on the number of sections allowed,

and its symbolic debugging information is insufficient for supporting some of the features

of languages such as C++.

Page 11: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

In response, AT&T released ELF,

a minor variant of COFF

with the introduction of System V, version 4 .

Page 12: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

Microsoft created its own version of COFF.

For the sake of concreteness let’s examine its main features

- as described in the Microsoft document

“Microsoft Portable Executable and Common Object File Format Specification”, September

21, 2010.

Page 13: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

The name of the specification is abbreviated as PE / COFF

while the version released to accommodate 64 bit machines is called PE / COFF+.

Page 14: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

PE is the format of the output of the linker and . loader,

in which the various modules that make up the program are linked

all external references resolved

all relocation (fixups) completed

and the image obtained finally written into memory

Page 15: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

The COFF component of PE / COFF is the format of the object module that serves as

input to the linker

Page 16: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

It closely follows that of the original COFF specification.

The main difference is that the Microsoft version does not make use of the debugging facilities

supplied by the original COFF

such as e.g the line number information

It relies on Visual C++ type debug information.

Page 17: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

As a compiler writer, your responsibility in writing a compiler for Windows is the production

of an object module for input to the linker.

The PE formatted output of the linker, and the operating system, are the responsibility of

Microsoft.

Page 18: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

MICROSOFT’S COFF FORMAT

Here is an

illustration

of the coff

structure

Page 19: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

SECTIONS

The sections correspond to our segments.

Except for the segment associated with uninialized data, each segment consists of a

header,

the raw data,

and a relocation component.

Page 20: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

The .text section is the code section

and the relocation information corresponds to our fixups.

Page 21: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

There are two data sections.

One is for initialized data,

to e.g contain the initial value of variables, as in:

num dw 23

The other data section, called .bss above,

is for unitialized data,

as in:

array2 dw 1000 dup(?)

Page 22: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

The .bss section consists only of a header that specifies what space is to be involved at

execution time.

The “named sections”, if present, may be used for purposes such as functions that the program

employs.

The name of the section would then normally be the same as that of the function.

Page 23: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

Section Headers.

The fields involved in the section headers include:

the section name. If the name has 8 characters or less, it is contained in the header, otherwise it is included in the String table (which corresponds to our ID_S), and the name field of the section header then contains a pointer to its offset there.

the section’s virtual address (i.e. offset within the object module itself).

the sections’s physical address (i.e. the offset from the start of the program that it will have at execution time)

Page 24: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

the size of the section

a pointer to the section’s raw data

a pointer to the corresponding relocation entries

a specification of whether the section contains executable code, initialized data, or unitialized dataa specification of whether the section may or may not be read, written, or executed

Page 25: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

THE FILE HEADER

The fields involved in the file header include:

a number identifying the target machine e.g. those employing the 386 or later Pentium, or various machines

produced by Hitachi, Mitsubishi, etc.

a time and date stamp, indicating when the file was created

the number of section headers

a pointer to the symbol table’s starting address

Page 26: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

THE SYMBOL TABLE

The symbol table entries are each 18 bytes long, and include:

the name of the symbol. The same scheme is employed as described above for section header names, i.e. if the name is longer than 8 bytes it is stored in the string table, and a pointer to it employed instead

Page 27: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

the section the item is defined in

it’s offset within that section

it’s storage class, e.g. whether it is external, static, or is a function

Page 28: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

Some of the entries, such as e.g. those for functions, require more than the 18 bytes an

entry provides for its information.

In such cases, the main entry for the name is followed by an additional entry (referred to as

an auxillary entry).

Page 29: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

THE STRING TABLE

As mentioned, this corresponds to our id_s.

It starts off with 4 bytes specifying its length.

This is followed by null-terminated strings, in general representing names.

Page 30: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

Note that the segdef, pubdef, and extdef records we have been using

are replaced by entries in the symbol table and the string table.

Page 31: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

THE PE MODULE FORMAT

As mentioned, the compiler writer, in the case where target is not an intermediate language, is concerned with producing the object module input to the linker.

He or she is not directly involved with the PE module that the linker produces. Let us however look at the main features of the PE format.

Page 32: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

Here is a diagram of its structure

Page 33: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

The components the linker has added to the Coff format are:

(a) the DOS stub

(b) the optional file header

(c) the data directories

Page 34: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

THE DOS STUB

The purpose of the DOS stub is to detect when an attempt is made to execute the program under DOS, and then issue an error message such as:

This program can only be run under Windows

Page 35: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

THE OPTIONAL FILE HEADER

The loader needs to be able to relocate the program in the case where it is unable to load it into the base location employed by the linker.

Some of the items listed on the next slide are included for this purpose

Page 36: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

The information the optional file header contains includes:

(a) the amount of memory space that will be occupied by executable code, initialized data, and uninialized data

(b) the offsets from the beginning of the program where the above items will be located in memory

(c) the offset from the beginning of the program of it’s entry point

Page 37: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

(d) the amount of space needed for the stack

(e) the amount of space needed for the heap

(f) the alignment of the sections. The default is at an address divisible by 512, but any power of 2 up to 64k can be used.

(g) the offsets within the module of the data

(h) directories and their sizes.

Page 38: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

THE DATA DIRECTORIES

These include:

(a) the Export Table

(b) the Import Table

(c) the Resource Table

(d) the Base Relocation Table

Page 39: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

The Export Table is employed mainly by DLLs to supply the entry points of the various functions they provide.

The Import Table is used by programs to supply the externals references that the linker was unable resolve, usually those to DLL functions.

Note that the location of the DLL functions may change between one Load & execute of the program to another.

Page 40: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

The unresolved calls in the memory image to such external routines are not directly fixed up.

They are instead replaced by the linker as calls to a table of external addresses which the loader fills in.

The pentium has a call indirect instruction for this purpose.

Page 41: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

The Resource Table table contains information about resources the program employs, such as dialog boxes, menus, icons, etc.

The Base Relocation Table replaces the Coff version, as much of the relocation and linking involved has already be carried out by the linker.

Page 42: OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest

SOURCES

1. Microsoft Portable Executable and Common Object File Format Specification, Revision 8.2, Sept. 2010.

2. Application Report spraa08-April 2009, Texas Instruments.