Upload
hoangtuyen
View
218
Download
0
Embed Size (px)
Citation preview
Outline • Multiple source file projects • Compiling • Linking • Loading • Libraries
– Static libraries – Overlays – Shared libraries
• DLLs
From Source To Execution • What is responsible for each step (each arrow)?
libgui.a
second.m first.m
first.o second.o
a.out
running program
mouse.m gui.m
gui.o mouse.o
Unix Windows file.m file.m file.o file.obj file.a file.lib a.out file.exe
A contingent fact of history • Stephen Jay Gould said “human equality is a
contingent fact of history” — it’s true but it did not have to be, and history explains why.
• Separation into compilers, linkers, and loaders is a contingent fact of history. There are other ways to do it.
• Lisp, Smalltalk, Pop-2, Prolog, APL: compiler is part of runtime, interactive programmer loads source code into running system, even pl=OS!
• BlueJ approximates that for Java. • Whole-program compilers (SmartEiffel, MIPS)
analyse/optimise whole program.
Separate compilation • Big broken programs, small machines! • Break program into replaceable parts (repair). • Break program into reusable parts (libraries). • Don’t load what you don’t need (configurable). • Small pieces ⇒ compiler has room and time • Many pieces ⇒ can compile in parallel
• TANSTAAFL! • How do you put the pieces together?
The secret • We don’t have to compile all the way to directly
executable code. • The output of a compiler can be a description of
the code. • “It’s sort of like this but when you find the Amulet
of Yendor do that to it.” • Compiler output = meta-program executed by
linker/loader to generate actual code. • Classic steps are relocate code from logical
address to physical address • and resolve references to external names to their
actual addresses.
Separate Source Files • use.c extern int another;
int main(void) {
another = 1234;
return 0;
}
• declare.c int another;
• How does the compiler know: – Where another is stored in memory?
• How can the compiler produce the machine code?
The Compiler • Leave gaps in machine code when referencing externs • use.c extern int another; int main(void) { another = 1234; } • Compiler output for use.c 0001 18c0 sect 0 0002 _main: 0003 18c0 cc 04 d2 ldd #1234 0004 18c3 fd 00 00 std _another 0005 18c6 39 rts
• On line 0004 another is at location 0000!
The Compiler • Allocate space for global variables • declare.c int another;
• Compiler output for declare.c 0001 4000 sect 1
0002 _another:
0003 4000 rmb2
• On line 0003 space is allocated for another
Revision: Segments • Program
– The code or text segment
• Local Variables – The stack segment
• Global variables – The data segment
• In the example
Software Model
Stack
Program
Heap
Globals Data segment
Code (text) segment
Stack segment
“The break”
Segment Example Code sect 0 Data sect 1 Stack
The Linker • Loads a set of object files and outputs an executable file • Each input file is a set of segments (not related to x86
segments) – Code / Data – Symbol table – Debugging information (not loaded)
• Pass 1 – Scan the input files to compute the segment sizes – Collect the symbol tables together
• Process – Allocate locations for each symbol – Lay the symbols out in the output (executable) file
• Pass 2 – Read and relocate the object code – Replace symbol references with memory locations – Copy segments into the output (executable) file
The Combined (Linked) Program 0001 * define starting addresses 0002 18c7 sect 0 * code 0003 1800 org $1800 0004 0000 sect 1 * data 0005 4000 org $4000 0006 7ffb stackbase equ $7ffb 0007 * 0008 * start of code 0009 * 0010 1800 sect 0 0011 1800 8e 7f fb lds #stackbase 0012 1803 bd 18 c0 jsr _main … 0002 _main: 0003 18c0 cc 04 d2 ldd #1234 0004 18c3 fd 40 00 std _another 0005 18c6 39 rts 0006 L1.use: 0001 4000 sect 1 0002 _another: 0003 4000 rmb 2
The Loader • Read the executable file • Allocate memory space for it • Load each segment • Initialize the stack (if needed)
– Create stack segment (if needed) • Set up environment, etc. • Jump to program start
– Initializes the stack (if needed)
Stack
Code
HEAP
Data Code
Data
Header
Other
a.out memory
Relocation • In some systems (old and embedded systems)
– Multiple programs in memory at one time – No virtual address space
• Executable format has a patch or relocation table • Loader
– Loads the executable at some base location – For every direct memory address:
• Adds the base to the address • Uses the patch table to do this
• Some hardware requires special attention – E.g. Intel 8088 segmentation
Dynamic loaders • Do the loading tasks after program starts. • Also have to do some linker tasks. • Position-independent code: executable code that
contains no absolute addresses so that it can be loaded anywhere in memory.
• PC-relative code for branches; base+displacement addressing for external calls and data; use Global Offset Table in UNIX for resolution.
• UNIX shared objects require PIC. • Windows DLLs are dynamically relocated if not
loaded at their preferred address.
Static Libraries • Just a collection of object files stored together plus a
combined symbol table (see ranlib(1)). – In Unix they are created using ar
• The archive program – Using Windows they are created using LIB
• Linux gcc -c first.c
gcc -c second.c
ar -r my.a first.o second.o
gcc -c use.c
gcc use.o my.a • Windows
cl -c first.c
cl -c second.c
lib /out:my.lib first.obj second.obj
cl –c use.c
link use.obj my.lib
Overlays • What if the program is larger than memory? • Used:
– When no virtual memory manager (VMM) available – Before VMMs existed (e.g. DOS/360 MS-DOS)
• Loader calls A or D (and loads one or the other) – If A calls B then load B – If A calls C then load C – B cannot call C
• C cannot call B – A/B/C cannot call D
• D cannot call A/B/C
• A/B/C/D/Loader are sets of methods / objects
Loader
A B C
D
Dynamic unloading • An overlay may be unloaded when no procedure
call is using it. (See dlclose(3) in UNIX.) • Fortran and COBOL: “static” variables may be
reinitialised on re-entry to a procedure, because it might be in an overlay, and it’s not just the code that goes away when an overlay is unloaded, the data does too.
• The ability to dynamically unload and reload a module means that a running program can be patched.
• Erlang “hot loading”: modules can be replaced even while they are in use.
Static Shared Libraries • Linker:
– Loads the library – Binds addresses (entry points) to the executable
• Often via a branch table – Throws the library away
• Loader – Load the library when program starts
• Advantage: – The library is only stored on disc once – The program cannot be broken by library changes
• Problems: – Must be present when program is run – Can’t change the library (much) once bound
DLL / OCX • DLL: Dynamic link libraries
– Load and bind at run time • Allows library to change after program written
• VBX / OCX / ActiveX – Dynamic link and load of objects – Load and bind at run time
• Allows library to change after program written