92
1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg University wledgement to Norm Hutchinson, Patrick Earl, Ethan Henry, Simon Leonard and whose slides this lecture is based on.

1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

Embed Size (px)

Citation preview

Page 1: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

1

Languages and Compilers(SProg og Oversættere)

Lecture 12

Heap Management and Garbage Collection

Bent Thomsen

Department of Computer Science

Aalborg University

With acknowledgement to Norm Hutchinson, Patrick Earl, Ethan Henry, Simon Leonard and Jack Newton

whose slides this lecture is based on.

Page 2: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

2

The “Phases” of a Compiler

Syntax Analysis

Contextual Analysis

Code Generation

Source Program

Abstract Syntax Tree

Decorated Abstract Syntax Tree

Object Code

Error Reports

Error Reports

Last lecture

Page 3: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

3

Remember Lecture 10

High Level Program

Low-level Language Processor

How to model high-level computational structures and data structures in terms of low-level memory and machine instructions.

Procedures

Expressions

VariablesArrays

Records

Objects

Methods

Registers

Machine Instructions

Bits and BytesMachine Stack

How to model ?

Page 4: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

4

Sometimes it goes the other way round

High Level Program

Low-level Language Processor

How to reflect low-level memory and machine data structures in terms of high-level computational structures.

Objects

RegistersBits and Bytes

How to model ?

Pointers

Indirect addressing

References

Page 5: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

5

Memory usage

RECAP: we have seen 2 storage allocation models so far:• static allocation

• code segment • global variables• exist “forever”

• stack allocation• local variables and arguments for procedures and functions• lifetime follows procedure activation

NEXT: • Dynamic allocation, also called Heap Storage

• Some programs may ask for and get memory allocated on arbitrary points during execution

• When this memory is no longer used it should be freed

Page 6: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

6

Heap Storage

• Memory allocation under explicit programmatic control– C malloc, C++ / Pascal / Java / C# new operation.

• Memory allocation implicit in language constructs– Lisp, Scheme, Haskell, SML, … most functional languages

– Autoboxing/unboxing in Java 1.5 and C#

• Deallocation under explicit programmatic control– C, C++, Pascal

• Deallocation implicit– Java, C#, Lisp, Scheme, Haskell, SML, …

Page 7: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

7

Heap Storage Deallocation

Explicit versus Implicit Deallocation

Examples: • Implicit: Java, Scheme • Explicit: Pascal and C

To free heap memory a specific operation must be called.Pascal ==> dispose C ==> freeC++ ==> delete

• Implicit and Explicit: AdaDeallocation on leaving scope

In explicit memory management, the program must explicitly call an operation to release memory back to the memory management system.

In implicit memory management, heap memory is reclaimed automatically by a “garbage collector”.

Page 8: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

8

How does things become garbage?

int *p, *q;…p = malloc(sizeof(int));p = q;

for(int i=0;i<10000;i++){ SomeClass obj= new SomeClass(i); System.out.println(obj);}

Newly created space becomes garbage

Creates 10000 objects, which becomes garbage just after the print

Page 9: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

9

Problem with explicit heap management

int *p, *q;…p = malloc(sizeof(int));q = p;free(p); Dangling pointer in q now

float myArray[100];

p = myArray;*(p+i) = … //equivalent to myArray[i]

They can be hard to recognize

Page 10: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

10

OO Languages and heap allocation

Objects are a lot like records and instance variables are a lot like fields.=> The representation of objects is similar to that of a record.

Methods are a lot like procedures.=> Implementation of methods is similar to routines.

But… there are differences:

Objects have methods as well as instance variables, records only have fields.

The methods have to somehow know what object they are associated with (so that methods can access the object’s instance variables)

Page 11: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

11

Representation of a simple Java object

A simple Java object (no inheritance)

class Point { int x,y; public Point(int x, int y) { this.x=x; this.y=y; }

public void move(int dx, int dy) { x=x+dx; y=y+dy; }

public float area() { ...} public float dist(Point other) { ... }}

class Point { int x,y; public Point(int x, int y) { this.x=x; this.y=y; }

public void move(int dx, int dy) { x=x+dx; y=y+dy; }

public float area() { ...} public float dist(Point other) { ... }}

(1)

(2)

(3)(4)

Page 12: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

12

Representation of a simple Java object

A simple Java object (no inheritance)

Point classPointmoveareadist

constructor(1)method(2)method(3)method(4)

Point p = new Point(2,3);Point q = new Point(0,0);

p

q

classxy

23

classxy

00

new allocates an object in the heap

Page 13: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

13

Objects can become garbage

Point p = new Point(2,3);Point q = new Point(0,0);p = q;

Page 14: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

14

Stacks and dynamic allocations are incompatible

Why can’t we just do dynamic allocation within the stack?

Programming Language design and Implementation -4th EditionCopyright©Prentice Hall, 2000

Page 15: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

15

Automatic Storage Deallocation (Garbage Collection)

Everybody probably knows what a garbage collector is.

But here are two “one liners” to make you think again about what a garbage collector really is!

1) Garbage collection provides the “illusion of infinite memory”!

2) A garbage collector predicts the future!

It’s a kind of magic! :-)

Let us look at how this magic is done!

Page 16: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

16

Where to put the heap?

• The heap is an area of memory which is dynamically allocated.

• Like a stack, it may grow and shrink during runtime.• Unlike a stack it is not a LIFO => more complicated to

manage• In a typical programming language implementation we

will have both heap-allocated and stack allocated memory coexisting.

Q: How do we allocate memory for both

Page 17: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

17

Where to put the heap?

• A simple approach is to divide the available memory at the start of the program into two areas: stack and heap.

• Another question then arises– How do we decide what portion to allocate for stack vs. heap ?

– Issue: if one of the areas is full, then even though we still have more memory (in the other area) we will get out-of-memory errors

Q: Isn’t there a better way?

Page 18: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

18

Where to put the heap?

Q: Isn’t there a better way?A: Yes, there is an often used “trick”: let both stack and heap share the same memory area, but grow towards each other from opposite ends!

ST

SB

HB

HT

Stack memory area

Heap memory area

Stack grows downward

Heap can expand upward

Page 19: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

19

How to keep track of free memory?

Stack is LIFO allocation => ST moves up/down everything above ST is in use/allocated. Below is free memory. This is easy! But …Heap is not LIFO, how to manage free space in the “middle” of the heap?

HB

HTAllocated

ST

SB

Free

Free

Mixed:AllocatedandFree

reuse?

Page 20: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

20

How to keep track of free memory?

How to manage free space in the “middle” of the heap?

HB

HT

=> keep track of free blocks in a data structure: the “free list”. For example we could use a linked list pointing to free blocks.

Free Next

freelist

Free Next

Free Next

A freelist! Good idea!

But where do we find the memory to store this data structure?

A freelist! Good idea!

But where do we find the memory to store this data structure?

Page 21: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

21

How to keep track of free memory?

HB

HT

Q: Where do we find the memory to store a freelist data structure?A: Since the free blocks are not used for anything by the program => memory manager can use them for storing the freelist itself.

HF

HF free block sizenext free

Page 22: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

22

Why Garbage Collect?

• Software Engineering– Garbage collection increases abstraction level of software

development.

– Simplified interfaces and decreases coupling of modules.

– Studies have shown a significant amount of development time is spent on memory management bugs [Rovner, 1985].

Page 23: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

23

Implicit memory management

• Current trend of modern programming language development: to give only implicit means of memory management to a programmer:– The constant increase of hardware memory justifies the policy of automatic

memory management– The explicit memory management distracts programmer from his primary tasks:

let everyone do what is required of them and nothing else! – The philosophy of high-level languages conforms to the implicit memory

management

• Other arguments for implicit memory management:– Anyway, a programmer cannot control memory management for temporary

variables!– The difficulties of combination of two memory management mechanisms:

system and the programmer’s

• The history repeats: in 70’s people thought that the implicit memory management had finally replaced all other mechanisms

Page 24: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

24

Classes of Garbage Collection Algorithms

• Direct Garbage Collectors: a record is associated with each node in the heap. The record for node N indicates how many other nodes or roots point to N.

• Indirect/Tracing Garbage Collectors: usually invoked when a user’s request for memory fails because the free list is exhausted. The garbage collector visits all live nodes, and returns all other memory to the free list. If sufficient memory has been recovered from this process, the user’s request for memory is satisfied.

Page 25: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

25

Terminology

• Roots: values that a program can manipulate directly (i.e. values held in registers, on the program stack, and global variables.)

• Node/Cell/Object: an individually allocated piece of data in the heap.

• Children Nodes: the list of pointers that a given node contains.• Live Node: a node whose address is held in a root or is the child

of a live node.• Garbage: nodes that are not live, but are not free either.• Garbage collection: the task of recovering (freeing) garbage

nodes.• Mutator: The program running alongside the garbage collection

system.

Page 26: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

26

Types of garbage collectors

• The “Classic” algorithms– Reference counting

– Mark and sweep

• Copying garbage collection

• Generational garbage collection

• Incremental Tracing garbage collection

Page 27: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

27

Reference Counting

• Every cell has an additional field: the reference count. This field represents the number of pointers to that cell from roots or heap cells.

• Initially, all cells in the heap are placed in a pool of free cells, the free list.

• When a cell is allocated from the free list, its reference count is set to one.

• When a pointer is set to reference a cell, the cell’s reference count is incremented by 1; if a pointer is to the cell is deleted, its reference count is decremented by 1.

• When a cell’s reference count reaches 0, its pointers to its children are deleted and it is returned to the free list.

Page 28: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

28

01

0

0

0

Reference Counting Example

1

2

1

1

Page 29: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

29

2

1

1

1

Reference Counting Example

0

1

Page 30: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

30

2

1

1

1

Reference Counting Example

0

0

1

1

Page 31: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

31

2

1

1

1

Reference Counting Example

0

0

0

1

1

Returned to free list

Page 32: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

32

Reference Counting: Advantages and Disadvantages

• Advantages:– Garbage collection overhead is distributed.

– Locality of reference is no worse than mutator.

– Free memory is returned to free list quickly.

Page 33: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

33

Reference Counting: Advantages and Disadvantages

• Disadvantages:– High time cost (every time a pointer is changed, reference counts must be

updated).

• In place of a single assignment x.f = p:

– Storage overhead for reference counter can be high.

– Unable to reclaim cyclic data structures.

– If the reference counter overflows, the object becomes permanent.

z = x.f

c = z.count

c = c – 1

z.count = c

If c = 0 call putOnFreeList(z)

x.f = p

c = p.count

c = c + 1

p.count = c

Page 34: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

34

Reference Counting: Cyclic Data Structure - Before

02

0

0

1

2

1

Page 35: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

35

Reference Counting: Cyclic Data Structure – After

01

0

0

1

2

1

Page 36: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

36

Deferred Reference Counting

• Optimisation– Cost can be improved by special treatment of local variables.

– Only update reference counters of objects on the stack at fixed intervals.

– Reference counts are still affected from pointers from one heap object to another.

Page 37: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

37

Mark-Sweep

• The first tracing garbage collection algorithm

• Garbage cells are allowed to build up until heap space is exhausted (i.e. a user program requests a memory allocation, but there is insufficient free space on the heap to satisfy the request.)

• At this point, the mark-sweep algorithm is invoked, and garbage cells are returned to the free list.

• Performed in two phases:– Mark: identifies all live cells by setting a mark bit. Live cells are cells

reachable from a root.

– Sweep: returns garbage cells to the free list.

Page 38: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

38

Mark and Sweep Garbage Collection

HTe

c

aHB

SB

ST

b

d

before gc

HTe

c

a

SB

ST

b

d

HB

mark as free phase

X

X

XXX

Page 39: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

39

Mark and Sweep Garbage Collection

HTe

c

a

SB

ST

b

d

HB

mark as free phase

X

X

XXX

HTe

c

a

SB

ST

b

d

HB

X

X

XXX

X

X

X

mark reachable

SB

ST

HB

HTe

b

d

X

X

X

X

X

X

collect free

Page 40: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

40

Mark-Sweep Example

Returned to free list

Page 41: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

41

Mark and Sweep Garbage Collection

Algorithm pseudo code:void garbageCollect() {

mark all heap variables as freefor each frame in the stack

scan(frame)for each heapvar (still) marked as free

add heapvar to freelist}void scan(region) {

for each pointer p in regionif p points to region marked as free then

mark region at p as reachablescan(region at p )

}Q: This algorithm is recursive. What do you think about that?

Page 42: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

42

Mark-Sweep: Advantages and Disadvantages

• Advantages:– Cyclic data structures can be recovered.

– Tends to be faster than reference counting.

Page 43: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

43

Mark-Sweep: Advantages and Disadvantages

• Disadvantages:– Computation must be halted while garbage collection is being

performed– Every live cell must be visited in the mark phase, and every

cell in the heap must be visited in the sweep phase.– Garbage collection becomes more frequent as residency of a

program increases.– May fragment memory.

Page 44: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

44

Mark-Sweep: Advantages and Disadvantages

• Disadvantages:– Has negative implications for locality of reference. Old objects

get surrounded by new ones (not suited for virtual memory applications).

• However, if objects tend to survive in clusters in memory, as they apparently often do, this can greatly reduce the cost of the sweep phase.

Page 45: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

45

Mark-Compact Collection

• Remedy the fragmentation and allocation problems of mark-sweep collectors.

• Two phases:– Mark: identical to mark sweep.

– Compact: marked objects are compacted, moving most of the live objects until all the live objects are contiguous.

Page 46: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

46

Heap Compaction

To fight fragmentation, some memory management algorithms perform “heap compaction” once in a while.

HB

HT

HFa

b

c

HB

HTa

b

ddc

before after

Page 47: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

47

Mark-Compact: Advantages and Disadvantages

• Advantages:– The contiguous free area eliminates fragmentation problem.

Allocating objects of various sizes is simple.

– The garbage space is "squeezed out", without disturbing the original ordering of objects. This improves locality.

Page 48: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

48

Mark-Compact: Advantages and Disadvantages

• Disadvantages:– Requires several passes over the data are required. "Sliding

compactors" takes two, three or more passes over the live objects.

• One pass computes the new location

• Subsequent passes update the pointers to refer to new locations, and actually move the objects

Page 49: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

49

Copying Garbage Collection

• Like mark-compact, copying garbage collection does not really "collect" garbage.

• Rather it moves all the live objects into one area and the rest of the heap is known to be available.

• Copying collectors integrate the traversal and the copying process, so that objects need only be traversed once.

• The work needed is proportional to the amount of live data (all of which must be copied).

Page 50: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

50

Semispace Collector Using the Cheney Algorithm

• The heap is subdivided into two contiguous subspaces (FromSpace and ToSpace).

• During normal program execution, only one of these semispaces is in use.

• When the garbage collector is called, all the live data are copied from the current semispace (FromSpace) to the other semispace (ToSpace).

Page 51: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

51

Semispace Collector Using the Cheney Algorithm

A

B C

D

FromSpace ToSpace

Page 52: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

52

Semispace Collector Using the Cheney Algorithm

FromSpace ToSpace

A B C D

A

B C

D

Page 53: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

53

Semispace Collector Using the Cheney Algorithm

• Once the copying is completed, the ToSpace is made the "current" semispace.

• A simple form of copying traversal is the Cheney algorithm.

• The immediately reachable objects from the initial queue of objects for a breadth-first traversal.

• A scan pointer is advanced through the first object location by location.

• Each time a pointer into FromSpace is encountered, the referred-to-object is transported to the end of the queue and the pointer to the object is updated.

Page 54: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

54

Semispace Collector Using the Cheney Algorithm

• Multiple paths must not be copied to tospace multiple times.

• When an object is transported to tospace, a forwarding pointer is installed in the old version of the object.

• The forwarding pointer signifies that the old object is obsolete and indicates where to find the new copy.

Page 55: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

55

Copying Garbage Collection: Advantages and Disadvantages

• Advantages:– Allocation is extremely cheap.

– Excellent asymptotic complexity.

– Fragmentation is eliminated.

– Only one pass through the data is required.

Page 56: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

56

Copying Garbage Collection: Advantages and Disadvantages

• Disadvantages:– The use of two semi-spaces doubles memory requirement

– Poor locality. Using virtual memory will cause excessive paging.

Page 57: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

57

Problems with Simple Tracing Collectors

• Difficult to achieve high efficiency in a simple garbage collector, because large amounts of memory are expensive.

• If virtual memory is used, the poor locality of the allocation/reclamation cycle will cause excessive paging.

• Even as main memory becomes steadily cheaper, locality within cache memory becomes increasingly important.

Page 58: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

58

Problems with Simple Tracing Collectors

• With a simple semispace copy collector, locality is likely to be worse than mark-sweep.

• The memory issue is not unique to copying collectors.

• Any efficient garbage collection involves a trade-off between space and time.

• The problem of locality is an indirect result of the use of garbage collection.

Page 59: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

59

Generational Garbage Collection

• Attempts to address weaknesses of simple tracing collectors such as mark-sweep and copying collectors:– All active data must be marked or copied.

– For copying collectors, each page of the heap is touched every two collection cycles, even though the user program is only using half the heap, leading to poor cache behavior and page faults.

– Long-lived objects are handled inefficiently.

Page 60: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

60

Generational Garbage Collection

• Generational garbage collection is based on the generational hypothesis:

Most objects die young.• As such, concentrate garbage collection efforts on

objects likely to be garbage: young objects.

Page 61: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

61

Generational Garbage Collection: Object Lifetimes

• When we discuss object lifetimes, the amount of heap allocation that occurs between the object’s birth and death is used rather than the wall time.

• For example, an object created when 1Kb of heap was allocated and was no longer referenced when 4 Kb of heap data was allocated would have lived for 3Kb.

• Typically, between 80 and 98 percent of all newly-allocated heap objects die before another megabyte has been allocated.

Page 62: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

62

Generational Garbage Collection

• Objects are segregated into different areas of memory based on their age.

• Areas containing newer objects are garbage collected more frequently.

• After an object has survived a given number of collections, it is promoted to a less frequently collected area.

Page 63: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

63

Generational Garbage Collection: Example

Old Generation New Generation

Root SetS

A

B

C

Memory Usage Memory Usage

Page 64: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

64

Generational Garbage Collection: Example(Continued)

Old Generation New Generation

Root SetS

A

B

C

Memory Usage Memory Usage

R

Page 65: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

65

Generational Garbage Collection: Example(Continued)

Old Generation New Generation

Root SetS

A

B

C

Memory Usage Memory Usage

R D

Page 66: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

66

Generational Garbage Collection: Example

• This example demonstrates several interesting characteristics of generational garbage collection:– The young generation can be collected independently of the

older generations (resulting in shorter pause times).– An intergenerational pointer was created from R to D. These

pointers must be treated as part of the root set of the New Generation.

– Garbage collection in the new generation result in S becoming unreachable, and thus garbage. Garbage in older generations (sometimes called tenured garbage) can not be reclaimed via garbage collections in younger generations.

Page 67: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

67

Generational Garbage Collection: Implementation

• Usually implemented as a copying collector, where each generation has its own semispace:

Old Generation New Generation

FromSpace FromSpace

ToSpace ToSpace

Page 68: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

68

Generational Garbage Collection: Issues

• Choosing an appropriate number of generations:– If we benefit from dividing the heap into two generations, can

we further benefit by using more than two generations?

• Choosing a promotion policy:– How many garbage collections should an object survive

before being moved to an older generation?

• Tracking intergenerational pointers:– Inter-generational pointers need to be tracked, since they form

part of the root set for younger generations.

• Collection Scheduling– Can we attempt to schedule garbage collection in such a way

that we minimize disruptive pauses?

Page 69: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

69

Generational Garbage Collection: Multiple Generations

Generation 1Generation 2Generation 3Generation 4

Page 70: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

70

Generational Garbage Collection: Multiple Generations

• Advantages:– Keeps youngest generation’s size small.– Helps address mistakes made by the promotion policy by creating

more intermediate generations that still get garbage collected fairly frequently.

• Disadvantages:– Collections for intermediate generations may be disruptive.– Tends to increase number of inter-generational pointers,

increasing the size of the root set for younger generations.

• Most generational collectors are limited to just two or three generations.

Page 71: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

71

Generational Garbage Collection: Promotion Policies

• A promotion policy determines how many garbage collections cycles (the cycle count) an object must survive before being advanced to the next generation.

• If the cycle count is too low, objects may be advanced too fast; if too high, the benefits of generational garbage collection are not realized

• With a cycle count of just one, objects created just before the garbage collection will be advanced, even though the generational hypothesis states they are likely to die soon.

• Increasing the cycle count to two denies advancement to recently created objects.

• Under most conditions, increasing the cycle count beyond two does not significantly reduce the amount of data advanced.

Page 72: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

72

Generational Garbage Collection: Inter-generational Pointers

• Inter-generational pointers can be created in two ways:– When an object containing pointers is promoted to an older

generation.– When a pointer to an object in a newer generation is stored in

an object.• The garbage collector can easily detect promotion-

caused inter-generational pointers, but handling pointer stores is a more complicated task.

• Pointer stores can be tracked via the use of a write barrier:– Pointer stores must be accompanied by extra bookkeeping

instructions that let the garbage collector know of pointers that have been updated.

• Often implemented at the compiler level.

Page 73: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

73

Generational Garbage Collection: Collection Scheduling

• Generational garbage collection aims to reduce pause times. When should these (hopefully short) pause times occur?

• Two strategies exist:– Hide collections when the user is least likely to notice a pause,

or

– Trigger efficient collections when there is likely to be lots of garbage to collect.

Page 74: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

74

Generational Garbage Collection: Advantages

• In practice it has proven to be an effective garbage collection technique.

• Minor garbage collections are performed quickly.• Good cache and virtual memory behaviour.

Page 75: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

75

Sun HotSpot

• Sun JDK 1.0 used mark-compact• Sun improved memory management in the Java 2 VMs

(JDK 1.2 and on) by switching to a generational garbage collection scheme

• The heap is separated into two regions:– New Objects

– Old Objects

Page 76: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

76

New Object Region

• The idea is to use a very fast allocation mechanism and hope that objects all become garbage before you have to garbage collect

• The New Object Regions is subdivided into three smaller regions:– Eden, where objects are allocated

– 2 “Survivor” semi-spaces: “From” and “To”

Page 77: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

77

New Object Region

• The Eden area is set up like a stack - an object allocation is implemented as a pointer increment

• When the Eden area is full, the GC does a reachability test and then copies all the live objects from Eden to the “To” region

• The labels on the regions are swapped– “To” becomes “From” - now the “From” area has objects

Page 78: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

78

New Object Region

• The next time Eden fills, objects are copied from both the “From” region and Eden to the “To” area

• There’s a “Tenuring Threshold” that determines how many times an object can be copied between survivor spaces before it’s moved to the Old Object region

• Note that one side-effect is that one survivor space is always empty

Page 79: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

79

Old Object Region

• The old object region is for objects that will have a long lifetime

• The hope is that because most garbage is generated by short-lived objects that you won’t need to GC the old object region very often

Page 80: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

80

Generational Garbage Collection

Eden SS1 SS2 Old

Eden SS1 SS2 Old

Eden SS1SS2 Old

Eden SS1SS2 Old

FirstGC

SecondGC

Eden SS1SS2 Old

New Object Region Old Object Region

Page 81: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

81

Generational Garbage Collection: Disadvantages

• Performs poorly if any of the main assumptions are false:– That objects tend to die young.

– That there are relatively few pointers from old objects to young ones.

• Frequent pointer writes to older generations will increase the cost of the write barrier, and possibly increase the size of the root set for younger generations.

Page 82: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

82

Incremental Tracing Collectors

• Program (Mutator) and Garbage Collector run concurrently.– Can think of system as similar to two threads. One performs

collection, and the other represents the regular program in execution.

• Can be used in systems with real-time requirements. For example, process control systems.– allow mutator to do its job without destroying collector’s

possibilities for keeping track of modifications of the object graph, and at the same time

– allowing collector to do its job without interfering with mutator

Page 83: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

83

Coherence & Conservatism

• Coherence: A proper state must be maintained between the mutator and the collector.

• Conservatism: How aggressive the garbage collector is at finding objects to be deallocated.

Page 84: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

84

Tricolouring

• White – Not yet traversed. A candidate for collection.• Black – Already traversed and found to be live. Will

not be reclaimed.• Grey – In traversal process. Defining characteristic is

that it’s children have not necessarily been explored.

Page 85: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

85

The Tricolour Abstraction

Page 86: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

86

Incremental Tracing Collectors Overview

• Baker’s Algorithm– Based on Cheney’s copy-collect-algorithm– Based on tricolouring

• Principle:– whenever mutator accesses a pointer to a white object the object is coloured

grey– this ensures that mutator never points to a white object, hence part 1 of the

invariant holds

• At every read mutator does, the invariant must be checked (expensive), if this is the case the object must be forwarded (using the standard forward algorithm) to to-space

• There are variants:– read-barrier– write-barrier– synchronising on page-level

Page 87: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

87

Garbage Collection: Summary

Method Conservatism Space Time Fragmentation Locality

Mark Sweep Major Basic 1 traversal + heap scan

Yes Fair

Mark Compact Major Basic Many passes of heap

No Good

Copying Major Two Semispaces 1 traversal No Poor

Reference Counting

No Reference count field

Constant per Assignment

Yes Very Good

Deferred Reference Counting

Only for stack variables

Reference Count Field

Constant per Assignment

Yes Very Good

Incremental Varies depending on algorithm

Varies Can be Guaranteed Real-Time

Varies Varies

Generational Variable Segregated Areas Varies with number of live objects in new generation

Yes (Non-Copying)

No (Copying)

Good

Tracin

gIn

cremen

tal

Page 88: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

88

Comparing Garbage Collection Algorithms

• Directly comparing garbage collection algorithms is difficult – there are many factors to consider.

• Some factors to consider:– Cost of reclaiming cells– Cost of allocating cells– Storage overhead– How does the algorithm scale with residency?– Will user program be suspended during garbage collection?– Does an upper bound exist on the pause time?– Is locality of data structures maintained (or maybe even

improved?)

Page 89: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

89

Comparing Garbage Collection Algorithms

• Zorn’s study in 1989/93 compared garbage collection to explicit deallocation:– Non-generational

• Between 0% and 36% more CPU time.• Between 40% and 280% more memory.

– Generational garbage collection• Between 5% to 20% more CPU time.• Between 30 and 150% more memory.

• Wilson feels these numbers can be improved, and they are also out of date.

• A well implemented garbage collector will slow a program down by approximately 10 percent relative to explicit heap deallocation.

Page 90: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

90

Garbage Collection: Conclusions

• Despite this cost, garbage collection a feature in many widely used languages:– Lisp (1959)

– SML, Haskell, Miranda (1980-)

– Perl (1987)

– Java (1995)

– C# (2001)

– Microsoft’s Common Language Runtime (2002)

Page 91: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

91

Different choices for different reasons

• JVM– Sun Classic: Mark, Sweep and Compact– SUN HotSpot: Generational (two generation + Eden)

• -Xincgc an incremental collector that breaks that old-object region into smaller chunks and GCs them individually

• -Xconcgc Concurrent GC allows other threads to keep running in parallel with the GC

– BEA jRockit JVM: concurrent, even on another processor– IBM: Improved Concurrent Mark, Sweep and Compact with a notion of

weak references– Real-Time Java

• Scoped LTMemory, VTMemory, RawMemory

• .Net CLR– Managed and unmanaged memory (memory blob)– PC version: Self-tuning Generation Garbage Collector– .Net CF: Mark, Sweep and Compact

Page 92: 1 Languages and Compilers (SProg og Oversættere) Lecture 12 Heap Management and Garbage Collection Bent Thomsen Department of Computer Science Aalborg

92

Garbage Collection: Conclusions

• Relieves the burden of explicit memory allocation and deallocation.

• Software module coupling related to memory management issues is eliminated.

• An extremely dangerous class of bugs is eliminated.

• The compiler generates code for allocating objects

• The compiler must also generate code to support GC– The GC must be able to recognize root pointers from the stack

– The GC must know about data-layout and objects descriptors

• The choice of Garbage Collector algorithm has to take many factors into consideration