16
1 Incremental Garbage Collection I BAKKALAUREATSARBEIT (Seminar aus Softwareentwicklung: Garbage Collection) zur Erlangung des akademischen Grades Bakkalaureus/Bakkalaurea der technischen Wissenschaften in der Studienrichtung INFORMATIK Eingereicht von: WIRTH Christian, 0355354 Angefertigt am: Institut für Systemsoftware Betreuung: Prof. Dr. Hanspeter Mössenböck Pasching/Linz, Jänner 2006

C. Wirth, Incremental Garbage Collection I. Seminar Aus Softwareentwicklung

Embed Size (px)

DESCRIPTION

Incremental garbage collection, 1 of 3 articles by c. wirth

Citation preview

  • 1

    Incremental Garbage Collection I

    BAKKALAUREATSARBEIT (Seminar aus Softwareentwicklung: Garbage Collection)

    zur Erlangung des akademischen Grades

    Bakkalaureus/Bakkalaurea der technischen Wissenschaften

    in der Studienrichtung

    INFORMATIK

    Eingereicht von: WIRTH Christian, 0355354

    Angefertigt am: Institut fr Systemsoftware

    Betreuung: Prof. Dr. Hanspeter Mssenbck

    Pasching/Linz, Jnner 2006

  • 2

    Abstract--Garbage collection systems ease the burdon on programmers of manually handling the memory management. There is no need to keep track of used and unused memory objects any longer, the automatic collector takes over that work.

    Many garbage collection algorithms have been introduced, most of them having to suspend the main program in order to identify and release unused nodes in memory.

    Here in this paper I describe methods on how to allow the garbage collector to incrementally identify the garbage without having to suspend the program, letting the collector run concurrently to it.

    This paper is the first part of a three paper series on incremental garbage collection. Its main intention is to give a general overview of the motivation for, the fundamental priciples of and some basic algorithms used for this advanced type of garbage collector system.

    Index Terms--Garbage collection, incremental garbage collection, tricolor marking, write-barrier, read-barrier, Bakers algorithm.

    I. NOMENCLATURE Several terms and definitions used in this paper deserve

    separate explaination:

    Mutator: This is an arbitraty program that mutates (allocates, uses, abandons) data in memory. It is the task of the (garbage) collector to search for and free data of the system memory when the mutator does not need it any longer without the mutator explicitly having to order that.

    Collector: Short for garbage collector. This is the process that, using techniques described in this paper, identifies memory not used by the mutator any longer and releases it.

    Floating Garbage: This term is used for unreferenced objects that are not (yet) recognized by the algorithm to be garbage. Ususally it takes another run of the garbage collector for the floating garbage to be recognized as such.

    Conservativity: The conservativity of an algorithm describes how eager it is to free potential garbage. The more conservative an algorithm is the higher are the chances that garbage is not recognized immediately and thus not freed

    (during the current run). This is independent of the correctness of the algorithm. Even the least conservative ones still have to be correct, meaning they may free only real garbage.

    Fromspace, Tospace: Copying algorithms (like Bakers algorithm) use two separate memory blocks. During one run of the collector, all used objects are copied from the currently used block Fromspace to another one (called Tospace). All unreferenced objects remain in Fromspace and are freed when the whole block is.

    Flip: When using a copying collector with a Fromspace/Tospace model, flip describes the act of swaping Tospace with Fromspace. This is done at the start of the run of the collector. After a flip all objects are in Fromspace and only thos at least indirectly referencable from the root set will be copied to Tospace and thus survive the current run.

    Root set: The root set is a (abstract) set of pointers. Objects on the heap are only visible to the program if they are (at least indirectly) referenced by a pointer that is available to the program. The root set consists of all those pointers that are available to the program without having to reference them in any way. This includes global variables, the stack and the registers.

    II. INTRODUCTION NCREMENTAL garbage collection is a method of allowing the garbage collection run to take place

    simultaneously to the mutator process(es). Doing so the collector has to take care of changes made by the mutator.

    Garbage collection in general provides a system that supports automatic memory management. Without the assistance of such an helpful system the application programmer has to keep track of all the memory he uses. He has to return unused objects back to the operating system lest his programs memory consumption does not rise perpetually. This causes quite a work overhead when writing applications while the programmer should concentrate on implementing the system he needes.

    Incremental Garbage Collection: The Basic Algorithms

    Christian Wirth

    I

  • 3

    The first automatic garbage collection system was introduced by John McCarthy in 1960, the inventor of the programming language LISP. Using such a system, the programmer could from then on concentrate on implementing his algorithms while being liberated of the laborious memory management.

    A. Drawbacks of simple algorithms Traditional, simple garbage collection methods have one

    major drawback. The usually have to stop the mutator process in order to have a consistent view on the state of the process. They are thus often called stop-the-world collectors. Only once the garbage collector has finished its run the mutator may proceed to manipulate the data in memory. Would it do so while the collector was still runing both could badly interfere with each other resulting in memory being released that was still in use.

    Incremental garbage collection tries to solve this misery. It enables the mutator to continue its work even during the run of the garbage collector. In most cases the mutator still has to be suspended for a short period of time but may proceed then while the collector is doing its job.

    The collector can either be implemented as a separate thread or process or it can be embedded into the mutator process and incrementally do a few steps of work every now and then. It is even possible for the collector to execute on a completeley separate processor in a multi-processor environment only sharing the same memory. This increases the need for further synchronization between the collector and the mutators though.

    This paper will give an overview of the motivation why to use an incremental garbage collector. It will present some of the basic algorithms, especially methods to adapt the three basic garbage collecting algorithms (mark-sweep, copying collectors and reference counting) to be used incrementally or concurrently. It will discuss the advantages and disadvantages of this approach.

    Based on those fundamentals my colleagues Schatz [11] and Wrthinger [12] will address some special features and more sophisticated algorithms in further detail in their separate papers.

    III. GENERAL ADVANTAGES AND DISADVANTAGES It is hard to exactly state the pros and cons for a big and

    diverse class of different algorithms like those of incremental garbage collectors. Depending on their actual method of implementation, the hardware they are used on and other factors, different advantages or causes for trouble are inherent in their implementation.

    Apparently, due to their aim their biggest advantage in general is that they allow the mutator to smoothly continue its computations while the collector can do its work at virtually the same time. The algorithms try to minimize the length of forced pause times of the mutator. This is especially important for interactive systems or other real-time systems that heavily rely on a low latency time.

    However, those algorithms tend to be slighly more expensive overall than others are. This is due to the special overhead of having to keep track of the changes the mutator exercises while the collector is already working on the data of the mutator. While for simple algorithms the collector is the only one to manipulate data during its run, when using incremental techniques this is not true. Additional routines have to be used to guarantee a secure but fast snchronization. This can get quite tricky easily for example when several different processors have to access the same memory space concurrently.

    IV. INCREMENTAL VERSUS CONCURRENT GARBAGE COLLECTION

    The terms incremental, parallel and concurrent are sometimes inaccurately used synonymously to describe incremental garbage collection.

    A. Incremental or parallel garbage collection To be precise the definition of incremental or parallel

    garbage collection describes any collecting system that runs in parallel to the mutator. It is the basic term for this class of algorithms described in the paper and thus its title.

    One possible method to implement such a system is to directly interweave the collector with the mutator. A commonly used method is for example to incrementally do a few steps of the collecting task every time new memory is requested. This can be done by inserting a few lines of code before every malloc (in C) or new (in Java or C++). Whenever such a memory request is called, the collector does a few steps of its job before actually serving the mutators request for new memory. An algorithm using this method is Bakers copying collector that will be explained in more details later.

    B. Concurrent garbage collection Concurrent garbage collection on the other hand is just

    another possibility to implement an incremental collector. It describes a garbage collector that is executed in a separate process or thread. The collector could use time-sharing techniques to run quasi-parallel to the mutator. In a multiprocessor environment it can even be executed on another processor. Especially when running on a separate processor it is really able to fulfil its task concurrently to the mutator thread.

  • 4

    Note that those suggestions for using the three terms is not always followed by the authors of the literature. It seems to be the most logical and and also most often used one though.

    V. INCREMENTAL MARK AND SWEEP A well-known algorithm for garbage collection is the mark

    and sweep scheme. In the following paragraphs the possible use of this method for incremental garbage collection are analyzed.

    A. Tricolor marking A basic technique to identify the garbage during the

    marking run is the so-called tricolor marking. For simple, non-incremental mark and sweep methods a two-color marking scheme usually suffices. The need to identify mutator changes during the marking run makes it necessary to use three colors here.

    The collector starts with the root set. Following the pointers from there it visits all nodes in the heap of the system memory recursiveley. Doing so it eventually visits all nodes in memory the mutator can access. All visited notes are colored with one of the three colors black, white and grey, accoring to the following rules:

    Black: The node and all of its direct descendants (meaning all the objects it directly references) have been visited. The node has been fully processed and does not need to be visited again during that run.

    Grey: The node is marked to have already been found by the coloring run but not yet been processed. It needs to be visited again before the tricolor marking is complete.

    White: The node has not been visited yet. If it stays white until the end of the marking run it cannot be reached from the mutator and can thus be considered garbage.

    Initially, all nodes are white. For each node the collector finds referenced it has to decide how to color it. Nodes are colored black once their treatment is completed. This is when they do not reference white nodes any more, when all their direct descendands are at least grey. Until then the found but not treated node is colored grey marking it as a reachable but not yet finished node.

    In an abstract way of seeing this process, the collector pushes a wave-front of grey nodes through the memory transfering white nodes to black ones. At the end of this process, this is when there are no grey nodes left, all nodes are either black or white.

    The collector has visited all nodes the mutator can possibly reach, having marked those black. The remaining white

    nodes can be considered garbage since they cannot be reached from within the mutator.

    This algorithm was first introduced by E. W. Dijkstra [7] to describe incremental garbage collection. When used in a static system the algorithm seems quite easy to be implemented. The basic idea would be to first pause the mutator, then make the collector start its tricolor marking run. Once all the garbage is identified it is removed. Only then the mutator may proceed its work. Apparently, this huge delay in the run of the mutator can not always be tolerated. Using incremental garbage this pause is eliminated or at least significantly shortened.

    B. Implementation of tricolor marking There are two possibilities to implement tricolor marking.

    The obvious first one is to add two color bits to every object.

    The other possible way is to use a marker bit for each node and a mark stack. Unmarked nodes are considered white. Marked nodes are black unless they are on the mark stack in which case they are grey. This method makes it easier to find the next grey node during the marking run.

    C. Mutator changes during the collecting run The described marking algorithm can be used in a system

    that allows the mutator to continue applying changes to the memory. There are a few additional provisions that have to be included though. Consider the following example.

    Fig. 1 Possible failure of tricolor marking system

    Of the three nodes (A, B, C), the objects A and B are in the root set. B points to C. Now the collector starts its run and finds A. Since it has no descendants A is colored black. Unfortunateley the collector is stopped in favour of the mutator that edits both A and B so that now only A points to C. Only after that the collector may proceed and visits the only node left in the root set, B. Since B has no descendants it is marked black, too.

    C has never been visited thus it is still white. There are neither grey nodes left nor unvisited links from the root set so the marking run is finished. Because of its color C is considered garbage. That is wrong of course since it is still

  • 5

    referenced by A.

    The cause of trouble in this example is allowing the mutator to change the state of a black node (A). In concrete, during the marking run the mutator may not set up a pointer from a black node to a white one that is not referenced from anywhere else. This is demonstrated by the example: C is only referenced from (black) A. There is no possible way that C will be visited during the current run since Cs only possible father-node is already visited. This renders the node garbage.

    Any other possible changes the mutator could do can not harm the system. The worst thing that can happen due to a change is that a node meant to be garbage is overlooked in the current run. This floating garbage has to wait for the next run to be treated correctly. How often this happens depends on the conservativity of the algorithm.

    To express it formally, two preconditions must hold at some point during the marking phase to cause troubles:

    1. A pointer to a white node is written into a black one. 2. That link is the only one referencing the white node.

    If that holds for any (white) node this node will incorrectly be treated as garbage since there is no other node referencing it so the collector could only reach it via the black node. The collector does not do so because it does not visit the black node or its children again. So the node will stay white blemishing it as garbage.

    By preventing (at least) one of the two conditions mentioned above an algorithm can ensure the system does not fail. This is usually done with a barrier limiting the access to the system memory.

    VI. BARRIER METHODS To prevent the mutator from linking a white node from a

    black one two methods exists. Read barriers supervise reading access to nodes while write barriers do so for write access. The mutator may access the nodes only once the barrier has ensured no link from a black to a white node can possibly be set up by the mutator. The code of the barrier is usually put into the mutator itself, for example into the memory allocation routines.

    A. Read barrier Using a read-barrier ensures the mutator never sees a

    white node. Before the barrier allows the mutator to access a white node it is treated by the collector. Painting it grey for example ensures the node will be visited later again. That way it is ensured that no node will be treated as garbage that is still referenced.

    Depending on the algorithms policy it may color all nodes grey or immediately visit them (and possibly their descendants, too), coloring them black and only then giving the mutator access to the node. Described in a Tospace/Fromspace model, the barrier ensures the mutator only accesses nodes in Tospace.

    A read barrier is usually implemented by inserting a few lines of code before each pointer read instruction. This can significantly increases the length of the code depending on the number of instructions to be inserted and the number of pointer read instructions compared to the total number of instruction. It also slows down the execution. Methods to handle this overhead are shown later.

    B. Write barrier A write barries does record when the mutator writes

    changes to an object. Whenever it tries to write a black to white link this operation is supervised. One or probably even both nodes (father and son) have to be visited or be marked to be visited later this run again.

    To implement a write barrier some instructions have to be added in the mutators code to each pointer-write operation. Whenever a pointer is about to be changed, first one of the affected objects is colored grey and only then the pointer is changed to its new target object.

    There are two types of write-barries.

    1) Snapshot-at-the-beginning Those algorithms conceptually take a snapshot of the state

    of the heap at the beginning of the marking run as the name indicates. Whenever a write operation is detected by the barrier, the originally referenced and now prossibly unlinked object is colored grey.

    Actually, this is usually done by coloring the original referenced of a pointer grey before it is changed to its new target. Creating a copy-on-write virtual copy of the active data structure at the beginning of the collection cycle would be to expensive and is not used for this. However, Furusous algorithm that will be discussed later takes use of such a technique.

    Note that generating black-to-white links is not prohibited for this algorithm. Instead it guarantees that the second precondition for a failure never holds. The old link to the object is not lost in the way it is used to color the object grey, guaranteeing that it is not treated as garbage during this collectors run.

    As a consequence those algorithms are very conservative. The unlinked object is kept no matter if it is garbage or not. Consequently, objects that are created during the marking

  • 6

    phase are colored black even though it is quite probable that such objects are relinquished befor the end of the current cycle. In further detail this method is presented when Yuasas algorithm is discussed.

    2) Incremental-update Algorithms of this type record potentially harmful pointer

    changes and color one of the two involved objects grey, depending on the actual algorithm. They are much less conservative than Snapshot-at-the-beginning algorithms because they leave the object that lost its reference white. That node can be deleted in the same collectors run if no other non-garbage node points at it.

    The algorithms conservativeness also depends on which node the algorithm shades gray. It can choose either the black father node or the new white child node. Chosing the child node is much more conservative because again that node can not become garbage during that run any more. Would the collector shade the father node grey the mutator could still unlink the (probably still white) childnode allowing this node to be removed during that run.

    Barries in general can be implemented in either software or hardware. Former hardware architectures like Symbolics, Explorer and SPUR provided hardware support but modern general purpose machines usually dont do. Software read-barriers are usually quite expensive. They increase the size of the generated code, slowing down execution and complicate caching techniques.

    VII. EXAMPLE MARK-SWEEP COLLECTORS Copying collectors have to protect the mutator of the

    changes made by copying the objects. This is usually done with the help of a read-barrier. This method is quite expensive though and is rearely used for non-copying collectors. They rather use write-barriers to synchronize the mutator and the collector. The probably most famous non-copying collectors are those of the mark-sweep pattern.

    In this paragraph some common mark-sweep algorithms will be compared. They all use an incremental-update write-barrier method except for Yuasas algorithm that uses a snapshot-at-the-beginning technique:

    Steeles Multiprocessing, Compactifying algorithm Dijkstras On the Fly collector Kung and Songs four color algorithm Yuasas sequential algorithm

    To explain their functionality the following example operation will be used.

    Fig. 2 Mutator updates pointer from B to C

    The mutator updates A so that it points to C instead of B. It is not defined if there is another pointer to B. No matter what other objects do point at B and C both have to be treated correctly (as garbage or not). Since C has existed before and was accessible by the mutator, there has been at least one link to it before the update operation.

    A. Yuasas algorithm Yuasas algorithm is a snapshot-at-the-beginning

    algorithm using a write barrier. The tricolor marking is implemented using a mark bit and a stack.

    shade(P) = if not marked(P) mark_bit(P) = marked gcpush(P,mark_stack)

    Update(A, C) = if phase == mark_phase shade(*A) *A = C

    Algorithm 1 Yuasas write-barrier update operation

    During the marking phase the barrier traps pointer updates and shades the white object that was pointed at before (B in the example) grey. Doing so that object is kept no matter if it is garbage due to the unlink operation or not, making the algorithm a very conservative one. Consequently, newly allocated objects are colored grey immediately. Even though chances are high the new object does not survive even a single run of the collector many objects allocated are just temporarily used and are not stored for a longer time - it can get garbage and be released only as early as the end of the next run.

    Fig 3. Yuasas snapshot write barrier

    What might look strange first is the fact Yuasas algorithm allows a black to white link (as from A to C in the example).

  • 7

    Nevertheless it ensures that only garbage will be removed.

    Unlike most other algorithms is does not rely on averting the first but the second precondition we defined to be necessary for a problem to arise. While it allows black-to-white links (first precondition) it does ensure every time a objects is unlinked probably removing the last link to it this node will be colored grey. This is a way of preventing the second precodition by using the old link to the node for coloring it and ensure its survival.

    That the black-to-white link is set up is no problem either. See the figure: there has to be another way to reach C since only then a link from A to C can be set up by the mutator. C will obviously not be visited via A but should be so via the other link. This is guaranteed by Yuasas algorithm since if that other link is destroyed too C will be set grey just as B was, via its former parent node.

    That way it will always either be reached via the other node referencing it or it will be colored grey when that one is deleted also ensuring it is not treated as garbage.

    This behaviour is a result of the snapshot-at-the-beginning type of algorithms. A static view of the memory system is assumed. All nodes reachable in this static image are assumed not to be garbage. Even if they are unlinked (like B) they still will not be released since they are reachable in the image. Such nodes have to wait as floating garbage for the next run where they are not linked in the original image any more and will get released. The black to white link is no problem here because it is assumed that there is another way to reach the white node. If this other link is deleted the object is colored grey for it is not lost.

    B. Dijkstras algorithm This is the most conservative incremental-update

    algorithms. It was introduced by Dijkstra, Lamport, et al. in 1976 and in 1978.

    It colors white cells grey whenever a reference is created to them no matter what color the parent node has. That the color grey is used and not black is obvious because the white node could have white sons so that coloring it black could violate the rule no black to white pointers. Unlike at Yuasas algorithm white cells that are unlinked can be released in the same run though.

    Fig. 4 Dijkstras write-barrier

    Dijkstras main concern when researching this algorithm was its correctness. He concentrated on a correct, easy to prove design. That accounted for some of the features that lead to its conservativeness. For example, new objects are marked black or grey but never white.

    An interesting and very important detail in the implementation of the write-barrier of this algorithm is that when a pointer is updated, first the pointer has to be changed and only after that the new target may be set to gray. On first sight this seems the wrong way round. It temporarily creates a link from a black to a white node what is said to be prohibited for this algorithm. Nevertheless it is the correct way of treating this operations as we will soon discuss.

    shade(P) = if white(P) color(P) = grey

    Update(A, C) = *A = C shade(C)

    Algorithm 2 Dijkstras write-barrier update operation

    There is no need to lock during the Update operations. The system just has to ensure that both lines of code represent atomic operations and are not unconnected internally.

    Now to the oder of the operations again. If the order of assignment and shading was changed a permanent black to white link could be created. Assume an already black A the mutator updates to point to the still white C. The write-barrier would trap this prohibited operation and shade C grey. Probably the mutator is suspended then and the collector is allowed to proceed. It would reach C (because it is grey) and treat it correctly.

    Once the collector has finished its run it starts a new one, reaches A and marks it black while C is not visited since it is not yet a descendand of A. Only now the mutator is sheduled to proceed and set up the link. This results in a black to white link from A to C! If C is unlucky enough not to have any other parent nodes referencing it, it will be treated as garbage although it is not! This is why the counter-intuitive order of instructions in the update operations is correct. For the correct order it suffices that both instructions are atomic operations; there is no need to lock the whole block.

    Since the parent node is left black and only the descendand is colored gray, this child will be kept even if the link is broken during the run. This makes the algorithm even more conservative.

  • 8

    1) Proof of correctness As mentioned above the main concern of Dijkstra and his

    colleagues was to find a correctly working algorithm while not so much careing about efficiency. It took him and his colleagues quite some effort to succeed in this task. He is quoted to have expressed:

    Our exercise has not only been very instructive, but at times even humiliating, as we have fallen into nearly every logical trap possible . . . It was only too easy to design what looked sometimes even for weeks and to many people like a perfectly valid solution, until the effort to prove it correct revealed a (sometimes deep) bug.

    When they finally succeeded they had proofen their algorithm to fulfil two criterias:

    Safety: No accessible node is ever appended to the free list Liveness: Every garbage node is eventually collected Both assertions have to be true for any correctly working

    garbage collection system.

    What Dijkstra and his colleagues still had to prove manually has later been automated. Russinoff [4] used an automatic proof system to prove the correctness of an algorithm similar to Dijkstras, the Ben-Aris incremental garbage collection algorithm, that is derived from Dijkstras but only uses two colors (black and white) to be proven even simpler.

    C. Steeles algorithm Unlike Dijkstra, Steeles main concern was efficiency. His

    variant of an incremental-update algorithm is less conservative.

    Its major difference compared to the algorithm of Dijkstra is that it does not shade C but A gray, leaving the child object white for the moment. That way C gets a chance to be unlinked by the mutator and become garbage even during the same run of the collector. The implementation of the tricolor marking is done with a mark bit and stack like Yuasas does.

    Fig. 5 Steeles write-barrier

    shade(P) = mark_bit(P) = unmarked gcpush(P)

    Update(A, C) = LOCK gcstate *A = C if (phase == marking_phase) if (marked(A) and unmarked(C)) shade(A)

    Algorithm 3 Steeles write-barrier update operation

    By coloring the parent node and not the child node, the algorithm retreats the grey wave-front in contrast to advancing it as Dijkstras algorithm does. It is also more selective as it only shades black parents of white sons. In all other cases there is no need to shade because it is already ensured that all objects will eventually be visited if they are not garbage then. The child has already been visited (or marked grey) if it is not white any longer. While this leads to some extra test and accesses to A, it reduces the amount of floating garbage at the end of the collection cycle.

    Unlike Dijkstras, Steeles algorithm needs a lock to ensure the collector does not finish its marking phase while the mutator is currently working inside a barrier what could corrupt the system otherwise.

    D. Kung and Songs algorithm This algorithm is basically an improved version of

    Dijkstras. They paint free objects with a fourth color, off-white when the data is released in the sweep phase unlike Dijkstra who colors those free nodes white. Doing so they get rid of some troubles Dijkstras algorithm has when it comes to differ between free nodes and nodes marked white.

    Another difference is that they use a deque (double ended queue) instead of a marking stack. This reduces the need for critical sections in their concurrent implementation.

    New() = temp = allocate() if phase == mark_phase color(R) = black return temp

    shade(P) = if white(P) or off-white(P) color(P) = grey gcpush(P, Mutator-end of queue)

    Update(A, C) = *A = C if phase == mark_phase shade(C)

    Algorithm 4 Kung and Song mutator code

  • 9

    VIII. TREATMENT OF NEW OBJECTS The treatment of objects allocated during the run of the

    collector has a major influence on the conservativity of an algorithm. New cells have a high chance of dying very soon after being allocated.

    This is of course dependend on the mutators behaviour, especially the task the mutator has to fulfill and how it is implemented. Such allocated-and-released nodes should ideally be freed during the same run of the collector while they should not render too much additional computation time to identify them.

    There are two possibilities how to treat new cells. The conservative method is to color them grey or even black by visiting them immediately. This way they cannot be reclaimed in the same run of the compiler. Example algorithms using this method are Yuasas and Dijkstras. Yuasas is slighty less conservative as it allows nodes that are allocated and relinquished outside a marking phase to be reclaimed during the next one.

    The other possibility is to allocate new cells white. This has the advantage that objects can be allocated, relinquished and reclaimed in the same cycle, diminishing the work overhead on those nodes if they are really often allocated and freed in the same cycle. On the other hand it has the drawback that all surviving nodes have to be traversed since they are not yet colored.

    Steeles as well as Kungs and Songs algorithm are inherently less conservative. To further support that feature they take a closer look on the nodes to decide which color allocated cells should be colored in. Kung and Song color new cells grey, or white outside a marking phase. Steele colors new nodes grey during a marking phase as well but uses a sophisticated heuristic to determine the color outside a marking phase. Most of the objects will be allocated white this way while some will be allocated black.

    IX. INITIALIZATION Another point of concern is the question when to initialize

    the garbage collection run. The next run can start immediately after the last one has finished or it can wait until the mutator runs out of free memory space.

    The first possibility can cause too much overhead when the mutator generates only little garbage making the collector waste its effort on just a few nodes freed. In constrast the second way can cause mutator starvation; the mutator is running out of free memory having to wait for the collector to catch up and free some garbage and thus provide free memory. Doing so would derogate the response time of the mutator since it can do nothing but wait for some time.

    As usual the best method lies somewhere in between. The next run should not be initiated immediately but wait some time. On the other hand it should not wait for no free memory left but start when the number of free nodes falls below a certain treshold. Yuasa for example suggested for his algorithm to initialize it once only about 22 percent of the available heap space are left free.

    A. User stack treatment by Dijkstra and Kung and Song Of further interest is how to efficiently treat the user stack.

    A simple sequential implementation would be to suspend the mutator and shade all the roots grey in one operation.

    For an incremental algorithm, neither Dijkstra nor Kung and Song give hint how to do so. Dijkstra only suggests to shade all the roots without giving further details how to shorten the necessary break of the mutator. Kung and Song add all the roots to a marking queue at the start of the collection cycle.

    B. User stack treatment by Yuasa Yuasa optimizes the solution by copying the entire

    program stack using a block-copying operation. Only the registers and global variables are copied directly onto the mark stack because he assumes there is only a very limited number of such root nodes available. The entries from the saved stack are then transferred in lower quantities to the mark stack when that is in danger of getting empty lest the mark stack not get to deep. The mark phase is finished when both mark stack and save stack are empty.

    This system makes Yuasa state his algorithm is real-time. That is only true in the context that the time complexity of the write barriers allocation routine is bounded by several constants. He also relies on the existence of the fast copying operator. Depending on the actual hard- and software his algorithm is implemented on, those bounding constants can grow unacceptably high.

    C. User stack treatment by Steele Steele suggests to first mark all objects reachable from the

    roots, one root element being traced after the other. The stack is left untouched until the very end of the marking phase because of its volatility. It is then traced one element at a time. Unfortunateley the mutator can still then push objects onto the stack so it has to push them on the marking stack too once the collector has finished tracing the elements on the program stack.

    When used in a concurrent system, the marking stack is locked by the collector. If it finds the stack empty then, the mark phase can be considered complete, otherwise the lock is reversed and the marking continues.

  • 10

    X. TERMINATION The termination of the marking phase is quite expensive

    for Dijkstras algorithm. His algorithm sequentially scans the heap for grey nodes over and over again. When it completes a full tour through the heap without identifying a grey node it has finished the marking phase. This method has a quadratic time complexity in worst case; unfortunateley it is easy to construct realistic examples for that complexity.

    Fig. 6 Dijkstras quadratic complexity

    One simple example is a linked list on the heap that is empty otherwise. That list would be put on the heap in reverse order, the start would be nearer to the end of the heap and each element would point to the one directly preceding it. I leave aside the fact that the example shows a LISP implementation using two pointers and a data field for each cell and just inspect the list element as if it was one object.

    There would be a link from the root set to the last element in the list. So every run through the heap would find just one grey element, marking this one black and coloring the preceding one grey. Then the collector would have to travel almost one full round through the heap to find the next grey one. That would continue until the first element was found and colored black. That leads to n rounds through the heap (where n is the number of elements in the linked list), every time checking n nodes if they are grey but only treating one node per round. The time complexity is thus O(n), quadratic.

    XI. PARALLEL EXECUTION OF MARK AND SWEEP A way of improving mark and sweep algorithms has been

    introduced by Lamport and Queinnec at al.[5]. They have shown that the mark phase and the sweep phase can be run concurrently. To be more precise: while the sweeping phase of the nth cycle is in progess the (n+1)th marking cycle may already be in progress. Therefore they applied two color fields to each element. While cycles (both mark and sweep) with an even number access the first color field those of odd cycles access the second one.

    There is a trap hidden though. Since Dijkstras marks both used and free memory one has to take care not to break the condition of not setting up black to white pointers. When a node is released and thus added to pool of free nodes and its color (as seen by the current cycle) is white, then its other color has to be set grey. Otherwise a black to white link could be set up.

    A. Very Concurrent Garbage Collector A similar approach is undertaken by Huelsbergen and

    Winterbottom [9]. Their Very Concurrent Garbage Collector system allows to execute mutator, marker and sweeper as three different, autonomous threads. The mutating, marking and sweeping run in consecutive phases. While the marker marks the nth generation of data the sweeper releases data marked as garbage during the (n-1)th run of the marker. In parallel the mutator may already alter data marked in the nth generation. This approach is not to be mistaken with generational garbage collection though.

    All this can be done by virtually no synchronization between the threads. Only the marker and the sweeper have to wait for each other to complete their phases. The sweeper may of course only begin with sweeping out the garbage of the nth phase when the marker has already completed marking it. There is no need to synchronize mutator and the two collectors with each other while the collector is running. Only when a new phase is started the mutator has to be suspended for a short time for the roots to be marked correctly. For more details on this algorithm I refer to the paper Mark and Sweep [10] of this seminar.

    XII. VIRTUAL MEMORY TECHNIQUES Using a software write-barrier implies the disadvantage of

    an overhead on all pointer updates done by the mutator. Often this can be handled by improved with assistance of the virtual memory.

    An example would be the Boehm-Demers-Shenker collector that marks the objects incrementally with the help of the operating system. It makes use of the dirty bits of virtual memory pages for synchronisation. When the algorithm needs to terminate, all mutator threads are suspended and the dirty bits of all pages are checked. For all dirty pages the marking process is restarted, beginning with root and shaded objects on them. Once this marking run is finished the collector again tries to terminate.

    This method needs only little overhead in the paging code that is furthermore only executed when the program pages. Only one dirty bit for one memory page is extremely coarse on the other hand. Many different objects can be placed on one page while only one of them has to be altered to result in the dirty bit set and the page needing another marking run.

    Since this method has to suspend the mutators everytime it wants to terminate and the examining of dirty bits and scanning of pages can be quite expensive this algorithm is usually not the best choise. Its huge advantage on the other hand is that it can be implemented without having to modify the compilers. It can thus be easily adapted to support various new languages.

  • 11

    Another possibility of utilizing the virtual memory is to make advantage of it to implement a snapshot-at-the-beginning write-barrier. An actual snapshot can be created incrementally with copy-on-write pages. This technique has been used by Furusou et al. to implement a concurrent conservative collector for object-oriented languages.

    The biggest advantage of this method is that is has the best pause time behaviour of all the algorithms utilizing virtual memory. The mutator threads only have to be suspended for a short time when the virtual memory tables are prepared for the copy-on-write process. From then on the mutators need not be stopped again.

    Once the snapshot is taken the collector uses this static image of the heap to mark while the mutators use (and probably modify) the the real data. Furusou actually used Yuasas algorithm on the copy of the heap. When the collector removes some garbage it does so in the actual data structures. There is no need for further synchronization between the mutators and collector.

    Tested in practice Furusous collector proved to be quite disappointing though. It was hard do avoid additionally copying operations since copy-on-write only copies the page the first time it is written back to the harddrive.

    Additionally, the memory management emerged to be horribly inefficient. The manager was only able to deliver a few thousand objects per second while it should be able to deliver several millions to be reasonably used in a concurrent object-oriented system. Furuosu et al. suggested to assign memory in sizes of full pages to the mutators.

    XIII. CONCURRENT REFERENCE COUNTING Reference Counting should be an appropriate algorithm to

    be used in an incremental garbage collecting system since both techniques rely on a strong interconnection between mutator and collector. Apart from its common drawbacks inability to detect cyclic referencing garbage, computational expenses and close coupling to the user program reference counting is especially bad suited for multiprocessor environments though. The process of updating a reference count has to be synchronized between the threads possibly accessing this object. This must be an atomic operation thus requiring locks on all affected objects. This further increases the already high overhead costs of pointer assignment.

    Fig. 7 Modula-2+ concurrent reference counting architecture

    One implementation of this algorithm is the Modula-2+ collector by Rovner, later improved by DeTreville. Mutator and collector communicate via a transaction queue. The mutator does not keep track of the reference counts but logs all changes it applies to pointers in the transation log. The collector uses this logged data from time to time to calculate the reference count of every object. Those nodes having a count of as low as zero are destroyed.

    Update(A, C) = LOCK mutex insert(A, C, tq) if ty is full notify_collector(tq) tq = get_next_block() *A = C

    Algorithm 5 Mutator code for Modula-2+ collector

    This can be further optimized by distinguishing between variables only the current thread has access to (like thos on the stack or in a register) and variables accessable by all threads (like the heap data or global variables). In fact the collector only reference-counts shared pointer-valued variables. This of course complicates the collector since the reference count number now only represents a lower bound of the actual count and not the full, corret value.

    To determine if an object may be freed the following system is used. Once the transaction block has been filled up it is transferred to the collector. This block holds the information up to some time t0. The collector now scans the state of each thread, one after the other. It has to hold the mutex so that the thread is not inside an Update operation. The time when all threads have been scanned is t1. After that the reference counts of all variables are recounted using the data of the transaction block. Variables with a reference count of zero are place in a Zero-Count List (ZCL).

    If an object had a reference count of zero at t0 and it was not visible in any threads state between t0 and t1 and does not appear in the transaction queue between t0 and t1 it is garbage

  • 12

    and can be freed.

    The last structure to be considered is the ZCL. If an object in the ZCL now has a shared reference count higher than zero it is removed from the list. If it was found in a threads state it is left in the ZCL and will eventually be freed. Otherwise the object is removed from the list and freed.

    This algorithm was not ideal in many ways. As a reference counter it was not able to reclaim cyclic structures. The cost of assignments to shared references was quite high. Other problems experienced where working set size, locality and a tendency of the collector to fall behind the mutator. After testing several different algorithms a combination of reference counting and mark-sweep collecting was used.

    A. Combining reference counting with mark-sweep colleting An algorithm of that type was first introduced by Deutsch

    and Bobrow in 1974 [2]. By using a combination of two garbage collection and reference counting they tried to eradicate the shortcomings of both systems.

    They stated that while reference counting has an overhead proportional to the number of transactions, garbage collection (mark-sweep) has an overhead proportional to the size of allocated memory what made them suggest reference counting as the prefered method to reclaim unused space. Since cyclic garbage cannot be detetectet using this method, they added a mark-sweep collector that ran much less frequent but guaranteed that all garbage was found.

    Their reference counter also took use of the transaction file system described above. Any operations with possible effect on the accessibility of allocated data is logged. Those operations are the allocation of new memory, the creation of a pointer to an object and the destruction of a pointer to an object. They assumed that a reference count of 1 is the most common one. So they only stored objects not having that count. Objects with a count of 2 or higher are stored in a hashtable named Multi Reference Table (MRT), those with a count of zero are stored in the already described ZCT. Each transaction stored in the low is not now inspected and treated as follows:

    Allocate transactions generate an entry in the ZCT since there are no pointers to the new cell yet.

    Create pointer transactions delete the value referenced from the ZCT if it is found there (count is now 1), increase its count if it is in the MRT or put it in the MRT with a count of 2 if it is found in none of the two tables.

    Delete pointer transations whose referenced object is found in the MRT trigger a decrement of the count value in the MRT unless the count is already on its maximum (it is kept

    there) or it is 2 in which case the entry is deleted from the MRT. If the entry is not found in the MRT it had a count of 1, so put it in the ZCT now.

    Now those variables mentioned in the ZCT that are not referenced by a variable can be deleted. This is verified using a hashtable named Variable Count Table (VCT) containing all pointers from the stack. Entries that are in the ZCT but not in the VCT can be reclaimed. Of course then the reference count of the objects referenced by this (now deleted) object have to be decremented. This can be done by storing the information that the object has been deleted in the normal transaction table. On the other hand there could be a significantly large structure ready to be reclaimed at once. However, it has to be taken into account that this could negatively affect the mutator by a huge number of disk accesses required if the structur occupied space on may pages.

    A first simple optimization is to ignore allocate and create pointer operations that directly succeed each other and reference the same object. That would put the object into ZCT and delete it right away again, so this pair can be ignored.

    A further optimization is to store a block of the transation log as hashtable and find patterns of operations referencing the same object that render no change in the MRT or ZCT. That would be the mentioned allocate create pointer combination as well as a create destroy combination. Deutsch and Bobrow argue that even if such an optimization sounds too expensive to be used it should give good performance, especially if one block of the transation log contains only a few thousand entries and can be kept in the core.

    In addition to the reference counting scheme mentioned above they use a garbage collector. The task of that collector is not only to reclaim cyclic referencing garbage but also to compact and defragment the memory in use. It is not primarily used to the dynamic reclamation of abandoned data a garbage collector is usually used for. The collector they researched bases on tracing techniques for linearization by Minsky and copies the non-garbage-data into a linearized form as described by Fenichel and Yokelson.

    The whole system they describe is capable of being executed on a separate processor as a true concurrent collector. The mutator only has to generate and send the transaction log but needs not be able to access the collector. The collector gets the log file and calculates the reference count (alongside the MRT and ZCT from that data). The not yet solved problem of recursive reclaiming of freed objects can just be postponed to the next run of the collector if treated correctly as described above.

  • 13

    Deutsch and Bobrow are certain their system could meet the requirements of a real-time system even when run on a single processor environment. They do not provide any proof though.

    XIV. BAKERS ALGORITHM The incremental copying collector of Baker [1] is one of

    the best-known algorithms using this scheme, first presented in 1978. Sometimes standard copying collectors were erroneously referred to as Bakers due to this. It may not be mistaken with Bakers Treadmill algorithm though, that is described in the paper of Schatz [11].

    Fig. 8 Bakers Tospace layout

    Baker adopted Cheynes copying collector so that it allowed to run parallel to the mutator. The Tospace is organized so that all the old surviving data is compacted at the bottom end (B) and the newly created objects are put at the top end (T).

    Consequently all new objects are allocated black. This means the collector does not have to scan the new objects. Even if it have been initialized with links to objects in Fromspace this does no harm since the read barrier takes care of this. The drawback is that new objects can only be reclaimed during the next cycle of the collector even if they were relinquished later that run. This defines Bakers algorithm to be more conservative than incremental update write-barriers (like Steeles) but less so than snapshot-at-the-beginning algorithms (like Yuasas).

    New(n)= if B > T - n if scan < B abort "Could not finish" flip() for root R R = copy(R) repeat k times while scan < B for P in Children(scan) *P = copy(*P) scan = scan + size(scan) if B == T abort "Heap full" T = T - n return T

    read(T) = T' = copy(T) return T'

    Algorithm 6 Bakers incremental copying algorithm

    Just as usual for a copying collector Bakers start with a flip of Tospace and Fromspace. This can either happen once the B and T pointers meet for example. Waiting for that has the advantage of giving the objects maximum time to die what minimizes the copying need since less objects have to be copied to Tospace and more garbage can be reclaimed by a single operation. The disadvantage on the other side is that more heap memory is allocated and it is possibly quite fragmented too what increases the number of paging requests to be fulfilled. So it might be better to start the next run as soon as possible, guaranteeing a lower fragmentation and less need to page but increasing the need to copy objects from Fromspace to Tospace.

    The flip also takes care that Tospace offers sufficiently space and expands it if that should be necessary.

    Each time the mutator allocates new memory, up to k objects are scanned by the collector, probably copied to Tospace and marked grey then (or black). When a bigger structure is allocated, this number is increased to n*k (where n is the size of the structure in words).

    The read-barrier only affects pointer load operations while write operations may be done without notice of the collector or the barrier. This allows white objects to die and be reclaimed during the same cycle of the collector. The reason for this is that they do not have to be copied when they are written to (and are marked grey or black then). The read-barrier only monitors access to white nodes, copying them and marking them grey befor allowing access to them. Access to grey nodes is allowed freely preventing the need to immediately visit a evacuated nodes and their descendants.

  • 14

    A. Drawbacks of Bakers This basic version of the algorithm has a major drawback

    as it has to scan the whole root set at flip time. If the root set is big enough, for example when it includes a program stack, the latency can grow quite high. Another problem can be the costs of evacuating larger objects. The usage of Bakers algorithm is also limited by its close connection to the mutator, making it hard to predict when and for how long the pauses for the mutator are. It can not be predicted when an allocation operation has to flip and thus bears a longer pause for the mutator thread.

    The first problem can be solved by incrementally scavanging a number (k) of stack cells at each allocation. Doing so the overhead spreads on a wider number of operations elminating the one long pause for the mutator. The number of k is recomputed every time. It is adapted so that k compared to the number of stack nodes represents the same ratio as stack nodes to heap nodes.

    Incrementally scavanging the stack of course complicates the routines accessing the stack. Each pop operation could have to change the position of the scan pointer for the stack. Secondly, the read-barrier needs to be extended so that it also blocks objects taken from the stack and not only those taken from the heap. The push operation does not need any special attention since the read-barrier already ensures only grey or black objects (that thus reside in Tospace) can be accessed.

    The process of scanning the stack can either begin at its bottom or its top. While Baker favours to start at the top of it others (Brooks, Steele) suggest to start at the bottom. Objects residing nearer to the bottom are changed much less often. Doing so the collector reduces the chance to copy garbage into Tospace.

    The solution for the second problem is to copy large objects only lazily. This means that not the whole object is copied at once but incrementally in parts. This requires an additional pointer in both the Fromspace and the Tospace object called forwarding address and backward link. When the object is evacuated the necessary space for the object (plus this pointer) is reserved in Tospace. The forwarding address pointer in the Fromspace is set to point to the copy in the Tospace, the backward link pointer in Tospace links back to the original version of the object in Fromspace. The scanning then continues on the version in Tospace with the objects data getting copied from Fromspace incrementally every time.

    Fig. 9 Lazy scanning using forwarding address and backward link

    Whenever there is a write access to the object this has to be blocked. If the address of the field to be accessed is higher than the position of scan, then the data is stored in the original copy in Fromspace. The data will be copied to Tospace anyway when the scanning wavefront reaches this address. If the address is lower than scan then the version in Tospace is altered since scan has already handled that region and the data was already copied to the object in Tospace. When the scanning of the object is finished (meaning scan is higher than the address of the last field of the object in Tospace), the backward link is cleared (set to nil).

    Since the algorithm is so closely interwoven with the mutator it relies on hardware support to gain full efficiency. The overhead cost of a micro-coded read-barrier as suggested by Baker lacking any special hardware support is predicted to be about 30% by Wholey and Fahlmann in 1984. Later research by Zorn in 1990 [8] on the other hand showed that a well-coded software read-barrier can reach much better results. He suggests 2% - 6% are possible on modern systems when using appropriate methods.

    Another problem is that the cost of access to an object depends on whether it is found in Fromspace or in Tospace. This makes the behaviour of the read-barrier highly unpredictable since the cost of an acutal access depends on whether the object has already been visited by the collector and transfered to Tospace in the current run or not what can hardly be predicted.

    The last drawback of Bakers algorithm is that it conservatively allocates new objects black making it impossible for them to be released prior the next run of the collector. This has already been discussed in detail for other algorithms.

    B. Variations on Baker 1) Brookss indirection pointers

    Brooks suggested to add the indirection field used for lazy scanning above to all of the objects. This somewhat simplifies

  • 15

    the read-barrier. As long as the object has not been copied both indirection fields - the one in the original Fromspace version and the field in the Tospace copy - point to the Fromspace object. Once the object has been copied both indirection pointers reference the version of the object in Tospace.

    To ensure no black to white pointers are set up, destructive operations like update have to forward their second argument before installing it since the mutator can see both the Fromspace and the Tospace version.

    Fig. 10 Brookss forwarding pointers

    Actually this method is more alike an incremental-update write-barrier algorithm. Its major drawback is that is needs a huge overhead for storing the indirection pointer. Applied on LISP cons cells this means a 50% overhead in space consumption. A property of course emerging is the time penalty of having to use the indirection pointer every time accessing an object. However, this is partially compensated by the fact that the write-barrier is triggered significantly less frequent than a read-barrier would be.

    2) North and Reppy A further improved version of Brookss variation was

    introduced by North and Reppy in 1987. They use Brookss indirection pointers method but store those pointers in a separate dataspace all together. This reduces the space overhead since the indirection pointers are stored only once while the original method store them twice. The time overhead is slightly increased though by the fact that this pointer space needs to be visited by a garbage collector itself.

    3) Dawson Dawson tries to tackle the conservativity of Bakers

    algorithm. New allocated objects are colored black there. An improvement on this offered by Dawson is to allocate new objects in Fromspace whenever possible and coloring them white. The next cycle of the collector is also initiated as soon as possible, when the last one has finished.

    Fig. 11 Heap layout for Dawsons collector

    4) MultiLisp implementation by Halstead Halstead tried to adapt Bakers algorithm so it can be used

    in a multi-processor environment using Concert MultiLisp. Several processors are able to access the same memory. Each processor has a garbage collection thread of its own and a separate Tospace and Fromspace region. Since they all share the same memory the garbage collection has to be synchronized between all collectors. Halsteads solution added a lock bit to each pointer and each object allowing a consistent run of all the collectors.

    XV. DYNAMIC REGROUPING Static regrouping is used to compact the data bringing

    related objects nearer to each other. This should result in a better runtime efficiency since paging should have to be used less often. When using an incremental garbage collector the additional improvement of a dynamic regrouping strategie can be thought of that can execute quite efficiently since it can be interwoven with the rest of the garbage collector.

    A technique proposed by Baker and Dawson is the dynamic regrouping. Its another improved version of Bakers copying collector and additionally tries to reorganize the data dynamically to increase the locality of reference for the memory.

    The original algorithm is adapted so that the mutator always puts its objects (that are newly allocated elements and such evacuated from Fromspace by the read-barrier) at the top end of the Tospace to position T. The scanner uses an additional pointer previousB (see Figure 11 further above for the layout of the Tospace) . It scans at that position whenever possible instead of using the position of scanB. That way the datastructures are traversed in a depth-first manner what linearizes the data.

    This is especially usefull for LISP programs that take heavy use of lists. For them first their cdr and only later the

  • 16

    car pointer is followed. By using dynamic regrouping LISP programs can have their lists linearized what should increases the execution time significantly while there is not too much overhead for the technique described here. Its simply a by-product of Bakers algorithm. The only major difference is that the scanner has to decide whether to continue scanning at previousB or at scanB. That is just one additional compare though and has a further advantage of not affecting the mutator in any way.

    Another approach was taken by Court. His Temporal Garbage Collector used the fact that even when a program uses a high amount of heap space only a somewhat smaller fraction of this data is heavily used during a typical section. Using a simple regrouping strategie he wanted to put that section together while more or less ignoring the part of memory that is used less often or not at all.

    He did that by letting the user run a training session of the most often used functions of the system. That sesstion told the collector what data was really used and what not. While doing so a read-barrier copied all the used objects into the Tospace while the unused stayed in Fromspace. The scanner was turned off during this and did not evacuate any objects. At the end of the training run all the data used was in Tospace while the unused still resided in Fromspace. The objects in Tospace were marked as static and were not copied during the following collection runs. This training-set was also stored on disk to be used for future sessions.

    While the idea behind is quite good of course its major drawback is that it still is a simple static method. It neither employes a dynamic regrouping strategy nor does it tribute to changes in the behaviour pattern of the user and strictly relies on data collected probably years ago. Nonetheless Court observed that paging time for his programs was used by 30 to 50 percent when applying this technique.

    He later combined this method with an adaptive training strategy. That used a series of shorter training sessions, one at each start of the collector. First a small amout of data was trained without the scanner running, the rest of the collection continued just as used. This generated an smaller set of often-used data that was more related to the currently used functions though. By also applying further improvements he reached an overall decrease of paging time of up to 75 to 80 percent total for his programs. This results could later be confirmed by Johnson and Llames but although not so dramatically as described by Court.

    LIST OF FIGURES 1. Possible failure of tricolor marking system 4 2. Mutator updates pointer from B to C 6 3. Yuasas snapshot write-barrier 6 4. Dijkstras write-barrier 7 5. Steeles write-barrier 8 6. Dijkstras quadratic complexity 9 7. Modula-2+ concurrent ref. counting architecture 11 8. Bakers Tospace layout 12 9. Lazy scanning 14 10. Brookss forwarding pointers 14 11. Heap layout for Dawsons collector 15

    REFERENCES [1] H. G. Baker, List processing in real time on a serial computer, Commun.

    ACM, vol. 21, no. 4, pp. 280294, 1978. [2] L. P. Deutsch, D. G. Bobrow: An efficient incremental automatic garbage

    collector, Commun. ACM 19,9 (Sept. 1976),522-526. [3] H. G. Baker: List processing in real-time on a serial computer, Commun.

    ACM, 21(4):280-94, 1978 [4] D. M. Russinoff, A mechanically verified incremental garbage collector,

    Formal Aspects of Computing 6: 359-390, 1994 [5] C. Queinnec et al. : Mark DURING Sweep rather than Mark THEN

    Sweep, ACM Lecture Notes In Computer Science; Vol. 365, 1989 [6] G. L. Steele: Multiprocessing compactifying garbage collection,

    Commun. ACM, 18(9):495--508, September 1975. [7] E. W. Dijkstra, et al. : On-the-fly garbage collection: An exercise in

    cooperation, Commun. ACM, 21(11):965--975, November 1978 [8] B. Zorn.: Barrier Methods for Garbage Collection. University of

    Colorado at Boulder. Technical Report CU-CS-494-90, 1990 [9] L. Huelsbergen, P. Winterbottom. Very concurrent mark-&-sweep

    garbage collection without fine-grain synchronization. Symposium on Memory Management, Vancouver, p. 166-175, 1998

    [10] Schartner: Mark & Sweep, in Seminar aus Softwareentwicklung: Garbage Collection, 2006

    [11] R. Schatz: Incremental Garbage Collection II, in Seminar aus Softwareentwicklung: Garbage Collection, 2006

    [12] T. Wrthinger: Incremental Garbage Collection III, in Seminar aus Softwareentwicklung: Garbage Collection, 2006