Search

Monday, January 19, 2009

Back to basics: Why use garbage collection

This is a part of a series of post on GC. See the index here

Garbage Collection does exactly what it’s more fancier name “Automatic dynamic memory management” suggests. As I discussed in my last post on Memory Allocation dynamic memory is hard to manage and GC attempts to do that automatically and relieves the coder from the hard task.

GC basically attempts to take care of two basic scenarios remove garbage and avoid dangling pointers. They are very inter-related but are different scenarios

Garbage Collect

Consider a typical object reference graph. In the graph every rectangle is an object and the arrows denote object references. So A is an object which references B. For a program to be able to use an object it should be in one of these reference chains. The reference chain start from what is called Root and are typically references held in registers, on stack as local variable or global variables.

image

Let’s assume that due to some operation, A relinquishes the reference to B and the graph becomes something like this…

image 

Now B and hence C is not reachable from any valid root in the program and hence have become Garbage (un-reachable). The programmer must ensure to follow all reference and free (de-allocate them). One of the duty of a GC system is to automate this process by tracking down (using various algorithms) such objects and reclaim the memory used by them automatically. So in a GC system when the reference is broken it will figure out that B and hence C is not reachable and will de-allocate them.

Hanging/dangling reference

Let’s consider another object graph which is similar to the one above but in addition to A, another object A’ also has a reference to B (or in other words B is shared between them)

image

Even here after some operation object A doesn’t need reference to B. The programmer does what he thinks is right and de-allocates B and C

image

However, A’ still has a references to B and hence that reference is now hanging or in more specific term pointing to invalid memory and would typically return unpredictable result when accessed. The key here is the unpredictable behavior. It is not necessary that program will crash. Unless the memory location in B is re-used it will seem to have valid data and de-references from A’ will work fine. So the failure will come up in un-expected ways and totally un-related places in the program and will make locating the root cause extremely hard.

GC helps by automatically taking care of both of the above scenarios and ensuring that the system doesn’t land up in either of them.

How important is GC

If GC is so cool then why doesn’t all systems use GC. There are multiple reasons but with newer technology most of them are going away. One of the primary reasons of not using GC is the performance overhead. This makes GC a less lucrative deal for real-time systems, device drivers and even gaming platforms. However, there are examples where GC is used even for these systems, e.g. .NET Compact Framework (the team I work for) is used with full GC support very successfully in XNA games on Xbox.

However, on some systems like functional languages which relies a lot on closures and deferred execution of those, it makes execution flow very un-predictable and hence GC becomes almost mandatory.

No comments: