Garbage Collector in .NET



Garbage Collector is a common term for developers working with high level languages such as Java, .NET, Ruby etc...

We all know that the Garbage Collector is responsible for memory management in our code/application written, but how at end of the day does the GC do that?

Well that's what we are going to look at in today's discussion, we'll take up the case of Garbage Collector in .Net framework, how it works internally and how manully we can control it.

For making things more clear for the readers, (accroding to my understandings), Garbage Collection is a concept which can be achieved using any of the algorithms present out there in texts and web or even you can come out with one to implement Garbage collection which suits best for your environment (by environment I mean, can be OS, custom frameworks, language created by you etc...). You can refer to the end of the article to see the various references and links wheere you can find various Garbage Collection Algorithms.

Ok, coming back to how Garbage collection is implemented in .NET:

The .NET CLR (Common Language Runtime) is the one which implements the Garbage Collection for us, all resource allocation happens from the heap (also termed the managed heap). Whenever the process is initialized, CLR reservers a contiguous address space, which is termed as managed heap.

The heap also maintains a pointer (let us name it ptr for our reference), which indicates where the next object in the address space is to allocated, initially this pointer is set the the base address of the managed heap. Whenever 'new' is used to create an object in C#/VB/Visual C++ that internally is converted in an IL instruction of 'newobj' which tells the CLR to do the following things:

  1. Calculate the number of bytes required for that type.
  2. Add the bytes required for object's overhead (we'll learn more on this next)
  3. CLR performs a check whether the required bytes are available on the managed heap or not, if it is available, the object will be accomodated easily, starting at the address pointed to out by ptr. To elaborate bit more :

    gar1.gif

    as you can see the pointer is pointing to the location where the next object will be allocated space in the heap.

Coming back to the 2nd point in the steps involved, space allocation for every object in .NET requires additional space which is dependent on whether you are working on 32 bit or 64 bit system, one of the things this additional space needs is for storing the metadata of the object (or the header information called sometimes).

Let us do a more comprehensive comparison with memory allocation in C and why .NET fairs well here in this approach:

In C, memory allocation for an object requires parsing a linked list data structure; if a memory block large enough to allocate space for the object is found, that block is split and pointers are then modified to keep them updated with the new starting address of allocated as well as unallocated blocks. But in our case of .NET memory allocation, allocating an object is as simple as moving the pointer to the next address of freely available space in our contiguous managed heap. So allocating space for several objects in C could cause these objects to be separated by sizeable amount of bytes and in spome cases could be even in megabytes which in .NET will be contigious allocation, thus memory in allocation comes with the advantage of 'Locality of Reference' thus giving performance gains for objects which share a strong relationship.

To enable this working of memory allocation in .NET there should be a huge and sufficient amount of consitguous space available for the disposal of CLR, to achieve this .NET uses the concept of Garbage Collector.

Let us see how GC in .NET works:

In previous discussion we saw one case when space is available for allocation of object, CLR goes ahead and allocates space, but what if space isn't available. Well here comes the concept of Generations. The CLR verison comes with Generation 0,1,2 (I am not sure about CLR 4 i.e .NET 4).

The GC checks to see if any object in the heap is no longer being used in the application, if not then that memory space is recovered and object is removed from the heap. But how does the CLR determine whether the object is being refrenced (or used in the application), well that itself makes an interesting topic but to cut short and making it simple (mind you which it isn't), every applciaction has a set of roots, these roots are nothing but pointer to reference type objects so these pointers either point to an object or null.

Roots can be your static fields defined within the type or method parameters or local variables of reference type (value types aren't considered roots).

So when the GC first starts all the objects which are allocated in the managed heap are assumed to be garbage i.e. These objects are meant to be checked whether or not these are used in application, this stage is called the marking phase, if a root to the object in managed heap is found that object is marked and this marking is done by turning the bit on in the object's sync block index field (this is usually the metadata information kept for the object, as we discussed in step 2 of memory allocation).

gar2.gif

In the above figure objA and objC are the roots and one can see that application's roots directly refer to objects A and C, all these objects are marked and while marking ObjC GC can see that it in turn refers to ObjF, so Obj F is also marked.

After all roots are traversed we have two sets of objects: one that is are marked and another that is not. None of these unmarked objects are considered for garbage collection and their memory is reclaimed.

Next comes the compaction phase; there the collector traverses the heap linearly looking for contiguous blocks of unmarked (garbage) objects. The GC moves the marked objects down in memory to compact the heap. This naturally calls for correcting the variables and CPU registers that contain pointers to the objects as the marked objects are now at new addresses after compaction phase.

This naturally is a performance hit to some extent that comes with Garbage collection, that is why it is not normally considered a good programming practice to invoke the GC through your code.

The above said happens only when the GENERATION 0 is full that is the objects in first phase completely occupy the managed heap and there is no space available for allocation of new objects.

So in this article we saw how memory allocation is performed in .NET and how the Garbage collection takes place.

We'll see more on the topic in next article along with how manually we can control the lifetime of an object through the managed API's exposed from .NET framework.

Reference: http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) you can find some good algorithms for implementing Garbase Collection here.

Author:

Suchit Khanna.
  


Similar Articles