Classes, Objects, and References
It is important to further clarify the distinction between classes, objects, and reference variables. Recall that a class is nothing more than a blueprint that describes how an instance of this type will look and feel in memory.
Classes, of course, are defined within a code file (which in C# takes a *.cs extension by convention). Consider the following simple Car class defined within a new C# application project named SimpleGC:
// Car.cs
public class Car
{
public int CurrentSpeed { get; set; }
public string PetName { get; set; }
public Car() { }
public Car(string name, int speed)
{
PetName = name;
CurrentSpeed = speed;
}
public override string ToString()
{
return string.Format("{0} is going {1} MPH",
PetName, CurrentSpeed);
}
}
After a class has been defined, you may allocate any number of objects using the C# "new" keyword. Understand, however, that the new keyword returns a reference to the object on the heap, not the actual object itself. If you declare the reference variable as a local variable in a method scope then it is stored on the stack for further use in your application. When you want to invoke members on the object, apply the C# dot operator to the stored reference, like so:
class Program
{
static void Main(string[] args)
{
Console.WriteLine("***** GC Basics *****");
// Create a new Car object on
// the managed heap. We are
// returned a reference to this
// object ("refToMyCar").
Car refToMyCar = new Car("Zippy", 50);
// The C# dot operator (.) is used
// to invoke members on the object
// using our reference variable.
Console.WriteLine(refToMyCar.ToString());
Console.ReadLine();
}
}
Figure 13-1 illustrates the class, object, and reference relationship.
Figure 13-1. References to objects on the managed heap
Structures are value types that are always allocated directly on the stack and are never placed on the .NET managed heap. Heap allocation occurs only when you are creating instances of classes.
The Basics of Object Lifetime
When you are building your C# applications, you are correct to assume that the .NET runtime environment (a.k.a. the CLR) will take care of the managed heap without your direct intervention. In fact, the golden rule of .NET memory management is simple:
"Allocate a class instance onto the managed heap using the new keyword and forget about it."
Once instantiated, the garbage collector will destroy an object when it is no longer needed. The next obvious question, of course, is, “How does the garbage collector determine when an object is no longer needed?” The short (i.e., incomplete) answer is that the garbage collector removes an object from the heap only if it is unreachable by any part of your code base. Assume you have a method in your Program class that allocates a local Car object as follows:
static void MakeACar()
{
// If myCar is the only reference to the Car object,
// it *may* be destroyed when this method returns.
Car myCar = new Car();
}
Notice that this Car reference (myCar) has been created directly within the MakeACar() method and has not been ed outside of the defining scope (via a return value or ref/out parameters). Thus, once this method call completes, the myCar reference is no longer reachable, and the associated Car object is now a candidate for garbage collection. Understand, however, that you can't guarantee that this object will be reclaimed from memory immediately after MakeACar() has completed. All you can assume at this point is that when the CLR performs the next garbage collection, the myCar object could be safely destroyed.
As you will most certainly discover, programming in a garbage-collected environment greatly simplifies your application development. In stark contrast, C++ programmers are painfully aware that if they fail to manually delete heap-allocated objects, memory leaks are never far behind. In fact, tracking down memory leaks is one of the most time-consuming (and tedious) aspects of programming in unmanaged environments. By allowing the garbage collector to take charge of destroying objects, the burden of memory management has been lifted from your shoulders and placed onto those of the CLR.
The CIL of new
When the C# compiler encounters the new keyword, it emits a CIL newobj instruction into the method implementation. If you compile the current example code and investigate the resulting assembly using ildasm.exe, you'd find the following CIL statements within the MakeACar() method:
.method private hidebysig static void MakeACar() cil managed
{
// Code size 8 (0x8)
.maxstack 1
.locals init ([0] class SimpleGC.Car myCar)
IL_0000: nop
IL_0001: newobj instance void SimpleGC.Car::.ctor()
IL_0006: stloc.0
IL_0007: ret
} // end of method Program::MakeACar
Before we examine the exact rules that determine when an object is removed from the managed heap, let's check out the role of the CIL newobj instruction in a bit more detail. First, understand that the managed heap is more than just a random chunk of memory accessed by the CLR. The .NET garbage collector is quite a tidy housekeeper of the heap, given that it will compact empty blocks of memory (when necessary) for purposes of optimization. To aid in this endeavor, the managed heap maintains a pointer (commonly referred to as the next object pointer or new object pointer) that identifies exactly where the next object will be located. That said, the newobj instruction tells the CLR to perform the following core operations:
- Calculate the total amount of memory required for the object to be allocated
(including the memory required by the data members and the base classes).
- Examine the managed heap to ensure that there is indeed enough room to host the object to be allocated. If there is, the specified constructor is called and the caller is ultimately returned a reference to the new object in memory, whose address just happens to be identical to the last position of the next object pointer.
- Finally, before returning the reference to the caller, advance the next object pointer to point to the next available slot on the managed heap.
Figure 13-2. The details of allocating objects onto the managed heap
As your application is busy allocating objects, the space on the managed heap may eventually become full. When processing the newobj instruction, if the CLR determines that the managed heap does not have sufficient memory to allocate the requested type, it will perform a garbage collection in an attempt to free up memory. Thus, the next rule of garbage collection is also quite simple:
"If the managed heap does not have sufficient memory to allocate a requested object, a garbage collection will occur."
Exactly how this garbage collection occurs, however, depends on which version of the .NET platform your application is running under.
The Role of Application Roots
How the garbage collector determines when an object is no longer needed. To understand the details, you need to be aware of the notion of application roots. Simply put, a root is a storage location containing a reference to an object on the managed heap. Strictly speaking, a root can fall into any of the following categories:
- References to global objects (though these are not allowed in C#, CIL code does permit allocation of global objects)
- References to any static objects/static fields
- References to local objects within an application's code base
- References to object parameters ed into a method
- References to objects waiting to be finalized (described later in this chapter)
- Any CPU register that references an object
During a garbage collection process, the runtime will investigate objects on the managed heap to determine whether they are still reachable (i.e., rooted) by the application. To do so, the CLR will build an object graph, that represents each reachable object on the heap. Object graphs are explained in some detail during the discussion of object serialization in Chapter 20. For now, just understand that object graphs are used to document all reachable objects. As well, be aware that the garbage collector will never graph the same object twice, thus avoiding the nasty circular reference count found in COM programming.
Assume the managed heap contains a set of objects named A, B, C, D, E, F, and G. During a garbage collection, these objects (as well as any internal object references they may contain) are examined for active roots. After the graph has been constructed, unreachable objects (that we will assume are objects C and F) are marked as garbage. Figure 13-3 diagrams a possible object graph for the scenario just described (you can read the directional arrows using the phrase "depends on" or "requires").
For example, E depends on G and B, A depends on nothing, and so on).
Figure 13-3. Object graphs are constructed to determine which objects are reachable by application roots
After objects have been marked for termination (C and F in this case, since they are not accounted for in the object graph), they are swept from memory. At this point, the remaining space on the heap is compacted, that in turn causes the CLR to modify the set of active application roots (and the underlying pointers) to refer to the correct memory location (this is done automatically and transparently). Last but not least, the next object pointer is readjusted to point to the next available slot.
Figure 13-4 illustrates the resulting readjustment.
Understanding Object Generations
When the CLR is attempting to locate unreachable objects, it does not literally examine each and every object placed on the managed heap. Doing so, obviously, would involve considerable time, especially in larger (i.e., real-world) applications.
To help optimize the process, each object on the heap is assigned to a specific “generation”. The idea behind generations is simple: the longer an object has existed on the heap, the more likely it is to stay there. For example, the class that defined the main window of a desktop application will be in memory until the program terminates. Conversely, objects that have only recently been placed on the heap (such as an object allocated within a method scope) are likely to be unreachable rather quickly.
Given these assumptions, each object on the heap belongs to one of the following generations:
- Generation 0: Identifies a newly allocated object that has never been marked for collection.
- Generation 1: Identifies an object that has survived a garbage collection (in other words, it was marked for collection but was not removed due to the fact that the sufficient heap space was acquired).
- Generation 2: Identifies an object that has survived more than one sweep of the garbage collector.
Note: Generations 0 and 1 are termed ephemeral generations. As explained in the next section, you will see that the garbage collection process does treat ephemeral generations differently.
The garbage collector will investigate all generation of 0 objects first. If marking and sweeping (or said more plainly, getting rid of) these objects results in the required amount of free memory, any surviving objects are promoted to generation 1. To see how an object's generation affects the collection process, ponder Figure 13-5, that diagrams how a set of surviving generation 0 objects (A, B, and E) are promoted once the required memory has been reclaimed.
Figure 13-5. Generation 0 objects that survive a garbage collection are promoted to generation 1
If all generation 0 objects have been evaluated, but additional memory is still required, generation 1 objects are then investigated for reachability and collected accordingly. Surviving generation 1 objects are then promoted to generation 2. If the garbage collector still requires additional memory, generation 2 objects are evaluated. At this point, if a generation 2 object survives a garbage collection, it remains a generation 2 object, given the predefined upper limit of object generations. The bottom line is that by assigning a generational value to objects on the heap, newer objects (such as local variables) will be removed quickly, while older objects (such as a program's main Window) are not “bothered” as often.