Use of the string class is quite common in our daily code implementations. But understanding the behavior of strings is very important, in terms of its performance, especially, when any operation is performed on it, like we append some string to it. Before we get into the actual discussion, let us re-iterate some important points related to strings, that you might already be aware of. These are:
- String is a reference type that behaves like a Value type variable.
- Being a reference types implies, that the value of a string variable is NOT the actual data, but a pointer/reference to the actual data.
- From MSDN: Although string is a reference type, the equality operators (== and !=) are defined to compare the values of string objects, not references. This makes testing for string equality more intuitive.
This means that we might expect that comparing two strings with == or != compares to the references/pointers of the actual data. But it does not. It directly compares the actual data they are assigned.
There is a term called immutable, which means the state of an object can't be changed after is has been created. A string is an immutable type. The statement that a string is immutable means that, once created, it is not altered by changing the value assigned to it. If we try to change the value of a string by concatenation (using + operator) or assign a new value to it, it actually results in creation of a new string object to hold a reference to the newly generated string. It might seem that we have successfully altered the existing string. But behind the scenes, a new string reference is created, which points to the newly created string.
Let's use an example to analyze this behavior. For this, we will simply create one string and assign it to another string. Then we will compare the values (that are references to the actual data in this case) of both and the actual data they point to. So we write the following code and run the application.
In the code above, when we copy one string to the other, it actually copies the reference (or pointer to the actual data) to the second variable and not the actual data. So now both variables s1 and s2 contain the same reference to the actual data and comparing their values and the actual data of the strings, return true in both the cases.
Now in the next step, we only change the s2 to append a string value to it. This time, of course the data is changed. But in this case, the value (pointer to actual data) of s2, is modified to point to the newly generated string. Had it modified the existing string, in other words Hello, being reference types, the value of s1 would have also changed to Hello User and the comparison of values of s1 and s2 would have returned true. But this did not happen, since changing the value of s2 resulted in the creation of a new string, pointing to the new data. So not only the values differ, but also the pointer (reference to the data) becomed different. So the comparison of the data and the references these strings hold, returns false.
Since the values of s1 and s2 and the data they point to do not match, its immutable behavior is reflected from it. The entire concept can be diagrammatically represented as in the following:
From the above discussion, it becomes very important to understand that string operations, especially for large string manipulation, should be done very carefully. As an alternative, we have the StringBuilder class, which is much more efficient then string. Using its Append(), it always manipulates the existing string rather then creating a new instance. Thus, when we would like to append a large string, we should prefer the use of the StringBuilder class instead of the string class. I hope you enjoyed reading this!