Introduction
.NET Framework provides aspect-oriented solution (http://en.wikipedia.org/wiki/Aspect-oriented_programming) to XML serialization which saves a lot of typing and debugging compared to manual implementation. However, this simplification in our lives does not come without a price. Incorrect class design may disqualify it for XML serialization. This article demonstrates some typical design decisions that have to be made when programming classes intended to be serialized into XML.
Example that will be shown concerns a class named Person which, as may be expected, represents a human being reduced to only a couple of features:
- Name - for simplicity, contains full name of the person.
- Date of birth
- Gender - enumeration: male/female
Simple Class Example
Here comes simple implementation of the Person class which demonstrates basic XML serialization capabilities.
using System;
using System.Xml.Serialization;
using System.IO;
namespace SerializationTest
{
public enum PersonGender
{
Male,
Female
}
public class Person
{
public Person() // Default constructor must be available
{
}
public Person(string name, DateTime dob, PersonGender gender)
{
_name = name;
_dateOfBirth = dob;
_gender = gender;
}
[XmlElement("Name")]
public string Name
{
get { return _name; }
set { _name = value; }
}
[XmlElement("DateOfBirth")]
public DateTime DateOfBirth
{
get { return _dateOfBirth; }
set { _dateOfBirth = value; }
}
[XmlElement("Gender")]
public PersonGender Gender
{
get { return _gender; }
set { _gender = value; }
}
private string _name;
private DateTime _dateOfBirth;
private PersonGender _gender;
}
class Program
{
static void Main(string[] args)
{
Person joe = new Person("Joe", new DateTime(1970, 5, 12), PersonGender.Male);
Person mary = new Person("Mary", new DateTime(1972, 3, 6), PersonGender.Female);
XmlSerializer serializer = new XmlSerializer(typeof(Person));
using (Stream output = Console.OpenStandardOutput())
{
serializer.Serialize(output, joe);
serializer.Serialize(output, mary);
}
}
}
}
Main function simply instantiates two persons, Joe and Mary, and then serializes them to XML. Code produces output like this:
<?xml version="1.0"?>
<Person
xmlns:xsi="http:"//www.w3.org/2001/XMLSchema-instance
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Name>Joe</Name>
<DateOfBirth>1970-05-12T00:00:00</DateOfBirth>
<Gender>Male</Gender>
</Person><?xml version="1.0"?>
<Person
xmlns:xsi="http:"//www.w3.org/2001/XMLSchema-instance
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Name>Mary</Name>
<DateOfBirth>1972-03-06T00:00:00</DateOfBirth>
<Gender>Female</Gender>
</Person>
Adding Functionality
Now suppose that we want something more. For example, we may decide to add properties named Mother and Father, representing another instances of Person class which in modelled system play roles of person's parents.
In addition, we shall add a simple method IsBrotherOrSister, which accepts another Person reference and tests whether both mother and father are the same for both objects. Here is the code added to the Person class:
[XmlElement("Mother")]
public Person Mother
{
get { return _mother; }
set { _mother = value; }
}
[XmlElement("Father")]
public Person Father
{
get { return _father; }
set { _father = value; }
}
public bool IsBrotherOrSister(Person p)
{
return
object.ReferenceEquals(_mother, p._mother) &&
object.ReferenceEquals(_father, p._father);
}
private Person _mother;
private Person _father;
Now we can perform a little different test on this extended class. The test will require that both Joe and Mary testify that they are brother and sister before serialization as well as after deserialization. To accomplish that, we will first serialize Joe and Mary into memory stream, and then deserialize the stream into another objects representing Joe and Mary (called Joe1 and Mary1). Here is the code:
static void Main(string[] args)
{
Person joe = new Person("Joe", new DateTime(1970, 5, 12), PersonGender.Male);
Person mary = new Person("Mary", new DateTime(1972, 3, 6), PersonGender.Female);
Person wilma = new Person("Wilma", new DateTime(1941, 2, 14), PersonGender.Female);
Person harry = new Person("Harry", new DateTime(1938, 3, 18), PersonGender.Male);
joe.Mother = mary.Mother = wilma;
joe.Father = mary.Father = harry;
bool related = joe.IsBrotherOrSister(mary);
Console.WriteLine("{0} and {1} are {2}related.",
joe.Name, mary.Name, related ? "" : "NOT ");
XmlSerializer serializer = new XmlSerializer(typeof(Person));
using (MemoryStream ms = new MemoryStream())
{
serializer.Serialize(ms, joe);
ms.Position = 0;
Person joe1 = serializer.Deserialize(ms) as Person;
long startPos = ms.Position;
serializer.Serialize(ms, mary);
ms.Position = startPos;
Person mary1 = serializer.Deserialize(ms) as Person;
related = joe1.IsBrotherOrSister(mary1);
Console.WriteLine("{0} and {1} are {2}related.",
joe1.Name, mary1.Name, related ? "" : "NOT ");
}
}
Output will look like this:
Joe and Mary are related.
Joe and Mary are NOT related.
Now you can observe that serialization and deserialization has caused actuall loss of information. Before serialization, both Joe and Mary had referenced the same objects as their mother and father. But after deserialization, this information has simply been lost – two distinct objects have replaced Wilma, and the same happened to Harry. From the point of view of Joe and Mary, their parents are not the same objects and consequently, they are not considered brother and sister in our object model any more.
Adapting the Model
This simple case shows how serialization actually works. It serializes properties as they are given, and some information regarding semantics of our objects may be lost in serialization and cannot be recovered in deserialization. If you take a closer look, what is lost is about references: Serialization does not preserve references.
The solution to this problem is to plan such situations in advance. Programmers must always be aware of class being planned to be serialized to XML and to design it with that in mind. Some design decisions are made purely to allow lossless serialization-deserialization process and have no other purpose in code. We will show one such decision using the Person class as an example.
What we are facing with the Person class is that multiple objects of this class are serialized as a whole, not just a set of unrelated objects. For that purpose, we will create another class, named SerializablePersonSet. That class will not be a simple set of Person objects, because for that purpose we can use any collection available. It will also keep information about parents of persons. The class has word Serializable prepended to name intentionally, to make it clear that this class is not made to act as a functional set of persons, but only to allow serialization and deserialization of multiple persons as a utility class.
The basic requirement for this new class is to preserve family relations – mothers and fathers must not be created multiple times, but only once, and their children must reference the same objects after deserialization. What we are going to demonstrate here is a technique sometimes called dehydration/rehydration. When serializing, we are going to replace information that would be lost because it is invisible to serializer (i.e. references to mother/father objects) with more explicit information which is normally visible to serializer (i.e. indices of mother/father objects in the structure).
For example, if we construct a family of Joe and Mary as children and Wilma and Harry as their mother and father, then we can add information that Joe's mother is person #2 (zero-based) and father is person #3. Same information (#2 and #3) will be attached to Mary. If a person has no known mother/father, corresponding indices will have invalid value -1, indicating the situation. With this modification, we can preserve information who's related to whom in an array of persons and leave references to be forgotten along the way. This is the process of dehydration – unfathomable data are replaced with their simple counterparts.
Once done, serialized array of persons will contain persons themselves (without mother and father) and two arrays of integers indicating mothers and fathers indices in the original array. This is something that XML serialization cannot perform on its own, but those additional data must be specifically provided to the serializer.
Deserialization is performed by first deserializing persons, without mothers and fathers. Then indices of mother/father objects are used to set Mother and Father properties to real references. This process is referred as rehydration – helper information, simple integers indicating positions in array, are replaced with more advanced form – references to objects.
To make this possible, we have to make changes to Person class as well. This was meant in the text above when it was said that some design decisions are made for serialization purposes only. First of all, Mother and Father will not be serialized any more – XmlElement attributes will be replaced with XmlIgnore. Next, integer value will be added to Person class to indicate object's position in an array. This value is of no general use and it will be represented by an internal property, so that classes outside the namespace cannot modify it. Needless to say, this property will not be serialized (it's not public anyway, XmlSerializer doesn't see it). In addition, helper properties for mother and father index will be added to Person class.
Here are the modifications made to Person class:
public class Person
{
...
[XmlIgnore()]
public Person Mother
{
get { return _mother; }
set { _mother = value; }
}
[XmlIgnore()]
public Person Father
{
get { return _father; }
set { _father = value; }
}
internal int Id
{
get { return _id; }
set { _id = value; }
}
internal int FatherId
{
get { return _father == null ? -1 : _father.Id; }
}
internal int MotherId
{
get { return _mother == null ? -1 : _mother.Id; }
}
private int _id;
}
Having these properties available, we can design the SerializablePersonSet class as follows.
public class SerializablePersonSet
{
public SerializablePersonSet()
{
_persons = new List<Person>();
}
public void Clear()
{
_persons.Clear();
}
[XmlArray(ElementName="Persons",Order=1)]
[XmlArrayItem("Person")]
public Person[] Persons
{
get { return _persons.ToArray(); }
set
{
_persons.Clear();
_persons.AddRange(value);
for (int i = 0; i < value.Length; i++)
{
value[i].Id = i;
}
}
}
[XmlArray(ElementName="Mothers", Order=2)]
[XmlArrayItem("MotherIndex")]
public int[] MotherIndices
{
get
{
int[] indices = new int[_persons.Count];
for (int i = 0; i < _persons.Count; i++)
indices[i] = _persons[i].MotherId;
return indices;
}
set
{
for (int i = 0; i < value.Length; i++)
if (value[i] >= 0)
_persons[i].Mother = _persons[value[i]];
}
}
[XmlArray(ElementName="Fathers", Order=3)]
[XmlArrayItem("FatherIndex")]
public int[] FatherIndices
{
get
{
int[] indices = new int[_persons.Count];
for (int i = 0; i < _persons.Count; i++)
indices[i] = _persons[i].FatherId;
return indices;
}
set
{
for (int i = 0; i < value.Length; i++)
if (value[i] >= 0)
_persons[i].Father = _persons[value[i]];
}
}
private List<Person> _persons;
}
Observe that this class exposes three properties for serialization. First property is array of persons, and the others are indices of mothers and indices of fathers in the array of persons. Also note that serialization order is strictly specified using Order attribute. This is significant because all persons must be available in the object before mothers and fathers are set, because both mother and father of each object are picked from the common array of objects. Hence, persons must be serialized and deserialized first, and only then mothers and fathers indices may follow.
First let's see how it looks to use SerializablePersonSet to serialize our family:
static void Main(string[] args)
{
Person joe = new Person("Joe", new DateTime(1970, 5, 12), PersonGender.Male);
Person mary = new Person("Mary", new DateTime(1972, 3, 6), PersonGender.Female);
Person wilma = new Person("Wilma", new DateTime(1941, 2, 14), PersonGender.Female);
Person harry = new Person("Harry", new DateTime(1938, 3, 18), PersonGender.Male);
joe.Mother = mary.Mother = wilma;
joe.Father = mary.Father = harry;
Person[] family = new Person[] { joe, mary, wilma, harry };
XmlSerializer serializer = new XmlSerializer(typeof(SerializablePersonSet));
SerializablePersonSet set = new SerializablePersonSet();
set.Persons = family;
Stream output = Console.OpenStandardOutput();
serializer.Serialize(output, set);
}
Output will look like this:
<?xml version="1.0"?>
<SerializablePersonSet
xmlns:xsi="http:"//www.w3.org/2001/XMLSchema-instance
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Persons>
<Person>
<Name>Joe</Name>
<DateOfBirth>1970-05-12T00:00:00</DateOfBirth>
<Gender>Male</Gender>
</Person>
<Person>
<Name>Mary</Name>
<DateOfBirth>1972-03-06T00:00:00</DateOfBirth>
<Gender>Female</Gender>
</Person>
<Person>
<Name>Wilma</Name>
<DateOfBirth>1941-02-14T00:00:00</DateOfBirth>
<Gender>Female</Gender>
</Person>
<Person>
<Name>Harry</Name>
<DateOfBirth>1938-03-18T00:00:00</DateOfBirth>
<Gender>Male</Gender>
</Person>
</Persons>
<Mothers>
<MotherIndex>2</MotherIndex>
<MotherIndex>2</MotherIndex>
<MotherIndex>-1</MotherIndex>
<MotherIndex>-1</MotherIndex>
</Mothers>
<Fathers>
<FatherIndex>3</FatherIndex>
<FatherIndex>3</FatherIndex>
<FatherIndex>-1</FatherIndex>
<FatherIndex>-1</FatherIndex>
</Fathers>
</SerializablePersonSet>
Now observe that information about mothers and fathers is stored in Mothers and Fathers arrays as expected. No actual information loss has occurred durig serialization process, although references were not serialized.
Now deserialization is equally simple. We will test relation of Joe and Mary as the utmost proof that all data was preserved – after deserialization they must point to the same objects representing Wilma and Harry:
static void Main(string[] args)
{
Person joe = new Person("Joe", new DateTime(1970, 5, 12), PersonGender.Male);
Person mary = new Person("Mary", new DateTime(1972, 3, 6), PersonGender.Female);
Person wilma = new Person("Wilma", new DateTime(1941, 2, 14), PersonGender.Female);
Person harry = new Person("Harry", new DateTime(1938, 3, 18), PersonGender.Male);
joe.Mother = mary.Mother = wilma;
joe.Father = mary.Father = harry;
bool related = joe.IsBrotherOrSister(mary);
Console.WriteLine("{0} and {1} are {2}related.",
joe.Name, mary.Name, related ? "" : "NOT ");
Person[] family = new Person[] { joe, mary, wilma, harry };
SerializablePersonSet set = new SerializablePersonSet();
set.Persons = family;
using (MemoryStream ms = new MemoryStream())
{
XmlSerializer serializer = new XmlSerializer(typeof(SerializablePersonSet));
serializer.Serialize(ms, set);
ms.Position = 0;
SerializablePersonSet set1 = serializer.Deserialize(ms) as SerializablePersonSet;
Person[] family1 = set1.Persons;
Person joe1 = family1[0];
Person mary1 = family1[1];
related = joe1.IsBrotherOrSister(mary1);
Console.WriteLine("{0} and {1} are {2}related.",
joe1.Name, mary1.Name, related ? "" : "NOT ");
}
}
This code produces expected output:
Joe and Mary are related.
Joe and Mary are related.
This proves that deserialization has created correct relations between objects.
Conclusion
Serializing complex classes, essentially those containing references to other classes, requires special care. The most effective way of preserving references is to create another class which represents serializable counterpart of the original class, rather than insisting on making original class serializable. This may further require that referenced classes are also replaced by their own serializable versions, which may lead to a relatively complex translation between original and serializable classes. Another way of dealing with the situation would be to perform custom serialization, if one finds that path less costly.
Result of this design process is twofold:
-
All classes translated to their serializable versions can be serialized to XML and later deserialized and translated back to original shape without loss of relevant information.
-
Original classes do not have to expose public properties that might otherwise jeopardize encapsulation and other design goals.
This article has shown one technique in which references to objects are replaced with scalar information which identifies referenced object. This can be pushed even further, to insist that all classes in object model contain unique serializable identifiers (i.e. of type integer, string, etc.). In that way, dehydration/rehydration of object model would be simplified, since uniqueness of object identities would be granted rather than enforced ad hoc when serialization begins. For example, file system objects (files, directories) can be uniquely identified by their full path. Object model would contain objects, but during serialization, references would be replaced by full paths of referenced objects, which would further allow reconstruction of references in deserialization.
This article has shown that XML serialization, so simply added in code, is not a magical stick. Serialization must be planned in full detail when working with complex classes, rather than expected to work by itself. Loss of planning work leads to redesign work later on, when maintaining serialization of original classes becomes too expensive or even hits the limit after which serialization of original classes is not possible without loss of data.