General Formatter for .NET 1/4: Introduction


This is the first of four parts this article consists of:

  1. Introduction - Part in which proposed goals and desired features of the solution are defined.
  2. Design - Part in which solution design is outlined, explaining critical details that the solution will have to implement.
  3. Implementation - Third part which explains actual implementation of the formatter classes.
  4. Example - Final part which lists numerous examples of formatters use.

The last part of the article will have a compressed file attached with it, containing complete and commented source code of the formatter classes, source code of the demonstration project and compiled library.

Introduction

Converting objects to strings is an everyday programming task which can be viewed in one of two basic forms: serialization or formatting. Serialization is reversible process in which object is converted to string (or some other form) from which it can be recovered into original state later. Formatting is less rigid operation, in which string is built so to resemble contents of the object, but without a requirement that object should be restored from that string later.

In this article we will deal with the problem of formatting. We will address the question how to build a string which represents object of unknown structure and contents. This question is opposed to typical situation in which programmer formats the string which represents object of known structure, and hence contents of the string can reflect object's semantics in their proper ways.

For example, if we have an instance of the Rectangle structure, it could be formatted like this:

Rectangle r = new Rectangle(1, 2, 3, 4);
Console.WriteLine("({0},{1})-({2},{3})", r.Left, r.Top, r.Right, r.Bottom);

This piece of code produces output:

(1,2)-(4,6)

Output presented like this shows upper-left and lower-right corner of the rectangle. But in order to show such output, we must know that represented object is a rectangle, and also that such string will be informative to the reader. Under different circumstances, user might not be satisfied to see this format, but might require something different (e.g. left-top-right-bottom representation of rectangle's coordinates). One could remember several other formats applicable to the Rectangle structure, let alone other data types. Different formats that could be used in a complex software project are seemingly endless.

This naturally leads to the question whether there is a format that could be applied to all objects uniformly? The answer is certainly yes, simply because formatting is a loosened form of serialization to string. Any serializer can be legitimately used as a formatter as well. But then, reader might become quite confused reading the formatted string, because serializers typically produce output which is not really user friendly. Their purpose is on the other side, and they should not put too much effort into readability.

So our question regarding formatters might need to be refined. We might ask if there is a format that could be applied uniformly to all objects in such way that resulting string is readable and sufficiently informative to the reader. In other words, we are searching for such formatter to convert any object to string so that human reader might visually search for the needed information (contained in the original object) and find it without much effort.

Answering this question positively means to look for the formatter which finds proper balance between length of the string presented to the reader and its informative contents which resembles contents of the object. In this article we will present a formatter which attempts to find such balance. The complete designing process will be explained and full source code given to the reader, along with numerous examples of its use.

Formatting Goals

When trying to format string which represents object of unknown structure, we must first determine what the proposed goal is. Formatter would have to deal with quite different objects in its time and all objects, from simplest to most complex, should be presented in the form of human-readable string in accordance to same set of fixed, predefined formatting rules.

So let's start naming the predefined formatting rules. First of all, every object is either primitive or consists of other objects. Primitive objects, like integer values, can be simply formatted as:

int Count = 4

More complex object, like Point which has three properties: IsEmpty, X and Y, can be represented by simply listing their contained objects:

Point Center { bool IsEmpty=false, int X=3, int Y=4 }

In this case, we have printed the three public properties of the Point structure in one line. Somebody might prefer multi-lined representation, in which indentation is used to determine which object is child of which other object:

Point Center = {
     bool IsEmpty = false
     int X = 3
     int Y = 4 }

Things may become even more complex, like in the following example of a Rectangle structure:

Rectangle Window {
|-- int Bottom = 51
|-- int Height = 42
|-- bool IsEmpty = false
|-- int Left = 14
|-- Point Location = {
|    |-- bool IsEmpty = false
|    |-- int X=14
|    +-- int Y=9 }
|-- int Right = 31
|-- Size Size = {
|    |-- int Height = 42
|    |-- bool IsEmpty = false
|    +-- int Width = 17 }
|-- int Top = 9
|-- int Width = 17
|-- int X=14
+-- int Y=9 }


In this example we have formatted all public properties of the Rectangle structure as a tree, taking advantage of the fact that string is printed in monospaced font. However, public properties exposed by the Rectangle type are so redundant that at least half of the output is redundant as well. But general formatter cannot detect redundancies and it has to live with them. What could be improved in the example above is to compact Location and Size properties representation into one line each, rather than spreading them to multiple lines:

Rectangle Window {
|-- int Bottom=51
|-- int Height=42
|-- bool IsEmpty=false
|-- int Left=14
|-- Point Location { bool IsEmpty=false, int X=14, int Y=9 }
|-- int Right=31
|-- Size Size { int Height=42, bool IsEmpty=false, int Width=17 }
|-- int Top=9
|-- int Width=17
|-- int X=14
+-- int Y=9 }


This output is still full of redundancies, but at least takes less room to print.

Article continues in General Formatter for .NET 2/4: Design.