Abstract.
Based on a real world project this article shows how to handle XML data in .NET
using VB.NET DataSets and DataViews. The article presents step-by-step how to
bind, read and view XML data and generate graphical reports with the build-in
Crystal Reports engine. The article puts special emphasize on XML
standardization and interoperability issues experienced during the project, and
presents workarounds for existing limitations.
Introduction.
This project was part of a larger project in the
context of training acquisition and analysis. A Palm PDA is used for mobile data
acquisition; these data are transferred onto a PC into a well-formed and valid
XML data file. The task of this project was to develop a sophisticated data
analysis and reporting application on the PC using the given XML data file.
The .NET framework was chosen for the PC application
because of its great data handling capabilities, its powerful graphic
functionality and the built-in reporting engine Crystal Reports. We selected the
programming language VB.NET because we believe, that this is the language of the
future under .NET - and because there is no major difference between the
supported languages in VS.NET with respect to their offered functionality.
The idea was that both, the existing Palm application
and the new VB.NET application can share one common XML file as its data
source.
XML Access Methods in .NET.
.NET offers numerous XML-related classes for reading
and writing XML documents. Although the documentation states, that any class
conforms to the W3C XML 1.0 standard, there are trade-offs in respect to
performance, efficiency, productivity, and XML compliance between the different
implementations.
So we first investigated into these different access methods and then selected
the most appropriate for our application. This step also addresses important
issues that must be considered when dealing with legacy valid and well-formed
XML data in .NET.
The VB.NET developer can use any of the following five
methods to access XML data:
- XmlTextReader
- XmlValidatingReader
- XmlDocument (and the rest of the DOM API),
- XPathNavigator.
- ADO.NET DataSets
The detailed description of these methods would go
beyond the scope of this article. Thus the following table summarizes very
briefly the characteristics of each method and shows when to use which
technology.
XmlTextReader | - You need only read access.
- Performance is your highest priority.
- You don't need XSD/DTD validation.
- You don't need XSD type information at runtime.
- You don't need XPath/XSLT services.
|
XmlValidatingReader | - You need only read access.
- You need XSD/DTD validation.
- You need XSD type information at runtime.
|
XmlDocument | - You need to update the document (read/write).
- Productivity is your highest priority.
- You need XPath services.
|
XPathNavigator | - You need only read access.
- You need to execute an XSLT transformation.
- You want to leverage an implementation (like XPathDocument).
|
ADO.NET DataSets | - You need to update the document (read/write).
- You need greatest flexibility and functionality.
- You want greatest tool support.
- You need interop-support.
|
Table 1. XML Data Access Methods in .NET.
When dealing with complex XML Schemas, then ADO.NET
DataSet offers greatest support and flexibility. The DataSet,
which can be used either relational databases or XML data files, is an in-memory
cache of retrieved data. It is the major component for data handling in the
ADO.NET architecture.
The DataSet provides a consistent relational
programming model regardless of the data source; thus it consists of a
collection of DataTable objects that you can relate to each other with
DataRelation objects. A DataSet reads and writes data and schema as XML
documents. The data and schema can be transported across HTTP and used by any
application, on any platform that is XML-enabled.
Considering all these features and the fact that this
VB.NET application will need to be WebService interoperable with a server
application in future we decided to go for DataSet's.
XML Conformance in .NET.
Going with DataSets, ReadXML and ReadXMLSchema are the
methods of choice to read the XML data, and the corresponding XML schema.
However doing so with the given valid and well-formed
XML data file, raised the following runtime error:
"The same table {description} cannot be the child
table in two nested relations".
Looking deeper into the file shows that the valid and
well-formed (and thus fully W3C compliant) XML file had a child table called "description"
which had more than one parent table. This part of the XML Schema is shown in
Figure 1 .
Figure 1. Original XML Schema.
This is a valid and allowed design according to the W3C
XML schema specification. (Refer:
W3C
definition of Parent child Relationship). Also the XMLSpy tool validated
this XML schema as valid and well-formed.
More investigation showed that the .NET XML framework
does not (yet?) support the full XML 1.0 standard. It does not support recursive
schemas or designs where a child node type is beneath more than one parent node
type. Of course, also all classes based on DataSets, like the
System.Windows.Forms.DataGrid, which we were going to use, have the same
restriction. In other words:
XML Design Issue 1: ".NET
Datasets require that a table can only be a child of one other table."
Microsoft has identified this as a bug (Q325695) and
their knowledge base article [MS-Q325695] gives more information.
There are essentially three ways to solve this problem:
- Write custom functions
- Write a converter
- Change your XML Schema
If you can't break change the existing XML Schema (for
example because you are not the owner of the file), there are two ways to solve
the problem: you customize all your ode, or you write a converter.
Customizing all your code means you write special
methods for the XmlReader class and add records to a table with a hierarchical
type design. You also have to implement a custom filtering mechanism if you want
to display this data in bound controls, and a custom writing mechanism to write
to the type of hierarchical format you are reading from. Even worse, any change
in the schema file will cause changes in all of your
custom functions - so this isn't really an.
The second option, if you can't change the given XML
Schema, is to write a converter. That means you define a new .NET compliant XML
Schema for the given XML file and make use of the given DataSet
capabilities. Then you write a bi-directional converter to convert the data from
one XML Schema into the other before you use it. Although this still causes
additional effort, it causes the least additional effort.
If you are the owner of the XML Schema file, then
solution is to change the given the given XML Schema and make it also .NET
compliant. In our situation, this meant, to define three different description
type structures (Idescription, Ddecscription, Fdescription), for the
three occurrences of the description.
To avoid multiple definitions of the same structure
(which can cause maintenance errors in the future), we made use of the
inheritance capabilities of XML Schema. We derived the three description types
from the still existing base structure "description". The details are shown in
figure 2.
Figure 2. Avoid one child with several parents.
Reading XML Data with DataSets.
VB.NET developers have several approaches of using a
DataSet in the applications. This section explains the chosen approach
to create a DataSet from the modified XML Schema, the tools that were used and
how we populated the DataSet from the XML source file.
A DataSet can be typed or untyped.
A typed DataSet is a dataset, which is created based a given XML Schema
definition file (XSD file). Information from the schema (tables, columns, etc.)
is generated and compiled into the new DataSet class as a set of first-class
objects and properties.
Because a typed DataSet class inherits from
the base DataSet class, the typed class assumes all of the
functionality of the DataSet class and can be used with methods that
take an instance of a DataSet class as a parameter.
An untyped DataSet, in contrast, has no
corresponding built-in schema. As in a typed dataset, an untyped DataSet
contains tables, columns - but those are exposed only as collections. (However,
after manually creating the tables and other data elements in an untyped
DataSet, you can export the DataSet's structure as a schema using the
DataSet 's WriteXmlSchema method.).
You can use either type of DataSet in your
application. However, Visual Studio has more tool support for typed DataSets,
and they make programming with the DataSet much easier and less error-prone. So
having considered all these options the decision was to go with ADO.NET typed
DataSets.
Typed Datasets can be generated with the
XSD.EXE tool, which is part of the VS.NET environment. The tool
accepts a valid XML Schema file as input, as well as the language to use (C#,
VB). The following line shows a typical command line of the tool which uses the
XML Schema file XSDSchemaFileName.xsd.
xsd.exe /d /l:VB.NET XSDSchemaFileName.xsd /n:XSDSchema.Namespace.
The /d directive tells the tool to generate DataSets,
/l specifies the language to use, the optional /n defines the namespace to
generate. The generated DataSet classes will be saved in the source file
XSDSchemaFileName.vb.
Once the Typed DataSet classes are generated, the
further procedure is almost a child's play. The provided methods and properties
guarantee data access in a type safe manner.
So the next step was to populate the Typed DataSet at
runtime from the XML file. The ReadXml() and the WriteXml() methods of the typed
DataSet class do this very easily without any difficulty as the following to
lines of code show:
Dim myDS
As New DataSet.
myDS.ReadXml("input.xml", XmlReadMode.ReadSchema).
Viewing XML Data.
Having created and populated the DataSet the
next step was to bind the data to the windows controls for user interaction.
Since we were going to implement a grid view, this involved binding
DataTables to DataGrids with user navigation facilities and
providing parent-child relationships in the grid view so when a user selects a
parent row, the corresponding child rows are to be shown automatically for
editing purposes.
The DataTable is the primary building block of
ADO.NET. The DataTable is made up of a DataColumn and DataRow
collections. The columns define the schema of the DataTable and the rows make up
the actual data in the DataTable. A DataView is a bindable, customized view of a
DataTable. You can create multiple DataView's of the same
DataTable, each one can contain different data sorted in different order.
Additionally, you can add, delete, or edit information in each DataView.
DataTable and DataView use the same
instances of DataColumns, i.e. they share the common structure. However
DataTable and DataView each have its own row collections. The
DataTable is consists of DataRow's while the DataView
is made up DataRowView's. Figure 3 shows the relationship between
DataSet, DataTable and DataView.
Figure 3. Tables and Views.
The data binding of typed DataSets is very smooth in
VB.NET. You can bind the data sources to the WindowsForms controls at design
time or runtime without any difficulty. For example, you can specify which data
column to be bound to a ComboBox in a WindowsForm by setting the DataSource
and DataMember property of that control so that it gets automatically bound at
runtime.
We used the .NET DataView class to bind DataTables
to the DataGrid. DataView supports data filtering and data sorting at
design time or at runtime. For convenience a default DataView is
generated for the developer for customization.
In the analysis part, the application needs to display
some statistical information, which is calculated from several columns and which
is shown in some dynamically generated extra columns as shown in Figure 4 .
Figure 4. The Data View.
Some of the data columns needed for these calculations
were in different parent and child tables. Unfortunately it was not possible
join two tables in the dataset to show up in a single DataGrid.(for example, by
using the DataSet.Relations property).To further clarify the situation
say I have two Parent & child tables Table1(id,columnA,columnB) and
Table2(id,columnC,columnD) and I want resulting datagrid columns to be view
like (columnA , columnB , columnC , columnD). It is similar to the situation if
I had wrote a sql query like
select a.id , a.columnA , a.columnB , b.[columnC] ,
b.[columnD]
From Table1 a , Table2 b
where a.id =b.id.
It is important to understand that DataView is not
equivalent of a SQL VIEW . A Dataset does not contain a query processor and has
no mechanism to perform a join.
The other way to do it (most developers seems to
suggest) is to simply create a new datatable
That is typed the way you want it and manually moved
the data from both datatables to the new one. Also any data manipulation has
to copy over to the original tables using a nested loops. This is a brute force
method. It would be nice to here if anyone come up with something else.
Finally more references were added to the schema file.
But that is not the best workaround. The reason that the columns could not be
merged is founded in the relational programming model of the DataSets.
The existing XML file uses nested child tables instead of references. This is
not (yet?) resolved by the .NET DataSets, so that references are
missing to cross-reference tables and to establish relationships.
XML Design Issue 2: "Use
references in XML data files instead of nested child tables to show up a joined
table in a single DataGrid".
So the next change was to introduce references where
needed.
Figure 5. Adding references to tables.
In our application we use the Date column as a filter
criteria for the generated graphs in the Crystal Report Engine. Thus in Figure 5
a Date column has been added to the session table, which is a direct
reference of the date column of the day table.
Reporting XML Data.
Having completed the data binding to Windows forms the
next task was to generate the summarized reports in both graphical and tabular
forms. We have chosen the Crystal Reports Engine (which is delivered with the
VS.NET environment) with its powerful dynamic reporting capabilities.
Crystal Reports for Visual Studio .NET offers the
following three object models:
- The Engine Object Model.
- The Windows Forms Viewer Object Model.
- The Web Forms Viewer Object Model.
The top level Engine object model is the report
document object. The report document object offers sophisticated control over
your object. It contains all the properties and methods needed to interface with
and customize a report. How ever it does not provide the display of the reports.
This is where viewer objects and the report document
objects come together. The Windows Forms viewer allows you to view a report in a
Windows application and the Web Forms viewer does it for a Web application.
So we selected the Engine Object model together with
the Windows Object Model to accomplish the requirements. The Vb.NET
CrystalDecisions.Windows.Forms namespace in the
CrystalDecisions.Windows.Forms.dll assembly provides support for the
Windows Forms Viewer control and its associated classes. It can also dynamically
update the report being hosted and interact with controls within a windows
application.
When using the Crystal Report Engine you basically have
two options for the selection of data source. The first one is passing the raw
XML file, and the second is passing the populated DataSet to the Crystal Report
Engine.
When using the first approach you have to verify on
every usage, that the XML file contains correct data types (data validation).
Otherwise Crystal Report Engine would interpret the data types in the XML files
differently (e.g. float are used as strings), which prevents the data from being
processed further within Crystal Reports.
The second, recommended option is passing the already
populated typed DataSets to the Crystal Report Engine. This saves the runtime
verification of data types. The content and the scope of the report is
determined at runtime according to the user selection of the required report
format. Crystal Reports provides template files for this. Figure 6 shows a
sample report.
Figure 6. Crystal Report View.
Summary.
Visual Studio .NET has a very rich set of tightly
integrated tools that work together to support C# application developer. For XML
based applications it provides schema generation and validation tools. It offers
the methods of choice for XML handling. Especially the DataSet classes make
reading and writing of XML data extremely easy. The DataSet classes provide easy
data binding to the GUI and to even third-party tools like Crystal Reports for
advanced reporting.
Other very nice new features are the new source code
documentation support and application installer production. The code
documentation tool generates a very nice HTML documentation (they improved the
JavaDoc style), which is completely XML based. The integrated application
installer allows easy creation of a setup program for your application.
Despite the currently existing limitations of DataSets
concerning its XML standard support, it is the tool of choice for XML data
handling with .NET.
Once your XML data sources confirm to the ".NET XML
standard" the rest can be handled without much difficulty. It offers lot of
built-in powerful classes and methods for reading, writing and displaying data.
This allows the developer to concentrate on the more important application logic
rather than having to worry about data handling issues or framework specifics.
References.
[MSVSNET] Visual Studio .NET Help, Microsoft
Visual Studio IDE v7.0.
[W3CXML] Extensible Markup Language (XML) 1.0, 2nd
Ed.,
http://www.w3.org/TR/2000/REC-xml-20001006,
Oct 2000.
[W3CDOM] Document Object Model Technical Reports,
http://www.w3.org/DOM/DOMTR.
[W3CCHI] W3C definition of Parent child
Relationship,
http://www.w3.org/TR/2000/sec-well-formed.
[MS-Q325695] "Error Message When You View
Multi-Dimensional XSD Schemas in the Visual Studio .NET XML Schema Designer"
http://support.microsoft.com/default.aspx?scid=kb;en-us;Q325695.