Reading XML Documents using C#

Mahesh Chand
13y
100.8k
0
1

Article

This article has been excerpted from book "A Programmer's Guide to ADO.NET in C#".

The XmlReader is an abstract base class for XML reader classes. This class provides fast, non-cached forward-only cursors to read XML documents.

The XmlTextReader, XmlNodeReader, and XmlValidatingReader classes are defined from the XmlReader class. Figure 6-6 shows XmlReader and its derived classes.

Figure-6.6.gif

Figure 6-6. XmlReader classes

You use the XmlTextReader, XmlNodeReader, and XmlValidatingReader classes to read XML documents. These classes define overloaded constructors to read XML files, strings, streams, TextReader objects, XmlNameTable, and combinations of these. After creating an instance, you simply call the Read method of the class to read the document. The Read method starts reading the document from the root node and continues until Read returns false, which indicates there is no node left to read in the document. Listing 6-9 reads an XML file and displays some information about the file. In this example I'll use the books.xml file. You can use any XML by replacing the string name.

Listing 6-9. Reading an XML file

            XmlTextReader reader = new XmlTextReader(@"C:\Documents and Settings\PuranMAC\My Documents\Visual Studio
2008\Projects\ConsoleApplication2\ConsoleApplication2\XMLFile1.xml");
            Console.WriteLine("General Information");
            Console.WriteLine("= = = = = = = = = ");
            Console.WriteLine(reader.Name);
            Console.WriteLine(reader.BaseURI);
         Console.WriteLine(reader.LocalName);

Getting Node Information

The Name Property returns the name of the node with the namespace prefix, and the LocalName property returns the name of the node without the prefix.

The Item is the indexer. The Value property returns the value of a current node. you can even get the level of the node by using the Depth property, as shown in this example:

            XmlTextReader reader = new XmlTextReader(@"C:\Documents and Settings\PuranMAC\My Documents\Visual Studio 2008\Projects\ConsoleApplication2\ConsoleApplication2\XMLFile1.xml");

            while (reader.Read())
            {
                if (reader.HasValue)
                {
                    Console.WriteLine("Name : " + reader.Name);
                    Console.WriteLine("Node Depth: " + reader.Depth.ToString());
                    Console.WriteLine("Value : " + reader.Value);
                }
            }

The Node Type property returns the type of the current node in the form of XmlNodeType enumeration:

XmlNodeType type = reader.NodeType;

Which defines the type of a node. The XmlNodeType enumeration members are Attribute, CDATA, Comment, Document, Element, WhiteSpace, and so on. These represent XML document node types.

In Listing 6-10, you read a document's nodes one by one and count them. Once reading and counting are done, you see how many comments, processing instructions, CDATAs, elements, whitespaces, and so on that a document has and display them on the console. The XmlReader.NodeType property returns the type of node in the form of XmlNodeType enumeration. The XmlNodeType enumeration contains a member corresponding to each node types. You can compare the return value with XmlNode Type members to find out the type of a node.

Listing 6-10. Getting node information

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.Common;
using System.Xml;
using System.IO;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
             int DecCounter = 0, PICounter = 0, DocCounter = 0, CommentCounter = 0;
             int ElementCounter = 0, AttributeCounter = 0, TextCounter = 0, WhitespaceCounter = 0;
             XmlTextReader reader = new XmlTextReader(@"C:\Documents and Settings\PuranMAC\My Documents\Visual Studio 2008\Projects\ConsoleApplication2\ConsoleApplication2\XMLFile1.xml");

            while (reader.Read())
            {
                XmlNodeType nodetype = reader.NodeType;
                switch (nodetype)
                {
                    case XmlNodeType.XmlDeclaration:
                        DecCounter++;
                        break;
                    case XmlNodeType.ProcessingInstruction:
                        PICounter++;
                        break;
                    case XmlNodeType.DocumentType:
                        DocCounter++;
                        break;
                    case XmlNodeType.Comment:
                        CommentCounter++;
                        break;
                    case XmlNodeType.Element:
                        ElementCounter++;
                        if (reader.HasAttributes)
                            AttributeCounter += reader.AttributeCount;
                        break;
                    case XmlNodeType.Text:
                        TextCounter++;
                        break;
                    case XmlNodeType.Whitespace:
                        WhitespaceCounter++;
                        break;
                }
            }

            // print the info
            Console.WriteLine("White Spaces:" + WhitespaceCounter.ToString());
            Console.WriteLine("Process Instruction:" + PICounter.ToString());
            Console.WriteLine("Declaration:" + DecCounter.ToString());
            Console.WriteLine("White Spaces:" + DocCounter.ToString());
            Console.WriteLine("Comments:" + CommentCounter.ToString());
            Console.WriteLine("Attributes:" + AttributeCounter.ToString());
            Console.ReadLine();
       }
    }
}

Output of above coding

The case statement can have values XmlNodeType.XmlDeclaration, XmlNodeType.ProcessingInstruction, XmlNodeType.DocumentType, XmlNodeType.Comment, XmlNodeType.Element, XmlNodeType.Text, XmlNodeType.Whitespace, and so on.

The XmlNodeType enumeration specifies the type of node. Table 6-4 describes its members.

Table 6-4. the xml Node Type Enumeration's members

MEMBER NAME	DESCRIPTION
Attribute	Attribute node
CDATA	CDATA section
Comment	Comment node
Document	Document object
DocumentFragment	Document Fragment
DocumentType	The DTD, indicated by the <! DOCTYPE> tag
Element	Element node
EndElement	End of element
EndEntity	End of an entity
Entity	Entity declaration
EntityReference	Reference to an entity
None	Returned if XmlReader is not called yet
Notation	Returned if XmlReader is not called yet
ProcessingInstruction	Represents a processing instruction (PI) node
SignificationWhitespace	Represents white space between markup in a mixed content model
Text	Represent the text content of an element
Whitespace	Represents white space between markup
XmlDeclaration	Represents an XML declaration node

Moving to a Content

You can use the MoveToMethod to move from the current to the next content node of an XML document. A content's node is an item of the following type: text CDATA, Element, EntityReference, or Entity. So if you call the MoveToContent method, it skips other types of nodes besides the content type nodes. For example if the next node of the current node is DxlDeclaration, or DocumentType, it will skip these nodes until it finds a content type node. See the following example:

            XmlTextReader reader = new XmlTextReader @"C:\Documents and Settings\PuranMAC\My Documents\Visual Studio 2008\Projects\ConsoleApplication2\ConsoleApplication2\XMLFile1.xml");
            if (reader.Read())
            {
                Console.WriteLine(reader.Name);
                reader.MoveToContent();
                Console.WriteLine(reader.Name);
            }

The Get Attributes of a Node

The GetAttribute method is an overloaded method. You can use this method to return attributes with the specified name, index, local name, or namespace URI. You use the HasAttributes property to check if a node has attributes, and AttributesCount returns the number of attributes on the node. The local name is the name of the current node without prefixes. For example, if <bk:book> represents a name of a node, where bk is a namespace and: is used to refer to the namespace, the local name for the <bk:book> element is book. MoveToFirstAttributes moves to the first attribute. The MoveToElement method moves to the element that contains the current attributes node (see listing 6-11).

Listing 6-11. Get Attributes of a node

using System;
using System.Xml;

class XmlReaderSamp
{
    static void Main(string[] args)
    {
        XmlTextReader reader = new XmlTextReader(@"C:\Documents and Settings\PuranMAC\My Documents\Visual Studio 2008\Projects\ConsoleApplication2\ConsoleApplication2\XMLFile1.xml");
        reader.MoveToContent();
        reader.MoveToFirstAttribute();
        Console.WriteLine("First Attribute value" + reader.Value);
        Console.WriteLine("First Attribute Name" + reader.Name);

        while (reader.Read())
        {
            if (reader.HasAttributes)
            {
                Console.WriteLine(reader.Name + "Attribute");
                for (int i = 0; i < reader.AttributeCount; i++)
                {
                    reader.MoveToAttribute(i);
                    Console.WriteLine("Nam: " + reader.Name + ", value: " + reader.Value);
                }

                reader.MoveToElement();
                            }
        }
        Console.ReadLine();
    }
}

Output of above coding

You can move to attributes by using MoveToAttribute, MoveToFirstAttribute, and MoveToNextAttribute. MoveToFirstAttribute and MoveToNextAttribute move to the first and next attributes, respectively. After calling MoveToAttribute, the Name, Namespace, and Prefix property will reflect the properties of the specified attribute.

Searching for a Node

The Skip method skips the current node. It's useful when you're looking for a particular node and want to skip other nodes. In listing 6-12, you read your books.xml document and compare its XmlReader.Name(through XmlTextReader) to look for a node with name bookstore and display the name, level, and value of that node using XmlReader's Name, Depth, and Value properties.

Listing 6-12. Skip Method

XmlTextReader reader = new XmlTextReader(@"C:\Documents and Settings\PuranMAC\My Documents\Visual Studio 2008\Projects\ConsoleApplication2\ConsoleApplication2\XMLFile1.xml");

        while (reader.Read())
        {
            // Look for a Node with name bookstore
            if (reader.Name != "bookstore")
                reader.Skip();
            else
            {
                Console.WriteLine("Name: " + reader.Name);
                Console.WriteLine("Level of the node:" + reader.Depth.ToString());
                Console.WriteLine("Value: " + reader.Value);
            }
        }

Closing the Document

Finally, use Close to close the opened XML document.

Table 6-5 and 6-6 list the XmlReader class properties and methods. I've discussed some of them already.

Table 6-5 xml Reader properties

PUBLIC INSTANCE PROPERTY	DESCRIPTION
AttributeCount	Returns the number of attributes on the current node
BaseURI	Returns the base URI of the current node
Depth	Returns the level of the current node
EOF	Indicates whether its pointer is at the end of the stream
HasAttributes	Indicates if a node has attributes or not
HasValue	Indicates if a node has a value or not
IsDefault	Indicates whether the current node is an attributes generated from the default value defined in the DTD or schema
IsEmptyTag	Returns if the current node is empty or not
Item	Returns if value of the attribute
LocalName	Name of the current node without the namespace prefix
Name	Name of the current node with the namespaces prefix
NamespaceURI	Namespace uniform Resource Name (URN) of the current namespace scope
NameTable	Returns the XmlNameTable associated with this implementation
NodeType	Returns the type of node
Prefix	Returns the namespace associated with a node
ReadState	Read state
Value	Returns the value of a node
XmlLang	Returns the current xml:lang scope
XmlSpace	Returns the current xml:space scope

Table 6-6. xml Reader Methods

PUBLIC INSTANCE METHOD	DESCRIPTION
Close	Close the stream and changes ReadState to Closed
GetAttribute	Returns the value of an attribute
IsStartElement	Checks if a node has start tag
LookupNamespace	Resolves a namespace prefix in the current element's scope
MoveToAttribute, MoveToContent, MoveToElement,	Moves to specified attributes, content, and element
MoveToFirstAttribute, MoveToNextAttribute	Moves to the first and next attributes
Read	Reads a node
ReadAttributeValue	Parses the attributes value into one or more Text and/or EntityReference node types
ReadXXXX (ReadChar, ReadBoolean, ReadDate, ReadIn32, and so on)	Reads the contents of an element into the specified type including char, double, string, date, and so on
ReadInnerXml	Reads all the content as a string
Skip	Skips the current element

Conclusion

Hope this article would have helped you in understanding Reading XML. See other articles on the website also for further reference.

This essential guide to Microsoft's ADO.NET overviews C#, then leads you toward deeper understanding of ADO.NET.