This article has been excerpted from the book "A Programmer's Guide to ADO.NET in C#".
An XML document is a set of elements in a well-formed and valid standard format. A document is valid if it has DTD associated with it and if it complies with the DTD. As mentioned earlier, a document is well-formed if it contains one or more elements, and if it follows the exact syntaxes of the language. An XML parser will only parse a document that is well formed, but the document doesn't necessarily have to be valid. This means that a document must have at least one element (a root element) in it, but it doesn't matter whether it uses DTDs.
An XML document has the following parts, each described in the sections that follow:
- Prolog
- DOCTYPE declaration
- Start and end tags
- Comments
- Character and entity references
- Empty elements
- Processing instructions
- CDATA section
- Attributes
- White spaces
Prolog
The prolog part of a document appears before the root tag. The prolog information applies to the entire document. It can have character encoding, stylesheets, comments, and processing instructions. This is an example of a prolog:
- <?xml version ="1.0" ?>
- <?xml-stylesheet type="text/xsl" href="books.xsl" ?>
- <!DOCTYPE StudentRecord SYSTEM "mydtd.dtd">
- <!=my comments - - - ->
DOCTYPE Declaration
With the help of a DOCTYPE declaration, you can read the structure of your root element and DTD from external files. A DOCTYPE declaration can contain a root element or a DTD (used for document validation). In a validating environment, a DOCTYPE declaration is a must. In a DOCTYPE reference, you can even use a URI reference. For example:
or
- <!DOCTYPE rootElement SYSTEM "URIreference">
or
- <!DOCTYPE StudentRecord SYSTEM "mydtd.dtd">
Start and End tags
Start and end tags are the heart of XML language. As mentioned earlier in the article, XML is nothing but a text file start and end tags. Each tag starts with <TAG> and ends with </TAG>. If you want to add a tag called <book> to your XML file, it must start with <book> and end the </book>, as shown in this example:
- <?xml version ="1.0" ?>
- <book xmlns="http://www.c-sharpcorner.com/xmlNet">
- <title> The Autobiography of Benjamin Franklin</title>
- <author>
- <first-name>Benjamin</first-name>
- <last-name>Franklin</last-name>
- </author>
- <price>8.99</price>
- </book>
Note: Empty elements don't have to heed this < >...</ > criteria. I'll discuss empty tags later in the "Empty Elements" section.
Note: An element is another name a starting and ending tag pair
Comments
Using comments in your code is good programming practice. They help you understand your code, as well as help others to understand your code, by explaining certain code lines. You use the <! - - and - - > pair to write comments in an XML document:
XML parsers ignore comments.
CDATA Sections
What if you want to use < and > characters in your XML file but not as part of a tag? Well, you can't use them because the XML parser will interpret them as to start and end tags. CDATE provides the following solution. So you can use XML markup characters in your documents and have the XML parser ignore them. If you use the following line:
- <! [CDATA [I want to use < and >, characters]]>
the parser will treat those characters as data.
Another good example of CDATA is the following example:
- <! [CDATA [< Title>This is the title of a page</ Title>
In this case, the parser will treat the second title as data, not as a markup tag.
Character and entity reference
In some cases, you can't use a character directly in a document because of some limitations, such as a character being treated as a markup character or a device or processor limitation.
By using character and entity references, you can include information in a document by reference rather than the character.
A character reference is a hexadecimal code for a character. You use the hash symbol (#) before the hexadecimal value. The XML parser takes care of the rest. For example, the character reference for the Return Key is# x000d.
The reference starts with an ampersand (&) and a #, and it ends with a semicolon (;). The syntax for decimal and hexadecimal references is & # value; and value; respectively. XML has some built-in entities. Use the It, gt, and amp entities for less than, greater than, and ampersand, respectively. Table 6-2 shows five XML built-in entities and their references. For example, if you want to write a > b or Jack & Jill, you can do that by using these entities:
A>b and Jack& Jill
Table 6-2. XML Build- in Entities
ENTITY
|
REFERENCE
|
DESCRIPTION
|
Lt
|
<
|
Less than: <
|
Gt
|
>
|
Greater than: >
|
Amp
|
&
|
Ampersand: &
|
Apos
|
&apos
|
Single quote: '
|
Auot
|
"
|
Double quote: "
|
Empty elements
Empty elements start and end with the same tag. They start with < and end with >. The text between these two symbols is text data. For example:
- <Name> </Name>
- <IMG SRC= "img.jpg" />
- <tagname/>
are all empty element example. The <IMG> specifies an inline image, and the SRC attribute specifies the image's location. The image can be any format, though browsers generally support only GIF, JPEG, and PNG images.
Processing Instructions
Processing instructions (PIs) play a vital role in XML parsing. A PI holds the parsing instructions, which are read by the parser and other programs. If you noticed the first line of any of the XML samples discussed earlier, a PI starts like this:
All PIs start with <? And end with ?>. This is another example of PI:
- <?xml-stylesheet type ="text/ xsl" href="myxsl.xsl"?>
This PI tells a parser to apply a stylesheet on the document.
Attributes
Attributes let you add extra information to an element without creating another element. An attribute is a name and value pair. Both the name and value must be present in an attribute. The attribute value must be in double quotes; otherwise, the parser will give an error. Listing 6-8 is an example of an attribute in a <table> tag. In the example, the <table> tag has border and width attributes, and the <td> tag a width attribute.
Listing 6-8. Attributes in the < table> tag
- <table border="1" width="43%">
- <tr>
- <td width="50%">Row1, Column1</td>
- <td width="50%">Row1, Column2</td>
- </tr>
- <tr>
- <td width="50%">Row2, Column1</td>
- <td width="50%">Row2, Column2</td>
- </tr>
- </table>
White spaces
XML preserves white spaces except in attribute values. That means white space in your document will be displayed in the browser. However, white spaces are not allowed before the XML declaration. The XML parser reports all-white spaces available in the document. If white spaces appear before the declaration, the parser treats them as PI.
In element, XML 1.0 standard defines the xml: space attribute to insert spaces in a document. The XML:space attribute accepts only two values: default and preserve. The default value is the same as not specifying an xml:space attribute. It allows the parser to treat spaces as in a normal document. The Preserve value tells the parser to preserve space in the document. The parser preserves space in attributes, but it converts line break into single spaces.
Conclusion
Hope this article would have helped you in understanding XML Document and its Items. See other articles on the website also for further reference.
|
This essential guide to Microsoft's ADO.NET overviews C# then leads you toward a deeper understanding of ADO.NET. |