Rowan

Rowan

  • NA
  • 4
  • 0

How to import XML to Dataset excluding certain elements

Apr 25 2008 2:08 AM
Hi,

I am trying to import a 50Mb XML file into a Dataset using Dataset.XMLReader.  Unfortunately the XML code is not able to be read by Dataset.XMLReader() raising the exception:

Cannot add a nested relation or an element column to a table containing a SimpleContent column.

I have tracked the fault in the XML Code to 4 different elements which occur approximately 800 times throughout the XML file.  Fortunately I do not need these elements and so I am attempting to work out the best way to exclude them from then input.

Ways I have thought of so far are:
1. Use a schema with the elements excluded
2. Pre-process the XML file and exclude the elements before passing the output to XMLReader

I can not use a fixed schema so I would have to generate the schema on the fly each time, the only way I know how to do this is using ReadXMLSchema, however I can not determine a way to pass a list of elements to be ignored.  ReadXMLSchema also fails with the same exception.

So the best option appears to be to read in the XML File scanning for the elements which are causing the exception and write a new XML file with the elements removed.  At the moment to acheive this I have been using "sed" to pre-process the file which is not really ideal because I want to do it in my program.

So I am looking for the best way to acheive what I want, any help most appreciated.

FYI, here is some more info for those interested in specifics:

// XML File
// network_objects.xml
 
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="../network_objects.xsl"?>
 
<network_objects>
 
<network_object>
   
<Name>fbe</Name>
   
<Class_Name>gateway_ckp</Class_Name>
   
<isakmp.encmethods>
     
<isakmp.encmethods>
       
<![CDATA[DES]]>
     
</isakmp.encmethods>
     
<isakmp.encmethods>
       
<![CDATA[3DES]]>
     
</isakmp.encmethods>
     
<isakmp.encmethods>
       
<![CDATA[CAST]]>
     
</isakmp.encmethods>
     
<isakmp.encmethods>
       
<![CDATA[AES-256]]>
     
</isakmp.encmethods>
   
</isakmp.encmethods>
   
<enable_multicast_acceleration>false</enable_multicast_acceleration>
 
</network_object>
</network_objects>
 
 
// C# Code to Process this
 
       
static void Main(string[] args)
       
{
           
string folder = System.Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
           
TextWriter tw = new StreamWriter(folder + "\\output\\XML-Schema.xsd");
           
DataSet dataset1 = new DataSet();
 
            dataset1
.ReadXml(folder + "\\xsl\\TCL-Firewalls-XML\\network_objects.xml"); // <--- Exception raised
 
            dataset1
.WriteXmlSchema(tw);
 
            tw
.Close();
       
}
 
// When I run this an exception is thrown at the ReadXML Line
// Although the exception is thrown I have discovered that I can still use Continue (F5)
// And the Schema is written out to disk (although it is incomplete) ... this is what it looks like...
 
<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="network_objects" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
 
<xs:element name="isakmp.encmethods" nillable="true">
   
<xs:complexType>
     
<xs:simpleContent msdata:ColumnName="isakmp.encmethods_Text" msdata:Ordinal="1">
       
<xs:extension base="xs:string">
       
</xs:extension>
     
</xs:simpleContent>
   
</xs:complexType>
 
</xs:element>
 
<xs:element name="network_objects" msdata:IsDataSet="true" msdata:Locale="en-US">
   
<xs:complexType>
     
<xs:choice minOccurs="0" maxOccurs="unbounded">
       
<xs:element name="network_object">
         
<xs:complexType>
           
<xs:sequence>
             
<xs:element name="Name" type="xs:string" minOccurs="0" />
             
<xs:element name="Class_Name" type="xs:string" minOccurs="0" />
           
</xs:sequence>
         
</xs:complexType>
       
</xs:element>
       
<xs:element ref="isakmp.encmethods" />
     
</xs:choice>
   
</xs:complexType>
 
</xs:element>
</xs:schema>
 
// I Can now use my text editor to find the text "simpleContent"
// once I have found this I can determine what elements in the input XML file are causing the problem
 
// This is how I have tracked down the 4 elements that are causing the problem to the ReadXml routine.


Answers (1)