Introduction
As you know, to get information from an XML file you need to parse the XML file, using DOM, SAX or Stax, or another way. There are ways you can get information from an XML file using plain String processing. Sometimes, it performs well, but this concept cannot be generalized for any kind or type of XML document. XML parsing, with a specific methodology, has its own pros and cons. It is always significant to extract more information and provide it to the system wiht smooth and fast processing. Sometimes, developers adopt different approaches to provide the optimum result to the system. In this article, I will provide a small trick about how to gather information from an XML file, without using any conventional parsing techniques in a specific scenario. I will also show you about the performance of both ways (using a parsing technique and using text processing).
Technicalities
I was working on a project where a destination system provided results in the form of an XML file and the source system displayed the exact results to the end system. This is a very specific scenario where we need to employ our own technique to get the best results. In order to substantiate the preceding statements, let me provide you a typical scenario. Think about a situation; you are making a call to an external system that connects to a device like a Point of Sale (POS) device that only provides the information about your transaction. The transaction may be a success or a failure. Your system is only concerned about the core output of the XML contents. Also, think that in a second, you perform many transactions. Generally, the convention is to parse the XML file and display the data enclosed inside a tag. There can be a debate on this about which parsing technique to use, whether to use DOM or SAX. It all depends upon the situation. In this specific scenario, the XML document is very small and the source system displays only the data without any pre-processing or post-processing. I am not saying that the XML parsing in this situation is wrong, but I can recommend that if you can process the text intelligently from an XML file then it will be helpful, to a greater extent. Let us consider the following source XML files and outcome of the XML file.
Successful Transaction
- <Transaction>
- <Status>
- <StatusCode>001</StatusCode>
- <StatusMsg>Transaction Successful</StatusMsg>
- <Date>09:09:2013</Date>
- <Time>14:54:53</Time>
- </Status>
- </Transaction>
Failed Transaction
- <Transaction>
- <Status>
- <StatusCode>009</StatusCode>
- <StatusMsg>Transaction Failed</StatusMsg>
- <Reason>UnExpected Error in reading Card</Reason>
- <Date>09:09:2013</Date>
- <Time>14:54:53</Time>
- </Status>
- </Transaction>
Now the source system displays the following information:
001
Transaction Successful
09:09:2013
14:54:53
The preceding situation is a hypothetical one. The entire objective is to show how you can change your business algorithm to boost the performance of your application. It is not always necessary to follow the conventional approach to get the performance.
In the preceding case, the text you are displaying to the system is very small. Well, you can use a conventional parsing technique to show the information. But, if you have a pile of XML documents of this structure and you want to gather information using an XML parsing technique, then there may be slight performance repercussions. To avoid, you can apply plain text processing using a simple, regular expression. I provide the following two approaches, let us see below.
XML Parsing Technique
The following steps are required to get the information from the XML file.
-
Get the XML contents as a String
-
Parse the XML contents using DOM parsing
-
Visit each element or tag of the XML doc and extract the contents
-
Display the entire contents of the system
The brief code snippet for XML parsing is given below.
- public static void processTxn(String contents) {
- DocumentBuilderFactory docBuilderFact = DocumentBuilderFactory.newInstance();
- DocumentBuilder docBuilder = null;
- Document doc = null;
- try {
- docBuilder = docBuilderFact.newDocumentBuilder();
- } catch (ParserConfigurationException e) {
- e.printStackTrace();
- }
- StringReader srReader = new StringReader(contents);
- InputSource inSrc = new InputSource(srReader);
- try {
- doc = docBuilder.parse(inSrc);
- } catch (SAXException e) {
- e.printStackTrace();
- } catch (IOException e) {
- e.printStackTrace();
- }
- System.out.println("---------------Message From System-------------");
- recursiveParsing(doc.getDocumentElement());
- }
Text Processing Technique
The following procedure is required to get the information from the XML file:
-
Get the XML contents as a String
-
Remove all the tags (<Tag>)
-
Display the entire contents to the system
The brief code snippet for removal of XML tags is given below.
- public static String removeXmlTags(String contents) {
- Pattern tag = Pattern.compile("<.*?>");
- Matcher mtag = tag.matcher(contents);
- while (mtag.find()) {
- contents = mtag.replaceAll("");
- }
- return contents;
- }
Comparison
To compare the time taken by both the code structures, let us make an experiment with 10 observations. Let us see the results below.
********** Time Taken in TEXT Processing *************
NANOSECONDS MILLISECONDS SECONDS
5919837 5.919837 0.005920
********** Time Taken in XML Processing *************
NANOSECONDS MILLISECONDS SECONDS
35151999 35.151999 0.035152
If you run the code in profile mode using the Netbeans IDE then you can see the difference in time consumptions. Let us see the image below.
Configuration
To have a clear understanding, download the complete project in the Eclipse and Netbeans IDEs. Run the following test classes and also run the test classes in profile mode, in Netbeans IDE.
Conclusion
I hope you have enjoyed my article about data extraction, from an XML file in Java. Download the complete project and go through the source code to understand the concept and its usage. It is just an approach. It may not always be correct in all situations. Based on the complexity and design, you can decide whether to use this concept. For any kind of issues and error, you can contact me at
[email protected].