Mark Chassy

Mark Chassy

  • NA
  • 2
  • 19.1k

Extracting data from a Word document

Sep 9 2010 9:15 AM
I am trying to extract semi structured data from a Word document using C#.
The first step would be to extract the different section titles from the document (Chapter 1, Section 1.1 and so on) with the number and title.
The second step would be to get the range for each section and search for bookmarks in that range.

So for example in Section "4.1.1 Rules for calculating price" I would extract first that section title, and then any bookmarks in that section.

The documents will contain detailed functional specifications an IT application. 
I would ask the people who write the document to insert bookmarks for each specification. 

I have found the code which will get all of the bookmarks for a specific range and will allow me to extract the name and text of that bookmark. Though one problem I have already seen is that by default the bookmarks come out in alphabetical order and not in the order of their placement withing the text.

I have not found the code which would allow me to go through the sections and get their range.

So in all I have three questions :

  1. How can I get each section and it's range?
  2. When I do a foreach (bookmark in range.bookmarks) how can I get the bookmarks in the order of their placement in the text?
  3. Does anyone have a better suggestion other than using the bookmarks, given that this will have to be implemented by people who are not so technical. Bookmarks seem a little finicky to me (for example you can't modify the name of a bookmark, you have to delete and recreate it).
Thanx for any suggestions.

Mark


Answers (1)