Introduction
This is a sample class that reads Microsoft Word 2007 - 365 documents and returns the content.
Microsoft introduced a new document format with contained data in Office 2007. This means that .docx files are like a zip file.
The code below shows how to read the text from .docx documents.
using System.IO;
using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
using Telerik.WinForms.Documents.FormatProviders.Txt;
namespace Telerik.WinForms.Documents
{
public static class
{
public static string ReadDocxContent(string file)
{
var docxFormatProvider = new DocxFormatProvider();
using var input = File.OpenRead(file);
var document = docxFormatProvider.Import(input);
var txtFormatProvider = new TxtFormatProvider();
return txtFormatProvider.Export(document);
}
}
}
This technique uses Progress Telerik.
Just pass a document file with a full path, and the function returns its content.
var txt = ReadDocx.ReadDocxContent(@".\desktop\document1.docx");
This code works with .Net Framework 4.x, too, with a small refactoring:
using System.IO;
using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
using Telerik.WinForms.Documents.FormatProviders.Txt;
namespace Telerik.WinForms.Documents
{
public static class ReadDocx
{
public static string ReadDocxContent(string file)
{
var docxFormatProvider = new DocxFormatProvider();
using (var input = File.OpenRead(file))
{
var document = docxFormatProvider.Import(input);
var txtFormatProvider = new TxtFormatProvider();
return txtFormatProvider.Export(document);
}
}
}
}
Happy coding!