Reading a Microsoft Word Document (.docx) using Telerik

Introduction

This is a sample class that reads Microsoft Word 2007 - 365 documents and returns the content.

Microsoft introduced a new document format with contained data in Office 2007. This means that .docx files are like a zip file.

The code below shows how to read the text from .docx documents. 

using System.IO;  
using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;  
using Telerik.WinForms.Documents.FormatProviders.Txt;  
  
namespace Telerik.WinForms.Documents  
{  
    public static class  
    {  
        public static string ReadDocxContent(string file)  
        {  
            var docxFormatProvider = new DocxFormatProvider();  
            using var input = File.OpenRead(file);  
            var document = docxFormatProvider.Import(input);  
            var txtFormatProvider = new TxtFormatProvider();  
            return txtFormatProvider.Export(document);  
        }  
    }  
}

This technique uses Progress Telerik.

Just pass a document file with a full path, and the function returns its content. 

var txt = ReadDocx.ReadDocxContent(@".\desktop\document1.docx");  

This code works with .Net Framework 4.x, too, with a small refactoring:

using System.IO;  
using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;  
using Telerik.WinForms.Documents.FormatProviders.Txt;  
  
namespace Telerik.WinForms.Documents  
{  
    public static class ReadDocx  
    {  
        public static string ReadDocxContent(string file)  
        {  
            var docxFormatProvider = new DocxFormatProvider();  
            using (var input = File.OpenRead(file))  
            {  
                var document = docxFormatProvider.Import(input);  
                var txtFormatProvider = new TxtFormatProvider();  
                return txtFormatProvider.Export(document);  
            }  
        }  
    }
}

Happy coding!

Next Recommended Reading Convert HTML To PDF Using Telerik