it can be performed by Microsoft Office Document Imaging Object Model,for it we
are need to use the MODI Library in a Development Project.The MODI object model
consists of the following objects:
Document
object: Represents an ordered collection of pages (images).
Image
object: Represents a single page of a document.
Layout
object: Represents the results of optical character recognition
(OCR) on a page.
MiDocSearch
object: Exposes document search functionality.
Viewer
control:
Is an ActiveX control that displays the pages of a document
Example
for extracting text from tif file:
Dim strWordInfo As String
Dim docs As New MODI.Document
docs.Create("C:\test.tif")
Dim Success As Integer =
Analyse(docs)
If Success Then
Dim j As Integer
For j
= 0 To docs.Images.Count
- 1
strWordInfo = strWordInfo & "
" & docs.Images(0).Layout.Text
Next
strWordInfo = strWordInfo.Replace("'", "''").ToString()
End If
Function Analyse(ByVal Doc As MODI.Document) As Integer
If Doc Is Nothing Then
Exit Function
End If
Try
'
MODI call for OCR
'
_MODIDocument.OCR(_MODIParameters.Language, '_MODIParameters.WithAutoRotation,
_MODIParameters.WithStraightenImage)
Doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, True, True)
Analyse = 1
Catch ex As Exception
'MessageBox.Show("OCR
was successful but no text was recognized")
Analyse = 0
End Try
End Function
Note : The
most important point here to performing all tasks is to add a reference to " Microsoft
Office Document Imaging Type Library",
In case of
Microsoft Outlook 2003,
Add " Microsoft
Office Document Imaging 11.0 Type
Library "
Microsoft Outlook 2007,
Add " Microsoft
Office Document Imaging 12.0 Type
Library "