Optical Character Reader Using Python

Shalin Dashora
2y
2.6k
0
1

Article

Hello everyone! In this article, we will learn about OCR (optical character Reader) using python.

What is OCR?

Python OCR is a technology that recognizes and pulls out text in images like scanned documents and photos using Python.

We will use PIL (Python Image Library) and built-in open source OCR engine of python pyTesseract.

Tesseract runs on Windows, macOS, and Linux platforms. It supports Unicode (UTF-8) and more than 100 languages. In this article, we will start with the Tesseract OCR installation process, and test the extraction of text in images.

So, Let's start!

Firstly, install pytesserect by pip install pytesserect command

Optical Character Reader using Python

Now after Installation let's import the dependencies and implement it.

def ocr():
import pytesseract
#import pytesseract
try:
    from PIL import Image       #add Image module from PIL
    except ImportError:
    import  Image
    import pytesseract
    #add path of binary file of tesseract
    pytesseract.pytesseract.tesseract_cmd = r'C:\Users\SHALIN DASHORA\anaconda3\envs\tesseract\Library\bin\tesseract.exe'       
    #add image path to open the image
    text_from_image = pytesseract.image_to_string(Image.open(r"C:\Users\SHALIN DASHORA\Desktop\ocr1.png"))
    print(text_from_image)   #print the text in image
ocr()

Note
You have to download an additional binary file of Tesseracrt engine. You can download it from here.

Output

The image from which we reading the Text,

Optical Character Reader using Python

Output

Optical Character Reader using Python

The output we are getting.☝️

Hope you like it! Thanks🙂.

Please Like and Share.