Robotic Process Automation is the technology that allows anyone today to configure computer software, or a “robot” to emulate and integrate the actions of a human interacting within digital systems to execute a business process.
RPA robots utilize the user interface to capture data and manipulate applications just like humans do. They interpret, trigger responses and communicate with other systems in order to perform a vast variety of repetitive tasks. Substantially better: an RPA software robot never sleeps and makes zero mistakes.
UiPath is a leading Robotic Process Automation vendor providing a complete software platform to help organizations efficiently automate business processes.
UiPath Studio is a tool that can model an organization's business processes in a visual way.
The Read PDF with OCR Activity is used to extract data from the PDF documents which have both Text and Images. So, if you have any images apart from the text in the document, this activity would extract data from those images and give a Text output.
Reading this article, you can learn how to extract the PDF document contains text and images with text automation using Read PDF with OCR and activities in UiPath Studio Pro Community.
The following important tools are required for developing UiPath Bots,
- Windows 7/8.1/10 (Recommended)
- UiPath Studio Pro - Community Cloud (It is a free software available online – https://www.uipath.com/start-trial)
Now we can discuss step by step Bot development.
Step1
Open UiPath Studio -> Start -> New Project-> Click Process
Step 2
Now, create a New Blank Process, name it UiPdfImage and give your description.
Step 3
Next, for extracting the text and images text in a PDF document, create a new Sequence workflow named GetImagePDF.
Next, to install the PDF packages ->Go to Manage packages and select Official and select UiPath.PDF.Activities and install it.
After installing the package,
Click Activities -> search Read PDF With OCR activity->Drag and drop in to sequence and select the PDF file,
Sample PDF with Text and Image containsText,
Create a String variable extractimage and set the properties range as 1 page and output text as extractimage
Click Activities -> search Tesseract OCR engineactivity->Drag and drop in to sequence,
Click Activities -> search Write Text File activity->Drag and drop in to sequence and set the properties Filename and Text ,
Step 5
For running your project, select debug file -> Run. The output of the UiPdfImage project is,
Summary
Now you have successfully extracted the text and images with text in the PDF document automation using UiPath Studio.