Introduction to Data Scraping
Data scraping is the process of extracting data from web browser, document or application and store in any media like database, CSV, or spreadsheet. To do so the data must be organized and in structured pattern. For example, a web search of e-commerce website or a job portal. The name of the product, description and price can be extracted and store in an excel workbook. The excel can later be used for various market survey and analysis. Similarly, job search of certain title and name of the company can be used to generate lead for your business.
Data Scraping Steps
This article will be the walkthrough of data scraping of a job portal. The steps are pointed out below:
Step 1
Open a new process with a meaningful name in UiPath studio
Step 2
Open browser and navigate to indeed.com
Step 3
Search any job tile you prefer
N.B. You can do the above two steps manually or automate it by using Open Browser activity
Step 4
Select Data Scraping from design bar
Step 5
Click Next
Step 6
Indicate the Job Title of 1st job post
Step 7
As soon as you indicate the 1st job title a pop-up appears prompting to extract the whole table.
If the web page contains 1 table each, then selecting yes would work. For example, the given image
But our job portal contains around 10 job post in each page. In this scenario we will select no and move to the next step
Step 8
Click Next and indicate the 2nd job title of last job post to create the pattern for the bot to extract
Step 9
Configure the column name in this case “Job Title”. We can also extract the URL by simply checking the Extract URL box and naming the column
Step 10
The extracted data looks like this
To extract more data, we must select “Extract Correlated Data”. Let’s try extracting company name
Step 11
Similarly, select the company name of the 1st job post and similarly to create a pattern indicate the company name of the 2nd job post
Step 12
The extracted data will look like this. To complete the extraction, click Finish
Step 13
If the web page consist of multiple pages then to extract from all the page indicate the next button
We can now store the extracted data to an excel workbook. And our work will look something like this. By default the extracted datatable is saved in the variable ExtractDataTable
Conclusion
In this article we have learned to extract data from a web page. Try by yourself and extract data from any medium.