What is Web Scraping?
Web Scraping is extracting a large amount of data from websites. This data then can be saved in your local file on the computer or in the database table.
We can use this data to do the analysis. For example, we can scrape prices of products from e-commerce websites and then analyze them.
Why Web Scraping?
Data displayed on the website can only be viewed on the browser. We don't get to save this information. For this, we actually need to copy/paste the entire website which is boring sometimes. So, instead, we can use scrapers to get the information in a fraction of minutes.
Scrapy Framework
Scrapy is the web scraping framework written in Python. It can be used for various purposes like data mining, monitoring, and test automating. Scrapy is open source and available for python 2.7 and python 3.4 and above version.
Here, we will see how easily we can scrape websites using the Scrapy framework.
Steps
- Requirement
Install Python 2.7 or Python 3.4 or above. Here is the link for downloading python Python Link.
- Install Scrapy
Open your command prompt or terminal and type,
pip install scrapy
-
Scrapy Shell
Scrapy has a ScrapyShell which can be used for testing or debugging your code and you can also scrape the URLs from here. So, once you have successfully installed Scrapy, just write in your command prompt or Terminal -
scrapy shell
-
Fetch
Once ScrapyShell is started successfully, we can start scraping. Fetch is going to request the response and scrape the data. For now, I am going to take my friend's website "ugentertainment.in".
fetch("http://ugentertainment.in/")
- View
The view will open the response in your default browser.
view(response)
and the scraped website will open in the default browser and you can compare the original website and scraped website.
Scaped Website
Original Website
And you are done with scraping your first website using Scrapy.