Web Scraping In C# Using ScraperAPI

“Wisdom…. comes not from age, but from education and learning.” – Anton Chekhov

Overview

In this article, we will explore how to create real-life web scraper using ScraperAPI and HtmlAgilityPack.

What is ScraperAPI?

ScraperAPI is used to extract data. Its special purpose is to download large amounts of raw data easily and quickly. It is easy to use. We can scrape by sending the URL you would like to scrape to the API along with your API key and the API will return the HTML responses from the URL you want to scrape.

Key features of ScraperAPI

  • IP address rotation with each request
  • Automatically retries failed requests
  • Full customization (request headers, request types, IP geolocation, and more)
  • Custom session support
  • Unlimited bandwidth
  • Speed and reliability

Sign-Up on ScraperAPI and Get API Key

We need to pass the API key with each ScraperAPI request to authenticate requests.

For that, you need to sign up for an account here. After signing up on ScraperAPI, you will get 1000 free requests per month for a trial.

ScraperAPI-SignUp

On the dashboard you can see API Key, Sample API Code and Sample Proxy Code. As shown in the below image,

ScraperAPI-Dashboard

Web Scraping in C#

We are going to perform scraping with HTML parsing. We are going to extract data from https://coinmarketcap.com and store that data in a CSV file.

This website holds the information of cryptocurrencies, like name, current price, the percentage change in the last 24hrs, 7 days, market capital, etc.

Step 1 - Create Project

First, create a project, here we are choosing Console App (.NET Core). You can choose the project template based on your requirement.

Step 2 - Install NuGet Packages

We require to install the following NuGet packages:

  • ScraperAPI: This is the official C# SDK for the ScraperAPI.
  • HtmlAgilityPack: It is a .NET code library that allows you to parse "out of the web" HTML files.

Open Package Manager Console and run the below command one by one,

Install-Package ScraperApi
Install-Package HtmlAgilityPack

Step 3 - GetDataFromWebPage() method

In this example, we are going to use the HttpClient and ScraperApiClient.

static async Task GetDataFromWebPage() {
  try {
    Console.WriteLine("### Started Getting Data.");

    string apiKey = "**Add Your API Key Here**";
    HttpClient scraperApiHttpClient = ScraperApiClient.GetProxyHttpClient(apiKey);
    scraperApiHttpClient.BaseAddress = new Uri("https://coinmarketcap.com");

    var response = await scraperApiHttpClient.GetAsync("/");
    if (response.StatusCode == HttpStatusCode.OK) {
      var htmlData = await response.Content.ReadAsStringAsync();
      ParseHtml(htmlData);
    }
  } catch (Exception ex) {
    Console.WriteLine("GetDataFromWebPage Failed: {0}", ex.Message);
  }
}

Replace your ScraperAPI key in the “apiKey” variable.

GetProxyHttpClient() is used to create a HTTP client with the scraperapi.com proxy.

GetAsync() will fetch the data from the website and store in local variable.

Step 4 - ParseHtml() method

Once get HTML data from Webpage, parsing it using the HTMLdocument method. It comes from HtmlAgilityPack.

Next step, load HTML data and get the ‘tbody’ HTML tag from it. The tbody tag contains the rows of Cryptocurrency data.

To get more data, use selectSignleNode method. It will return the first HtmlNode that matches the XPath query, it will return a null reference if the matching node is not found. SelectNodes is a collection of Html Nodes.

static void ParseHtml(string htmlData) {
  try {
    Console.WriteLine("### Started Parsing HTML.");
    var coinData = new Dictionary < string,
      string > ();
    HtmlDocument htmlDoc = new HtmlDocument();
    htmlDoc.LoadHtml(htmlData);

    var theHTML = htmlDoc.DocumentNode.SelectSingleNode("html//body");
    var cmcTableBody = theHTML.SelectSingleNode("//tbody");
    var cmcTableRows = cmcTableBody.SelectNodes("tr");
    if (cmcTableRows != null) {
      foreach(HtmlNode row in cmcTableRows) {
        var cmcTableColumns = row.SelectNodes("td");
        string name = cmcTableColumns[2].InnerText;
        string price = cmcTableColumns[3].InnerText;
        coinData.Add(name, price);
      }
    }
    WriteDataToCSV(coinData);
  } catch (Exception ex) {
    Console.WriteLine("ParseHtml Failed: {0}", ex.Message);
  }
}

Step 5 - WriteDataToCSV() method

In this example, we have taken the currency name and its price from the scraped data, and store it in a CSV file.

static void WriteDataToCSV(Dictionary < string, string > cryptoCurrencyData) {
  try {
    var csvBuilder = new StringBuilder();

    csvBuilder.AppendLine("Name,Price");
    foreach(var item in cryptoCurrencyData) {
      csvBuilder.AppendLine(string.Format("{0},\"{1}\"", item.Key, item.Value));
    }
    File.WriteAllText("C:\\Cryptocurrency-Prices.csv", csvBuilder.ToString());

    Console.WriteLine("### Completed Writting Data To CSV File.");
  } catch (Exception ex) {
    Console.WriteLine("WriteDataToCSV Failed: {0}", ex.Message);
  }
}

Step 6 - Main() method

Replace content of Main method with below code:

static async Task Main(string[] args) {
  await GetDataFromWebPage();
}

Output

Go to the path where you have created the CSV file and open that file. Here our CSV file path is “C:\Cryptocurrency-Prices.csv”.

We can see CSV file as output that contains two columns, 1) Currency Name and 2) Price. We can get required column and data based on our need.

Cryptocurrency-Prices

Conclusion

In this article, with the help of ScraperAPI and HtmlAgility Nuget Packages, we can scrap the data from site, filter the require data and dump into CSV file.


Similar Articles