Creating A Link And Filter In C#

Naveen Arumugam
8y
15.9k
0
4
25
Blog

Introduction

In this blog, we will learn how to extract all the links from a Webpage, using a Web client. Thus, without wasting time, let's dive directly into the code.

Step 1

Thus, we are creating a link grabber. For it, we need some logic and it's always a good idea to clarify the logic before creating something. Thus, let's define the logic.

We need a link for the page to crawl. We can get the link from a TextBox.
Now, we have the link. The next step will be to download the Web page to crawl. We can either use a Web client for it or a WebBrowser control.
Now, we have a HTML document. The next step is to extract the links from that page.
As we know, most of the useful links are contained in href attribute of the anchor tags.
Now, up to that point, we know that we want to grab the anchor elements of the page. Thus, we can do this, using getElementsByTagName().
Now, we have the collection of all the anchor elements.
The next step is to get href attribute and add it to a list. Let this list be a check box list.
Now, we have all the extracted links.

Step 2

Open Visual Studio and choose "New project".

Now, choose "Visual C#" -> Windows -> "Windows Forms Application".
Now, drop a text box from the Toolbar onto the form.
Now, drop a button from the Toolbar onto the form and name it "grab".
Now, add one check list box from the Toolbar menu onto the form.
Now, double-click on the button to generate the click handler.
Add the code, mentioned below, for the click handler.

The following code is

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace linkGrabber {
public partial class Form: Form {
public Form() {
InitializeComponent();
}
private void button_Click(object sender, EventArgs e) {
WebBrowser wb = new WebBrowser();
wb.Url = new Uri(textBox.Text);
wb.DocumentCompleted += wb_DocumentCompleted;
}
void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) {
HtmlDocument source = ((WebBrowser) sender).Document;
extractLink(source);
}
private void extractLink(HtmlDocument source) {
HtmlElementCollection anchorList = source.GetElementsByTagName("a");
foreach(var item in anchorList) {
checkedListBox.Items.Add(((HtmlElement) item).GetAttribute("href"));
}
}
}
}

Conclusion

In this blog, we learned about creating a link extractor and filter in C#.