Introduction
This article explains how to implement C# code using ASP.NET in your website and web page to make a better web search for your website using just a few lines. This article explains the algorithm you can use to find the content inside your database and provide it to the user related to nearly every field in your database.
Background
I have seen many people asking how to build a better search engine, well I can't provide them the source code for Google or Bing, but I can tell them what the search engine actually does (just the database part, since I won't be working with meta tags, spiders and crawlers and so on) and how it provides the results back to the user depending on the query he provided to the server. Maybe in the future this article would help some other developer to make a search engine for his own ASP.NET web site or for his company.
Environment Requirements
This project was built using WebMatrix. If you already have it installed on your system, then don't worry, just skip this section of the article. If somehow you don't have WebMatrix, you can always get it from Microsoft's Web Platform Installer.
Just go to the Microsoft's website, (http://www.microsoft.com/web) and download the installer from that location. Once done, the software would install a software program in your machine. Don't worry, just accept and continue the process this would install the WebMatrix and all the related software that are required to run this website on your own personal computer machine. Otherwise this website project won't work and you'll not be able to test this project in your computer.
SQL Server CE is a must, since we're working with Databases, SQL Server CE is a free file based database that you can get for free and work with. If you're going to work with SQL Server Express or any other edition, feel free to do so, just ensure you're having the Connection String added to the web.config file of the project.
What a Search Engine is
Like all other internet terms, this term is also a fancy term where users get a highly agitated feeling about something belonging to some sort of alien stuff, but remember, this is just an English term, "Search Engine". In reality, this is just software or logic or algorithm to find the content that you're looking for.
For example, Google, they're not an alien company, they're just a company that provided the result that you're trying to look for. They ask you for a query and then run code against their database and look for the best fit for that. That is, they extract and use some special kind of ASP.NET (C#) code they get that gets results that suit you the best.
A similar thing can be done on a personal website too. You can create a database for your work and after it's done, you can create a table that you'll have the content saved in. Then, once you're done, you would know where and how to look forward for the data that the user is asking for. This is what other companies do, they use your query and look in their database to get the related data.
For example, when you search for "Christmas", they search for everything that has Christmas in it, like "Merry Christmas" and they provide the result of "Merry Christmas". There is no Black magic going around on the web server. It is just code and some data in their hard drive.
Using the Project
This project has everything set up for you, a simple search field (although not a search field, just an input field) and a button to work with. Once you submit it, it goes to the IIS Server and requests that the database data be read. IIS does so and provides the data back to the request. Then the response is generated with the content from the database that matches (or does not match) the query.
After this stage, the remaining job is handled by the code that we have inside the body element. Where we check if there is any data or not, if there is, then we find the values and other content and so on.
Building the Database
The database I have used in this project is SQL Server CE that you can get from Microsoft and use it for free. But remember it is (or soon will be) deprecated. So, I would prefer you to do yourself a favor to upgrade the server to SQL Server Express. It is also a free database from Microsoft and you can enjoy all of the features of SQL Server.
There is only one table in that database. It contains the ID that we will extract when there is a match and a PostTitle column for the Title search and a PostContent column for the content search. We will search for them both for any match that we can get into.
Note: This content was extracted from my Wordpress accounts Link1 and Link2.
Using the website
As already said, this website makes use of a simple HTML form, that gets an input from the user as in the following:
This is an empty form and it has not been filled with any value. Fill it with the following value and see what happens.
This form has been filled with enough details, now you can move forward and click the Submit button. Once you do so you will see the following results.
As you can see, the results have come out that have at least 1 occurrence of the term "Software". Others have been discarded from the List. Only the content that is related is posted on the website. There is another result showing the Posts whose content includes the term "Software". So this website searches for both, Title and Content for the Query. You can edit the UI to make it a better fit for your application, to either show the titles or the content or both as already done!
Now let's try another query, why not try a special character one? Try writing "C#" to the query box and hit the search button, you'll see the following result this time.
This might look like the previous one to you, but it is not. You can see that the post this time with the Title is a new post that was something else. So, you can see the code search the database for the new and fresh data every time, in other words it is not using a cache service. It continues to search for the relative data and provides you with the accurate queries that you're looking for and discards the remaining items that you don't want to use anywhere or anymore.
What about the terms that we don't have in our database?
Well, this is another feature of this website, that doesn't show you any exception or error or any bad UX. It tells you politely about the error that you're facing. It tells you that there is no result for the Query you entered, simple as that.
You can try to write "WebMatrix" in the search box and search for it, the following page would be generated.
This will be the result provided by the ASP.NET code and you can understand that currently you don't have any of the content that works with WebMatrix so you need to put some content into the database so that the user can get some results for the WebMatrix thing from your website.
Code for the website
The code for the website is pretty simple, it actually works as a search engine but not as powerful since there are only a few of the lines of the server-side ASP.NET code. Which handles the database handling, Query extraction and the rendering of the UI HTML too.
- @{
- Layout = "~/_SiteLayout.cshtml";
- Page.Title = "Home Page";
- }
-
- @{
- bool executed = false;
- List<int> titleResultStrings = new List<int>();
- List<int> contentResultStrings = new List<int>();
-
- if(IsPost) {
- // request was made, get the input and search the database.
- var query = "%" + Request.Form["input"] + "%";
- var db = Database.Open("StarterSite");
-
- // select the Query, parameters on
- var selectTitleQuery = "SELECT * FROM Content WHERE PostTitle LIKE @0";
- var selectContentQuery = "SELECT * FROM Content WHERE PostContent LIKE @0";
- var titleResult = db.Query(selectTitleQuery, query);
- var contentResult = db.Query(selectContentQuery, query);
-
- // append the post id to the list
- foreach (var row in titleResult) {
- titleResultStrings.Add(row.PostId);
- }
-
- foreach (var row in contentResult) {
- contentResultStrings.Add(row.PostId);
- }
-
- // got the content, now do the C# on it!
- executed = true;
- }
- }
-
- <form method="post">
- Write the input and get the result!<br />
- <input type="text" name="input" /><br />
- <input type="submit" value="Submit" />
- </form>
-
- // code executed
- @if(executed) {
- <p>Code has been executed!</p>
- // if the code was executed show the result.
- if(titleResultStrings.Count() != 0) {
- var db2 = Database.Open("StarterSite");
- <h4>Posts whose title have this character are</h4>
- int i = 0;
- foreach (var item in titleResultStrings) {
- // get the data for each item!
- var post = db2.Query("SELECT * FROM Content WHERE PostId = @0", titleResultStrings[i]);
- i++;
- foreach (var row in post) {
- <p>@row.PostTitle</p>
- }
- }
- } else {
- <p>No post for query "<b>@Request.Form["input"]</b>" was found!</p>
- }
-
- if(contentResultStrings.Count() != 0) {
- var db2 = Database.Open("StarterSite");
- <h4>Posts whose content have this character are</h4>
- int i = 0;
- foreach (var item in contentResultStrings) {
- // get the data for each item!
- var post = db2.Query("SELECT * FROM Content WHERE PostId = @0", contentResultStrings[i]);
- i++;
- foreach (var row in post) {
- <p>@row.PostContent</p>
- }
- }
- } else {
- <p>No post for query "<b>@Request.Form["input"]</b>" content was found!</p>
- }
- }
The code is pretty easy once you go through it. It is just 20% algorithm, 30% database extraction and query management and 50% HTML rendering for viewing the results.
Preventing SQL Injection
SQL injection is a method used by a hacker user to break your SQL query and do badly to your database. There are many types of injection that might edit the content in your database, update it or even delete your tables. The example query for the database search is the following:
- var result = db.Query("SELECT * FROM table_name WHERE column_name = value");
If you values that are valid, then it would execute correctly. But user attempts to a value that might break the query, the entire database might be exposed to him by the server. Suppose the value to be: "'; DROP TABLE table_name --".
The preceding value would break the query, delete the table and comment out (-- is a comment in SQL) the remaining query code.
One thing that you can do to handle this kind of situation is to prevent the Query from breaking. Until the query is Okay, the database won't be exposed even if there is no match for the data. I have made a new variable that the code would fill up! Since the ASP.NET Web Pages don't allow the parameterization inside the query, the variable is filled with the data and ed on.
- var query = "%" +Request.Form["input"] + "%";
- var selectTitleQuery = "SELECT * FROM Content WHERE PostTitle LIKE @0";
- var selectContentQuery = "SELECT * FROM Content WHERE PostContent LIKE @0";
This preceding code does the job.
Points of Interest
I have learned something new today, that is that I can convert from IEnumerable to a List object by adding each of the objects from IEnumberable to the List object! The second thing I learned is that it is not necessary to always use a class in the List object, you can use any data type since it is a generic data type object.
Another thing is that I have learned is the user of @ operator, I already did know, but I kind of forgot it and so today I learned it. That you cannot use @ inside @.