TECHNOLOGIES
FORUMS
JOBS
BOOKS
EVENTS
INTERVIEWS
Live
MORE
LEARN
Training
CAREER
MEMBERS
VIDEOS
NEWS
BLOGS
Sign Up
Login
No unread comment.
View All Comments
No unread message.
View All Messages
No unread notification.
View All Notifications
Answers
Post
An Article
A Blog
A News
A Video
An EBook
An Interview Question
Ask Question
Forums
Monthly Leaders
Forum guidelines
Sarah Reynolds
NA
32
11.4k
Matching a URL
Jan 27 2012 4:15 PM
Hi Everyone,
I'm trying to locate a websites position in google using c# and regex. I can match where the domain to check is a simple domain ie: 'mywebsite.com' but it doesn't work when the domain to be checked is 'mywebsite.com/a-product-name-p-23.html'
I assume this is the regex I am using but for the life of me I cannot work out what this should be. Basically I want to run the script to check my website pages position in serps but the url's could all be very different.
My code at the moment is:
public int GetPosition(Uri url, string searchTerm)
{
string raw = "http://www.google.co.uk/search?q={0}&num=100&hl=en&lr=&ie=UTF-8&safe=off&output=search#q={0}&hl=en&lr=&safe=off&prmd=imvns&ei=avgiT6HOGILPsgaRwfHpCA&start=0&sa=N&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=c7b3e04f0f892e66&biw=1366&bih=624";
string search = string.Format(raw, HttpUtility.UrlEncode(searchTerm));
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(search);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.ASCII))
{
string html = reader.ReadToEnd();
return FindPosition(html, url);
}
}
}
private static int FindPosition(string html, Uri url)
{
string lookup = "(<h3 class=\"r\"><a href=\")(\\w+[a-zA-Z0-9.-?=/]*)";
MatchCollection matches = Regex.Matches(html, lookup);
for (int i = 0; i < matches.Count; i++)
{
string match = matches[i].Groups[2].Value;
if (match.Contains(url.AbsoluteUri))
return i + 1;
}
return 0;
}
An example of the data passed into 'string html' is:
<h3 class="r"><a href="http://www.bbc.co.uk/news/" class=l onmousedown="return rwt(this,'','','','2','AFQjCNE9TwMbS0bHLzYb5kDoTlS2JM66mw','','0CFUQFjAB',null,event)">BBC <em>News</em> - Home</a>
If I use Keyword 'World News' and URL 'bbc.co.uk' then a serp position is retrieved but if I use the URL 'http://news.sky.com/home/world-news' OR 'www.msnbc.msn.com/id/3032507/' no serp position is retrieved despite both urls being on page1 of the serps.
I guess the regex cant match the more complex urls?
Any regex experts here who might be able to point me in the right direction or is there an easier way to do this in c#?
Many thanks, Sarah
Reply
Answers (
13
)
How to add a tree structure dynamically?
Fill Random?