Implementing a middleware layer that filters incoming HTTP requests for known bot "User-Agent" strings is a straightforward way to prevent unwanted bot traffic in private online apps or websites. The middleware intercepts Each request, which extracts the "User-Agent" header and compares it to a predefined list of user-agent strings typically associated with bots. If a match is detected, the request is forwarded to another page, preventing the bot from accessing the intended resource.
Here is a sample code for Program.cs in ASP.NET Core Application.
app.Use(async (context, next) =>
{
var userAgent = context.Request.Headers["User-Agent"].ToString();
if (new Bots.BotsAgents().IsBotAgent(userAgent))
{
context.Response.Redirect("MyBotPage.html");
}
await next();
});
In this case, I redirect to a static MyBotPage.html at line 7, but you can redirect to any URL you wish.
This is the code for namespace Bots.
namespace Bots;
public class BotsAgents
{
private readonly HashSet<string> Bots = new HashSet<string>(StringComparer.Ordinal)
{
"Googlebot",
"Bingbot",
"Slurp",
"DuckDuckBot",
"Baiduspider",
"YandexBot",
"Sogou",
"Exabot",
"facebookexternalhit",
"LinkedInBot",
"Twitterbot",
"Pinterestbot",
"WhatsApp",
"DotBot",
"spbot/",
"James BOT",
"baidu",
"Baidu",
"/bot",
"semantic-visions.com",
"spider",
"lipperhey",
"linkdexbot/",
"MJ12bot/",
"Lipperhey-Kaus-Australis/",
"BDCbot",
"AhrefsBot",
"SemrushBot",
"Alexa",
"Uptimebot",
"Crawl",
"Spider",
"PageSpeed",
"ZoominfoBot",
"Adidxbot",
"BLEXBot",
"SEOkicks",
"BlackWidow",
"BotALot",
"Buddy",
"BuiltWith",
"Curl",
"DISCo",
"Dotbot",
"Feedfetcher-Google",
"Geekbot",
"GrapeshotCrawler",
"GT::WWW",
"HTTP::Lite",
"HubSpot",
"ia_archiver",
"Jetbot",
"JetBrains Omea Reader",
"Mechanize",
"NetcraftSurveyAgent",
"Nutch",
"Outbrain",
"Python-urllib",
"rogerbot",
"ShowyouBot",
"SiteExplorer",
"Slackbot",
"Teoma",
"Twingly Recon",
"Via",
"Wget",
"Xenu Link Sleuth",
"ZmEu"
};
public bool IsBotAgent(string cBot) =>
Bots.Any(bot => cBot.IndexOf(bot, StringComparison.Ordinal) != -1);
}
To minimize login page overload, our method is simple and intended to repel only the most well-known bots.
It is advantageous for small to medium-sized applications that want to reduce the impact of bots on server traffic, analytics, and security. However, it should be noted that, while helpful, this strategy has flaws. Some bots may disguise their user-agent strings to avoid detection by such filters, and there is also the potential of false positives, in which legitimate user requests are incorrectly labeled as bots. As a result, for comprehensive protection, this method should be combined with security measures such as rate limitation and CAPTCHA difficulties.
Will you implement something like that, or have you already implemented a user agent block policy? Tell me!