Bots can be categorized into good bots or bad bots.
Good bots are beneficial to all online businesses. They help in creating the required visibility of the sites on the internet, and also help these businesses achieve an online authority.
When you search for a site or phrases related to the site's products or services, you get relevant results listed on the search page. This is made possible with the help of search engine spiders/bots, or crawler bots.
Good bots are regulated. There's a specific pattern to these types of 'regulated' bots and you also get the option to tweak the crawler activity on your site.
These bots help in improving the website’s SEO.
Bad bots don’t play by the rules. They're illegitimate and have a definitive ‘malicious’ pattern and are mostly unregulated. Imagine thousands of page visits originating from a single IP address within a very short span of time. This activity stresses your web servers, and chokes the available bandwidth. This directly impacts those genuine users on your site, trying to access a product or service.
Shield Security is focused on helping you detect characteristics and behavior commonly associated with illegitimate bots. To achieve this, we use bot detection rules, or "bot signals".
Signals are just behaviours that bots have which indicate that they could be a bot. With enough of these behaviours, we can detect whether a particular bot is legitimate (good bot) or not.
What is Fake Web Crawler signal?
Picture the Google web crawler for a moment – the bot that scans your website for search engine results. How do we know that this particular bot is an official Google bot? Because it tells us it is, through the User Agent ID.
When browsing to a web page, your browser will send along a piece of text that gives the web server details of the browser you’re using. For example, that it’s Google Chrome, the version is 73.0.3686, it’s 64-bit, etc. This is the User Agent ID.
The Google web crawler does the same thing, and it usually has the text ‘Googlebot‘ in there somewhere, or something similar.
But what’s to stop all bots throwing in Googlebot into its User Agent string and making us think it’s a Google web crawler?
Absolutely nothing, of course. And they do.
What can we do about that? Well it turns out there are ways to confirm whether a bot is really a Google bot, or not. Shield Security has been doing this for a long time already.
And because we can confirm whether a bot is real-Google or fake-Google, we can now use this as a ‘bad bot’ signal. We can confidently state that a fake-Google user agent is a bad bot, masquerading as a good one.
What is Empty User Agents signal?
We discussed what User Agents are, above. All normal traffic by people have user agents. But some bots can be sloppy and neglect to include a user agent into their requests. This suggests that perhaps they’re bot.
Care needs to be taken with this setting as not all webhosts are properly configured to populate the user agent request value in PHP. This can make it appear that there’s no User Agent sent with a request, when there is. You’ll need to test this on your hosting platform to ensure that you can use this signal.
How to detect behaviors common to bots
Shield provides 2 effective ways ("bot signals" explained above) for detecting characteristics and behavior commonly associated with illegitimate bots. It achieves this through its Detect Bot Behaviors feature. You can use this feature to:
- Detect fake search engine crawlers
- Detect requests with empty user agents
To access the Detect Bot Behaviors feature, simply go to the Shield main menu => Settings => Block Bad IPs/Visitors => Detect Bot Behaviors:
Here you'll be able to configure each of bot signals independently from each other and you’ll also be able to decide how you want Shield to respond. You’ll have 4 options to choose from:
- Audit Log Only. This option lets you see the activity of these bots on the audit trail before applying any transgressions or blocks to offenders. It’ll let you test-drive the signal before making it take effect.
- Increment Transgression (by 1). This option puts another black mark against an IP. As always with the transgression system, once the limit is reached for an IP address, it is blocked from accessing the site.
- Double Transgression (by 2). We’ve added the ability to give weight to certain behaviours. By allowing the transgression counter to increment by 2, the IP will reach the limit more quickly, and be blocked sooner.
- Immediate block. If you decide that a particular signal on your site is severe enough, you can have Shield immediately mark that IP as blocked.
Hint: You may also want to use Traffic Watch Viewer to review all logs of HTTP requests made to your WordPress site.
Read more about the Traffic Watch Viewer here.
We also recommend you to read:
Note: The Detect Bot Behaviors feature is available with the Shield Pro only. To find out what the extra features for Shield Pro are, read the article here.