AQTRONiX WebKnight - Robots
Robots
WebKnight blocks bad robots or bad user-agents in four possible ways:
Robots Database
WebKnight uses a robots database to block known bad bots or any additional bots the administrator specifies. This robots database is the file Robots.xml located in the WebKnight folder of your installation.
Download and overwrite the existing file in your WebKnight folder to have the latest database of known robots. WebKnight will automatically detect and load the new file.
Download the latest version of Robots.xml (right click and choose save as...).
When WebKnight blocks
- a user agent specified in the User Agent section of the WebKnight configuration
- a robot that is specified in the Robots section of the WebKnight configuration
it will generate a log entry with any or both of the following messages:
BLOCKED: User Agent not allowed
BLOCKED: '[token]' not allowed in User Agent
To know what the robot / user agent is all about and why it was blocked:
Lookup the user agent in our database
If you want to allow a certain blocked robot, you need to remove the [token] and User Agent from the robots.xml file or uncheck the appropriate item in the Robots section of WebKnight configuration. To know in which section your robot is, you can lookup the agent.
Bad Bot Trap
This feature enables you to block robots that do not obey robots.txt. This consists of three things:
- Robots.txt file
- Hidden links to lure bad robots
- WebKnight configuration
Add the bot trap urls to your robots.txt file. The robots.txt file should look like this (you can also find a default robots.txt file with the installation):
User-Agent: *
Disallow: /forbidden.for.web.robots/
If you have an existing robots.txt, make sure to add the Disallow statement to all of your User-Agent sections as well.
Now to lure a bad bot into those urls, add these urls with hidden anchors (you don't want anyone to actually click on this link) in your web site:
<a href="/forbidden.for.web.robots/"></a>
Make sure this folder is also added in the WebKnight configuration to the "Deny Bots BotTraps" in the section Robots. To catch all bad robots, make sure to not add the ending forward slash in the WebKnight config file, because some bots request the file without a slash, if it gets a redirection, then it knows the folder exists, if not, it knows it is a bad bot trap. When WebKnight detects access to these urls, it will block the robots for several hours (by default 36 hours). Blocking is done by combination of IP address and User Agent. You will see this in your log file:
BLOCKED: Bad robot fell for trap: '/forbidden.for.web.robots'
BLOCKED: Robot not allowed until timeout expires
Aggressive Bot Trap
This filter enables you to block robots that are requesting too many pages in a short period of time. By default, this filter is not enabled. When enabled and using the default settings, robots that are requesting more than 180 requests in 3 minutes after their initial request for robots.txt, will be blocked. Blocking is done by combination of IP address and User Agent. You will see this in your log file:
BLOCKED: Aggressive robot not allowed
If robots do not request the file robots.txt, they will not be seen as robots and will not be blocked by this filter. If you want to block aggressive users as well, you can block them in the Connection settings (Use Connection Requests Limits).
Robots.txt Cloak
WebKnight 2.5 and later supports url rewriting. Requests for robots.txt can be mapped to a server side script without redirecting the client. IIS will execute this file, but the robot/browser will still see the file robots.txt in the url. This enables you to block certain robots or hackers from seeing the true contents of your robots.txt file.
In addition you can set session variables or block the IP address at the web application level.
A sample ASP script (robots.asp) is provided with WebKnight. To use this file, copy it to the root of your website and enable the Dynamic Robots setting.
Published: 17/04/2007 | Document Type: HOWTO |
Last modified: 10/01/2017 | Target: Administrator |
Visibility: Public | Language: English |
|