Should I block the Yandex bot?

Should the Yandex Russian Search Engine Bot be blocked from spidering your pages?  I believe it should.  I did some serious testing using server logs and honeypots and this bot currently does not respect robots.txt files.  Worse still it applies such a server load that it must be contained.

I initially had this code (amongst others) in the .htaccess file:

SetEnvIfNoCase User-Agent "^Yandex bot" bad_bot

<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

but because it’s a persistent little critter I now have this as the first line:

SetEnvIfNoCase User-Agent "^Yandex*" bad_bot

If it bothers me any more, I shall start to fight back and start a campaign against it.  One of our team got so fed up he did this:

# permanently redirect specific IP request for entire site
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{REMOTE_HOST} 77\.88\.26\.27
RewriteRule \.shtml$ https://www.youtube.com/watch?v=oHg5SJYRHA0 [R=301,L]

Now the Yandex bot gets RickRolled every visit.  Imagine half a million sites doing this…..

Leave a Reply

Your email address will not be published. Required fields are marked *