unwelcome user agents
Posted by ark, ,
The following useragents are not welcome on my website:

SetEnvIfNoCase User-Agent "^Biz360" bad_bot
SetEnvIfNoCase User-Agent "^Blogslive" bad_bot
SetEnvIfNoCase User-Agent "^Cazoodle" bad_bot
SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "^FeedLounge" bad_bot
SetEnvIfNoCase User-Agent "^OmniExplorer" bad_bot
SetEnvIfNoCase User-Agent "^Sphere" bad_bot
SetEnvIfNoCase User-Agent "^SurveyBot" bad_bot
SetEnvIfNoCase User-Agent "^edgeio" bad_bot
SetEnvIfNoCase User-Agent "^ia_archiver" bad_bot
SetEnvIfNoCase User-Agent "^nutch" bad_bot
SetEnvIfNoCase User-Agent "^panscient.com" bad_bot
SetEnvIfNoCase User-Agent "^ping.blo.gs" bad_bot
SetEnvIfNoCase User-Agent "^topicblogs" bad_bot
SetEnvIfNoCase User-Agent "^Moreoverbot" bad_bot
SetEnvIfNoCase User-Agent "^BlogSearch" bad_bot
SetEnvIfNoCase User-Agent "Twiceler" bad_bot
SetEnvIfNoCase User-Agent "^BlogPulse" bad_bot
SetEnvIfNoCase User-Agent "FreeMyFeed" bad_bot
# SetEnvIfNoCase User-Agent "^Java" bad_bot
I didn't have the courage to deny all Java folks.

Then I just have this in my apache config

<Directory /home/ark/html/>
Order allow,deny
allow from all
Deny from env=bad_bot
</Directory>
I'll try and keep this post up to date. Mostly you get on this list if you're a robot that's crawling (and indexing) my rss feeds.

Comments

Posted Wednesday 23 April 2008 Share