Software Monoculture
A lot of the blogs I read have been deluged in recent days with robots posting comments.
I got a couple of robot comments posted over the weekend, before I took steps to deal with the problem. Since then, I have not gotten any (though I have seen numerous attempts).
The problem is simple. All MovableType blogs have a comment-entry CGI script mt-comments.cgi
and — by default — the comment-entry template makes no attempt to prevent Search engines from indexing it. The result? If you go to Google and search for mt-comments.cgi
, you’ll get millions of hits of MT comment-entry forms. Write a 'bot to post comments to that form and sit back and enjoy watching your Google PageRank explode. How could any spammer resist?
This needed fixing and, after the first shot over my bow, I wasted no time in taking the following steps to put a stop to it.
- Add a
line to the comment-entry template, so they don’t get indexed in the future.<meta name="robots" content="noindex,nofollow" />
- Change the name of the CGI script so that the previously-indexed one is inaccessible and spammers can’t go after the new one with a shot-in-the-dark URL.
- Point to the new script in
mt.cfg
:
and rebuild your blog pages.CommentScript somenewname.cgi
- Sit back and enjoy watching spammers hammer away, attempting to access the old location of the comment-entry CGI script (adding their IP addresses to your IP Ban List).
But what of the future?
Once spammers tire of this little game (I give 'em another month, maybe), there are several directions they can go. Needless to say, I think I’m ready. But I’m not going to give the game away just yet. Check back in a few months to read about the next stage in the arms race.
Update (10/16/2003): Ben Trott weighs in:
We’ve all seen that comment spam is becoming a serious problem. Particularly on Movable Type weblogs, where the generated pages are all very similar in structure and semantics, …
Yeah, Ben that’s the problem, which is why content-based filtering is not really the solution. The real solution is to make robot-posting (regardless of content) infeasible. The above suggestions are the first step in that direction. I’ve implemented some further safeguards on this blog (which can be revealed by some assiduous viewing of source) and I’ve a few more tricks waiting in reserve for when the chickenboners wise up.
Update (10/16/2003): In a comment to this entry, I wrote “I, personally, prefer the CGI script to simply go ‘404’.” That, of course, is silly. What I really want is for the CGI script to go “410” (permanently gone). That’s a one line addition to the mod_rewrite
rules for the MovableType CGI directory (which have been modified to reflect the new comment script location):
RewriteRule ^mt-comments - [G]
Update (11/17/2003): One month later, and still spam-free. Read this followup article for some further thoughts.
Posted by distler at October 15, 2003 10:02 AM
Re: Software Monoculture
Why shouldn’t comments be indexed? Are they not valid content? I think all comment spam solutions should follow a hippocratic rule. By the way, new problem: if I’m in this comment box and I press tab twice, the window closes. Moz 1.5 RC2.