Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

May 20, 2007

Rewrite Magic

When you deal with the levels of trackback spam that we do, your defences have to be pretty fine-tuned. I’ve mentioned that I redirect trackback spammers to a tarpit. Here I’d like to explain one of the neat mod_rewrite tricks involved.

I maintain a DBM database of trackback-spammer IP addresses. How the database is populated is a closely-guarded secret. Suffice to say that it consists of key/value pairs, where the key is an IP address and the value is the relative URL of a tarpit CGI script.

The first thing to do was to move the trackback CGI script out of the way:

% mv mt-tb.cgi hidden-tb.cgi

Then, in the server-config section of my httpd.conf file, I added a RewriteMap directive


RewriteEngine on
...
RewriteMap tb_spammers dbm:/file/path/to/the/database

Finally, in the Directory-config section for my MovableType installation, I placed


RewriteEngine on
...
RewriteCond %{REQUEST_URI} mt-tb\.cgi(.*)
RewriteRule .* ${tb_spammers:%{REMOTE_ADDR}|hidden-tb.cgi%1} [L]

Legitimate trackbacks are redirected to the trackback script. Spammers are redirected to the tarpit.

The advantage of using a database are several

  1. No need to restart the server when IP addresses are added/deleted.
  2. It can handle a large number of IP addresses with equanimity.
  3. Lookup results are cached in-core until the mtime of the database changes or the server is restarted. So this method is much faster than any of the alternatives.

mod_rewrite is an incredibly powerful tool. RewriteMap makes it even more powerful, by letting you interface with external databases or programs. In my case, it helps me keep the trackback spammers on a short leash.

Posted by distler at May 20, 2007 11:12 PM

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/1282

1 Comment & 0 Trackbacks

Re: Rewrite Magic

I love the quote at the top of the mod_rewrite docs: “The great thing about mod_rewrite is it gives you all the configurability and flexibility of Sendmail. The downside to mod_rewrite is that it gives you all the configurability and flexibility of Sendmail.” —Brian Behlendorf

(This is true to an extent few people realise. Like sendmail.cf, mod_rewrite rulesets are Turing-complete.)

Posted by: Aristotle Pagaltzis on May 21, 2007 7:01 AM | Permalink | Reply to this

Post a New Comment