The Pace of Innovation

December 17, 2004

The Pace of Innovation

If you read any other blogs besides this one, you’ve probably heard that MovableType blogs are currently being hammered by a “new breed” of spambot. Webhosts are shutting down MT installations in self-defence. SixApart has offered soothing words that a fix is on the way.

As it turns out, the spammers in question had been visiting golem, upwards of a thousand times a day, until I … umh … made them go away. But to know that, you’d have to read my server logs. They had no noticeable effect on server load or on the amount of comment spam I’ve received (4 in the past two months).

Gleening what I can from the experience of others, what strikes me is how little innovation a spambot writer needs to exercise to get his 15 minutes of fame. This “new breed” of spambot differs from the ones in common usage a year ago in only two respects

Taking a page from the crapflooders, it operates from behind multiple anonymous proxies. Thus it can deposit hundreds of spam comments in a short interval of time, while evading MT’s lame-ass, ineffective comment throttle¹.
Upon finding a comment-script to post to (by looking for the POST method of the comment-entry form), it cycles through the BlogID parameter, thus depositing spam on all of the blogs hosted by that MT installation.

Other than that, apparently, it’s no more sophisticated than the easily-defeated spambots of yesteryear. You don’t even need to do something mildly sophisticated to render it ineffective.

So why the brouhaha?

Because an unprotected MT installation will buckle under the load of a crapflood (and, apparently, because MT3.x is even worse than MT2.x in this regard). So this spambot’s author’s decision to go for quantity over quality has made for a lot of unhappy people, right now.

Trackback Spam (Update)

Lest anyone think I am totally sanguine about beating these spammers, let me hasten to say that I am not. Changing only a few lines of code in their spambot, they could convert to sending Trackback spam instead of Comment spam. And neither I, nor anyone else, would have any good way of defending against them. Sure trackback throttling (incorporated into MT 3.1) would stem the tide. But, unlike comments, which can be made difficult for machines to POST, trackbacks are supposed to be POSTed by machines.

My one thought on the matter was to demand that the URL of the trackback resolve to the same IP address as where the trackback ping came from, or at least to the same /24. Theoretically, the blogging software that produced the trackback ping is running on the same host as website in question. Trackback pings sent by spammers via anonymous proxies, however, would not match.

I was halfway through coding up a plugin for MT 3.1, when it occurred to me to wonder how my existing, legitimate, trackback pings would have faired under such a system. So I wrote a little script to comb through the trackback pings I’ve received, and retrospectively apply this test to them. The results were a bit disappointing:

Of 172 trackback pings,

65 matched IP addresses exactly
13 more were in the same /24
89 didn’t come even close to matching
5 DNS lookups failed

This exaggerates the problem somewhat. Some of these pings are over 2 years old. If the person changed webhosts in the interim, the IP addresses certainly wouldn’t match today, even if they did match when the trackback was originally POSTed. But still, it looks like this strategy would have blocked over half the legitimate trackbacks I’ve received.

Back to the drawing board…

Update 2: Endearing Moderation

Chad Everett has created a plugin that has similar functionality to my forced comment previews, without the need to hack the MT source-code. Actually, as far as I can understand, his plugin doesn’t actually force a preview so much as toss any non-previewed comments into a moderation queue. As I’ve explained, mine was primarily a way to enforce XHTML validation and only secondarily to deter comment spam. So I say, “Moderation, schmoderation! If it hasn’t been previewed/validated, you can’t post it.”

I actually got another spam comment tonight, which was of the endearing, hand-crafted sort that you see once you’ve eliminated the mechanized ones. The author came in on the following Google search

http://www.google.ca/search?q=.edu+mt-comments&hl=en&lr=&c2coff=1&start=60&sa=N

deciphered, as best he could, the entry in question, and the left the comment

I have a blog with XHTML 1.1 and I have tried employing your Mathenable feature and I keep getting the same error code default #F3423 substring expected could I be applying it incorrectly as maybe my lib/mt/app is a different library version.

Really delightful! Alas, his URL no longer points to some schlocky bargain-finds-on-ebay site….

¹ If you want effective comment throttling, you need to install a plugin (a plugin!).

Posted by distler at December 17, 2004 3:40 PM

TrackBack URL for this Entry: https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/486

5 Comments & 0 Trackbacks

Re: The Pace of Innovation

The plugin solution only solves a small part of the problem… for MT3.X users. There is still a whole slew of MT2.X users out there, and these plugins are not backward compatible… but I think 6A has now learned that the problem is not version-specific.

The solution at the application level is going to have to involve a plethora of blog-specific changes. In other words, monoculture is our enemy, just as monoculture in the forest can wipe out an entire forest. Even among all MT users, we will need to find solutions that are not MT-community-wide because that is a weakness in our defense against the spambots. The solution will probably not be to just install a plugin and hope the problem goes away, because if the plugin is not forward and backward compatible, it’s only a bandaid and the spambots will eventually get around it.

My hope is that the folks at 6A realize that patching MT3.X is only a partial solution, and does nothing for those bloggers still running MT2.6XX. They may want to leave those people behind, but I don’t recommend that because it’s not smart business… the surest way to convince people to upgrade, and the best public relations act they could perform, would be to embrace those people still using the earlier versions – some of us do have legitimate reasons for still running the old software… in my case, it’s temporary, but the spambot problem is something I have to deal with today, especially if I want to keep my webhosts from shutting MT down on the server (and that’s my goal). We don’t want webhosts to start refusing MT installations, even those of the latest and greatest version, so we have to figure out solutions that will work for all MT users no matter what version they are using now or in the future.

My point, I guess, is that any fix that is application-wide will, likewise, be only a temporary fix until the spambots learn the new routine. What is needed on the application level is a fix that will somehow be unique to each blog, making the spambot people’s job a lot harder. For example, why does the comment script have to be exactly the same on every MT installation of a given version? Why does the base installation have to be the main blog (it isn’t on my setup)? And why not make the base installation of MT immune from googlebot via robots.txt file (installed automagically when MT is installed) at the base level of the MT installation) so that the “unique” comment script name in that installation is not particularly “advertised” to one and all?

Fixes on the server level, however, that cut across all application platforms are, imnsho, a better solution in the wider viewpoint, because they are not limited to specific applications or versions of those applications. They help us all, no matter what apps we are using. A good conversation on this is also ongoing over in the TextDrive forums. :)

Posted by: Aine on December 17, 2004 4:55 PM | Permalink | Reply to this

Partial Solutions

The plugin solution only solves a small part of the problem… for MT3.X users. There is still a whole slew of MT2.X users out there, and these plugins are not backward compatible… but I think 6A has now learned that the problem is not version-specific.

My original comment-throttle patch to the source code was for MT 2.6.x. I pleaded with the SixApart people to release a security update to 2.6.x, with my patch (or something similar) rolled in. They said it would all be fixed in 3.0, and not to worry.

Well guess what? It wasn’t fixed in 3.0 (though 3.1 did provide the hooks that enabled Phil to roll the patch into a plugin). There’s a certain irony that the same issue is coming back to haunt them now…

For example, why does the comment script have to be exactly the same on every MT installation of a given version?

It doesn’t. The installation script could easily give it a random name, and configure mt.cfg accordingly.

And why not make the base installation of MT immune from googlebot via robots.txt file (installed automagically when MT is installed) at the base level of the MT installation) so that the “unique” comment script name in that installation is not particularly “advertised” to one and all?

The default templates should not have a comment-entry form on the individual archive pages, and the comment popup template should have a

<meta name="robots" content="noindex,nofollow" />

That’s really all that’s required. You don’t need a robots.txt file (nor would one do any good if — as here — your blog is not located at the (virtual) server root level).

Fixes on the server level, however, that cut across all application platforms are, imnsho, a better solution in the wider viewpoint…

Yeah, though that requires an enlightened webhost. Perhaps anti-comment-spam defences could be a product-differentiator for enlightened webhosting services.

Posted by: Jacques Distler on December 17, 2004 5:29 PM | Permalink | PGP Sig | Reply to this

Re: The Pace of Innovation

I’ve filled in my comments here and added a few things in a new post at my blog. I’ve also posted similar over at the TextDrive forums. And Anil Dash has taken an interest in what’s going on over at those forums and even put his two cents in there a few days ago.

I think in any new MT installation, I’m going to robots.txt-protect my base installation and fix all templates, etc. in that and any new blogs created from it. I’ll use the base installation’s blog to document any changes (though not specifics of things like naming) that I make in my blogs (like a developer’s notebook of sorts). I always wondered why MT didn’t come with a little notepad in the controlpanel for such purposes, but the same thing can be had with a base installation’s blog to help you keep track of which plugins you’ve installed (and why) and what design changes you’ve made to templates or stylesheets, etc.

Posted by: Aine on December 17, 2004 6:11 PM | Permalink | Reply to this

Re: The Pace of Innovation

I’m glad you are giving the trackback issue some thought. This is probably the more serious problem for me right now.

Between MT-Blacklist, closing out comments on older threads, and renaming the comments CGI to a non-standard name, my comment spam problem has been pretty managable.

I’ve been thinking of hacking MT to make the close comments setting apply to trackbacks as well. That way when I do get hit, the spammer is only going to get the most recent articles. Now, they sometimes hit hundreds of trackbacks in a shot.

And then, other times, I think I should just abandon MT for some other platform.

Posted by: chip on December 20, 2004 3:03 PM | Permalink | Reply to this

Trackbacks

If the other platform supports Trackback, then it won’t make a whit of difference. The Trackback protocol is platform-independent.

Throttling them is a start. But there’s not much else you can do without redesigning the protocol.

If you were redesigning the protocol, you might think about some kind of challenge-response system, in which the initial trackback ping would cause your server to send a challenge to the purported originator of the ping. For the trackback to be accepted, an appropriate response would be required.

The point is to somehow assure that the website referenced in the trackback actually sent it. Unfortunately, what was a relatively simple protocol starts looking much more complicated.

And I’m not sure you’ve really solved the problem…

Posted by: Jacques Distler on December 20, 2004 3:30 PM | Permalink | PGP Sig | Reply to this

Musings

Skip to the Main Content

December 17, 2004